From patchwork Sun Sep 21 20:12:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 120585 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE7F53858C2D for ; Sun, 21 Sep 2025 20:13:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DE7F53858C2D Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=SVRxYeNF X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id A09433858C54 for ; Sun, 21 Sep 2025 20:12:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A09433858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A09433858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1758485580; cv=none; b=pSnuHomZjUnEt7JuqzoE9rpxL/ZyWsB/wLjwctEfNSTD7CQVpwcgxC86wyKUJCSiRxeB3FR1kuazT2qHRvpim8NZcdgZEJXBzB87yqaYC4mvgd0K1DgUmrFNiVYhAKlFyquLdUFc2DjdoNFcMcD28O6r7XLu+f+JVgxSm2nrK64= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1758485580; c=relaxed/simple; bh=U6OBbcUwUDFB31h/PEsK96KpuRzoVNRBgjIZxQcQnqk=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=oLobhpxt55+vXdMerJATF60s2CPafCQkohEDpBehVyC46DWjNxrXGqIuEOp8i2PrP+0x7XSoQk5HuNdkkO5uZm5ozEdiL9HEwDK4cjLpZ+j5ILNdI/bDNSlv/Dm/mSEoe2+aNz3lJ7xXtZGVDmdlna1I54iiVjJrKnhR+sUxPcQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A09433858C54 Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-b07c28f390eso679453766b.2 for ; Sun, 21 Sep 2025 13:12:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1758485578; x=1759090378; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=08NMf8WMzKCSXXmotvdLVkh6ieJNumLmwdXPtjYutUQ=; b=SVRxYeNFPHzm2DrmRb72oK86MlCi8a1ilDd7mLk0DAW7ksKDiX5SaxFWjlOErZEDRQ Pro4eMqbBQA1xMoEc8AvTHn7DQ9dP0qYUdgWiLYoTzOw3eLunvdIw4tANxnk2UoHQVoY p1TsAptYQSa0Rf/KPhlWx1WVGaJG3p6s/gDEz8EFCN1D7tlL90sjonNph4n6I1bBX6FV qfIch0Tps52O6z1d9+cMxrEkhkqCvChsvqeK57i75UiYuc4w2E9OdGH3OvwhbHApBZgD vUfZHblE/wbJYhuiS0mCyww3/T+09RaJP0MLZ8+sqSfybDe1aV6maq88ErH+yYITh+zP 5RFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1758485578; x=1759090378; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=08NMf8WMzKCSXXmotvdLVkh6ieJNumLmwdXPtjYutUQ=; b=GuzNLqX4F49eVPp9PceJlRQosUoq8zTrQTfz/h58qBQnQd9ZmEe7QLt490j0cGxuHQ W/dZruDJCYaRDdH+O0YET3gvMs0u/pYR+QNnAQwGsX6odvt4Pi+VEGQwSU/wxhNFEtuf AjBNRtTwFd2l85kL5Ob6INllKhSAr2kAhXPoWrRpxWcf7WRLEsi7ZDgo56u3zfkrhaAf N1w5L8elNn8Om0sRuzPceQhZULC9WesIo2W1n5DZUkMVUs/CzFM+HlsyafCSz3aJAja+ MqJ2U5a5bFk4ILfFPEKZPVuT0LxxU2S5kutauSAyFQNjlFb1ZzhVmSkqUFIO/2ytzMXB wWtw== X-Gm-Message-State: AOJu0YxGQo7GPaS9hkyHdVv0DFeLDJuxHiPzqY0FUd6/yYrwSMRZNxSL sIg8jtJzGI1vZx3frTj5570d6WK306QuVwJr9vFhDeqYmrFXGm1SdvCHIDcZbg== X-Gm-Gg: ASbGncv7Pb5V8Z/88XyHOljMxjFevJjCMUqpswTFJM3VpVZpJyle8CsXqaZJ5ZFhsQD xgwXtZU4VeauqWdEeTexcr6fiip2uQmGZpClzQ87ceUdFtPduNr9XLktAqoSFmbDZ7q1PmQEiyu wjXOVO5Fc4C5sy3ctC+KLGJo+4CrPKYD9sYHr+y3nKnPjk9Ybgl5Sd6N04kuC9ZCinMLqtYVNhV b/WBIJzPfVG0i9n+u6JSR6G/JhVjIVXYlgBwnkWKVp9jx72FRpN+gk87gme22Hv8fdysMQllrMt zHmxG5rfwDEKRZEKtVkWGxHeiqcqbcHjl1ooKPNvVb5ZajNvCsiW7K6uLuIjQcT44bSuQwrR+9C JJMqXMxti4VSd X-Google-Smtp-Source: AGHT+IFO5ubt8WF5Em8AcdBO08pWNS0qnCgG9Qt0Au5roFSVapcEB4BnHmbv4ViHJZfcOE3B40V0jg== X-Received: by 2002:a17:906:fe0e:b0:b0f:a22a:4c3c with SMTP id a640c23a62f3a-b24f50aa958mr1281371366b.48.1758485577161; Sun, 21 Sep 2025 13:12:57 -0700 (PDT) Received: from fedora ([46.248.82.114]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b2adc17af65sm212841666b.19.2025.09.21.13.12.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 21 Sep 2025 13:12:56 -0700 (PDT) From: Uros Bizjak To: libc-alpha@sourceware.org, hjl.tools@gmail.com Cc: Uros Bizjak , Adhemerval Zanella Netto , Florian Weimer Subject: [PATCH v2 2/2] x86: Use "%v" to emit VEX encoded instructions for AVX targets Date: Sun, 21 Sep 2025 22:12:14 +0200 Message-ID: <20250921201252.3680335-2-ubizjak@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20250921201252.3680335-1-ubizjak@gmail.com> References: <20250921201252.3680335-1-ubizjak@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Legacy encodings of SSE instructions incur AVX-SSE domain transition penalties on some Intel microarchitectures (e.g. Haswell, Broadwell). Using the VEX forms avoids these penatlies and keeps all instructions in the VEX decode domain. Use "%v" sequence to emit the "v" prefix for opcodes when compiling with -mavx. No functional changes intended. Signed-off-by: Uros Bizjak Cc: Adhemerval Zanella Netto Reviewed-by: Florian Weimer Reviewed-by: H.J. Lu --- v2: Split from v1 patch and describe the reason for change. --- sysdeps/i386/fpu/fclrexcpt.c | 4 +-- sysdeps/i386/fpu/fedisblxcpt.c | 4 +-- sysdeps/i386/fpu/feenablxcpt.c | 4 +-- sysdeps/i386/fpu/fegetenv.c | 2 +- sysdeps/i386/fpu/fegetmode.c | 2 +- sysdeps/i386/fpu/feholdexcpt.c | 4 +-- sysdeps/i386/fpu/fesetenv.c | 4 +-- sysdeps/i386/fpu/fesetexcept.c | 4 +-- sysdeps/i386/fpu/fesetmode.c | 4 +-- sysdeps/i386/fpu/fesetround.c | 4 +-- sysdeps/i386/fpu/feupdateenv.c | 2 +- sysdeps/i386/fpu/fgetexcptflg.c | 2 +- sysdeps/i386/fpu/fsetexcptflg.c | 4 +-- sysdeps/i386/fpu/ftestexcept.c | 2 +- sysdeps/i386/setfpucw.c | 4 +-- sysdeps/x86/fpu/fenv_private.h | 44 +++++++++++++------------------ sysdeps/x86/fpu/sfp-machine.h | 8 +----- sysdeps/x86/fpu/test-fenv-sse-2.c | 4 +-- sysdeps/x86_64/fpu/fclrexcpt.c | 4 +-- sysdeps/x86_64/fpu/fedisblxcpt.c | 4 +-- sysdeps/x86_64/fpu/feenablxcpt.c | 4 +-- sysdeps/x86_64/fpu/fegetenv.c | 2 +- sysdeps/x86_64/fpu/fegetmode.c | 2 +- sysdeps/x86_64/fpu/feholdexcpt.c | 4 +-- sysdeps/x86_64/fpu/fesetenv.c | 4 +-- sysdeps/x86_64/fpu/fesetexcept.c | 4 +-- sysdeps/x86_64/fpu/fesetmode.c | 4 +-- sysdeps/x86_64/fpu/fesetround.c | 4 +-- sysdeps/x86_64/fpu/feupdateenv.c | 2 +- sysdeps/x86_64/fpu/fgetexcptflg.c | 2 +- sysdeps/x86_64/fpu/fraiseexcpt.c | 4 +-- sysdeps/x86_64/fpu/fsetexcptflg.c | 4 +-- sysdeps/x86_64/fpu/ftestexcept.c | 2 +- 33 files changed, 71 insertions(+), 85 deletions(-) diff --git a/sysdeps/i386/fpu/fclrexcpt.c b/sysdeps/i386/fpu/fclrexcpt.c index 17012635f1..39bcf3de59 100644 --- a/sysdeps/i386/fpu/fclrexcpt.c +++ b/sysdeps/i386/fpu/fclrexcpt.c @@ -44,13 +44,13 @@ __feclearexcept (int excepts) unsigned int xnew_exc; /* Get the current MXCSR. */ - __asm__ ("stmxcsr %0" : "=m" (xnew_exc)); + __asm__ ("%vstmxcsr %0" : "=m" (xnew_exc)); /* Clear the relevant bits. */ xnew_exc &= ~excepts; /* Put the new data in effect. */ - __asm__ ("ldmxcsr %0" : : "m" (xnew_exc)); + __asm__ ("%vldmxcsr %0" : : "m" (xnew_exc)); } /* Success. */ diff --git a/sysdeps/i386/fpu/fedisblxcpt.c b/sysdeps/i386/fpu/fedisblxcpt.c index c2f59231a6..a2dfa8e4c9 100644 --- a/sysdeps/i386/fpu/fedisblxcpt.c +++ b/sysdeps/i386/fpu/fedisblxcpt.c @@ -41,11 +41,11 @@ fedisableexcept (int excepts) unsigned int xnew_exc; /* Get the current control word. */ - __asm__ ("stmxcsr %0" : "=m" (xnew_exc)); + __asm__ ("%vstmxcsr %0" : "=m" (xnew_exc)); xnew_exc |= excepts << 7; - __asm__ ("ldmxcsr %0" : : "m" (xnew_exc)); + __asm__ ("%vldmxcsr %0" : : "m" (xnew_exc)); } return old_exc; diff --git a/sysdeps/i386/fpu/feenablxcpt.c b/sysdeps/i386/fpu/feenablxcpt.c index bffcc02bd8..fa1d82a4b6 100644 --- a/sysdeps/i386/fpu/feenablxcpt.c +++ b/sysdeps/i386/fpu/feenablxcpt.c @@ -41,11 +41,11 @@ feenableexcept (int excepts) unsigned int xnew_exc; /* Get the current control word. */ - __asm__ ("stmxcsr %0" : "=m" (xnew_exc)); + __asm__ ("%vstmxcsr %0" : "=m" (xnew_exc)); xnew_exc &= ~(excepts << 7); - __asm__ ("ldmxcsr %0" : : "m" (xnew_exc)); + __asm__ ("%vldmxcsr %0" : : "m" (xnew_exc)); } return old_exc; diff --git a/sysdeps/i386/fpu/fegetenv.c b/sysdeps/i386/fpu/fegetenv.c index 0d2b87db93..5b35577151 100644 --- a/sysdeps/i386/fpu/fegetenv.c +++ b/sysdeps/i386/fpu/fegetenv.c @@ -30,7 +30,7 @@ __fegetenv (fenv_t *envp) __asm__ ("fldenv %0" : : "m" (*envp)); if (CPU_FEATURE_USABLE (SSE)) - __asm__ ("stmxcsr %0" : "=m" (envp->__eip)); + __asm__ ("%vstmxcsr %0" : "=m" (envp->__eip)); /* Success. */ return 0; diff --git a/sysdeps/i386/fpu/fegetmode.c b/sysdeps/i386/fpu/fegetmode.c index 41275e1036..8b109072f5 100644 --- a/sysdeps/i386/fpu/fegetmode.c +++ b/sysdeps/i386/fpu/fegetmode.c @@ -26,6 +26,6 @@ fegetmode (femode_t *modep) { _FPU_GETCW (modep->__control_word); if (CPU_FEATURE_USABLE (SSE)) - __asm__ ("stmxcsr %0" : "=m" (modep->__mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (modep->__mxcsr)); return 0; } diff --git a/sysdeps/i386/fpu/feholdexcpt.c b/sysdeps/i386/fpu/feholdexcpt.c index cd4b357d74..f6f6b70dd4 100644 --- a/sysdeps/i386/fpu/feholdexcpt.c +++ b/sysdeps/i386/fpu/feholdexcpt.c @@ -33,12 +33,12 @@ __feholdexcept (fenv_t *envp) unsigned int xwork; /* Get the current control word. */ - __asm__ ("stmxcsr %0" : "=m" (envp->__eip)); + __asm__ ("%vstmxcsr %0" : "=m" (envp->__eip)); /* Set all exceptions to non-stop and clear them. */ xwork = (envp->__eip | 0x1f80) & ~0x3f; - __asm__ ("ldmxcsr %0" : : "m" (xwork)); + __asm__ ("%vldmxcsr %0" : : "m" (xwork)); } return 0; diff --git a/sysdeps/i386/fpu/fesetenv.c b/sysdeps/i386/fpu/fesetenv.c index 3fec7af43a..e6b276a0fc 100644 --- a/sysdeps/i386/fpu/fesetenv.c +++ b/sysdeps/i386/fpu/fesetenv.c @@ -80,7 +80,7 @@ __fesetenv (const fenv_t *envp) if (CPU_FEATURE_USABLE (SSE)) { unsigned int mxcsr; - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); if (envp == FE_DFL_ENV) { @@ -111,7 +111,7 @@ __fesetenv (const fenv_t *envp) else mxcsr = envp->__eip; - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); } /* Success. */ diff --git a/sysdeps/i386/fpu/fesetexcept.c b/sysdeps/i386/fpu/fesetexcept.c index 7d1a4c5b52..876bde233f 100644 --- a/sysdeps/i386/fpu/fesetexcept.c +++ b/sysdeps/i386/fpu/fesetexcept.c @@ -33,13 +33,13 @@ fesetexcept (int excepts) { /* Get the control word of the SSE unit. */ unsigned int mxcsr; - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Set relevant flags. */ mxcsr |= excepts; /* Put the new data in effect. */ - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); } else { diff --git a/sysdeps/i386/fpu/fesetmode.c b/sysdeps/i386/fpu/fesetmode.c index eab0a5d683..ee61ca1cec 100644 --- a/sysdeps/i386/fpu/fesetmode.c +++ b/sysdeps/i386/fpu/fesetmode.c @@ -37,7 +37,7 @@ fesetmode (const femode_t *modep) if (CPU_FEATURE_USABLE (SSE)) { unsigned int mxcsr; - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Preserve SSE exception flags but restore other state in MXCSR. */ mxcsr &= FE_ALL_EXCEPT_X86; @@ -47,7 +47,7 @@ fesetmode (const femode_t *modep) mxcsr |= FE_ALL_EXCEPT_X86 << 7; else mxcsr |= modep->__mxcsr & ~FE_ALL_EXCEPT_X86; - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); } return 0; } diff --git a/sysdeps/i386/fpu/fesetround.c b/sysdeps/i386/fpu/fesetround.c index bd976a4755..e87d794319 100644 --- a/sysdeps/i386/fpu/fesetround.c +++ b/sysdeps/i386/fpu/fesetround.c @@ -39,10 +39,10 @@ __fesetround (int round) { unsigned int xcw; - __asm__ ("stmxcsr %0" : "=m" (xcw)); + __asm__ ("%vstmxcsr %0" : "=m" (xcw)); xcw &= ~0x6000; xcw |= round << 3; - __asm__ ("ldmxcsr %0" : : "m" (xcw)); + __asm__ ("%vldmxcsr %0" : : "m" (xcw)); } return 0; diff --git a/sysdeps/i386/fpu/feupdateenv.c b/sysdeps/i386/fpu/feupdateenv.c index f8ad46db51..9e1ad97118 100644 --- a/sysdeps/i386/fpu/feupdateenv.c +++ b/sysdeps/i386/fpu/feupdateenv.c @@ -31,7 +31,7 @@ __feupdateenv (const fenv_t *envp) /* If the CPU supports SSE we test the MXCSR as well. */ if (CPU_FEATURE_USABLE (SSE)) - __asm__ ("stmxcsr %0" : "=m" (xtemp)); + __asm__ ("%vstmxcsr %0" : "=m" (xtemp)); temp = (temp | xtemp) & FE_ALL_EXCEPT; diff --git a/sysdeps/i386/fpu/fgetexcptflg.c b/sysdeps/i386/fpu/fgetexcptflg.c index da2f00a91a..36dd297cdc 100644 --- a/sysdeps/i386/fpu/fgetexcptflg.c +++ b/sysdeps/i386/fpu/fgetexcptflg.c @@ -37,7 +37,7 @@ __fegetexceptflag (fexcept_t *flagp, int excepts) unsigned int sse_exc; /* Get the current MXCSR. */ - __asm__ ("stmxcsr %0" : "=m" (sse_exc)); + __asm__ ("%vstmxcsr %0" : "=m" (sse_exc)); *flagp |= sse_exc & excepts & FE_ALL_EXCEPT; } diff --git a/sysdeps/i386/fpu/fsetexcptflg.c b/sysdeps/i386/fpu/fsetexcptflg.c index 49c2facf37..b78d1dcd3c 100644 --- a/sysdeps/i386/fpu/fsetexcptflg.c +++ b/sysdeps/i386/fpu/fsetexcptflg.c @@ -50,13 +50,13 @@ __fesetexceptflag (const fexcept_t *flagp, int excepts) __asm__ ("fldenv %0" : : "m" (temp)); /* And now similarly for SSE. */ - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Clear or set relevant flags. */ mxcsr ^= (mxcsr ^ *flagp) & excepts; /* Put the new data in effect. */ - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); } else { diff --git a/sysdeps/i386/fpu/ftestexcept.c b/sysdeps/i386/fpu/ftestexcept.c index 3b966c2095..51abfd3917 100644 --- a/sysdeps/i386/fpu/ftestexcept.c +++ b/sysdeps/i386/fpu/ftestexcept.c @@ -31,7 +31,7 @@ __fetestexcept (int excepts) /* If the CPU supports SSE we test the MXCSR as well. */ if (CPU_FEATURE_USABLE (SSE)) - __asm__ ("stmxcsr %0" : "=m" (xtemp)); + __asm__ ("%vstmxcsr %0" : "=m" (xtemp)); return (temp | xtemp) & excepts & FE_ALL_EXCEPT; } diff --git a/sysdeps/i386/setfpucw.c b/sysdeps/i386/setfpucw.c index 9b13425682..8438c7ed75 100644 --- a/sysdeps/i386/setfpucw.c +++ b/sysdeps/i386/setfpucw.c @@ -43,11 +43,11 @@ __setfpucw (fpu_control_t set) unsigned int xnew_exc; /* Get the current MXCSR. */ - __asm__ ("stmxcsr %0" : "=m" (xnew_exc)); + __asm__ ("%vstmxcsr %0" : "=m" (xnew_exc)); xnew_exc &= ~((0xc00 << 3) | (FE_ALL_EXCEPT << 7)); xnew_exc |= ((set & 0xc00) << 3) | ((set & FE_ALL_EXCEPT) << 7); - __asm__ ("ldmxcsr %0" : : "m" (xnew_exc)); + __asm__ ("%vldmxcsr %0" : : "m" (xnew_exc)); } } diff --git a/sysdeps/x86/fpu/fenv_private.h b/sysdeps/x86/fpu/fenv_private.h index 22036654e9..c9b573cacd 100644 --- a/sysdeps/x86/fpu/fenv_private.h +++ b/sysdeps/x86/fpu/fenv_private.h @@ -18,22 +18,14 @@ need not care for both the 387 and the sse unit, only the one we're actually using. */ -#if defined __AVX__ || defined SSE2AVX -# define STMXCSR "vstmxcsr" -# define LDMXCSR "vldmxcsr" -#else -# define STMXCSR "stmxcsr" -# define LDMXCSR "ldmxcsr" -#endif - static __always_inline void libc_feholdexcept_sse (fenv_t *e) { unsigned int mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); e->__mxcsr = mxcsr; mxcsr = (mxcsr | 0x1f80) & ~0x3f; - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); } static __always_inline void @@ -51,9 +43,9 @@ static __always_inline void libc_fesetround_sse (int r) { unsigned int mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); mxcsr = (mxcsr & ~0x6000) | (r << 3); - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); } static __always_inline void @@ -69,10 +61,10 @@ static __always_inline void libc_feholdexcept_setround_sse (fenv_t *e, int r) { unsigned int mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); e->__mxcsr = mxcsr; mxcsr = ((mxcsr | 0x1f80) & ~0x603f) | (r << 3); - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); } /* Set both rounding mode and precision. A convenience function for use @@ -104,7 +96,7 @@ static __always_inline int libc_fetestexcept_sse (int e) { unsigned int mxcsr; - asm volatile (STMXCSR " %0" : "=m" (mxcsr)); + asm volatile ("%vstmxcsr %0" : "=m" (mxcsr)); return mxcsr & e & FE_ALL_EXCEPT; } @@ -119,7 +111,7 @@ libc_fetestexcept_387 (int ex) static __always_inline void libc_fesetenv_sse (fenv_t *e) { - asm volatile (LDMXCSR " %0" : : "m" (e->__mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (e->__mxcsr)); } static __always_inline void @@ -137,13 +129,13 @@ static __always_inline int libc_feupdateenv_test_sse (fenv_t *e, int ex) { unsigned int mxcsr, old_mxcsr, cur_ex; - asm volatile (STMXCSR " %0" : "=m" (mxcsr)); + asm volatile ("%vstmxcsr %0" : "=m" (mxcsr)); cur_ex = mxcsr & FE_ALL_EXCEPT; /* Merge current exceptions with the old environment. */ old_mxcsr = e->__mxcsr; mxcsr = old_mxcsr | cur_ex; - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); /* Raise SIGFPE for any new exceptions since the hold. Expect that the normal environment has all exceptions masked. */ @@ -189,10 +181,10 @@ static __always_inline void libc_feholdsetround_sse (fenv_t *e, int r) { unsigned int mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); e->__mxcsr = mxcsr; mxcsr = (mxcsr & ~0x6000) | (r << 3); - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); } static __always_inline void @@ -223,9 +215,9 @@ static __always_inline void libc_feresetround_sse (fenv_t *e) { unsigned int mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); mxcsr = (mxcsr & ~0x6000) | (e->__mxcsr & 0x6000); - asm volatile (LDMXCSR " %0" : : "m" (mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (mxcsr)); } static __always_inline void @@ -315,13 +307,13 @@ static __always_inline void libc_feholdexcept_setround_sse_ctx (struct rm_ctx *ctx, int r) { unsigned int mxcsr, new_mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); new_mxcsr = ((mxcsr | 0x1f80) & ~0x603f) | (r << 3); ctx->env.__mxcsr = mxcsr; if (__glibc_unlikely (mxcsr != new_mxcsr)) { - asm volatile (LDMXCSR " %0" : : "m" (new_mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (new_mxcsr)); ctx->updated_status = true; } else @@ -412,13 +404,13 @@ libc_feholdsetround_sse_ctx (struct rm_ctx *ctx, int r) { unsigned int mxcsr, new_mxcsr; - asm (STMXCSR " %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); new_mxcsr = (mxcsr & ~0x6000) | (r << 3); ctx->env.__mxcsr = mxcsr; if (__glibc_unlikely (new_mxcsr != mxcsr)) { - asm volatile (LDMXCSR " %0" : : "m" (new_mxcsr)); + asm volatile ("%vldmxcsr %0" : : "m" (new_mxcsr)); ctx->updated_status = true; } else diff --git a/sysdeps/x86/fpu/sfp-machine.h b/sysdeps/x86/fpu/sfp-machine.h index bc3fe332df..5892f4f5fe 100644 --- a/sysdeps/x86/fpu/sfp-machine.h +++ b/sysdeps/x86/fpu/sfp-machine.h @@ -39,15 +39,9 @@ typedef unsigned int UTItype __attribute__ ((mode (TI))); # define FP_RND_MASK 0x6000 -# ifdef __AVX__ -# define AVX_INSN_PREFIX "v" -# else -# define AVX_INSN_PREFIX "" -# endif - # define FP_INIT_ROUNDMODE \ do { \ - __asm__ __volatile__ (AVX_INSN_PREFIX "stmxcsr\t%0" : "=m" (_fcw)); \ + __asm__ __volatile__ ("%vstmxcsr\t%0" : "=m" (_fcw)); \ } while (0) #else # define _FP_W_TYPE_SIZE 32 diff --git a/sysdeps/x86/fpu/test-fenv-sse-2.c b/sysdeps/x86/fpu/test-fenv-sse-2.c index 39526e06ee..d12009bb81 100644 --- a/sysdeps/x86/fpu/test-fenv-sse-2.c +++ b/sysdeps/x86/fpu/test-fenv-sse-2.c @@ -29,14 +29,14 @@ static uint32_t get_sse_mxcsr (void) { uint32_t temp; - __asm__ __volatile__ ("stmxcsr %0" : "=m" (temp)); + __asm__ __volatile__ ("%vstmxcsr %0" : "=m" (temp)); return temp; } static void set_sse_mxcsr (uint32_t val) { - __asm__ __volatile__ ("ldmxcsr %0" : : "m" (val)); + __asm__ __volatile__ ("%vldmxcsr %0" : : "m" (val)); } static void diff --git a/sysdeps/x86_64/fpu/fclrexcpt.c b/sysdeps/x86_64/fpu/fclrexcpt.c index 1ce14ece14..86b4228f2f 100644 --- a/sysdeps/x86_64/fpu/fclrexcpt.c +++ b/sysdeps/x86_64/fpu/fclrexcpt.c @@ -38,13 +38,13 @@ __feclearexcept (int excepts) __asm__ ("fldenv %0" : : "m" (temp)); /* And the same procedure for SSE. */ - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Clear the relevant bits. */ mxcsr &= ~excepts; /* And put them into effect. */ - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); /* Success. */ return 0; diff --git a/sysdeps/x86_64/fpu/fedisblxcpt.c b/sysdeps/x86_64/fpu/fedisblxcpt.c index 873ee65f4e..dab9ad19c2 100644 --- a/sysdeps/x86_64/fpu/fedisblxcpt.c +++ b/sysdeps/x86_64/fpu/fedisblxcpt.c @@ -35,11 +35,11 @@ fedisableexcept (int excepts) __asm__ ("fldcw %0" : : "m" (new_exc)); /* And now the same for the SSE MXCSR register. */ - __asm__ ("stmxcsr %0" : "=m" (new)); + __asm__ ("%vstmxcsr %0" : "=m" (new)); /* The SSE exception masks are shifted by 7 bits. */ new |= excepts << 7; - __asm__ ("ldmxcsr %0" : : "m" (new)); + __asm__ ("%vldmxcsr %0" : : "m" (new)); return old_exc; } diff --git a/sysdeps/x86_64/fpu/feenablxcpt.c b/sysdeps/x86_64/fpu/feenablxcpt.c index 81630841c7..828b2b247a 100644 --- a/sysdeps/x86_64/fpu/feenablxcpt.c +++ b/sysdeps/x86_64/fpu/feenablxcpt.c @@ -35,11 +35,11 @@ feenableexcept (int excepts) __asm__ ("fldcw %0" : : "m" (new_exc)); /* And now the same for the SSE MXCSR register. */ - __asm__ ("stmxcsr %0" : "=m" (new)); + __asm__ ("%vstmxcsr %0" : "=m" (new)); /* The SSE exception masks are shifted by 7 bits. */ new &= ~(excepts << 7); - __asm__ ("ldmxcsr %0" : : "m" (new)); + __asm__ ("%vldmxcsr %0" : : "m" (new)); return old_exc; } diff --git a/sysdeps/x86_64/fpu/fegetenv.c b/sysdeps/x86_64/fpu/fegetenv.c index 7c89583c0d..eea9d6bee7 100644 --- a/sysdeps/x86_64/fpu/fegetenv.c +++ b/sysdeps/x86_64/fpu/fegetenv.c @@ -25,7 +25,7 @@ __fegetenv (fenv_t *envp) /* fnstenv changes the exception mask, so load back the stored environment. */ "fldenv %0\n" - "stmxcsr %1" : "=m" (*envp), "=m" (envp->__mxcsr)); + "%vstmxcsr %1" : "=m" (*envp), "=m" (envp->__mxcsr)); /* Success. */ return 0; diff --git a/sysdeps/x86_64/fpu/fegetmode.c b/sysdeps/x86_64/fpu/fegetmode.c index 8830a161d6..39d124a6d8 100644 --- a/sysdeps/x86_64/fpu/fegetmode.c +++ b/sysdeps/x86_64/fpu/fegetmode.c @@ -23,6 +23,6 @@ int fegetmode (femode_t *modep) { _FPU_GETCW (modep->__control_word); - __asm__ ("stmxcsr %0" : "=m" (modep->__mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (modep->__mxcsr)); return 0; } diff --git a/sysdeps/x86_64/fpu/feholdexcpt.c b/sysdeps/x86_64/fpu/feholdexcpt.c index 446e98d19f..9a22a2ea77 100644 --- a/sysdeps/x86_64/fpu/feholdexcpt.c +++ b/sysdeps/x86_64/fpu/feholdexcpt.c @@ -26,13 +26,13 @@ __feholdexcept (fenv_t *envp) /* Store the environment. Recall that fnstenv has a side effect of masking all exceptions. Then clear all exceptions. */ __asm__ ("fnstenv %0\n\t" - "stmxcsr %1\n\t" + "%vstmxcsr %1\n\t" "fnclex" : "=m" (*envp), "=m" (envp->__mxcsr)); /* Set the SSE MXCSR register. */ mxcsr = (envp->__mxcsr | 0x1f80) & ~0x3f; - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); return 0; } diff --git a/sysdeps/x86_64/fpu/fesetenv.c b/sysdeps/x86_64/fpu/fesetenv.c index 0ab3059889..e4e721afff 100644 --- a/sysdeps/x86_64/fpu/fesetenv.c +++ b/sysdeps/x86_64/fpu/fesetenv.c @@ -36,7 +36,7 @@ __fesetenv (const fenv_t *envp) Therefore, we get the current environment and replace the values we want to use from the environment specified by the parameter. */ __asm__ ("fnstenv %0\n" - "stmxcsr %1" : "=m" (temp), "=m" (temp.__mxcsr)); + "%vstmxcsr %1" : "=m" (temp), "=m" (temp.__mxcsr)); if (envp == FE_DFL_ENV) { @@ -104,7 +104,7 @@ __fesetenv (const fenv_t *envp) } __asm__ ("fldenv %0\n" - "ldmxcsr %1" : : "m" (temp), "m" (temp.__mxcsr)); + "%vldmxcsr %1" : : "m" (temp), "m" (temp.__mxcsr)); /* Success. */ return 0; diff --git a/sysdeps/x86_64/fpu/fesetexcept.c b/sysdeps/x86_64/fpu/fesetexcept.c index 22ce321bc3..91d5270f8e 100644 --- a/sysdeps/x86_64/fpu/fesetexcept.c +++ b/sysdeps/x86_64/fpu/fesetexcept.c @@ -23,9 +23,9 @@ fesetexcept (int excepts) { unsigned int mxcsr; - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); mxcsr |= excepts & FE_ALL_EXCEPT; - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); return 0; } diff --git a/sysdeps/x86_64/fpu/fesetmode.c b/sysdeps/x86_64/fpu/fesetmode.c index 3bd728e599..2b35d7e719 100644 --- a/sysdeps/x86_64/fpu/fesetmode.c +++ b/sysdeps/x86_64/fpu/fesetmode.c @@ -28,7 +28,7 @@ fesetmode (const femode_t *modep) { fpu_control_t cw; unsigned int mxcsr; - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Preserve SSE exception flags but restore other state in MXCSR. */ mxcsr &= FE_ALL_EXCEPT_X86; @@ -45,6 +45,6 @@ fesetmode (const femode_t *modep) mxcsr |= modep->__mxcsr & ~FE_ALL_EXCEPT_X86; } _FPU_SETCW (cw); - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); return 0; } diff --git a/sysdeps/x86_64/fpu/fesetround.c b/sysdeps/x86_64/fpu/fesetround.c index dda635ed19..e1ffb3b7a9 100644 --- a/sysdeps/x86_64/fpu/fesetround.c +++ b/sysdeps/x86_64/fpu/fesetround.c @@ -36,10 +36,10 @@ __fesetround (int round) /* And now the MSCSR register for SSE, the precision is at different bit positions in the different units, we need to shift it 3 bits. */ - asm ("stmxcsr %0" : "=m" (mxcsr)); + asm ("%vstmxcsr %0" : "=m" (mxcsr)); mxcsr &= ~ 0x6000; mxcsr |= round << 3; - asm ("ldmxcsr %0" : : "m" (mxcsr)); + asm ("%vldmxcsr %0" : : "m" (mxcsr)); return 0; } diff --git a/sysdeps/x86_64/fpu/feupdateenv.c b/sysdeps/x86_64/fpu/feupdateenv.c index 72abc188e1..0e26b92af5 100644 --- a/sysdeps/x86_64/fpu/feupdateenv.c +++ b/sysdeps/x86_64/fpu/feupdateenv.c @@ -25,7 +25,7 @@ __feupdateenv (const fenv_t *envp) unsigned int xtemp; /* Save current exceptions. */ - __asm__ ("fnstsw %0\n\tstmxcsr %1" : "=m" (temp), "=m" (xtemp)); + __asm__ ("fnstsw %0\n\t%vstmxcsr %1" : "=m" (temp), "=m" (xtemp)); temp = (temp | xtemp) & FE_ALL_EXCEPT; /* Install new environment. */ diff --git a/sysdeps/x86_64/fpu/fgetexcptflg.c b/sysdeps/x86_64/fpu/fgetexcptflg.c index d11d3465e2..a7b500b600 100644 --- a/sysdeps/x86_64/fpu/fgetexcptflg.c +++ b/sysdeps/x86_64/fpu/fgetexcptflg.c @@ -26,7 +26,7 @@ fegetexceptflag (fexcept_t *flagp, int excepts) /* Get the current exceptions for the x87 FPU and SSE unit. */ __asm__ ("fnstsw %0\n" - "stmxcsr %1" : "=m" (temp), "=m" (mxscr)); + "%vstmxcsr %1" : "=m" (temp), "=m" (mxscr)); *flagp = (temp | mxscr) & FE_ALL_EXCEPT & excepts; diff --git a/sysdeps/x86_64/fpu/fraiseexcpt.c b/sysdeps/x86_64/fpu/fraiseexcpt.c index c340730ed5..a97f8f0b95 100644 --- a/sysdeps/x86_64/fpu/fraiseexcpt.c +++ b/sysdeps/x86_64/fpu/fraiseexcpt.c @@ -33,7 +33,7 @@ __feraiseexcept (int excepts) /* One example of an invalid operation is 0.0 / 0.0. */ float f = 0.0; - __asm__ __volatile__ ("divss %0, %0 " : "+x" (f)); + __asm__ __volatile__ ("%vdivss %0, %0 " : "+x" (f)); (void) &f; } @@ -43,7 +43,7 @@ __feraiseexcept (int excepts) float f = 1.0; float g = 0.0; - __asm__ __volatile__ ("divss %1, %0" : "+x" (f) : "x" (g)); + __asm__ __volatile__ ("%vdivss %1, %0" : "+x" (f) : "x" (g)); (void) &f; } diff --git a/sysdeps/x86_64/fpu/fsetexcptflg.c b/sysdeps/x86_64/fpu/fsetexcptflg.c index 9dec41c1b3..34ea24c061 100644 --- a/sysdeps/x86_64/fpu/fsetexcptflg.c +++ b/sysdeps/x86_64/fpu/fsetexcptflg.c @@ -44,13 +44,13 @@ fesetexceptflag (const fexcept_t *flagp, int excepts) __asm__ ("fldenv %0" : : "m" (temp)); /* And now similarly for SSE. */ - __asm__ ("stmxcsr %0" : "=m" (mxcsr)); + __asm__ ("%vstmxcsr %0" : "=m" (mxcsr)); /* Clear or set relevant flags. */ mxcsr ^= (mxcsr ^ *flagp) & excepts; /* Put the new data in effect. */ - __asm__ ("ldmxcsr %0" : : "m" (mxcsr)); + __asm__ ("%vldmxcsr %0" : : "m" (mxcsr)); /* Success. */ return 0; diff --git a/sysdeps/x86_64/fpu/ftestexcept.c b/sysdeps/x86_64/fpu/ftestexcept.c index f2aae5e66c..39df30fbd2 100644 --- a/sysdeps/x86_64/fpu/ftestexcept.c +++ b/sysdeps/x86_64/fpu/ftestexcept.c @@ -26,7 +26,7 @@ __fetestexcept (int excepts) /* Get current exceptions. */ __asm__ ("fnstsw %0\n" - "stmxcsr %1" : "=m" (temp), "=m" (mxscr)); + "%vstmxcsr %1" : "=m" (temp), "=m" (mxscr)); return (temp | mxscr) & excepts & FE_ALL_EXCEPT; }