From patchwork Fri Sep 12 12:57:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 120138 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7E8EC3857B8F for ; Fri, 12 Sep 2025 12:58:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E8EC3857B8F Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=KIs+bSwN X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by sourceware.org (Postfix) with ESMTPS id 232EA3857C5D for ; Fri, 12 Sep 2025 12:57:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 232EA3857C5D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 232EA3857C5D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::634 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757681829; cv=none; b=ikl/bhrfmxX43t9cy017AO9TjP+EUVyXlSnzmpJOMharaFf20wMAs9IpbcnIP7lALLJdNVeWBgRamk7UT1+EaufbbJr7XZwacpaNCOHlrCFnLMgqc91YQMhgFQSFcBsJLNw59MrSJn+opv+aW+j1PCnta+5xpXgfGw0I3mGQnB8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757681829; c=relaxed/simple; bh=kyLEdYXiHib6/NokjK6I3PoiWZu0iwvpRYt5VPII+/I=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=rgyZ+wKfCRPQtiHJc9Xh7qvjTW5HFHAbCTEFilYyFZxJDgdcu7RQ6HarLK1etf7eOkG+5fTlvOOrdSicK39wA31iEsjabfP8efufuh0wSeGJbsnAaT4fJExdrUXRrW0EarxNpkAS2Ee+1Kf+eeP9ojCd2410wSxCK1haKGgni5o= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 232EA3857C5D Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-244582738b5so15862445ad.3 for ; Fri, 12 Sep 2025 05:57:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757681828; x=1758286628; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=NTqELFDqreb+By42bPv+WhJzlC5EnueqXD6WJtjwTuU=; b=KIs+bSwNBJqS6ZrPAbWxtVctcOmpydkSDnfLKIHPIad/tSOpDQnhu2AyOTL7qOxLTE /597mJYY7ta4lB+s40YnMVWjZ5cagrL6yVndGyabpSzN6/LCR/VzghF8Vr5YSDznw+6B gtRNhFZTpCkeZhdmacP1kAM2Q+JESXIa3BZwWHZbe/GWshzT6hraw30FOIaXeY9t6puL gfyjrN7/KXaa4qYifW+0O1tJHj9kQfKI/dO8cJIvg2WEVNRae3yDeuepUaDpUf+NI8eG HmnuKnmnoTNN2YDYz58bCsZoJEOyGOVJztUzr1bLGL5ZfVgiwb9firJKM0xE0QqlYwxw 0bvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757681828; x=1758286628; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NTqELFDqreb+By42bPv+WhJzlC5EnueqXD6WJtjwTuU=; b=feIdinNiHLwU8BSWlOa4Cpe/kERNr1EytyLUdZVpjIlxLlXPqt9Zmlu68ed7bnLfER OgUiV889UNuTF81CdBGeihrESBIa/sDG+wQh7z5rm3ul08fbL2nhxTfJx4L0DkrFaPeQ byJQaPO/d/wiG8GGheFs77FP9MFovrSl2Hqk+uZiT47AJsZ+dOr4TNkE5xwrSqv8qR+M Cxfyg3W61IqZfxjLeahu1VtdCOwNnE2Vwm9FYAJPTCJ5xN4El5yhRIp7dPYYPA9RTmNo HGdBjlrZGelziF98ZUPKPcC+i/zqCZNXAL8EhIPubluRoW3n5Wo3N2p/MA2mrXAaQuor kCYQ== X-Gm-Message-State: AOJu0YwSEBvWV4k/a1PRMkrUYoQmeyb7H43MeLgOWfjkrTizzIR60ZLr gpEI/0erwDGxGiZ53cEF60QTi3G0QaU2bNm6poLiyJh12JsbydXmVG+rJDaxUA== X-Gm-Gg: ASbGncvQXxdhxPRnqdKqgEP9LwpoZiAEfRD5gJ8pmA2Ioqa55Gs5QCXhWyqCubyyoIW 1frmHVFY9cVdmwbRI3boL1wQQ1z0bXPqz8lqZex8aPz9V1gJILcOzsbJu08N7YjqN72TuxNIXPA /4a0uDP6PpnpKhHeEn6g6Qry2IbI1uZ6rvhnONzuDhs1k/CI5hUGIcObBJZvw8L66gUUUuqIsCr wckvRJBNEvE5W1MtK62G9YcWL9jUl2X9ROJGKSSBxU8igYJFs6GuD895LDfp6SlO8uQio3w8Wjv RywziiLepcPH/FZpYkRTVzNCvsmgF3USldnzMwZqJI5UqkeEGjTw6Km9BNvwpt42g0wFDppHk52 U+sUC+lc+7qOG4RLSioeuxTBSzng2I0YP2pIXddMCYszsg9yrpA== X-Google-Smtp-Source: AGHT+IEj3va7bgYVzXrlnl2ciGiTE1uxzZDqcNY18GCxSCoE75Q38QX0oR2AsxD1WHSPMaBZBEYIsA== X-Received: by 2002:a17:903:2f0f:b0:24b:1589:5054 with SMTP id d9443c01a7336-25d250961b0mr33580545ad.23.1757681828038; Fri, 12 Sep 2025 05:57:08 -0700 (PDT) Received: from gnu-cfl-3.localdomain ([172.59.160.12]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-25c3a84ec66sm47652205ad.77.2025.09.12.05.57.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Sep 2025 05:57:07 -0700 (PDT) Received: from gnu-cfl-3.localdomain (localhost [127.0.0.1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id 71408740153; Fri, 12 Sep 2025 05:57:06 -0700 (PDT) From: "H.J. Lu" To: libc-alpha@sourceware.org Cc: fw@deneb.enyo.de Subject: [PATCH v2] x86-64: Don't use asm statement for trunc/truncf Date: Fri, 12 Sep 2025 05:57:01 -0700 Message-ID: <20250912125701.3584407-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3015.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Compiler inlines trunc and truncf with SSE4.1. But older versions of GCC doesn't inline them with -Os: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121861 Don't use asm statement for trunc and truncf if compiler can inline them with -Os. It removes one register move with GCC 16: __modff_sse41: __modff_sse41: .LFB23: .LFB23: .cfi_startproc .cfi_startproc endbr64 endbr64 subq $24, %rsp subq $24, %rsp .cfi_def_cfa_offset 32 .cfi_def_cfa_offset 32 movq %fs:40, %rax movq %fs:40, %rax movq %rax, 8(%rsp) movq %rax, 8(%rsp) xorl %eax, %eax xorl %eax, %eax movd %xmm0, %eax movd %xmm0, %eax addl %eax, %eax addl %eax, %eax cmpl $-16777216, %eax cmpl $-16777216, %eax je .L7 je .L7 > movaps %xmm0, %xmm3 movaps %xmm0, %xmm4 movaps %xmm0, %xmm4 movss .LC0(%rip), %xmm2 | movss .LC0(%rip), %xmm1 movaps %xmm2, %xmm3 | movaps %xmm1, %xmm2 andps %xmm0, %xmm2 | roundss $11, %xmm3, %xmm3 roundss $11, %xmm0, %xmm1 | subss %xmm3, %xmm4 subss %xmm1, %xmm4 | andps %xmm0, %xmm1 andnps %xmm4, %xmm3 | andnps %xmm4, %xmm2 orps %xmm3, %xmm2 | orps %xmm2, %xmm1 .L3: .L3: movss %xmm1, (%rdi) | movss %xmm3, (%rdi) movq 8(%rsp), %rax movq 8(%rsp), %rax subq %fs:40, %rax subq %fs:40, %rax jne .L8 jne .L8 movaps %xmm2, %xmm0 | movaps %xmm1, %xmm0 addq $24, %rsp addq $24, %rsp .cfi_remember_state .cfi_remember_state .cfi_def_cfa_offset 8 .cfi_def_cfa_offset 8 ret ret Signed-off-by: H.J. Lu --- config.h.in | 3 ++ sysdeps/x86/fpu/math_private.h | 24 ++++++++++------ sysdeps/x86_64/configure | 52 ++++++++++++++++++++++++++++++++++ sysdeps/x86_64/configure.ac | 31 ++++++++++++++++++++ 4 files changed, 102 insertions(+), 8 deletions(-) diff --git a/config.h.in b/config.h.in index 8b4077f578..af2ab31379 100644 --- a/config.h.in +++ b/config.h.in @@ -308,4 +308,7 @@ /* Define if -mapxf is enabled by default on x86. */ #undef HAVE_X86_APX +/* Define if -Os inlines trunc on x86. */ +#undef HAVE_X86_OS_INLINE_TRUNC + #endif diff --git a/sysdeps/x86/fpu/math_private.h b/sysdeps/x86/fpu/math_private.h index d30d580cea..610ae364f3 100644 --- a/sysdeps/x86/fpu/math_private.h +++ b/sysdeps/x86/fpu/math_private.h @@ -33,27 +33,35 @@ __NTH (__ieee754_atan2l (long double y, long double x)) __extern_always_inline double __trunc (double x) { -#ifdef __AVX__ +#if HAVE_X86_OS_INLINE_TRUNC + return trunc (x); +#else +# ifdef __AVX__ asm ("vroundsd $11, %1, %1, %0" : "=v" (x) : "v" (x)); -#elif defined __SSE4_1__ +# elif defined __SSE4_1__ asm ("roundsd $11, %1, %0" : "=x" (x) : "x" (x)); -#else +# else x = trunc (x); -#endif +# endif return x; +#endif } __extern_always_inline float __truncf (float x) { -#ifdef __AVX__ +#if HAVE_X86_OS_INLINE_TRUNC + return truncf (x); +#else +# ifdef __AVX__ asm ("vroundss $11, %1, %1, %0" : "=v" (x) : "v" (x)); -#elif defined __SSE4_1__ +# elif defined __SSE4_1__ asm ("roundss $11, %1, %0" : "=x" (x) : "x" (x)); -#else +# else x = truncf (x); -#endif +# endif return x; +#endif } #endif diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure index 32324f62da..9e98d020ae 100644 --- a/sysdeps/x86_64/configure +++ b/sysdeps/x86_64/configure @@ -289,6 +289,58 @@ fi config_vars="$config_vars have-x86-apx = $libc_cv_x86_have_apx" +conftest_code=" +extern float truncf (float __x) __attribute__ ((__nothrow__,__const__)); + +float +tf (float x) +{ + return truncf (x); +} +" + +cat > conftest.c <&5 +printf %s "checking if -Os inlines trunc... " >&6; } +if test ${libc_cv_cc_x86_inline_trunc+y} +then : + printf %s "(cached) " >&6 +else case e in #( + e) if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS -S -Os -msse4.1 conftest.c -o conftest 1>&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } + then + +libc_cv_cc_x86_inline_trunc=no +if grep -E -q "roundss" conftest; then + libc_cv_cc_x86_inline_trunc=yes +fi + + else + +echo "failed to check if -Os inlines trunc." +rm -f conftest* +exit 1 + + fi ;; +esac +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_cv_cc_x86_inline_trunc" >&5 +printf "%s\n" "$libc_cv_cc_x86_inline_trunc" >&6; } +rm -f conftest* +if test "$libc_cv_cc_x86_inline_trunc" = yes; then + printf "%s\n" "#define HAVE_X86_OS_INLINE_TRUNC 1" >>confdefs.h + +else + printf "%s\n" "#define HAVE_X86_OS_INLINE_TRUNC 0" >>confdefs.h + +fi + libc_cv_support_sframe=yes test -n "$critic_missing" && as_fn_error $? " diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac index a00958e219..848dc4e170 100644 --- a/sysdeps/x86_64/configure.ac +++ b/sysdeps/x86_64/configure.ac @@ -104,6 +104,37 @@ if test $libc_cv_x86_have_apx = yes; then fi LIBC_CONFIG_VAR([have-x86-apx], [$libc_cv_x86_have_apx]) +conftest_code=" +extern float truncf (float __x) __attribute__ ((__nothrow__,__const__)); + +float +tf (float x) +{ + return truncf (x); +} +" +dnl Check if CC inlines trunc with -Os. +LIBC_TRY_CC_COMMAND([if -Os inlines trunc], + [$conftest_code], + [-S -Os -msse4.1], + libc_cv_cc_x86_inline_trunc, + [ +libc_cv_cc_x86_inline_trunc=no +if grep -E -q "roundss" conftest; then + libc_cv_cc_x86_inline_trunc=yes +fi +], +[ +echo "failed to check if -Os inlines trunc." +rm -f conftest* +exit 1 +]) +if test "$libc_cv_cc_x86_inline_trunc" = yes; then + AC_DEFINE(HAVE_X86_OS_INLINE_TRUNC, 1) +else + AC_DEFINE(HAVE_X86_OS_INLINE_TRUNC, 0) +fi + libc_cv_support_sframe=yes test -n "$critic_missing" && AC_MSG_ERROR([