From patchwork Fri Sep 12 03:40:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 120120 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AD65A3857C7B for ; Fri, 12 Sep 2025 03:40:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AD65A3857C7B Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=m6iMa8KB X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 013D93858023 for ; Fri, 12 Sep 2025 03:40:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 013D93858023 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 013D93858023 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::629 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757648410; cv=none; b=oYZ9lyCE/tEh2B9W9Ro3/+0CUoj10BKeNRIm69d1DoEjb2i7E4b4XG3llP4+/2JWrMrcpXvoZd4BkKJjJaPncgTvw95XcXOhBiVUpODqC3ST2BamNueCND/W8OJRI5jqHcV4Py3w764rQe8tHNcbEwNgJ8o8kHSIB7a4oJdk2ys= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1757648410; c=relaxed/simple; bh=HApRa+egzqtARVlWQAuiLeL4izo80DFk7FOwIssiU8g=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=J0kRQRTGdDyI0rZvFncR6fGSY6TK4LhdhXcrX4lpOj/c51o6mOd6gfmMVfAT9aSbbTlzaKfO3DMcXoK7kqYpq7ToHnl1IezNYWOkifbajMRyvQNKGfAjjvyvXl7lVDzKQR+Wf4ZsiOboTt7qvmKEHj4wv1byi11bUaF/VN4L5Mg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 013D93858023 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-2570bf6058aso19441195ad.0 for ; Thu, 11 Sep 2025 20:40:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1757648409; x=1758253209; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:from:to:cc:subject:date:message-id:reply-to; bh=txJ2erhLZNAo6f/CjZbZph20UuPCMyOwL2knmKxhqS0=; b=m6iMa8KBLufKksSP0gJrFQZUbIhwk0922gDapsNyUDv9GKLh936JabbOaApVOJDtsW Qc1+N4jVGI0tUU1LBBKBt9o/6UM5c5T1ydfwI9bC1ygaz/rISI3E5TwrwMdS3EZ3YS/6 lQEbyWopl9WBKzXUxDsgkPvcUtE7nRcYt2s80nI/ydXY/83QGZ4bziw0To9ppEu1P4Q+ jze7PUFt3gknbOnG6xs6vdqNBEb042h3+vV27rpMLaEkGtfs3z/Shml4dl8TCkAOgWCG rSGKw8DDMSwWRBdUTX1JCEUgABJDUoeRZzk+/nnTabeoaSfPnnFl2XStz/CuGozn6vpO wpgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757648409; x=1758253209; h=content-transfer-encoding:mime-version:message-id:date:subject:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=txJ2erhLZNAo6f/CjZbZph20UuPCMyOwL2knmKxhqS0=; b=DmEHionEi9Inn8i5tjyROuRzdYOIro0FhCVubYNBNCD62SEMrh7UGGCf3uWliVfDra Ua2wEg8eH5Q104z0CjEWQIz/vKa01dCxSuQSRrs/EEwEVRO480yrF1DfYzokIKstF4SJ WxBEe4ySyjy+gwvGZj3ZrIVHTbjVa/7eWBu5Y7/4pIj775aPkC5eXPDFSZlHmw/MqOj5 O92LlN6ZqOoZyIlULiXRZ9tl1B9MvIdePhFQMsb987IKmKQcvFcfrH8q7iK40+74g3Or A8DbujN2bEvyF2YkEVqDRJLjDHSgXN7K+i1Swt/hTZBT5m0NulTPbIbdPsGvk5iA4YJf BEHA== X-Gm-Message-State: AOJu0YyQSmyN1FyDddcX4Kimd5MfZ/K1yWJk754HYuw1s42ptxq6x3+O sI+/ye6jDiy62yo1z2w54qcjCnHuXyapaTsPkhOnSstwT7yGa9ILuG7J5cyNoQ== X-Gm-Gg: ASbGnct6j5hsrmASkCdyMK4m3dnIG4qEkG1G/sT4+Q3t0Gp1sJ+W6sIWL7TPWESz1AQ BLEdPvsNIYyhxS+MH3Zv7/JTB53ayTlLrqEwDjrQmoKEgPLuY6LpsKm7wuR89wS3MgcJy5Cx6z2 8TyqOzjpQ7/g2n+uI+iIRTV5b0YCsnoYBjEPpa650Rp+WZ0v1Gm+6m9/to+2yIomkR/A9gq8yAk NhF8LlyTncWUTT2sgD94p6Tww83oTmXYLp5C4RHQKXTLgHG/aSb9scOJlFmjYUEYEDAuZtBOqK/ bY3P9J1OZ4Igq0teeR06g9JkLCzCyRpyhNRUNjrjIcJ0nT/ydzMWpKUXlWOvZyIClKdV82kwa9z iu7fYAJV3fTOnxo6MxkGpwKk/8JtBz+hAidFEGe1jc6rW+cRHXg== X-Google-Smtp-Source: AGHT+IF7VckEF0pWOGNveoRWpHwBrEkLNeSThLwA593qjXr2+wf59YwiZy8eMCsTpYLe+XgWi6cdgA== X-Received: by 2002:a17:902:ce07:b0:24a:af68:72f0 with SMTP id d9443c01a7336-25d2647047cmr15077365ad.34.1757648408537; Thu, 11 Sep 2025 20:40:08 -0700 (PDT) Received: from gnu-cfl-3.localdomain ([172.59.160.12]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-32dd62201edsm4427434a91.10.2025.09.11.20.40.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Sep 2025 20:40:08 -0700 (PDT) Received: from gnu-cfl-3.localdomain (localhost [127.0.0.1]) by gnu-cfl-3.localdomain (Postfix) with ESMTP id 3440374005C for ; Thu, 11 Sep 2025 20:40:07 -0700 (PDT) From: "H.J. Lu" To: libc-alpha@sourceware.org Subject: [PATCH] x86-64: Don't use asm statement for trunc/truncf Date: Thu, 11 Sep 2025 20:40:02 -0700 Message-ID: <20250912034002.3577435-1-hjl.tools@gmail.com> X-Mailer: git-send-email 2.51.0 MIME-Version: 1.0 X-Spam-Status: No, score=-3015.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Don't use asm statement for trunc/truncf if compiler can inline them with -Os. It removes one register move with GCC 16: __modff_sse41: __modff_sse41: .LFB23: .LFB23: .cfi_startproc .cfi_startproc endbr64 endbr64 subq $24, %rsp subq $24, %rsp .cfi_def_cfa_offset 32 .cfi_def_cfa_offset 32 movq %fs:40, %rax movq %fs:40, %rax movq %rax, 8(%rsp) movq %rax, 8(%rsp) xorl %eax, %eax xorl %eax, %eax movd %xmm0, %eax movd %xmm0, %eax addl %eax, %eax addl %eax, %eax cmpl $-16777216, %eax cmpl $-16777216, %eax je .L7 je .L7 > movaps %xmm0, %xmm3 movaps %xmm0, %xmm4 movaps %xmm0, %xmm4 movss .LC0(%rip), %xmm2 | movss .LC0(%rip), %xmm1 movaps %xmm2, %xmm3 | movaps %xmm1, %xmm2 andps %xmm0, %xmm2 | roundss $11, %xmm3, %xmm3 roundss $11, %xmm0, %xmm1 | subss %xmm3, %xmm4 subss %xmm1, %xmm4 | andps %xmm0, %xmm1 andnps %xmm4, %xmm3 | andnps %xmm4, %xmm2 orps %xmm3, %xmm2 | orps %xmm2, %xmm1 .L3: .L3: movss %xmm1, (%rdi) | movss %xmm3, (%rdi) movq 8(%rsp), %rax movq 8(%rsp), %rax subq %fs:40, %rax subq %fs:40, %rax jne .L8 jne .L8 movaps %xmm2, %xmm0 | movaps %xmm1, %xmm0 addq $24, %rsp addq $24, %rsp .cfi_remember_state .cfi_remember_state .cfi_def_cfa_offset 8 .cfi_def_cfa_offset 8 ret ret Signed-off-by: H.J. Lu --- config.h.in | 3 ++ sysdeps/x86/fpu/math_private.h | 24 ++++++++++------ sysdeps/x86_64/configure | 52 ++++++++++++++++++++++++++++++++++ sysdeps/x86_64/configure.ac | 31 ++++++++++++++++++++ 4 files changed, 102 insertions(+), 8 deletions(-) diff --git a/config.h.in b/config.h.in index 8b4077f578..af2ab31379 100644 --- a/config.h.in +++ b/config.h.in @@ -308,4 +308,7 @@ /* Define if -mapxf is enabled by default on x86. */ #undef HAVE_X86_APX +/* Define if -Os inlines trunc on x86. */ +#undef HAVE_X86_OS_INLINE_TRUNC + #endif diff --git a/sysdeps/x86/fpu/math_private.h b/sysdeps/x86/fpu/math_private.h index d30d580cea..610ae364f3 100644 --- a/sysdeps/x86/fpu/math_private.h +++ b/sysdeps/x86/fpu/math_private.h @@ -33,27 +33,35 @@ __NTH (__ieee754_atan2l (long double y, long double x)) __extern_always_inline double __trunc (double x) { -#ifdef __AVX__ +#if HAVE_X86_OS_INLINE_TRUNC + return trunc (x); +#else +# ifdef __AVX__ asm ("vroundsd $11, %1, %1, %0" : "=v" (x) : "v" (x)); -#elif defined __SSE4_1__ +# elif defined __SSE4_1__ asm ("roundsd $11, %1, %0" : "=x" (x) : "x" (x)); -#else +# else x = trunc (x); -#endif +# endif return x; +#endif } __extern_always_inline float __truncf (float x) { -#ifdef __AVX__ +#if HAVE_X86_OS_INLINE_TRUNC + return truncf (x); +#else +# ifdef __AVX__ asm ("vroundss $11, %1, %1, %0" : "=v" (x) : "v" (x)); -#elif defined __SSE4_1__ +# elif defined __SSE4_1__ asm ("roundss $11, %1, %0" : "=x" (x) : "x" (x)); -#else +# else x = truncf (x); -#endif +# endif return x; +#endif } #endif diff --git a/sysdeps/x86_64/configure b/sysdeps/x86_64/configure index 32324f62da..6eb9b8d200 100644 --- a/sysdeps/x86_64/configure +++ b/sysdeps/x86_64/configure @@ -289,6 +289,58 @@ fi config_vars="$config_vars have-x86-apx = $libc_cv_x86_have_apx" +conftest_code=" +extern float truncf (float __x) __attribute__ ((__nothrow__,__const__)); + +float +tf (float x) +{ + return truncf (x); +} +" + +cat > conftest.c <&5 +printf %s "checking if -Os inlines trunc... " >&6; } +if test ${libc_cv_cc_x86_inline_trunc+y} +then : + printf %s "(cached) " >&6 +else case e in #( + e) if { ac_try='${CC-cc} $CFLAGS $CPPFLAGS -S -Os -mavx conftest.c -o conftest 1>&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5 + (eval $ac_try) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } + then + +libc_cv_cc_x86_inline_trunc=no +if grep -E -q "roundss" conftest; then + libc_cv_cc_x86_inline_trunc=yes +fi + + else + +echo "failed to check if -Os inlines trunc." +rm -f conftest* +exit 1 + + fi ;; +esac +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $libc_cv_cc_x86_inline_trunc" >&5 +printf "%s\n" "$libc_cv_cc_x86_inline_trunc" >&6; } +rm -f conftest* +if test "$libc_cv_cc_x86_inline_trunc" = yes; then + printf "%s\n" "#define HAVE_X86_OS_INLINE_TRUNC 1" >>confdefs.h + +else + printf "%s\n" "#define HAVE_X86_OS_INLINE_TRUNC 0" >>confdefs.h + +fi + libc_cv_support_sframe=yes test -n "$critic_missing" && as_fn_error $? " diff --git a/sysdeps/x86_64/configure.ac b/sysdeps/x86_64/configure.ac index a00958e219..a3651f7f5d 100644 --- a/sysdeps/x86_64/configure.ac +++ b/sysdeps/x86_64/configure.ac @@ -104,6 +104,37 @@ if test $libc_cv_x86_have_apx = yes; then fi LIBC_CONFIG_VAR([have-x86-apx], [$libc_cv_x86_have_apx]) +conftest_code=" +extern float truncf (float __x) __attribute__ ((__nothrow__,__const__)); + +float +tf (float x) +{ + return truncf (x); +} +" +dnl Check if CC inlines trunc with -Os. +LIBC_TRY_CC_COMMAND([if -Os inlines trunc], + [$conftest_code], + [-S -Os -mavx], + libc_cv_cc_x86_inline_trunc, + [ +libc_cv_cc_x86_inline_trunc=no +if grep -E -q "roundss" conftest; then + libc_cv_cc_x86_inline_trunc=yes +fi +], +[ +echo "failed to check if -Os inlines trunc." +rm -f conftest* +exit 1 +]) +if test "$libc_cv_cc_x86_inline_trunc" = yes; then + AC_DEFINE(HAVE_X86_OS_INLINE_TRUNC, 1) +else + AC_DEFINE(HAVE_X86_OS_INLINE_TRUNC, 0) +fi + libc_cv_support_sframe=yes test -n "$critic_missing" && AC_MSG_ERROR([