From patchwork Tue Oct 21 16:41:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 122380 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 733673858C2D for ; Tue, 21 Oct 2025 16:41:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 733673858C2D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1761064918; bh=Z9nyTmPaT//uGHkg7+xL6zqwSmIeMeNE53CIqihWw88=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From; b=AVnURu3TdIvul0EWlXN7psfbE0ujRVwNdrykIgngWXF/uko8BK8w/Vh42WYs8ayKe kWzELbOiyO8TR+HnqtXiFFlGy0OSlA7ZaleGkIe7XxchcCUx9vmSW0NM2gmIH+KGEH tpuFjWmQ2o+zozkg62xSfcUPj1mlWSJYbvrsew60= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from serval.cherry.relay.mailchannels.net (serval.cherry.relay.mailchannels.net [23.83.223.163]) by sourceware.org (Postfix) with ESMTPS id 933293858CD9 for ; Tue, 21 Oct 2025 16:41:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 933293858CD9 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=sourceware.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=sourceware.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 933293858CD9 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=23.83.223.163 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1761064875; cv=pass; b=bayDV5P7bZQh75NzA16rF9UndyII7aMDIEEkgmISit4D4q/LbdvAjCbwY+mEOp5iSxEs6lmGRvuoxWA3cCuTsnzEX+19epAytmQ6cXL5idwjRc4WoP+URhr/fePYLYMvO4n3kOp3DX8rjnaEDeuqwRRVA3F2nc8REw8g3fs11wU= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1761064875; c=relaxed/simple; bh=89Gjdm+uoeWGheXKxojwwG6nb1DODnfRbs8Ymp6mAfU=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=A+DEDl5A98MpfdLuDmj5szj/Tf6iGMBrqvzaWNKBk56nyLjcZ4tJWOIExYpasbPMPf5yAlO9FDQgHJtgAVnRV3d+hWtRIFhZHlTks0668Y2wBu47Djyls4baldCXktHnuXXWT/yJQsHOCFD0Z6k41mTZEKikI2kpde1xoLrUlP0= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 933293858CD9 X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 87A96438BF; Tue, 21 Oct 2025 16:41:14 +0000 (UTC) Received: from pdx1-sub0-mail-a405.dreamhost.com (trex-green-0.trex.outbound.svc.cluster.local [100.121.87.184]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 1968B43BC8; Tue, 21 Oct 2025 16:41:13 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1761064873; a=rsa-sha256; cv=none; b=N54UZtOZLhJH0oL5pk6gWhIalV/hY6Z9ukMSXgKaA1AT5VVSoPBNVPgfcd9r33rBghjj+N 6f+TCtlfZ6JHwI+HSjzEoHoJw4pchhdM3wKPuADJ3pyYZR+2ZSdPe+HTxESd/BFxTMuvr5 t+2VrMmsi0nmg50U8jSOGWVY63ByIj20oFxpMtvTen7MBTUx7jHWWAH3ZXk53mHv/1oFY+ MPiDV+aVCC5Ql9ZJ8zqQ5ZT3bWOfdps3WTmdvbGbcUUTN2OCshLZdRc0z9PmnpHzAODy/V dfPiBKY726Lz1A4pM0srYsUiINE7XHNaAGdg4Q7oOa3zsbFaEZ7ug+HIJCAisw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1761064873; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Z9nyTmPaT//uGHkg7+xL6zqwSmIeMeNE53CIqihWw88=; b=nEg7vZedUrS3bi7hedH8WGcK6Exb3MEmHJmURgtDfVv5J/J3kQr/i25/WqcG5p9m49jS47 ktPfzZMf/80lcr7EcqbVnIDDVRZPVXP9TmnezagOJss0VHHDAcUJI7zbRLxrnLNl3/vxbt s6xi8swU4IcDt5V/kSzl5ZLeoOuFtxLYpIPmdIFbQjPffdJYKhAmlagWbP4d/zAQztSIDR TxsKW+lFRrkIz5sSLxjQRazaL3FLa+/iH/re80jpQWwxAosSQz8BgnmWiRwWsLYrk6mPtS gr8YUW3KelvFKpcL1zGUPPjAsBwI238nx+dB7y/9+cWHqCQfDr+uyH2TpRURPg== ARC-Authentication-Results: i=1; rspamd-6c854d7645-pv8l7; auth=pass smtp.auth=dreamhost smtp.mailfrom=siddhesh@sourceware.org X-Sender-Id: dreamhost|x-authsender|siddhesh@gotplt.org X-MC-Relay: Neutral X-MC-Copy: stored-urls X-MailChannels-SenderId: dreamhost|x-authsender|siddhesh@gotplt.org X-MailChannels-Auth-Id: dreamhost X-Left-Versed: 5d8870c423f86f03_1761064873245_746765425 X-MC-Loop-Signature: 1761064873245:1494542495 X-MC-Ingress-Time: 1761064873245 Received: from pdx1-sub0-mail-a405.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384) by 100.121.87.184 (trex/7.1.3); Tue, 21 Oct 2025 16:41:13 +0000 Received: from fedora.redhat.com (unknown [38.23.181.90]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: siddhesh@gotplt.org) by pdx1-sub0-mail-a405.dreamhost.com (Postfix) with ESMTPSA id 4crdN843lCz2vjk; Tue, 21 Oct 2025 09:41:12 -0700 (PDT) From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Cc: josmyers@redhat.com, Paul.Zimmermann@inria.fr Subject: [PATCH v3] Simplify powl computation for small integral y [BZ #33411] Date: Tue, 21 Oct 2025 12:41:07 -0400 Message-ID: <20251021164107.390419-1-siddhesh@sourceware.org> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251021030219.189346-1-siddhesh@sourceware.org> References: <20251021030219.189346-1-siddhesh@sourceware.org> MIME-Version: 1.0 X-Spam-Status: No, score=-1165.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LOTSOFHASH, LOCAL_AUTHENTICATION_FAIL_DMARC, LOCAL_AUTHENTICATION_FAIL_SPF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The powl implementation for x86_64 ends up multiplying X once more than necessary and then throwing away that result. This results in an overflow flag being set in cases where there is no overflow. Simplify the relevant portion by special casing the -3 to 3 range and simply multiplying repetitively. Resolves: BZ #33411 Signed-off-by: Siddhesh Poyarekar Reviewed by: Paul Zimmermann --- Changes from v2: - Added test input to auto-libm-test-in instead. I'll push this shortly if there are no objections, since Paul has reviewed and acked the change itself. math/auto-libm-test-in | 3 ++ math/auto-libm-test-out-pow | 65 +++++++++++++++++++++++++++++++++++++ sysdeps/x86_64/fpu/e_powl.S | 56 +++++++++++++++++--------------- 3 files changed, 97 insertions(+), 27 deletions(-) diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 1397d317fb..9bce356252 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -8372,6 +8372,9 @@ pow 0x1.059c76p+0 0x1.ff80bep+11 pow 0x1.7ac7cp+5 23 pow -0x1.7ac7cp+5 23 +# BZ33411 +pow 0x1p+8192 1.0 xfail:binary32 + pown 0 0 pown 0 -0 pown -0 0 diff --git a/math/auto-libm-test-out-pow b/math/auto-libm-test-out-pow index 09ec53e49e..cbca46cd0c 100644 --- a/math/auto-libm-test-out-pow +++ b/math/auto-libm-test-out-pow @@ -44221,3 +44221,68 @@ pow -0x1.7ac7cp+5 23 = pow tonearest ibm128 -0x2.f58f8p+4 0x1.7p+4 : -0xf.fffff29cf02eeec4a7cde7b5a4p+124 : inexact-ok = pow towardzero ibm128 -0x2.f58f8p+4 0x1.7p+4 : -0xf.fffff29cf02eeec4a7cde7b5ap+124 : inexact-ok = pow upward ibm128 -0x2.f58f8p+4 0x1.7p+4 : -0xf.fffff29cf02eeec4a7cde7b5ap+124 : inexact-ok +pow 0x1p+8192 1.0 xfail:binary32 += pow downward binary32 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest binary32 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero binary32 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward binary32 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward binary64 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest binary64 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero binary64 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward binary64 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward intel96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest intel96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero intel96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward intel96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward m68k96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest m68k96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero m68k96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward m68k96 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward binary128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest binary128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero binary128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward binary128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward ibm128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow tonearest ibm128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow towardzero ibm128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow upward ibm128 0xf.fffffp+124 0x1p+0 : 0xf.fffffp+124 : xfail:binary32 inexact-ok += pow downward binary64 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow tonearest binary64 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow towardzero binary64 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow upward binary64 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow downward intel96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow tonearest intel96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow towardzero intel96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow upward intel96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow downward m68k96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow tonearest m68k96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow towardzero m68k96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow upward m68k96 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow downward binary128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow tonearest binary128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow towardzero binary128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow upward binary128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow downward ibm128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow tonearest ibm128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow towardzero ibm128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow upward ibm128 0xf.ffffffffffff8p+1020 0x1p+0 : 0xf.ffffffffffff8p+1020 : xfail:binary32 inexact-ok += pow downward intel96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow tonearest intel96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow towardzero intel96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow upward intel96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow downward m68k96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow tonearest m68k96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow towardzero m68k96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow upward m68k96 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow downward binary128 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow tonearest binary128 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow towardzero binary128 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow upward binary128 0x1p+8192 0x1p+0 : 0x1p+8192 : xfail:binary32 inexact-ok += pow downward binary128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow tonearest binary128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow towardzero binary128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow upward binary128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow downward ibm128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow tonearest ibm128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow towardzero ibm128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok += pow upward ibm128 0xf.ffffffffffffbffffffffffffcp+1020 0x1p+0 : 0xf.ffffffffffffbffffffffffffcp+1020 : xfail:binary32 inexact-ok diff --git a/sysdeps/x86_64/fpu/e_powl.S b/sysdeps/x86_64/fpu/e_powl.S index 620ef765a7..39f77480e8 100644 --- a/sysdeps/x86_64/fpu/e_powl.S +++ b/sysdeps/x86_64/fpu/e_powl.S @@ -144,39 +144,41 @@ ENTRY(__ieee754_powl) fcomip %st(1), %st // 4 : y : x fstp %st(0) // y : x jnc 3f - mov -8(%rsp),%eax - mov -4(%rsp),%edx - orl $0, %edx + + /* Here onwards, it's just integral y in range [-3, 3]. */ + movq -8(%rsp),%rax + orq $0, %rax fstp %st(0) // x jns 4f // y >= 0, jump fdivrl MO(one) // 1/x (now referred to as x) - negl %eax - adcl $0, %edx - negl %edx + negq %rax 4: fldl MO(one) // 1 : x - fxch - /* If y is even, take the absolute value of x. Otherwise, - ensure all intermediate values that might overflow have the - sign of x. */ + /* y range is further reduced to [0, 3]. Simply walk through the + options. First up, 0 and 1. */ + test %eax, %eax + jz 6f + fxch // x : 1 + subl $1, %eax + jz 6f + + /* Finally, y == 2 and 3. For y == 3 we do |x| * x * |x| because x * x + and |x| * |x| decay faster towards infinity compared to x * |x|. */ + fld %st // x : x : 1 + fabs // |x| : x : 1 + fxch // x : |x| : 1 + fld %st(1) // |x| : x : |x| : 1 testb $1, %al - jnz 6f - fabs - -6: shrdl $1, %edx, %eax - jnc 5f - fxch - fabs - fmul %st(1) // x : ST*x - fxch -5: fld %st // x : x : ST*x - fabs // |x| : x : ST*x - fmulp // |x|*x : ST*x - shrl $1, %edx - movl %eax, %ecx - orl %edx, %ecx - jnz 6b - fstp %st(0) // ST*x + jz 7f + fmulp %st(2) // x : |x| * |x| : 1 + fstp %st(0) // |x| * |x| : 1 + jmp 6f +7: fmulp // |x| * x : |x| : 1 + fmulp // |x| * x * |x| : 1 + + /* We come here with the stack as RES : , so pop off + . */ +6: fstp %st(1) LDBL_CHECK_FORCE_UFLOW_NONNAN ret