From patchwork Tue Oct 14 12:10:31 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 121848 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EA1B63858D38 for ; Tue, 14 Oct 2025 12:14:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EA1B63858D38 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=EENBIFSf X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id A3E953858C54 for ; Tue, 14 Oct 2025 12:12:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A3E953858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A3E953858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::631 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443922; cv=none; b=MzHy56Bqlkr+cNYg6Nys1mRceFtXnVkVT7C3RU7oISTrBxT4EV59afE3ZxwWazapshLV/aKdhEVWrxrMSLouxU2WzkTLYAVZ8sVNb8sDF1jUYI1jfDWjY1RBR27wH6TMlyWY9nUo6MJxQhsxNLbRJS0OmU0qq2JMyl8NDxOHDLk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443922; c=relaxed/simple; bh=ct8ye1QlBdhQV+LMvUYOAvdoikkhMbQwlGoHafhu10E=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Wzvhy8DohhRSkLTpelGG1RHLYLPQfyQtsbL2Fx9+9QPdBxnxcIEtU6dO9vQciboU53yLDtTJi59wKs1PlCN14T+GNGU5ykBCXvZuFDS+q0Wtfjw6ddI+fxJGDemamqfYdNMNJe8hNJvE53eHFl8MAuTNKjySZkJD8Yl4XSgN0rE= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A3E953858C54 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-27eceb38eb1so59884345ad.3 for ; Tue, 14 Oct 2025 05:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1760443921; x=1761048721; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ud2iWbU1dyuBcxcnqCxT9/kjkM+hdVuA4b1MN+P3g48=; b=EENBIFSfC4SHjzIbZS0TqduSLojh5V57nv5wTQd85aRDBi0889iU9L0fR6uQxRyYbN I9bsz0Py03POdVJ6I82cPRLbmagqODCy/WPC6AkEOAA7aCOce3V6L9/ClqPNREcLAx7k UNNjJQmFhnaPPqsRZJXsOYbIq+7kANu7j0kYtdrpdlP6qWaGkkd79geL83uIr6xPUmeT E+p6dFpB2TL9ixEo/IBeFNhK4hnr/HXlpEeOpcGUFnVIOo91cwi9swUvvSTEZsuPxMDL mX6vZFWXCqxIeBB3jhWEejn98ZSBd+rkAhoIkWWGVSFMBPpMAE1Ia+K+SF8IH/yMLkjO DheA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760443921; x=1761048721; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ud2iWbU1dyuBcxcnqCxT9/kjkM+hdVuA4b1MN+P3g48=; b=pVxyUFrtY0Mj8QTGV9rfWqqNqjHpuHsADaP+amFr4yNZstA5QpEcMqh2ya5DjkVYSd 0yZAbfgAzuFaeK1P575HvAD0Dg/6p84c3OTV2hLyI2HR28e1uf2hCybJjhR18L0YGZUZ 3vTS+4vI96dA3V+34iM+lKLPLhmzT1OnyGFYcN1sxqDW3Yxl/Lb2mOWym9nxmhGWmhkA VmVw1Z4ZieHwKxhA96SUby42krS285/zWlX5yTko2FRMPT1KJkyY/mmzcfh1vX0eN8yM gCP8N6SVGn6gBAjVl2SbMDE41hXFrtLz0wETWlKOY5Lh+B01au/DEMndZDbCVM4ZM6Js FsHA== X-Gm-Message-State: AOJu0Yyh0DPlb9yfyfiD06iYjWMQG43DybAk8iULAxD3jNtXG1sikvsB OUmC87Ai1kT917J5z6ziZIkYj2IRHgkJauKQhrNyo63kdDfUqGKmXZBWRoKU4898MPvfnDsLorX MHn5J X-Gm-Gg: ASbGncuqwDg/B+Q3oMjjuWiBDsnwIvYpQBhBPLGQhIFuNqv/85Jn/U/1zuZPQqQ/7f4 SYPQvIBvluhyhM1ppGOJhjJyYj4gv+jGV8xKb1n+uBRtwcO1w1t29vitHt4KXgjrjrQ3W149M8j PLcXK6/9oVMrIp6oj+rb3ee2UnRg5mNYht+WG18SjIhZE/dxjevfjsyDXJ1AFugjK0M0OfVQ8/6 gj4IimUXWMLMcjiE3ldXkyiYCpczF0p0ucMx3WirLxlsjbtKMT3ZZK7A56GmRov/McBZFhyCmGo vrIu6lWucTx830F6kWn2JgnUJUaMSDRu/X549TBUHMj1lFGj+1wz4i2dPa+RC6DlQH/CRpuHu4p cyak4MbXDQloOhqAxambL2R2uLbGGyKVDoHX7RLhAHK+7Wy3LE9PKmsJXyA== X-Google-Smtp-Source: AGHT+IGfeEZHADiVq7VmoAwk+//TcKCIq36YrSrahLiTmIiP7WGNZGRVriH4WXPNG2uo/qv+K8m8SA== X-Received: by 2002:a17:902:e94e:b0:270:4aa8:2dcc with SMTP id d9443c01a7336-2902737c5e9mr300885845ad.19.1760443921063; Tue, 14 Oct 2025 05:12:01 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:bf74:7212:598e:9e48:2320]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29034de6c07sm163321505ad.1.2025.10.14.05.11.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 05:12:00 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Wilco Dijkstra , Paul Zimmermann , DJ Delorie Subject: [PATCH v2 1/4] math: Optimize fma call on asinpif Date: Tue, 14 Oct 2025 09:10:31 -0300 Message-ID: <20251014121153.1058692-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> References: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The fma is required only for x == +/-0x1.6371e8p-4f in FE_TOWARDZERO to provide correctly rounded results. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra --- math/auto-libm-test-in | 2 ++ math/auto-libm-test-out-asinpi | 50 ++++++++++++++++++++++++++++++ sysdeps/ieee754/flt-32/s_asinpif.c | 9 ++++-- 3 files changed, 59 insertions(+), 2 deletions(-) diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 198dac54551..7e8cb4cef83 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -524,6 +524,8 @@ asinpi 0x1.f1c012p-1 asinpi -0x1.8805060cb885cp-3 asinpi 0x8.14d7e32b5c44642p-4 asinpi -0xa.7ca6c96caefe80b9d757de58a578p-4 +asinpi 0x1.6371e8p-4 +asinpi -0x1.6371e8p-4 atan inf atan -inf diff --git a/math/auto-libm-test-out-asinpi b/math/auto-libm-test-out-asinpi index 31fe8064116..80f83eb6541 100644 --- a/math/auto-libm-test-out-asinpi +++ b/math/auto-libm-test-out-asinpi @@ -2780,3 +2780,53 @@ asinpi -0xa.7ca6c96caefe80b9d757de58a578p-4 = asinpi tonearest ibm128 -0xa.7ca6c96caefe80b9d757de58a8p-4 : -0x3.a3e55379cf8d0f73aac00cc2e5p-4 : inexact-ok = asinpi towardzero ibm128 -0xa.7ca6c96caefe80b9d757de58a8p-4 : -0x3.a3e55379cf8d0f73aac00cc2e4p-4 : inexact-ok = asinpi upward ibm128 -0xa.7ca6c96caefe80b9d757de58a8p-4 : -0x3.a3e55379cf8d0f73aac00cc2e4p-4 : inexact-ok +asinpi 0x1.6371e8p-4 += asinpi downward binary32 0x1.6371e8p-4 : 0x7.148bcp-8 : inexact-ok += asinpi tonearest binary32 0x1.6371e8p-4 : 0x7.148bc8p-8 : inexact-ok += asinpi towardzero binary32 0x1.6371e8p-4 : 0x7.148bcp-8 : inexact-ok += asinpi upward binary32 0x1.6371e8p-4 : 0x7.148bc8p-8 : inexact-ok += asinpi downward binary64 0x1.6371e8p-4 : 0x7.148bc7fffff78p-8 : inexact-ok += asinpi tonearest binary64 0x1.6371e8p-4 : 0x7.148bc7fffff7cp-8 : inexact-ok += asinpi towardzero binary64 0x1.6371e8p-4 : 0x7.148bc7fffff78p-8 : inexact-ok += asinpi upward binary64 0x1.6371e8p-4 : 0x7.148bc7fffff7cp-8 : inexact-ok += asinpi downward intel96 0x1.6371e8p-4 : 0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi tonearest intel96 0x1.6371e8p-4 : 0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi towardzero intel96 0x1.6371e8p-4 : 0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi upward intel96 0x1.6371e8p-4 : 0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi downward m68k96 0x1.6371e8p-4 : 0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi tonearest m68k96 0x1.6371e8p-4 : 0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi towardzero m68k96 0x1.6371e8p-4 : 0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi upward m68k96 0x1.6371e8p-4 : 0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi downward binary128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi tonearest binary128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi towardzero binary128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi upward binary128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731f0cp-8 : inexact-ok += asinpi downward ibm128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731ep-8 : inexact-ok += asinpi tonearest ibm128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520732p-8 : inexact-ok += asinpi towardzero ibm128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520731ep-8 : inexact-ok += asinpi upward ibm128 0x1.6371e8p-4 : 0x7.148bc7fffff7af94c63520732p-8 : inexact-ok +asinpi -0x1.6371e8p-4 += asinpi downward binary32 -0x1.6371e8p-4 : -0x7.148bc8p-8 : inexact-ok += asinpi tonearest binary32 -0x1.6371e8p-4 : -0x7.148bc8p-8 : inexact-ok += asinpi towardzero binary32 -0x1.6371e8p-4 : -0x7.148bcp-8 : inexact-ok += asinpi upward binary32 -0x1.6371e8p-4 : -0x7.148bcp-8 : inexact-ok += asinpi downward binary64 -0x1.6371e8p-4 : -0x7.148bc7fffff7cp-8 : inexact-ok += asinpi tonearest binary64 -0x1.6371e8p-4 : -0x7.148bc7fffff7cp-8 : inexact-ok += asinpi towardzero binary64 -0x1.6371e8p-4 : -0x7.148bc7fffff78p-8 : inexact-ok += asinpi upward binary64 -0x1.6371e8p-4 : -0x7.148bc7fffff78p-8 : inexact-ok += asinpi downward intel96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi tonearest intel96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi towardzero intel96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi upward intel96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi downward m68k96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi tonearest m68k96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af98p-8 : inexact-ok += asinpi towardzero m68k96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi upward m68k96 -0x1.6371e8p-4 : -0x7.148bc7fffff7af9p-8 : inexact-ok += asinpi downward binary128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731f0cp-8 : inexact-ok += asinpi tonearest binary128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi towardzero binary128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi upward binary128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731f08p-8 : inexact-ok += asinpi downward ibm128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520732p-8 : inexact-ok += asinpi tonearest ibm128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520732p-8 : inexact-ok += asinpi towardzero ibm128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731ep-8 : inexact-ok += asinpi upward ibm128 -0x1.6371e8p-4 : -0x7.148bc7fffff7af94c63520731ep-8 : inexact-ok diff --git a/sysdeps/ieee754/flt-32/s_asinpif.c b/sysdeps/ieee754/flt-32/s_asinpif.c index f9e93533d4a..d50de7fcd7a 100644 --- a/sysdeps/ieee754/flt-32/s_asinpif.c +++ b/sysdeps/ieee754/flt-32/s_asinpif.c @@ -79,8 +79,13 @@ __asinpif (float x) c0 += c2 * z2; c4 += c6 * z2; c0 += c4 * z4; - double r = fma (-c0, copysign (f, x), copysign (0.5, x)); - return r; +#ifndef __FP_FAST_FMA + /* The fma is required only for x == 0x1.6371e8p-4f in FE_TOWARDZERO + to provide correctly rounded results. */ + if (__glibc_likely (ax != 0x1.6371e8p-4f)) + return copysign (0.5, x) - c0 * copysign (f, x); +#endif + return fma (-c0, copysign (f, x), copysign (0.5, x)); } } libm_alias_float (__asinpi, asinpi) From patchwork Tue Oct 14 12:10:32 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 121849 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7F4213858C54 for ; Tue, 14 Oct 2025 12:17:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7F4213858C54 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=Y94lEZXg X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x529.google.com (mail-pg1-x529.google.com [IPv6:2607:f8b0:4864:20::529]) by sourceware.org (Postfix) with ESMTPS id C653C3858410 for ; Tue, 14 Oct 2025 12:12:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C653C3858410 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C653C3858410 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::529 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443925; cv=none; b=sdnbTACapb8A1ZlkNeWsAaYNCwi0ZI7mzlo8HnNAmb2Df59gottHucgK03KQH/A1PhhlzrCo+onTurYsbz8lzUEXT93GauDCLvWwDF0dzWT+a5ePzbEMTCLY1YcE8KQBoeuEteoKKZ3YhZP7yucQv//lqkdrRgzgL4EY2ZK9z3M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443925; c=relaxed/simple; bh=czmbbwpz4qz6+OB9QWjujKb6DTxw1hPReBzLo76N8YI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=sdK3K+IxaUcQnNF0z+3GC9Bd3g+V0X1HSVlO/sttr/1ESZ8Phw/PDJbqs2m95yJSU3rgiWIL+aW/KWxG/m3aSrk4v3Rq547cdnZpumusP4HJu9lhL1p7IJ7av4S6Q5pOlpc7olC5wbw/Z9QPfZ5AZxHpt3dWLJe/FarBAPcvstA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C653C3858410 Received: by mail-pg1-x529.google.com with SMTP id 41be03b00d2f7-b6093f8f71dso3235457a12.3 for ; Tue, 14 Oct 2025 05:12:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1760443923; x=1761048723; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=aPqR7O+HHD6Lh/hqZKa6ITRlOVJevrt1FveUl1sky/k=; b=Y94lEZXgS9/+BvsKbh74zRBiFnWye1Zsi4m0OxnaFKhcmuFZU90WR4xcTxFhZwlldh r/4QPbUlhTOxZV6Lbe/L9AdCdpSDVlCyZI5zkbJiF1cp8iWc25hbKMk/BGJCf56bc/Xt J81Fx/BrdaSZ8RaVk3y0U5EHn/bCK0aAWqdnc053aEysCNQ/IFEgjwSbk+KWXXKZ4v7m YKiQRO7ftD4IYUxxvRWg+c1QzaxPzo5Srl2udYQxVMzEeyFGqJp7lOxh1uWnRxuN6CQR 6/GGCGHiG4gfFv6TP9mynQI8CEgXNLrXLtu3oE6C6hkvt6UrmVVXZXJnSuUcVYgBkCAn 7D4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760443923; x=1761048723; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aPqR7O+HHD6Lh/hqZKa6ITRlOVJevrt1FveUl1sky/k=; b=cz2WO3JztasyRV4qlceuYmBP8V4ACmntgiGoh1vGjnWsRiOW8J2f0P3mWXSQ38pZKz +tbcQfSmSLdP7NGqnCk33I08LOmJLy3J73q80loKMecIxp//DkPQPGqT5j6DOnmICwPd GJnfCRgNdZOpJ0LMR15psMIqdPlMxhOKxOhFq+zXUC0uh7VQ9/hEdLImTi7plU4i/Tnq ZaDoCZXU6l8ULV2mp73YIxsj3afuPkaokprtjgYzLN4cCxEXLu57fSyADBodXuCz8/Lw 3fYz2yzKV3SLlk9oRtx/eIPRk6kLqVVVjm/AAfeRgsOi8zcxcExSqiEmDBRRovwrS1V8 gN7w== X-Gm-Message-State: AOJu0YwRKx6DWepDPlqeWtKTm3XJ8hDyaY9cLYgIQcg2yukUI9u0nDeI dKe50lDC1drTVwgXBggYrw+s5gIIlAsH8YoAXSNpyY/8x+kThIY4OO1JY3f1C+Y9BPN2fbiyx6o 69VEA X-Gm-Gg: ASbGncuZKmXOy4qf4o6B69t+XL2YEcbwlZZ15tGVm00aokwBpKY7Vy3tJdOiatfxhu0 GEuxBoWR5YqIUmqfStHIbd8RXvxfezy8mwAFO2XkJ7HvcSv5iMgDygc/vmwwQ6EQ1xTKMWmBCaP 2J8nHJyChgcoZTPtIpykihaPiXDbu4urN5b8rF669DtqAEjIk3bFKtCwyLNXaKM1qWeS9pSgAb9 jV/rkuFUDJfxCO19cQ30eLI9EVxZB1FmjyQJfDMQy4IXByXboO4PdE44WLaFPDGxDkugq4XozrF voCkWs0OlqgVnMz8W/BQLTa1EU8ZG/R/3/Nz8h6nlPUIS6L60r1e/sd6hW/T2OlgMUUYYIOXAvr fPNTe6KD6XrMV8QrPagtpNz/5TBmfHGb8ihXq3337Hjb/QTjJt4L/8B67OQ== X-Google-Smtp-Source: AGHT+IGNEcS4J2xbTuEfahIymjAlxGJQdT4Zc0a2ps3CiFB7JtvFOBEJs3Z8bw61jx1KW+aBRVwqsQ== X-Received: by 2002:a17:903:fa6:b0:269:8d1b:40c3 with SMTP id d9443c01a7336-29027356614mr293254485ad.12.1760443923350; Tue, 14 Oct 2025 05:12:03 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:bf74:7212:598e:9e48:2320]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29034de6c07sm163321505ad.1.2025.10.14.05.12.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 05:12:02 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Wilco Dijkstra , Paul Zimmermann , DJ Delorie Subject: [PATCH v2 2/4] math: Optimize fma call on log2pf1 Date: Tue, 14 Oct 2025 09:10:32 -0300 Message-ID: <20251014121153.1058692-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> References: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The fma is required only for x == -0x1.da285cp-5 in FE_TONEAREST to provide correctly rounded results. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Wilco Dijkstra --- math/auto-libm-test-in | 1 + math/auto-libm-test-out-log2p1 | 25 +++++++++++++++++++++++++ sysdeps/ieee754/flt-32/s_log2p1f.c | 7 ++++++- 3 files changed, 32 insertions(+), 1 deletion(-) diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 7e8cb4cef83..03f6fee1f00 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -7714,6 +7714,7 @@ log2p1 -0x4.f37d3c9ce0b14bdd86eb157df5d4p-4 log2p1 0x7.2eca50c4d93196362b4f37f6e8dcp-4 log2p1 -0x6.3fef3067427e43dfcde9e48f74bcp-4 log2p1 0x6.af53d00fd2845d4772260ef5adc4p-4 +log2p1 -0x1.da285cp-5 mul 0 0 mul 0 -0 diff --git a/math/auto-libm-test-out-log2p1 b/math/auto-libm-test-out-log2p1 index 5e395fffa47..3902600a340 100644 --- a/math/auto-libm-test-out-log2p1 +++ b/math/auto-libm-test-out-log2p1 @@ -4467,3 +4467,28 @@ log2p1 0x6.af53d00fd2845d4772260ef5adc4p-4 = log2p1 tonearest ibm128 0x6.af53d00fd2845d4772260ef5acp-4 : 0x8.0efc6087a73bba3b9eb65a673p-4 : inexact-ok = log2p1 towardzero ibm128 0x6.af53d00fd2845d4772260ef5acp-4 : 0x8.0efc6087a73bba3b9eb65a673p-4 : inexact-ok = log2p1 upward ibm128 0x6.af53d00fd2845d4772260ef5acp-4 : 0x8.0efc6087a73bba3b9eb65a6734p-4 : inexact-ok +log2p1 -0x1.da285cp-5 += log2p1 downward binary32 -0xe.d142ep-8 : -0x1.60549p-4 : inexact-ok += log2p1 tonearest binary32 -0xe.d142ep-8 : -0x1.60549p-4 : inexact-ok += log2p1 towardzero binary32 -0xe.d142ep-8 : -0x1.60548ep-4 : inexact-ok += log2p1 upward binary32 -0xe.d142ep-8 : -0x1.60548ep-4 : inexact-ok += log2p1 downward binary64 -0xe.d142ep-8 : -0x1.60548f0000002p-4 : inexact-ok += log2p1 tonearest binary64 -0xe.d142ep-8 : -0x1.60548f0000001p-4 : inexact-ok += log2p1 towardzero binary64 -0xe.d142ep-8 : -0x1.60548f0000001p-4 : inexact-ok += log2p1 upward binary64 -0xe.d142ep-8 : -0x1.60548f0000001p-4 : inexact-ok += log2p1 downward intel96 -0xe.d142ep-8 : -0x1.60548f00000016dp-4 : inexact-ok += log2p1 tonearest intel96 -0xe.d142ep-8 : -0x1.60548f00000016dp-4 : inexact-ok += log2p1 towardzero intel96 -0xe.d142ep-8 : -0x1.60548f00000016cep-4 : inexact-ok += log2p1 upward intel96 -0xe.d142ep-8 : -0x1.60548f00000016cep-4 : inexact-ok += log2p1 downward m68k96 -0xe.d142ep-8 : -0x1.60548f00000016dp-4 : inexact-ok += log2p1 tonearest m68k96 -0xe.d142ep-8 : -0x1.60548f00000016dp-4 : inexact-ok += log2p1 towardzero m68k96 -0xe.d142ep-8 : -0x1.60548f00000016cep-4 : inexact-ok += log2p1 upward m68k96 -0xe.d142ep-8 : -0x1.60548f00000016cep-4 : inexact-ok += log2p1 downward binary128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656929dp-4 : inexact-ok += log2p1 tonearest binary128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656929dp-4 : inexact-ok += log2p1 towardzero binary128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656929cp-4 : inexact-ok += log2p1 upward binary128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656929cp-4 : inexact-ok += log2p1 downward ibm128 -0xe.d142ep-8 : -0x1.60548f00000016cf4743165693p-4 : inexact-ok += log2p1 tonearest ibm128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656928p-4 : inexact-ok += log2p1 towardzero ibm128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656928p-4 : inexact-ok += log2p1 upward ibm128 -0xe.d142ep-8 : -0x1.60548f00000016cf47431656928p-4 : inexact-ok diff --git a/sysdeps/ieee754/flt-32/s_log2p1f.c b/sysdeps/ieee754/flt-32/s_log2p1f.c index 09e77dc08ad..6fa7e5dc7a4 100644 --- a/sysdeps/ieee754/flt-32/s_log2p1f.c +++ b/sysdeps/ieee754/flt-32/s_log2p1f.c @@ -231,7 +231,12 @@ __log2p1f (float x) int j = (m + ((int64_t) 1 << (52 - 8))) >> (52 - 7), k = j > 53; e += k; double xd = asdouble (m | (uint64_t) 0x3ff << 52); - z = fma (xd, ix[j], -1.0); +#ifndef __FP_FAST_FMA + if (__glibc_unlikely (x == -0x1.da285cp-5f)) + z = fma (xd, ix[j], -1.0); + else +#endif + z = xd * ix[j] - 1.0; static const double c[] = { 0x1.71547652b82fep+0, -0x1.71547652b82ffp-1, 0x1.ec709dc32988bp-2, From patchwork Tue Oct 14 12:10:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 121847 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8DB293858C52 for ; Tue, 14 Oct 2025 12:14:28 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8DB293858C52 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=u0fYoLKf X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by sourceware.org (Postfix) with ESMTPS id 14A923858C56 for ; Tue, 14 Oct 2025 12:12:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 14A923858C56 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 14A923858C56 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::632 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443927; cv=none; b=etZUwclSOXT/Ctdwlv/jZwK4FvSaJ1nxWk/6aJ4UecT3p0zP91Eu1KKhXMddyDoLlZ4cvVT1GZy9RbF3rwJnWkkuGBzsXWN+qJPO3drtsgLCdD+eoHbbfNapGexQLvZBH0VpJI5s6UB+o52XT+yChUcnkHqR662y4hxP2FY5CfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443927; c=relaxed/simple; bh=2ZDdOaxxt6Wxg37u5Aos4bfoSmuUpTfTnLz9cv6laZE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=dF9l7lZwY4x8g/w5H4m1dUbbAcz54UsIm1XSbqLBgPiXFS+ZYmx4TmXg/ttpj6IyuCNe+lWzK7aNmFt9D5dQkqcv7SmH6qR4c0XPcu5jLM+n99vqNvOPE8OeQUCQu+IzuZu5RPULzt1oDEZXAPqbzXVXosupKWPhUNuyl3AEml0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 14A923858C56 Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-26a0a694ea8so37708575ad.3 for ; Tue, 14 Oct 2025 05:12:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1760443926; x=1761048726; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+ZM8gTuUz/O4FdBPp8hiqS4e10hQ1+mf96l7VDcU87c=; b=u0fYoLKfmIapH1igxzqYQ4zJ6/kOVbAJZANmWmsvnj7GUWcC5pVJ67VqyvmaklGtOw IAVL/dtaiiWb1bALe2D1mpoNlCj/IKiVxG/DVUvudgV/w9jD2SBGaKyXdeybNdzFc0yT oykyDtDaOlYIEWqXWiEgRoeThzyh3Nm01RN6cFZ3UPlfOyQ+d0Pz0jnfIDr4ccwFIt9p vAsE3k/69toqWRtjwALN4n/pAMTHDyL0sE5+zSy4QM7eLu1Diy3ACdW/ThCVVzK8tHBq iSGRy1/SSnW4gbisWXvhMmgZO0k6yEvJeriaGVgVhUZPqJOSX9Oykt1ovVUozGIdU+Qe D47Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760443926; x=1761048726; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+ZM8gTuUz/O4FdBPp8hiqS4e10hQ1+mf96l7VDcU87c=; b=E/rPEddu5jC/JCgdPKZaHkgxpjOsVmOJ3/nOn5gsDcdNHQANWfwQ/pOinIgCdCUJwt HJoFe8YBEWU0V8anorqWmCdcvdwmrYZvfQnl5P49OL1L3dzevLOZ2na1k32AePSYRlch qIC8Kqsykib2ALKzllKFkW+wiz7jIKZ+IJxLTXifHrqHIZLmw1z7XWWmp39JhMsXf5DW 9iNaN8eg41XH7VftXVEvTK8zm9tmQWSg9GBO2XGzcFKq+T4+KoFuj1mM5dYMu2Frcq07 bqWjXdlMEzbIxpWA9fAuqmeuBe0G5adqvLyNuG2aABH9mCC5wplefTdVio8aqmAq1D6O Ft9Q== X-Gm-Message-State: AOJu0Yx9dMY304a2yBbINVhK8pw6cJmiJLO9g1q9uOm0SYyW4xYpI6ZE 6dMbEBJPHtMqXlkHtWzjRxNlAtd7/TEobUueItfLkpigreMeCiTjt4XqrQJoKFVj7b0fC2zt/E0 928CQ X-Gm-Gg: ASbGnctHji5w0SfxRh0h1DJFQF8FfL/oMnYf4RuVPWOBZS5xbpXwZz5dkyLd6ohk8WT 605HE+OQ06aN+eWMUR++ljSjeFJSgHYkYde4HAWinl4DHPr9pkVJ4qTQ0XF9QphXYplt5v+7HlL ZGsJdX/3pjWwvt74a/1JBisOk+ZgS1g7hmQX9RMD4GBo+9M9on7mcg6RF7koSRnQJCQwHkzsLeG xCuzrTXS2kUkpacePXou+Awqa5VenvAGh0z5Nhf62bGQty+G1+8Bd1v4WLh8Ob9O5Iih2o3jiXs LZjY2agpfEa2Xp7xaG8xk/CChpWQX6jB4+QlPf66Ndh9UT0jvCQapPLMbWm/0l81kA2wLWlo0eT qRLxiMYchkSiV1r28hoI7HqEbY8SjQM3MQnZQB9QChJzU+PxoXoU/vvPlXp5c8URge3gYJSClqO XzoqA= X-Google-Smtp-Source: AGHT+IFmFq5XSGu+eQdkCmHLVBBQ3+X6ZRrbtk5r07aAaVc2EweJ/OPTb/xzajohOfYEE2PDXSJRWA== X-Received: by 2002:a17:903:1510:b0:27e:ec72:f62 with SMTP id d9443c01a7336-2902728b8a2mr308626825ad.6.1760443925504; Tue, 14 Oct 2025 05:12:05 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:bf74:7212:598e:9e48:2320]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29034de6c07sm163321505ad.1.2025.10.14.05.12.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 05:12:05 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Wilco Dijkstra , Paul Zimmermann , DJ Delorie Subject: [PATCH v2 3/4] math: Use stdbit.h instead of builtin in math_config.h Date: Tue, 14 Oct 2025 09:10:33 -0300 Message-ID: <20251014121153.1058692-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> References: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org --- sysdeps/ieee754/flt-32/math_config.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) Reviewed-by: Wilco Dijkstra diff --git a/sysdeps/ieee754/flt-32/math_config.h b/sysdeps/ieee754/flt-32/math_config.h index 6bb5c3324cc..230aee591ce 100644 --- a/sysdeps/ieee754/flt-32/math_config.h +++ b/sysdeps/ieee754/flt-32/math_config.h @@ -23,6 +23,7 @@ #include #include #include +#include #ifndef WANT_ROUNDING /* Correct special case results in non-nearest rounding modes. */ @@ -77,7 +78,7 @@ roundeven_finite (double x) { union { double f; uint64_t i; } u = {y}; union { double f; uint64_t i; } v = {y - copysign (1.0, x)}; - if (__builtin_ctzll (v.i) > __builtin_ctzll (u.i)) + if (stdc_trailing_zeros (v.i) > stdc_trailing_zeros (u.i)) y = v.f; } return y; @@ -101,8 +102,8 @@ roundevenf_finite (float x) if (fabs (x - y) == 0.5) { union { float f; uint32_t i; } u = {y}; - union { float f; uint32_t i; } v = {y - copysignf (1.0, x)}; - if (__builtin_ctzl (v.i) > __builtin_ctzl (u.i)) + union { float f; uint32_t i; } v = {y - copysignf (1.0f, x)}; + if (stdc_trailing_zeros (v.i) > stdc_trailing_zeros (u.i)) y = v.f; } return y; From patchwork Tue Oct 14 12:10:34 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 121851 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 579DC3857024 for ; Tue, 14 Oct 2025 12:18:27 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 579DC3857024 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=WESvercv X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by sourceware.org (Postfix) with ESMTPS id 076133858C60 for ; Tue, 14 Oct 2025 12:12:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 076133858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 076133858C60 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443930; cv=none; b=jvA+tDgBfxafW5HA41NsBOb7Tv24EAC7AMmXjii4B0LRr4d3P62fL2dK2q6dqMb8rI4ZhusGytdMV/tROdZztRCw7BPRGHlTq5fna9uWtmM9tg/UJP6B2OqBWUZN1dlcdoSLahYyet9ymgirj1TdUuf6zMcVsQEnWo9NJ9jCK5g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760443930; c=relaxed/simple; bh=M5j0FhBnt4VqOAN8RNtsNTzJq8iBSF8+ldwKR07QVsY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=MS0xBkiXB3rqiglVx76iD7a+YBoUYfV3dul+5XjskY3dfI3KOiGmI0ZrAIHR7zv9hTHpa2S2mNfq5N7w5iREQDHA3BEJjmnUIirUgfmZF5XBfz8Jw0y52n6A5/1WXppU0s+ke+M+X9fzhSdemJUODGoxcgs5nqA96wao1zdaOa0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 076133858C60 Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-269639879c3so50744355ad.2 for ; Tue, 14 Oct 2025 05:12:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1760443928; x=1761048728; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=MJO98lfjjIaV8X5lIaZYQa8Ly14JUDelUO9XrLwLh+c=; b=WESvercvAQTeVqInTsJsY4iH/7a6ikk+VpC1UEj2Uu5vRcT0FwOu6qqf7S9HT2EUDB pLQhgzrTzfBLAYNvwdMQPGAiR89/jS1FxdcXjwV0z6kb9iben6Hg59qDJ3Ixs7wonMZK 4y4iw4yUWrO7KS2Ldp/BK8Q0T3VghJues8yls1V3ixSafPbOew4G8W3hnYa7MfFAoOhH kKkcx23zBWje0ejd1dek/0G1NJ5SoQft8YoMuFLrfXbXKg0tDhsWEcSm6wHJg8RnBFXW A1PhxVMOnPqFD/IXTGxvucY8xY/GOsGKJZePTe/3sUb7i3oF+CPwJZFc8KEGoMQvfAP9 Oqaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1760443928; x=1761048728; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MJO98lfjjIaV8X5lIaZYQa8Ly14JUDelUO9XrLwLh+c=; b=abud8AAGjam8wsRQR0r6vLmQe3KVgM7qrfyRwfEfNxPRmZ2AjfOwmsN47DH/QnDDGw idfZoOBqUvtjTK4zxFR8/6BKxopbQmro+6NkRGmvl34bzpuOAoG84BlqC+1whx2wSYff ynEI7XK7e84NI2dQKkYZm3YU8XJmgxITBGDVcM15juZh2J0av3mWlfxqHKgfZp1Fzyo0 1KSvBoTYsERZerUlA++h87i4FXNxDL/3W5XKBHFYJFfFM/zL9bZ51n4CHKEmmCQ0ZmB0 EQuCKcK0BiAtoizBb1DhStbk3LYNeMrg8tofhihA9rEdOlduNNQM48E3kFIEUgsmn22R aCJA== X-Gm-Message-State: AOJu0Yyii7v7EKmkPyzyJOahqAzCUWUiYUCqdnivI5oAiFr82zImL8a6 teytzgTz+TF5uXUAoCIkNC4F8Jy+cruaE4X/jdOok9Xv4eigG5UInseIITMaiD1Js2MPQLYATh8 6FWgl X-Gm-Gg: ASbGncuYYnsk1ctW5gw214bfajmimXgAxKJY9AdLrtYJ1UjGrv06qS5mvC/W0+xx1B3 TjCmRAyj0jAtUKBJf2fVaHBMJO/GWK7oL+AVVhlVUpdWJ6U0c3hcD6QxH5ZC9Ckmkyt3cDClPbk ENe7YIeXvc0qwKmy0Aaya+tcOmyATppbW4BhIPxtpaKuqZclwDXcmHZgsdgwFpe6j+ExN8uu2fV cFKC820R/mG3t5SWs4yLO9mmPXxIBqlJ6Xt8vKe8foeY/XFtNVN722B+VRoS10hjNvUlo6gQGnb PZq/0xxEsFhO1Gm0i4hY4HqXVCDW3gkZ5jxE5gWGZEeq1xwG+eCRzpYT5J+D6EBqW83csAmg/C4 JyFh7j6SDpquvopaFr87lgwRumpPRgxm0ze4MDIBNa43zhauTSPliMAJaAw== X-Google-Smtp-Source: AGHT+IHScx76Aur/2tLhW1LKveRSHdw3TBhmL8DMUsSkZbd+lMk4mwZFJ8u+BsZnNRUTi/V2M5KRsQ== X-Received: by 2002:a17:903:1510:b0:25c:982e:2b1d with SMTP id d9443c01a7336-2902741e7fbmr310459215ad.59.1760443927808; Tue, 14 Oct 2025 05:12:07 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:bf74:7212:598e:9e48:2320]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29034de6c07sm163321505ad.1.2025.10.14.05.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Oct 2025 05:12:07 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Wilco Dijkstra , Paul Zimmermann , DJ Delorie Subject: [PATCH v2 4/4] math: Use binary search on lgammaf slow path Date: Tue, 14 Oct 2025 09:10:34 -0300 Message-ID: <20251014121153.1058692-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> References: <20251014121153.1058692-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org And remove some unused entries of the fallback table. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra --- math/auto-libm-test-in | 3 ++ math/auto-libm-test-out-lgamma | 75 ++++++++++++++++++++++++++ sysdeps/ieee754/flt-32/e_lgammaf_r.c | 78 ++++++++++++++-------------- 3 files changed, 117 insertions(+), 39 deletions(-) diff --git a/math/auto-libm-test-in b/math/auto-libm-test-in index 03f6fee1f00..1397d317fba 100644 --- a/math/auto-libm-test-in +++ b/math/auto-libm-test-in @@ -6933,6 +6933,9 @@ lgamma 0x1p-16494 lgamma -0x1p-16494 # the next value generates larger error bounds on x86_64 (binary32) lgamma -0x3.ec4298p+0 +lgamma 0x1.ecf3fep-73 +lgamma 0x1.58ace8p+112 +lgamma -0x1.efc2a2p+14 # Values +/- 10ulp from overflow threshold. (Values very close to # overflow threshold produce results very close of that threshold, diff --git a/math/auto-libm-test-out-lgamma b/math/auto-libm-test-out-lgamma index 36665b85602..d27c186639a 100644 --- a/math/auto-libm-test-out-lgamma +++ b/math/auto-libm-test-out-lgamma @@ -2226,6 +2226,81 @@ lgamma -0x3.ec4298p+0 = lgamma tonearest ibm128 -0x3.ec4298p+0 : -0x7.d809ecd340fc16da6722ad1166p-4 1 : inexact-ok = lgamma towardzero ibm128 -0x3.ec4298p+0 : -0x7.d809ecd340fc16da6722ad1166p-4 1 : inexact-ok = lgamma upward ibm128 -0x3.ec4298p+0 : -0x7.d809ecd340fc16da6722ad1166p-4 1 : inexact-ok +lgamma 0x1.ecf3fep-73 += lgamma downward binary32 0xf.679ffp-76 : 0x3.1f1cbp+4 1 : inexact-ok += lgamma tonearest binary32 0xf.679ffp-76 : 0x3.1f1cb4p+4 1 : inexact-ok += lgamma towardzero binary32 0xf.679ffp-76 : 0x3.1f1cbp+4 1 : inexact-ok += lgamma upward binary32 0xf.679ffp-76 : 0x3.1f1cb4p+4 1 : inexact-ok += lgamma downward binary64 0xf.679ffp-76 : 0x3.1f1cb3ffffffep+4 1 : inexact-ok += lgamma tonearest binary64 0xf.679ffp-76 : 0x3.1f1cb4p+4 1 : inexact-ok += lgamma towardzero binary64 0xf.679ffp-76 : 0x3.1f1cb3ffffffep+4 1 : inexact-ok += lgamma upward binary64 0xf.679ffp-76 : 0x3.1f1cb4p+4 1 : inexact-ok += lgamma downward intel96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma tonearest intel96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma towardzero intel96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma upward intel96 0xf.679ffp-76 : 0x3.1f1cb3fffffff09p+4 1 : inexact-ok += lgamma downward m68k96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma tonearest m68k96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma towardzero m68k96 0xf.679ffp-76 : 0x3.1f1cb3fffffff08cp+4 1 : inexact-ok += lgamma upward m68k96 0xf.679ffp-76 : 0x3.1f1cb3fffffff09p+4 1 : inexact-ok += lgamma downward binary128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8a8p+4 1 : inexact-ok += lgamma tonearest binary128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8a8p+4 1 : inexact-ok += lgamma towardzero binary128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8a8p+4 1 : inexact-ok += lgamma upward binary128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8aap+4 1 : inexact-ok += lgamma downward ibm128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8p+4 1 : inexact-ok += lgamma tonearest ibm128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c9p+4 1 : inexact-ok += lgamma towardzero ibm128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c8p+4 1 : inexact-ok += lgamma upward ibm128 0xf.679ffp-76 : 0x3.1f1cb3fffffff08c0c4788f0c9p+4 1 : inexact-ok +lgamma 0x1.58ace8p+112 += lgamma downward binary32 0x1.58ace8p+112 : 0x6.793d9p+116 1 : inexact-ok += lgamma tonearest binary32 0x1.58ace8p+112 : 0x6.793d98p+116 1 : inexact-ok += lgamma towardzero binary32 0x1.58ace8p+112 : 0x6.793d9p+116 1 : inexact-ok += lgamma upward binary32 0x1.58ace8p+112 : 0x6.793d98p+116 1 : inexact-ok += lgamma downward binary64 0x1.58ace8p+112 : 0x6.793d94p+116 1 : inexact-ok += lgamma tonearest binary64 0x1.58ace8p+112 : 0x6.793d940000004p+116 1 : inexact-ok += lgamma towardzero binary64 0x1.58ace8p+112 : 0x6.793d94p+116 1 : inexact-ok += lgamma upward binary64 0x1.58ace8p+112 : 0x6.793d940000004p+116 1 : inexact-ok += lgamma downward intel96 0x1.58ace8p+112 : 0x6.793d940000003d2p+116 1 : inexact-ok += lgamma tonearest intel96 0x1.58ace8p+112 : 0x6.793d940000003d28p+116 1 : inexact-ok += lgamma towardzero intel96 0x1.58ace8p+112 : 0x6.793d940000003d2p+116 1 : inexact-ok += lgamma upward intel96 0x1.58ace8p+112 : 0x6.793d940000003d28p+116 1 : inexact-ok += lgamma downward m68k96 0x1.58ace8p+112 : 0x6.793d940000003d2p+116 1 : inexact-ok += lgamma tonearest m68k96 0x1.58ace8p+112 : 0x6.793d940000003d28p+116 1 : inexact-ok += lgamma towardzero m68k96 0x1.58ace8p+112 : 0x6.793d940000003d2p+116 1 : inexact-ok += lgamma upward m68k96 0x1.58ace8p+112 : 0x6.793d940000003d28p+116 1 : inexact-ok += lgamma downward binary128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3f0cp+116 1 : inexact-ok += lgamma tonearest binary128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3f0cp+116 1 : inexact-ok += lgamma towardzero binary128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3f0cp+116 1 : inexact-ok += lgamma upward binary128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3f1p+116 1 : inexact-ok += lgamma downward ibm128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3ep+116 1 : inexact-ok += lgamma tonearest ibm128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c4p+116 1 : inexact-ok += lgamma towardzero ibm128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c3ep+116 1 : inexact-ok += lgamma upward ibm128 0x1.58ace8p+112 : 0x6.793d940000003d252ede096c4p+116 1 : inexact-ok +lgamma -0x1.efc2a2p+14 += lgamma downward binary32 -0x7.bf0a88p+12 : -0x4.88b6f8p+16 -1 : inexact-ok += lgamma tonearest binary32 -0x7.bf0a88p+12 : -0x4.88b6fp+16 -1 : inexact-ok += lgamma towardzero binary32 -0x7.bf0a88p+12 : -0x4.88b6fp+16 -1 : inexact-ok += lgamma upward binary32 -0x7.bf0a88p+12 : -0x4.88b6fp+16 -1 : inexact-ok += lgamma downward binary64 -0x7.bf0a88p+12 : -0x4.88b6f00000008p+16 -1 : inexact-ok += lgamma tonearest binary64 -0x7.bf0a88p+12 : -0x4.88b6f00000004p+16 -1 : inexact-ok += lgamma towardzero binary64 -0x7.bf0a88p+12 : -0x4.88b6f00000004p+16 -1 : inexact-ok += lgamma upward binary64 -0x7.bf0a88p+12 : -0x4.88b6f00000004p+16 -1 : inexact-ok += lgamma downward intel96 -0x7.bf0a88p+12 : -0x4.88b6f00000005978p+16 -1 : inexact-ok += lgamma tonearest intel96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma towardzero intel96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma upward intel96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma downward m68k96 -0x7.bf0a88p+12 : -0x4.88b6f00000005978p+16 -1 : inexact-ok += lgamma tonearest m68k96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma towardzero m68k96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma upward m68k96 -0x7.bf0a88p+12 : -0x4.88b6f0000000597p+16 -1 : inexact-ok += lgamma downward binary128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd834p+16 -1 : inexact-ok += lgamma tonearest binary128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd834p+16 -1 : inexact-ok += lgamma towardzero binary128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd83p+16 -1 : inexact-ok += lgamma upward binary128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd83p+16 -1 : inexact-ok += lgamma downward ibm128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8ddap+16 -1 : inexact-ok += lgamma tonearest ibm128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd8p+16 -1 : inexact-ok += lgamma towardzero ibm128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd8p+16 -1 : inexact-ok += lgamma upward ibm128 -0x7.bf0a88p+12 : -0x4.88b6f00000005971e29c3b8dd8p+16 -1 : inexact-ok lgamma 0x3.12be0cp+120 = lgamma downward binary32 0x3.12be0cp+120 : 0xf.ffff1p+124 1 : inexact-ok = lgamma tonearest binary32 0x3.12be0cp+120 : 0xf.ffff1p+124 1 : inexact-ok diff --git a/sysdeps/ieee754/flt-32/e_lgammaf_r.c b/sysdeps/ieee754/flt-32/e_lgammaf_r.c index 75ec25fb9e1..cd87248ffe0 100644 --- a/sysdeps/ieee754/flt-32/e_lgammaf_r.c +++ b/sysdeps/ieee754/flt-32/e_lgammaf_r.c @@ -116,41 +116,34 @@ __ieee754_lgammaf_r (float x, int *signgamp) float f; float df; } tb[] = { - { -0x1.efc2a2p+14, -0x1.222dbcp+18, -0x1p-7 }, - { -0x1.627346p+7, -0x1.73235ep+9, -0x1p-16 }, - { -0x1.08b14p+4, -0x1.f0cbe6p+4, -0x1p-21 }, - { -0x1.69d628p+3, -0x1.0eac2ap+4, -0x1p-21 }, - { -0x1.904902p+2, -0x1.65532cp+2, 0x1p-23 }, - { -0x1.9272d2p+1, -0x1.170b98p-8, 0x1p-33 }, - { -0x1.625edap+1, 0x1.6a6c4ap-5, -0x1p-30 }, - { -0x1.5fc2aep+1, 0x1.c0a484p-11, -0x1p-36 }, - { -0x1.5fb43ep+1, 0x1.5b697p-17, 0x1p-42 }, - { -0x1.5fa20cp+1, -0x1.132f7ap-10, 0x1p-35 }, - { -0x1.580c1ep+1, -0x1.5787c6p-4, 0x1p-29 }, - { -0x1.3a7fcap+1, -0x1.e4cf24p-24, -0x1p-49 }, - { -0x1.c2f04p-30, 0x1.43a6f6p+4, 0x1p-21 }, - { -0x1.ade594p-30, 0x1.446ab2p+4, -0x1p-21 }, - { -0x1.437e74p-40, 0x1.b7dec2p+4, -0x1p-21 }, - { -0x1.d85bfep-43, 0x1.d31592p+4, -0x1p-21 }, - { -0x1.f51c8ep-49, 0x1.0a572ap+5, -0x1p-20 }, - { -0x1.108a5ap-66, 0x1.6d7b18p+5, -0x1p-20 }, - { -0x1.ecf3fep-73, 0x1.8f8e5ap+5, -0x1p-20 }, - { -0x1.25cb66p-123, 0x1.547a44p+6, -0x1p-19 }, - { 0x1.ecf3fep-73, 0x1.8f8e5ap+5, -0x1p-20 }, - { 0x1.108a5ap-66, 0x1.6d7b18p+5, -0x1p-20 }, - { 0x1.a68bbcp-42, 0x1.c9c6e8p+4, 0x1p-21 }, - { 0x1.ddfd06p-12, 0x1.ec5ba8p+2, -0x1p-23 }, - { 0x1.f8a754p-9, 0x1.63acc2p+2, 0x1p-23 }, - { 0x1.8d16b2p+5, 0x1.1e4b4ep+7, 0x1p-18 }, - { 0x1.359e0ep+10, 0x1.d9ad02p+12, -0x1p-13 }, - { 0x1.a82a2cp+13, 0x1.c38036p+16, 0x1p-9 }, - { 0x1.62c646p+14, 0x1.9075bep+17, -0x1p-8 }, - { 0x1.7f298p+31, 0x1.f44946p+35, -0x1p+10 }, - { 0x1.a45ea4p+33, 0x1.25dcbcp+38, -0x1p+13 }, - { 0x1.f9413ep+76, 0x1.9d5ab4p+82, -0x1p+57 }, - { 0x1.dcbbaap+99, 0x1.fc5772p+105, 0x1p+80 }, - { 0x1.58ace8p+112, 0x1.9e4f66p+118, -0x1p+93 }, - { 0x1.87bdfp+115, 0x1.e465aep+121, 0x1p+96 }, + /* NB: the entries should be sorted by the asuint (x) value. */ + { 0x1.ecf3fep-73f, 0x1.8f8e5ap+5f, -0x1p-20f }, + { 0x1.108a5ap-66f, 0x1.6d7b18p+5f, -0x1p-20f }, + { 0x1.a68bbcp-42f, 0x1.c9c6e8p+4f, 0x1p-21f }, + { 0x1.ddfd06p-12f, 0x1.ec5ba8p+2f, -0x1p-23f }, + { 0x1.f8a754p-9f, 0x1.63acc2p+2f, 0x1p-23f }, + { 0x1.8d16b2p+5f, 0x1.1e4b4ep+7f, 0x1p-18f }, + { 0x1.359e0ep+10f, 0x1.d9ad02p+12f, -0x1p-13f }, + { 0x1.a82a2cp+13f, 0x1.c38036p+16f, 0x1p-9f }, + { 0x1.62c646p+14f, 0x1.9075bep+17f, -0x1p-8f }, + { 0x1.7f298p+31f, 0x1.f44946p+35f, -0x1p+10f }, + { 0x1.a45ea4p+33f, 0x1.25dcbcp+38f, -0x1p+13f }, + { 0x1.f9413ep+76f, 0x1.9d5ab4p+82f, -0x1p+57f }, + { 0x1.dcbbaap+99f, 0x1.fc5772p+105f, 0x1p+80f }, + { 0x1.58ace8p+112f, 0x1.9e4f66p+118f, -0x1p+93f }, + { 0x1.87bdfp+115f, 0x1.e465aep+121f, 0x1p+96f }, + { -0x1.25cb66p-123f, 0x1.547a44p+6f, -0x1p-19f }, + { -0x1.ecf3fep-73f, 0x1.8f8e5ap+5f, -0x1p-20f }, + { -0x1.108a5ap-66f, 0x1.6d7b18p+5f, -0x1p-20f }, + { -0x1.f51c8ep-49f, 0x1.0a572ap+5f, -0x1p-20f }, + { -0x1.d85bfep-43f, 0x1.d31592p+4f, -0x1p-21f }, + { -0x1.437e74p-40f, 0x1.b7dec2p+4f, -0x1p-21f }, + { -0x1.ade594p-30f, 0x1.446ab2p+4f, -0x1p-21f }, + { -0x1.c2f04p-30f, 0x1.43a6f6p+4f, 0x1p-21f }, + { -0x1.580c1ep+1f, -0x1.5787c6p-4f, 0x1p-29f }, + { -0x1.69d628p+3f, -0x1.0eac2ap+4f, -0x1p-21f }, + { -0x1.627346p+7f, -0x1.73235ep+9f, -0x1p-16f }, + { -0x1.efc2a2p+14f, -0x1.222dbcp+18f, -0x1p-7f } }; float fx = floor (x); @@ -355,11 +348,18 @@ __ieee754_lgammaf_r (float x, int *signgamp) if (__glibc_unlikely (tl <= 31u)) { t = asuint (x); - for (unsigned i = 0; i < array_length (tb); i++) - { - if (t == asuint (tb[i].x)) - return tb[i].f + tb[i].df; + int a = 0, b = array_length (tb) - 1; + while (a < b) + { /* Binary search. */ + int m = (a + b) >> 1; + uint32_t tbi = asuint (tb[m].x); + if (t > tbi) + a = m + 1; + else + b = m; } + if (t == asuint (tb[a].x)) + return tb[a].f + tb[a].df; } return r; }