From patchwork Thu Nov 13 18:05:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pierre Blanchard X-Patchwork-Id: 124191 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 134F13858C52 for ; Thu, 13 Nov 2025 18:07:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 134F13858C52 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=EW7rImGd; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=EW7rImGd X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from DUZPR83CU001.outbound.protection.outlook.com (mail-northeuropeazlp170120005.outbound.protection.outlook.com [IPv6:2a01:111:f403:c200::5]) by sourceware.org (Postfix) with ESMTPS id C50013858D20 for ; Thu, 13 Nov 2025 18:06:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C50013858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C50013858D20 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c200::5 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1763057168; cv=pass; b=BW64kZSskmf/nH6JHJiXMPKn5oiHUm/wmjUPJU35iK5k3+fm7YkSL40HBWck6NwgTQl2sGqsPdniFwEsXnuzb1abLQTbi88zKTuWCvWoQR5jVbnq6JvJ2d7sIlOwf7c1UfB1fveu1d0ZS/4TfHJax0PcLi/9Pu4IfRay10aLmwU= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1763057168; c=relaxed/simple; bh=78iA+lOzBfy/4QOC78LSIGGwewEPn0bIotzeIYOM6Ts=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=uQJBl4g/RQK9hto8XsGmN+5TIVZo+XozCdf74iDfNzXgs9XI4dqCN9KC98tT83bsSZBN5Tpn6WkKUcsQ5yulwCNwkFxXewYVpPQstIpEQwK4qEhdtZn4yvX7tOSBp9U1jzWRvZPvSt8GA64ccRGERqJPcrcphlxuax5hlDqBeuc= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C50013858D20 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=K/baW3MQePn/tqplhE4c9KhKS1fxeo0o78I5/h+K0yIOlZHGoV4oQP9R9SQxDoYX9ww4PdAf4+oRysrDWfAWhch6OOkQV1CFcq7Ssy4o/c8xPeLjmI2+RyFk/B10Nrg1pvFJUtekGRlSqlLewBwXOPCjS0XtdyjJQnGAJh5eTXS8O4OXLcykdneWz0DzYK1RZi1e/6AtwiQzSSC3DYo44PxQJ1iBnoiPqZ0dT3FIQp2KqVmW0qSkcpNxwPBZzMxQw2nYAj/Kj7/nWSeQdFvGZHcwD0NiAWpQAlXcYeKKMcMl5UFGW1LHeY6lmChsmQtvA95nUyt7AuLzjkFbOcKGHg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=odBqQGB0w0mXiKM2UXKGT1sg/coi+rU+FauPRLZ8j0Q=; b=WCEQQnbehLRLaSCRzEWqegO6tCWVftDr8oBPxbRAcwralJGHIhT94gQLOIOMZE8UrIq03djluB62bBZYOVP/CSh7XcZbTlrzLFnr/aRqqvBiYVorknXqH4Fzx7OsJ5PmQfsc7165T01YAgah2lcJWSsS01plckEBHP359dXDii1qJn1JLDo3jXPMkgGMMEscACv2jWmUs9LNfhkkRoKefZQcGEMUO5Yt7EjMA9QKrd6AMQ9b7Z2dpYHJf+Sz7SC3cp0v+kfdnc+vJV2PSw8wXlhGP/sUfwT9jP7N+i4D5lo+zBh9hj+eEgbkGcpW1MCVEIWYA27n2joxlBEvQ1g0GA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=odBqQGB0w0mXiKM2UXKGT1sg/coi+rU+FauPRLZ8j0Q=; b=EW7rImGdc2tqxhtDnrE5BZ7FlIOUwyX8tY9A2byqtoTMElwQRPDrCiC0KtfjRh6cLbd+dRmnnBeCV+GmIo8CYeeK1JOwZO1/DqU5UA3yCip9qbd101RSGBzf0M47xJEfD5HkjiEXH5oHpKbw7QuCki1F5szTxbpfuoUioPFDoJA= Received: from DU7P190CA0015.EURP190.PROD.OUTLOOK.COM (2603:10a6:10:550::9) by DB9PR08MB9850.eurprd08.prod.outlook.com (2603:10a6:10:45d::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9320.15; Thu, 13 Nov 2025 18:06:01 +0000 Received: from DB5PEPF00014B91.eurprd02.prod.outlook.com (2603:10a6:10:550:cafe::8c) by DU7P190CA0015.outlook.office365.com (2603:10a6:10:550::9) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9320.17 via Frontend Transport; Thu, 13 Nov 2025 18:06:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 4.158.2.129 as permitted sender) receiver=protection.outlook.com; client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by DB5PEPF00014B91.mail.protection.outlook.com (10.167.8.229) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9320.13 via Frontend Transport; Thu, 13 Nov 2025 18:06:01 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=QKl4BitbCGPvEoi9Ez4is2NMJZhOOBrD86CAq5abWGVHXdsFpOpqA7aJWEQZS+tfW9L9lHrTrBrXyo/D6MBf8d4bTTZnjB8GhvU09uOPXKNAcbn7+Qp2LdiVEuAxO3bWXQt/0/9eJKmy/usj6qDTfB9fY1pHEadWyPC2G198+fLeV5Qm21at2oXI6Hu357KoTKHr+zgn6pz6nshGvzga8XciCKJsBvHkT+i8RGOjBxrdAVNY+mZRMDiSeX/lK1FjJWbaGAnvHyNJ0B86QIQSVso8MJoZC7j1ZSJ3SCElK8mmxlAPCGFmL/P8q4HykKVlXJ7THPXi5gOMZiJA6ktRcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=odBqQGB0w0mXiKM2UXKGT1sg/coi+rU+FauPRLZ8j0Q=; b=fxy4UDsc1atMRYn4hvCkezrK3QErwMMj6bmJjOWoM++8C9m7kHWPvOlltP7smB36Y5qoB1v1FRmcc3Qc1D8zyeUdsjEc0fzEv8QSvWwwMZPcX1rUhvQpech0yhAYO2eR+SyNtXfGtjG358Wca4iiSUPskiMOOg0Gk17x0+TJTC+rct9tt9ENRDcKrlm1TAO5Q6LMGhGipFX710PY1izvzfuUMu1ukp7Z1N0m1xJHSAsrlzwPWwAyhKD+sBvTO6YumedJD04SO3VX+8eY2a/ub+kRRM6MVbsFrhf/H/2uOPiMoOB+ouu4Jf1PMiPymPlhZE1BLDRwj3sVQe/6RC20Kg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=odBqQGB0w0mXiKM2UXKGT1sg/coi+rU+FauPRLZ8j0Q=; b=EW7rImGdc2tqxhtDnrE5BZ7FlIOUwyX8tY9A2byqtoTMElwQRPDrCiC0KtfjRh6cLbd+dRmnnBeCV+GmIo8CYeeK1JOwZO1/DqU5UA3yCip9qbd101RSGBzf0M47xJEfD5HkjiEXH5oHpKbw7QuCki1F5szTxbpfuoUioPFDoJA= Received: from CWLP265CA0323.GBRP265.PROD.OUTLOOK.COM (2603:10a6:401:57::23) by DB9PR08MB9826.eurprd08.prod.outlook.com (2603:10a6:10:45d::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9275.16; Thu, 13 Nov 2025 18:05:24 +0000 Received: from AMS0EPF000001A8.eurprd05.prod.outlook.com (2603:10a6:401:57:cafe::f) by CWLP265CA0323.outlook.office365.com (2603:10a6:401:57::23) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9320.17 via Frontend Transport; Thu, 13 Nov 2025 18:05:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (172.205.89.229) by AMS0EPF000001A8.mail.protection.outlook.com (10.167.16.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9320.13 via Frontend Transport; Thu, 13 Nov 2025 18:05:24 +0000 Received: from AZ-NEU-EX03.Arm.com (10.240.25.137) by AZ-NEU-EX03.Arm.com (10.240.25.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.27; Thu, 13 Nov 2025 18:05:19 +0000 Received: from ip-10-252-30-205.eu-west-1.compute.internal (10.252.0.220) by mail.arm.com (10.240.25.137) with Microsoft SMTP Server id 15.2.2562.27 via Frontend Transport; Thu, 13 Nov 2025 18:05:19 +0000 From: Pierre Blanchard To: CC: Pierre Blanchard Subject: [PATCH] aarch64: Fix and improve SVE pow(f) special cases. Date: Thu, 13 Nov 2025 18:05:17 +0000 Message-ID: <20251113180517.1557371-1-pierre.blanchard@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AMS0EPF000001A8:EE_|DB9PR08MB9826:EE_|DB5PEPF00014B91:EE_|DB9PR08MB9850:EE_ X-MS-Office365-Filtering-Correlation-Id: 243972e8-c04a-4d8b-0099-08de22df4e0b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|36860700013|1800799024|82310400026|376014; X-Microsoft-Antispam-Message-Info-Original: NQi5lNNHZxp251O2Ppn72yO5dfD/YmKturaKdWRgafh2V7MRN2V9wxki+e1LWF2fnLoruJP2SHOPHHe4aHmaHfXCFpRRopKIjvo5Gg0gV/IJr9hjFMRIt9TUpAZDSxrAhliEm+AhY1SLeyHCvXG/mkX6Be67ur1IcDQJKmfZAA2OzzDWn4MF+Uhz5ZXcD8Meg38ez/fpw/nQB4ABUKheYN/JyPzQu2CfTPqIbciCeMw/2v+eBpNRa3Lx0CFTC185/oxzDjVj5mlnXhirhriwavrue97hbxp1JcWoh1/zmx+bogXRQur1gjK38c5WBk/BfFJ5PRWbv3jKmM8NdhW//IvDDXsUOUGGFPw/nE4Hna9WqLAPrqkV90jU1urnYukVQfIlJjKspCiWVFdZyrYATgzjaS80osu9dPoH4uryuku7OTmyTfXWWfn9TVcZDqzCNnSEbB6r6u7SN8iYTdfHTDtVI2W/ZNF/ryUdxikzW667vOMpQMx79f5SqBmsh9lAV9lt4UBYDbEti94thltJTN8owZO5t/LoJfjcOborgKDj3nYJQg06GH8vNAHFqWBXYczscOgvmzPCyTQ79LAEq8Eqzk6DWdKGw5fjMumZ4jJ0hgHo0y6QgpaeU4ZvtD8i6YD75872+70thzX9iofViXBpI9y+8tsX0efUy77yYRJSTv6fOLWbZ8A0oufLGIrMO4RaAAFO/4Ol5NetQEB2r6zotAOmGda8GNk8MiHbSLcuBM1/N0UvxbEdXsopXaHUNgOjQiTw9IT5dgoRTRciisfVElx0ZHCNvyliIDRYgNBA8deCxBWCTxVg8gec4SaVnbModbkS1u+M82+zApqJdo0x+s2vQQyvi9MaOggEjioU3IKjKFtqxzPRBX6BQ31/eEwHyFR4MFsHuAg4i5Q28gK9dCU8XIUx7T6FmCeBknA8FZICs47rGTRUSJ/MFbpTse3vff+Ul9w5TQ2j7Hq2Yh1P6P6+yMQ2qM9OROa33dsjFK0sO8FTqxsruus2SQzOvZjkclaK6Llyqw7yAMYlYBoJqOtD6Z+J7P835NN/YmhufiJwOk38Y3aJ1I1a8DYDbJJ8orRhvwlocP+ZEXGPreyIm3xGEKJcpR+jcX2HKfvVSIutnQowhXc6a2a0F8OMA2csZ+sZ1OJ0S8GF1xrXmMZh6mnBPUIm4+G8wOtx8WbHlnDBLsL+ZbEl3YIFp2fHXA/ZEFk7bvPAMvGaBKIYn1rE9WG3al6Kg9NdduNnh78mPYHhww++4WpFYm++Kq6zgY76qA0Ei7nnvEm1lYBAw2cwVSLOaO0JVdNHQs02QVzv96Kk/e9er5T0zIUJXjah9I1Whq1AteKpPj7hCf7DxnbyaFkDE+sbQZ6KvgMTMUgugf5l6p+8j94ygzRZ6v9GHVMZnGZWtGEeMATxTA3NWwRYj9zWOrG8rcyGREsjwloDu6yTE3AsNUwJbAkVBCOmzAwCe7FrKyEv/+tEwpamOk/dPIowmhxeY+8O29mwIXVFzz6e1mpBP2Tln+UyJijH4SgvOqUDMvc0S5j2AlSj6KHvTrercrpT9wcbTW/Cuq0= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(36860700013)(1800799024)(82310400026)(376014); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9826 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B91.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 284a68ce-a017-4a4d-d712-08de22df37ee X-Microsoft-Antispam: BCL:0; ARA:13230040|14060799003|376014|36860700013|35042699022|82310400026|1800799024; X-Microsoft-Antispam-Message-Info: RkeiR0YBonwDTMpvwIAAE+29c5McfsAh0Rgme5ch+e0Zj+hNZPbBISX4YxMtY4QBdR3iNw9YeoilBExpjo4gVyxyyBZ1CNJCljWxuuqiCAKJ563sQl5hBQETHIlpHskGV/UwuiH56ixOAfY53h2Pl27Kmmu7QTkVWbon7RJsdqEhglzUwTc3kRQ7rYwQXyL3xVQ/rTQVCfV/pq4fP16uSCc0QKKT7eMYjsgmSlzeR9KzFQFDkRS13iykXJArY1MOCJZvArzRUq8kccDHYMsLb9r967sRvIr4hVhlZ5/9YaSqp39RX57uV6nEPHzNRkuIvKxmzQBZPiaIwl9aq/0Ln3cKFHibi14bR+BvCRngjI40gt5/WarClrXRrUX82oPRf0TtfEdpeBPM0KiqaiDUqePqDv2cwWg0Z+VxxzP28TRCSoEMjeOReIpnp805W7Mz2twaTdCM54iyNAe5iCI1WYthymTAERSCCpXumUsWSiSrBKEyPa3hWDttX46WeHPinFSwPA71b5ha39quhU4dOwFYzj0b6HFshNLJTBoOGENQpfn7Pwl3fo940yBdhbZvIzcYExQUBL90K5YnjWnAoklDk1lcuJf6TzNLJ8Pu9DLjUpw2GPc7iet5Jgr3cRDiG4Q+MqG8w5EjQ+KrugTI8axHsD3VCTH3g2tfKVoMq9LOLLJzQ26bc1lWaVIPPTsltG4oGBrdiyR7TCOVtcTvlF5Zv8Q8v4Yh9E4B6jC5SqlM2gvOKIGWw+m6vrPv9r/vIe9wzb9ekVT8eqrQtbsu+9yhIxY8LM17DV4F5ajeCq7jMKHbJ7G360UU3cunEkRWorcjD2j5f2S527+5FxMSAChvsSNN18hxLtdtxP/I2cGPC7eKSC1hZWdJnYLAg47TbSBU8rCoiSPPPCnD3s+n4uAK5kbopgSyQxWt4s00GiHFH9mysNRAhMNkjxs4PnzPeXuRlMvjr6t/vL/O9qWARxywyWe5RUc2oj+o5TQQPT55PqZeralYoglm46t+UuvSGUJR2AyRVal8QxUUWRe/d0BhpTVnNeY4gzvYBSTxWWUtsrGQVgu0B6UPaG/M5D0gSTuidrIzh6AFGYi+D6MYkqIbAO42KDZ6OiStbwQU299ilQHfA3GrxGlexvqCtNoPLeZU/I/E5Le2aUC7THdI5C2C0a6gpUHXjXaeOOKkJ/RVq+p5udKPbREs89CZs+OK3EEIxZnM8koOCgbqqaESYh6rsG9Sqe3Dw/SCAZ1GIviqgFCZtlTB7f9vjqYHCdB2DBgKJL1/Jaq3gkjPjjOtUtDFUSosG7ns4ES/eY8XPTj2ZKjlrjIzBuqupzb5vRYZnGFMbtJxDLM6v+YOPGShfXcOm94neoubmTvU47/bN+Ag/uLINcr8NOBBbR/oRYtrjBBCJo/ft/kq5KVBZbQ4v7A7PhOQzy38HQK9QvylG+pdRRV/tqfvqcx/7sDWDMblNnabec3dsR+CeezIHEAU0lXCeygg6+wlsUig3J6CsUbmxadkCWXygm/69Yggw9ui9JthnXPQKUvHMuhXSXyoTcYxsGyEDDEVZCmZq2k45bE= X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(14060799003)(376014)(36860700013)(35042699022)(82310400026)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Nov 2025 18:06:01.5081 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 243972e8-c04a-4d8b-0099-08de22df4e0b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129]; Helo=[outbound-uk1.az.dlp.m.darktrace.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B91.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9850 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org powf: Update scalar special case function to best use new interface. pow: Make specialcase NOINLINE to prevent str/ldr leaking in fast path. Remove depency in sv_call2, as new callback impl is not a performance gain. Replace with vectorised specialcase since structure of scalar routine is fairly simple. Throughput gain of about 5-10% on V1 for large values and 25% for subnormal `x`. --- Ok for master? If so please commit for me as I don't have commit rights. Thanks, Pierre sysdeps/aarch64/fpu/pow_sve.c | 80 +++++++++++++++------------------- sysdeps/aarch64/fpu/powf_sve.c | 30 +++---------- 2 files changed, 40 insertions(+), 70 deletions(-) diff --git a/sysdeps/aarch64/fpu/pow_sve.c b/sysdeps/aarch64/fpu/pow_sve.c index 8fe51b7d4b..e09a52f115 100644 --- a/sysdeps/aarch64/fpu/pow_sve.c +++ b/sysdeps/aarch64/fpu/pow_sve.c @@ -31,8 +31,8 @@ The SVE algorithm drops the tail in the exp computation at the price of a lower accuracy, slightly above 1ULP. The SVE algorithm also drops the special treatement of small (< 2^-65) and - large (> 2^63) finite values of |y|, as they only affect non-round to nearest - modes. + large (> 2^63) finite values of |y|, as they only affect non-round to + nearest modes. Maximum measured error is 1.04 ULPs: SV_NAME_D2 (pow) (0x1.3d2d45bc848acp+63, -0x1.a48a38b40cd43p-12) @@ -156,42 +156,22 @@ sv_zeroinfnan (svbool_t pg, svuint64_t i) a double. (int32_t)KI is the k used in the argument reduction and exponent adjustment of scale, positive k here means the result may overflow and negative k means the result may underflow. */ -static inline double -specialcase (double tmp, uint64_t sbits, uint64_t ki) -{ - double scale; - if ((ki & 0x80000000) == 0) - { - /* k > 0, the exponent of scale might have overflowed by <= 460. */ - sbits -= 1009ull << 52; - scale = asdouble (sbits); - return 0x1p1009 * (scale + scale * tmp); - } - /* k < 0, need special care in the subnormal range. */ - sbits += 1022ull << 52; - /* Note: sbits is signed scale. */ - scale = asdouble (sbits); - double y = scale + scale * tmp; - return 0x1p-1022 * y; -} - -/* Scalar fallback for special cases of SVE pow's exp. */ static inline svfloat64_t -sv_call_specialcase (svfloat64_t x1, svuint64_t u1, svuint64_t u2, - svfloat64_t y, svbool_t cmp) +specialcase (svfloat64_t tmp, svuint64_t sbits, svuint64_t ki, svbool_t cmp) { - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) - { - double sx1 = svclastb (p, 0, x1); - uint64_t su1 = svclastb (p, 0, u1); - uint64_t su2 = svclastb (p, 0, u2); - double elem = specialcase (sx1, su1, su2); - svfloat64_t y2 = sv_f64 (elem); - y = svsel (p, y2, y); - p = svpnext_b64 (cmp, p); - } - return y; + svbool_t p_pos = svcmpge_n_f64 (cmp, svreinterpret_f64_u64 (ki), 0.0); + + /* Scale up or down depending on sign of k. */ + svint64_t offset + = svsel_s64 (p_pos, sv_s64 (1009ull << 52), sv_s64 (-1022ull << 52)); + svfloat64_t factor + = svsel_f64 (p_pos, sv_f64 (0x1p1009), sv_f64 (0x1p-1022)); + + svuint64_t offset_sbits + = svsub_u64_x (cmp, sbits, svreinterpret_u64_s64 (offset)); + svfloat64_t scale = svreinterpret_f64_u64 (offset_sbits); + svfloat64_t res = svmad_f64_x (cmp, scale, tmp, scale); + return svmul_f64_x (cmp, res, factor); } /* Compute y+TAIL = log(x) where the rounded result is y and TAIL has about @@ -214,8 +194,8 @@ sv_log_inline (svbool_t pg, svuint64_t ix, svfloat64_t *tail, /* log(x) = k*Ln2 + log(c) + log1p(z/c-1). */ /* SVE lookup requires 3 separate lookup tables, as opposed to scalar version - that uses array of structures. We also do the lookup earlier in the code to - make sure it finishes as early as possible. */ + that uses array of structures. We also do the lookup earlier in the code + to make sure it finishes as early as possible. */ svfloat64_t invc = svld1_gather_index (pg, __v_pow_log_data.invc, i); svfloat64_t logc = svld1_gather_index (pg, __v_pow_log_data.logc, i); svfloat64_t logctail = svld1_gather_index (pg, __v_pow_log_data.logctail, i); @@ -325,14 +305,14 @@ sv_exp_inline (svbool_t pg, svfloat64_t x, svfloat64_t xtail, svbool_t oflow = svcmpge (pg, abstop, HugeExp); oflow = svand_z (pg, uoflow, svbic_z (pg, oflow, uflow)); - /* For large |x| values (512 < |x| < 1024) scale * (1 + TMP) can overflow - or underflow. */ + /* Handle underflow and overlow in scale. + For large |x| values (512 < |x| < 1024), scale * (1 + TMP) can + overflow or underflow. */ svbool_t special = svbic_z (pg, uoflow, svorr_z (pg, uflow, oflow)); + if (__glibc_unlikely (svptest_any (pg, special))) + z = svsel (special, specialcase (tmp, sbits, ki, special), z); - /* Update result with special and large cases. */ - z = sv_call_specialcase (tmp, sbits, ki, z, special); - - /* Handle underflow and overflow. */ + /* Handle underflow and overflow in exp. */ svbool_t x_is_neg = svcmplt (pg, x, 0); svuint64_t sign_mask = svlsl_x (pg, sign_bias, 52 - V_POW_EXP_TABLE_BITS); @@ -353,7 +333,7 @@ sv_exp_inline (svbool_t pg, svfloat64_t x, svfloat64_t xtail, } static inline double -pow_sc (double x, double y) +pow_specialcase (double x, double y) { uint64_t ix = asuint64 (x); uint64_t iy = asuint64 (y); @@ -382,6 +362,14 @@ pow_sc (double x, double y) return x; } +/* Scalar fallback for special case routines with custom signature. */ +static svfloat64_t NOINLINE +sv_pow_specialcase (svfloat64_t x1, svfloat64_t x2, svfloat64_t y, + svbool_t cmp) +{ + return sv_call2_f64 (pow_specialcase, x1, x2, y, cmp); +} + svfloat64_t SV_NAME_D2 (pow) (svfloat64_t x, svfloat64_t y, const svbool_t pg) { const struct data *d = ptr_barrier (&data); @@ -444,7 +432,7 @@ svfloat64_t SV_NAME_D2 (pow) (svfloat64_t x, svfloat64_t y, const svbool_t pg) /* Cases of zero/inf/nan x or y. */ if (__glibc_unlikely (svptest_any (svptrue_b64 (), special))) - vz = sv_call2_f64 (pow_sc, x, y, vz, special); + vz = sv_pow_specialcase (x, y, vz, special); return vz; } diff --git a/sysdeps/aarch64/fpu/powf_sve.c b/sysdeps/aarch64/fpu/powf_sve.c index 22e6cc54fb..cbe2044926 100644 --- a/sysdeps/aarch64/fpu/powf_sve.c +++ b/sysdeps/aarch64/fpu/powf_sve.c @@ -116,11 +116,10 @@ zeroinfnan (uint32_t ix) preamble of scalar powf except that we do not update ix and sign_bias. This is done in the preamble of the SVE powf. */ static inline float -powf_specialcase (float x, float y, float z) +powf_specialcase (float x, float y) { uint32_t ix = asuint (x); uint32_t iy = asuint (y); - /* Either (x < 0x1p-126 or inf or nan) or (y is 0 or inf or nan). */ if (__glibc_unlikely (zeroinfnan (iy))) { if (2 * iy == 0) @@ -142,32 +141,15 @@ powf_specialcase (float x, float y, float z) x2 = -x2; return iy & 0x80000000 ? 1 / x2 : x2; } - /* We need a return here in case x<0 and y is integer, but all other tests - need to be run. */ - return z; + /* Return x for convenience, but make sure result is never used. */ + return x; } /* Scalar fallback for special case routines with custom signature. */ static svfloat32_t NOINLINE -sv_call_powf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y) +sv_call_powf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y, svbool_t cmp) { - /* Special cases of x or y: zero, inf and nan. */ - svbool_t xspecial = sv_zeroinfnan (svptrue_b32 (), svreinterpret_u32 (x1)); - svbool_t yspecial = sv_zeroinfnan (svptrue_b32 (), svreinterpret_u32 (x2)); - svbool_t cmp = svorr_z (svptrue_b32 (), xspecial, yspecial); - - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) - { - float sx1 = svclastb (p, 0, x1); - float sx2 = svclastb (p, 0, x2); - float elem = svclastb (p, 0, y); - elem = powf_specialcase (sx1, sx2, elem); - svfloat32_t y2 = sv_f32 (elem); - y = svsel (p, y2, y); - p = svpnext_b32 (cmp, p); - } - return y; + return sv_call2_f32 (powf_specialcase, x1, x2, y, cmp); } /* Compute core for half of the lanes in double precision. */ @@ -330,7 +312,7 @@ svfloat32_t SV_NAME_F2 (pow) (svfloat32_t x, svfloat32_t y, const svbool_t pg) ret = svsel (yint_or_xpos, ret, sv_f32 (__builtin_nanf (""))); if (__glibc_unlikely (svptest_any (cmp, cmp))) - return sv_call_powf_sc (x, y, ret); + return sv_call_powf_sc (x, y, ret, cmp); return ret; }