From patchwork Wed Dec 10 15:19:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pierre Blanchard X-Patchwork-Id: 126335 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 2F61F4BA2E30 for ; Wed, 10 Dec 2025 15:22:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2F61F4BA2E30 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=iWCRrp0P; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=iWCRrp0P X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from DU2PR03CU002.outbound.protection.outlook.com (mail-northeuropeazon11011003.outbound.protection.outlook.com [52.101.65.3]) by sourceware.org (Postfix) with ESMTPS id 1CCB34BA2E02 for ; Wed, 10 Dec 2025 15:20:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1CCB34BA2E02 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1CCB34BA2E02 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=52.101.65.3 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1765380061; cv=pass; b=emncRwiJwGoMn1KK9DQ/S7aU5JNy+vvVgpx1pZgNeUl/KXwMocA98tUTV3XmQKVrks3KVUT2TT9S66POR+3q9fkFuklgQbqsy2po7MynQ33JqxaD2jtB0UEDCyvw9L3cJMMj/OpSrl2kJWBWwzJn3+3FUmroNW84f6LFxD04ADE= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1765380061; c=relaxed/simple; bh=BwKmKdRbdE647aRXE4aGLrqIy92Na9P2xzDLnQJp1+Q=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=PHLhFZf8NKW59Mtnhe1pXt0bbOxHnTa0vwrn9xGnneHYo31FhJ6my0x6ChDxrnAT8DczCQCaLRxDbOPJRQAufbLs1vrEw0kp/wIdu6fHGc07SXMLUIMyagheQwTStRdaoaK7peoKFwCJ49/Duk6yUJBoMc1w6iXd94nl09/upjE= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1CCB34BA2E02 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=EHhyxH562xhJIpK1EyHxrBGvzxGx+y46E8Sc6U/RJhCz3oGH2ZtiYrYsABS9X6Y+GqqZmHiNEZwjHX8ZFypLSR7eR//825nO+ZXDb6ethlzZYlaFbvIWSD8ECM4eiAQBEiM7gD1czxP4lNOw/dhZWm/QHsJM5hEALOWcw7TrB6rVgqqcvDJw+MzbBCTelB/juOWL9Hhkr4YZc7bsYlWWQkc6k3EfktZlcE1fxbwkh9/bVpumM+zmyKrupT31qMVZC1iqZhxozBXAGodSKza5N43CZCZiMC1//g/QR8AgiHSaLw+ES1RCbxMI/CiJ5rmIcxY9gWjCB8rcQ7kag9+GcA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W1lqJ68T/UbdAcUqba8lkMBQ4ZuJuKrTZJn8A/GMNZ4=; b=HUMAGR2LZpAgi3flq+LaLLKViKEWKW9xx9eV6CZKElYs6sM8bOPpBgNvWDigRxU8u/445N0pLYEbVoeyyd0/AV4aPitDxyC/nHhNcsuwnlACP0rgQINSobZUqgapqeadAwZIsQh9dvqeKgum8uzJAMgV/El9JDO4tOaGeQRqW71fKzU3ewHiZwbolT8w/hjZgalbqjwDUxazL8XiBsyoM8lvcIhUl7tPe52sCluSEl9ekgM/oCRLu9jR/H24GDgE7NjUglSC7ozMh23oQhQHfs7nBhaEGupIJclsLcotI19RQlrzOesemmOrGs8AHsG310E0Ow2i1i09h0EKbLfUOw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W1lqJ68T/UbdAcUqba8lkMBQ4ZuJuKrTZJn8A/GMNZ4=; b=iWCRrp0PBWnpMCaJUG3JiR/j8d19xGl32jFFQKlZ8E/QueMbNwT3X8r2d1nnp5hIqRYWH1eAB+oJ2dp/EBmimUTIHTV17se8LHCrKO5g2XWkz8p10TE8pE8Ew71ZdwDV8+wrXL50lBhsJTPPxXoHAsd6FVy2b74ALpyJfmjklcM= Received: from AM0PR02CA0172.eurprd02.prod.outlook.com (2603:10a6:20b:28e::9) by PAXPR08MB6429.eurprd08.prod.outlook.com (2603:10a6:102:dc::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9412.7; Wed, 10 Dec 2025 15:20:49 +0000 Received: from AM4PEPF00027A5F.eurprd04.prod.outlook.com (2603:10a6:20b:28e:cafe::c3) by AM0PR02CA0172.outlook.office365.com (2603:10a6:20b:28e::9) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Wed, 10 Dec 2025 15:20:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 4.158.2.129 as permitted sender) receiver=protection.outlook.com; client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by AM4PEPF00027A5F.mail.protection.outlook.com (10.167.16.74) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9412.4 via Frontend Transport; Wed, 10 Dec 2025 15:20:49 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=bR18d/tZc9spxN7Nlk+VIg8o7d5Lsyad1TgGJmSPqrUlqd2pRpC2bFVvZKYL3cjUi4/T8xdrEI8JUshevw8O0lG80UE/80LK1ODQAPwsvCMzzbCEO+D2/nfMvpOb8pnRoMgQT7Qto01zlB6Wv6QR4rqDIOek2IoNfWZCsZ/K/klkuhnJ574TlUGIOqVb0GSBZGvI2QJcgqZFcGTwXr/LTdyQTkji9AagzX5W5dweM/vDW4wGeWdvFrgJZMOqS8fICN41zDy+VLFhnxMciJrDkR70h2nRcQQN9csH6eKYNNO9CsFh5YsoJ1Hf85THAGg0BKCo6T1ynS7a5LkjPPwigQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=W1lqJ68T/UbdAcUqba8lkMBQ4ZuJuKrTZJn8A/GMNZ4=; b=YUlDK2xZqQtZazBtO2wjsxNVT6NI40WCHTsMoy1oMEYJgNXYYzzkfECh7iAzd/ajWzHQ3TOggkwFbmL0MbkgWFZlquSCjDFYbNfm3N0nCRFyJ+QO2AjyaHrOPJd/pbAdSp8W5VrC8KXuTwdcx+GnwElJYft/5J/lswEJK/0XeNKI80E3ID57WEh7HV/Ge3Tm6xAPUXTGTx7iEKRiUto8fFFwkbRbcyqz6ITS6GPpbEKkoDySQURvNL7yhoOmKYwOqBEPFhaFZV5ZouVsfAzNQ2dwDPLy4znl5gnqU2f56DLnLwmV3cM9QaIZ+3aR7Ruff22g9fNkFbs0Fba28GiCLw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W1lqJ68T/UbdAcUqba8lkMBQ4ZuJuKrTZJn8A/GMNZ4=; b=iWCRrp0PBWnpMCaJUG3JiR/j8d19xGl32jFFQKlZ8E/QueMbNwT3X8r2d1nnp5hIqRYWH1eAB+oJ2dp/EBmimUTIHTV17se8LHCrKO5g2XWkz8p10TE8pE8Ew71ZdwDV8+wrXL50lBhsJTPPxXoHAsd6FVy2b74ALpyJfmjklcM= Received: from AM8P191CA0010.EURP191.PROD.OUTLOOK.COM (2603:10a6:20b:21a::15) by AS2PR08MB9498.eurprd08.prod.outlook.com (2603:10a6:20b:60e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9343.14; Wed, 10 Dec 2025 15:19:43 +0000 Received: from AM2PEPF0001C70C.eurprd05.prod.outlook.com (2603:10a6:20b:21a:cafe::fa) by AM8P191CA0010.outlook.office365.com (2603:10a6:20b:21a::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9412.7 via Frontend Transport; Wed, 10 Dec 2025 15:19:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (172.205.89.229) by AM2PEPF0001C70C.mail.protection.outlook.com (10.167.16.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.8 via Frontend Transport; Wed, 10 Dec 2025 15:19:43 +0000 Received: from AZ-NEU-EX03.Arm.com (10.240.25.137) by AZ-NEU-EX04.Arm.com (10.240.25.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.29; Wed, 10 Dec 2025 15:19:39 +0000 Received: from ip-10-252-30-205.eu-west-1.compute.internal (10.252.0.220) by mail.arm.com (10.240.25.137) with Microsoft SMTP Server id 15.2.2562.29 via Frontend Transport; Wed, 10 Dec 2025 15:19:39 +0000 From: Pierre Blanchard To: CC: Pierre Blanchard Subject: [PATCH 4/4] AArch64: Implement AdvSIMD and SVE powr(f) routines Date: Wed, 10 Dec 2025 15:19:29 +0000 Message-ID: <20251210151929.185631-4-pierre.blanchard@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251210151929.185631-1-pierre.blanchard@arm.com> References: <20251210151929.185631-1-pierre.blanchard@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM2PEPF0001C70C:EE_|AS2PR08MB9498:EE_|AM4PEPF00027A5F:EE_|PAXPR08MB6429:EE_ X-MS-Office365-Filtering-Correlation-Id: 3bb40dc0-4fef-4b8c-f0ba-08de37ffb2e7 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|36860700013|82310400026|1800799024|13003099007; X-Microsoft-Antispam-Message-Info-Original: 3gIUMbh/CH8ZN4KDVQ8OowxvRSjkjvPJtvcWlIgb28Zo+egr1vjttC5W4yzqXAmMGw9eZGGZ8s/eTs1I/1u0uSrv+h0fPR7Dwv9n/o2WzIudbrCYay61INNqXuDnYlZNXvKiQ5thElNxmZfQWtd88hIh88rLAbh5SoQdy+BT8moOU1klYR9r3olaBPCbNR5/XEtjMAedo3Y+j+dqpdYP7kMnJEgBDbu+qd6w4nZrIFR3+Xsf3CLo+j0HaPl5p8zwmmUnwWxS5Bmv3QbMglqoE9sCsCBPYiyX+dn50UYZMjGfbxe47wSTXwc6lyD6ikfKvOOQwt0myCobgezPTz5+eG7hbJrO9XSvlCLamQQ1wW8eRVcgnf2gxB4kZplVb5rIzwZ6EYsfN633TlkiaOOtCz61fmdSQiC8eKaPBXJUHPS3yvndiA8YznwYV56rM+KLl3cVyfDRSxmbZBqS/9Ob6mtVzlE9+nUfQXqLqwPlaHEzhbLc/IGIvwOHgplHiayYszpnCz48WNhP5/OBNvjlyFU3hsvkIY0CEJrADeWa7aJoRFjFy3rBWW3hyV52c6MGccVKr9pbBRslC8xphJ60sgSgme0LeOfJCfkTD6xIC71pMdACpF3I5suc2YzfTl6lEUT8OQV0yK0sv/PYXniF7otcvXfD/xjvriQSKoZgWviTV56rDJSf3B5HNBcxxbZ763xo0vtLt7Wes6BVevXTlb68uHnXfA0OCptZlWi36KBbxL7kI/nPBecJQafCHwHty6GEjVd2Vwr6tj152ZgTchVRPD8VJ/ZUh1w5M0J9KheYYupuTFF/guBGjKL83Svt9usuL/cGzdHJRsZ3slc88UyWVDINAOvlmSnYP1oCk1Fk99mwGxvz89raWF25ambftp5EVXs4MjJunINEcsaWCCfW1jI1lp+L90ra4mvo3Mbbz97dPFP+I/OmFhGUqLWin/JCR/o9yEFBMkzj8iSkSBC2BeC5gIDhGAIqFQj46PZb5xOSHOXQrLg5rA1QnEOp7HPv65gaZX5X38ui7b2Jzhz5Yfwo6rRlrvB3vn84FHJG/RPp8xDibN0RlFgjik5GVsGzO9DYsL6EzVCk7cKSssGz+xqtfDHed+bCWr6kPFnujwDNQpqgLR2ksXknaYQD0z7v7dsejD10xJtgJ81OoLO669gVGMTyML3LKMGqNuugcibR1UYJxOnAsfPF/rl8eXrhUabM9mgdy7EykDFh773Xnn8jv8NJEIPGKHNJ3bWN3/2V1+jstjtsCiQx9k2wXTxWNqN8mfYNgCiGx1Pf6Lvqlq6kWjEcs6hfxNp1s/ZoNxbRqDWX1ydk3HW/YPOVsEFEbxQ8p0kvt1ySUy0w9XN/60p1V27WCzEOf2c8Rr57hMI37AqjyeP3TT6wezN41cnf/kfLz+JZti8wN69A6/TTMBKlLXjLQ68OnSX76yfkNV6widIjRjQ+b1KyGqUTbloOeGg8U/QS4J7V2swGdos48RIiyCmeY0JpWJLTOWayFC6mEj9ti765NhvejBAncTlrrzs35xqh4wmN2q6J5B2AJ+yFVI1XZQALhAIOU7s= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(36860700013)(82310400026)(1800799024)(13003099007); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9498 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00027A5F.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 91dabf6f-b587-403a-100d-08de37ff8bb9 X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|1800799024|36860700013|82310400026|35042699022|14060799003|13003099007; X-Microsoft-Antispam-Message-Info: sRRnLXdL0fuTy8a+MwDNpUC0oPNRv3L3WjS5pjL/bsijelT2OhjkC6AFvJk++bYCJC7Jy1FXx9W5p56VrL/3q6E0UDfYYaYrxh+v40MfjrvW5OgnBn6yMlHMZ1O/Hd1F85ElBj5n/nN6UPlDEsT5v+8lOYeqlgv/kS41IcEyKW+wWO1FCghrSem1B4qmR9FGALItliHGyZEdI/VVMzq6mke+fMptO0XJk/FIn4oS4Xf/pisfEBydUl1NjpA9S+eUr6JZ0omk0tywi+LRer+2EP1eq5ip6sbr4ALwfBdTGuXi/DHWf57Ha2gXksBXuTDNnZkaL2JQCFIn9x+Pfbu6EUuJIoqQHF7nsIGt7s9aIYx7W/JJlaeGWHbfY1wgVQ4nM4L+ZsMTKcY56UGbRurwoO6JdJwUjGXajSPSIgdGEdiYn3TM5I+ED0Q+pALNbUulq/qbjFjcAiV+d5CnvCVKPq2+/66vFd3J48f9ovLsujIlNahub1S5aupu1TnHh3j3cf1zAFPT4D73HBwhnzK+DBqeFRbiwsQtB2hHkCxuKeftDAgAM09EcWYxa35wOouWsZnRzlD+1xCTCLdXwXY6yUpw43KAhsjclND/Ax0zL3qUg4UlXos4xgc9iyXUEZX+r88cJb+TG+UWoGMFKPyjzBXX0ljV+paBSmcc+w6IZ1XR5JtGUZVw3ceLaai5LsIhrs4EKhuccZjn1D9+b6QzF6pE3+AbASP6TVRW0U87boRoasmpHXx2aaT7yDQW4oRDGdBBdHE4BzFJoYHs5mfZUKOXpKgZ9/mHdLASDKcahzHNNMkCZWoWfhZR2BMkuPTcQb5+kfm2YvOuM2bjWBQjgRW63Sp2JavM+qCHMzjyGkELaKV4HfYrF61s9MkzXiQG98+XDyiWbii+YEGIFsATReOcI8m7K96Yew5a7YIiiKtkSOqlp/lwLY01e36GP3eEqY1nx5fjQwUeq0nezcbf23eLeFHDFXeUglinRhW8tyrlFtpfh0f/dT6CZ9YSj+d7JMhRmXaKxODnaJuEqeXxyq99LPKjABXRMvBSbgfKckEWyI6zrmImmAWSF0cZZb0J3mSXzp/OmMzU/96uStyIK2Y+X01LyH4rRWwnXb1fZ6GPt9+xyb5+X3Sw7IDj6v7VjaS+zk+URse2f/ZBZb4TKCqNuI0P+oqB0Tf/g1MJz65F6szs5HgzrC20UA235PqXXFnOpJ4kV9yzW19c93cOPlMehUn9ccTTn04s2GdsNRrLo9Ps2ZvKMJF90XFUlVY6ImyF/OwywLTItQRs+p1W0DXf2vkTEFApuVX+qJDY+Ud6tLZ7YHSmXduX/4Yk4MTh156k5FMxDWBqwrNkhPKVeZd2nhGHl9VrztJ2osKM05kZcWXZAeHNLhvl5WFVYoABrYrUmaCuGYlz4KODhqlnUI0pX32dJe6Wln08wq3KIswGhlU/bzZPGRN3koKGkoP/YO388yKBCiYrBHog1stOQKvLaPDpzO+wJBXmx6fEPKnYVplUyyW9C+cEcN39PnXRSUvTHfGtq5kTlvo9NkWFge0lGe70EQsh59/R2dfDpE0= X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(1800799024)(36860700013)(82310400026)(35042699022)(14060799003)(13003099007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Dec 2025 15:20:49.0570 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3bb40dc0-4fef-4b8c-f0ba-08de37ffb2e7 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129]; Helo=[outbound-uk1.az.dlp.m.darktrace.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00027A5F.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6429 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_BLOCKED, RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_PASS, SPF_NONE, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Vector variants of the new C23 powr routines. These provide same maximum error error as by virtue of relying on shared approximation techniques and sources. Note: Benchmark inputs for powr(f) are identical to pow(f). Performance gain over pow on V1 with GCC@15: - SVE powr: 10-12% on subnormal x, 12-13% on x < 0. - SVE powrf: 15% on all x < 0. - AdvSIMD powr: for x < 0, 40% if x subnormal, 60% otherwise. - AdvSIMD powrf: 4% on x subnormals or x < 0. --- Ok for master? If so please commit for me as I don't have commit rights. Thanks, Pierre bits/libm-simd-decl-stubs.h | 11 ++ math/bits/mathcalls.h | 1 + sysdeps/aarch64/fpu/Makefile | 1 + sysdeps/aarch64/fpu/Versions | 5 + sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 + .../fpu/finclude/math-vector-fortran.h | 2 + sysdeps/aarch64/fpu/powr_advsimd.c | 147 ++++++++++++++++++ sysdeps/aarch64/fpu/powr_sve.c | 122 +++++++++++++++ sysdeps/aarch64/fpu/powrf_advsimd.c | 135 ++++++++++++++++ sysdeps/aarch64/fpu/powrf_sve.c | 135 ++++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + .../unix/sysv/linux/aarch64/libmvec.abilist | 5 + 16 files changed, 577 insertions(+) create mode 100644 sysdeps/aarch64/fpu/powr_advsimd.c create mode 100644 sysdeps/aarch64/fpu/powr_sve.c create mode 100644 sysdeps/aarch64/fpu/powrf_advsimd.c create mode 100644 sysdeps/aarch64/fpu/powrf_sve.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index e12936d7f7..2715d07e6b 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -99,6 +99,17 @@ #define __DECL_SIMD_powf64x #define __DECL_SIMD_powf128x +#define __DECL_SIMD_powr +#define __DECL_SIMD_powrf +#define __DECL_SIMD_powrl +#define __DECL_SIMD_powrf16 +#define __DECL_SIMD_powrf32 +#define __DECL_SIMD_powrf64 +#define __DECL_SIMD_powrf128 +#define __DECL_SIMD_powrf32x +#define __DECL_SIMD_powrf64x +#define __DECL_SIMD_powrf128x + #define __DECL_SIMD_acos #define __DECL_SIMD_acosf #define __DECL_SIMD_acosl diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 592a80fcb9..f1919cfd49 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -197,6 +197,7 @@ __MATHCALL (compoundn,, (_Mdouble_ __x, long long int __y)); __MATHCALL (pown,, (_Mdouble_ __x, long long int __y)); /* Return X to the Y power. */ +__MATHCALL_VEC (powr,, (_Mdouble_ __x, _Mdouble_ __y)); __MATHCALL (powr,, (_Mdouble_ __x, _Mdouble_ __y)); /* Return the Yth root of X. */ diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 998fc08d43..6c8cacf21d 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -29,6 +29,7 @@ libmvec-supported-funcs = acos \ log2 \ log2p1 \ pow \ + powr \ rsqrt \ sin \ sinh \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index d68510a20e..c70537911a 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -200,6 +200,11 @@ libmvec { _ZGVnN4v_log10p1f; _ZGVsMxv_log10p1; _ZGVsMxv_log10p1f; + _ZGVnN2vv_powr; + _ZGVnN2vv_powrf; + _ZGVnN4vv_powrf; + _ZGVsMxvv_powr; + _ZGVsMxvv_powrf; _ZGVnN2v_rsqrt; _ZGVnN2v_rsqrtf; _ZGVnN4v_rsqrtf; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index abdb1ff114..613440e7db 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -47,6 +47,7 @@ libmvec_hidden_proto (V_NAME_F1(log2p1)); libmvec_hidden_proto (V_NAME_F1(logp1)); libmvec_hidden_proto (V_NAME_F1(log)); libmvec_hidden_proto (V_NAME_F2(pow)); +libmvec_hidden_proto (V_NAME_F2(powr)); libmvec_hidden_proto (V_NAME_F1(rsqrt)); libmvec_hidden_proto (V_NAME_F1(sin)); libmvec_hidden_proto (V_NAME_F1(sinh)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index 7406552f49..8f668c49cf 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -157,6 +157,10 @@ # define __DECL_SIMD_pow __DECL_SIMD_aarch64 # undef __DECL_SIMD_powf # define __DECL_SIMD_powf __DECL_SIMD_aarch64 +# undef __DECL_SIMD_powr +# define __DECL_SIMD_powr __DECL_SIMD_aarch64 +# undef __DECL_SIMD_powrf +# define __DECL_SIMD_powrf __DECL_SIMD_aarch64 # undef __DECL_SIMD_rsqrt # define __DECL_SIMD_rsqrt __DECL_SIMD_aarch64 # undef __DECL_SIMD_rsqrtf @@ -243,6 +247,7 @@ __vpcs __f32x4_t _ZGVnN4v_log2f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_log2p1f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_logp1f (__f32x4_t); __vpcs __f32x4_t _ZGVnN4vv_powf (__f32x4_t, __f32x4_t); +__vpcs __f32x4_t _ZGVnN4vv_powrf (__f32x4_t, __f32x4_t); __vpcs __f32x4_t _ZGVnN4v_rsqrtf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinhf (__f32x4_t); @@ -283,6 +288,7 @@ __vpcs __f64x2_t _ZGVnN2v_log2 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_log2p1 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_logp1 (__f64x2_t); __vpcs __f64x2_t _ZGVnN2vv_pow (__f64x2_t, __f64x2_t); +__vpcs __f64x2_t _ZGVnN2vv_powr (__f64x2_t, __f64x2_t); __vpcs __f64x2_t _ZGVnN2v_rsqrt (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sin (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sinh (__f64x2_t); @@ -328,6 +334,7 @@ __sv_f32_t _ZGVsMxv_log2f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_log2p1f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_logp1f (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxvv_powf (__sv_f32_t, __sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxvv_powrf (__sv_f32_t, __sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_rsqrtf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinhf (__sv_f32_t, __sv_bool_t); @@ -368,6 +375,7 @@ __sv_f64_t _ZGVsMxv_log2 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_log2p1 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_logp1 (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxvv_pow (__sv_f64_t, __sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxvv_powr (__sv_f64_t, __sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_rsqrt (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sin (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sinh (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h index 55e3469a2c..909b378a68 100644 --- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h @@ -80,6 +80,8 @@ !GCC$ builtin (logp1f) attributes simd (notinbranch) !GCC$ builtin (pow) attributes simd (notinbranch) !GCC$ builtin (powf) attributes simd (notinbranch) +!GCC$ builtin (powr) attributes simd (notinbranch) +!GCC$ builtin (powrf) attributes simd (notinbranch) !GCC$ builtin (rsqrt) attributes simd (notinbranch) !GCC$ builtin (rsqrtf) attributes simd (notinbranch) !GCC$ builtin (sin) attributes simd (notinbranch) diff --git a/sysdeps/aarch64/fpu/powr_advsimd.c b/sysdeps/aarch64/fpu/powr_advsimd.c new file mode 100644 index 0000000000..3890c8f5f9 --- /dev/null +++ b/sysdeps/aarch64/fpu/powr_advsimd.c @@ -0,0 +1,147 @@ +/* Double-precision vector (AdvSIMD) powr function + + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +#include "v_pow_inline.h" + +static double NOINLINE +powr_scalar_special_case (double x, double y) +{ + /* Negative x returns NaN (+0/-0 and NaN x not handled here). */ + if (x < 0) + return __builtin_nan (""); + + uint64_t ix = asuint64 (x); + uint64_t iy = asuint64 (y); + uint32_t topx = top12 (x); + uint32_t topy = top12 (y); + + /* Special cases: (x < 0x1p-126 or inf or nan) or + (|y| < 0x1p-65 or |y| >= 0x1p63 or nan). */ + if (__glibc_unlikely (topx - SmallPowX >= ThresPowX + || (topy & 0x7ff) - SmallPowY >= ThresPowY)) + { + /* |y| is 0, Inf or NaN. */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + if (2 * ix > 2 * asuint64 (INFINITY) + || 2 * iy > 2 * asuint64 (INFINITY)) + return __builtin_nan (""); + if (2 * iy == 0) + { + /* |x| = 0 or inf. */ + if ((2 * ix == 0) || (2 * ix == 2 * asuint64 (INFINITY))) + return __builtin_nan (""); + /* x is finite. */ + return 1.0; + } + /* |y| = Inf and x = 1.0. */ + if (ix == asuint64 (1.0)) + return __builtin_nan (""); + /* |x| < 1 and y = Inf or |x| > 1 and y = -Inf. */ + if ((2 * ix < 2 * asuint64 (1.0)) == !(iy >> 63)) + return 0.0; + /* |y| = Inf and previous conditions not met. */ + return y * y; + } + /* |x| is 0, Inf or NaN. */ + if (__glibc_unlikely (zeroinfnan (ix))) + { + double x2 = x * x; + return iy >> 63 ? 1 / x2 : x2; + } + /* Here x and y are non-zero finite. */ + /* Note: if |y| > 1075 * ln2 * 2^53 ~= 0x1.749p62 then powr(x,y) = inf/0 + and if |y| < 2^-54 / 1075 ~= 0x1.e7b6p-65 then powr(x,y) = +-1. */ + if ((topy & 0x7ff) - SmallPowY >= ThresPowY) + { + if (ix == asuint64 (1.0)) + return 1.0; + /* |y| < 2^-65, x^y ~= 1 + y*log(x). */ + if ((topy & 0x7ff) < SmallPowY) + return 1.0; + return (ix > asuint64 (1.0)) == (topy < 0x800) ? INFINITY : 0; + } + if (topx == 0) + { + /* Normalize subnormal x so exponent becomes negative. */ + ix = asuint64 (x * 0x1p52); + ix -= 52ULL << 52; + } + } + + /* Core computation of exp (y * log (x)). */ + double lo; + double hi = log_inline (ix, &lo); + double ehi = y * hi; + double elo = y * lo + fma (y, hi, -ehi); + return exp_inline (ehi, elo, 0); +} + +static float64x2_t VPCS_ATTR NOINLINE +scalar_fallback (float64x2_t x, float64x2_t y) +{ + return (float64x2_t){ powr_scalar_special_case (x[0], y[0]), + powr_scalar_special_case (x[1], y[1]) }; +} + +/* Implementation of AdvSIMD powr. + Maximum measured error is 1.04 ULPs: + _ZGVnN2vv_powr(0x1.024a3e56b3c3p-136, 0x1.87910248b58acp-13) + got 0x1.f71162f473251p-1 + want 0x1.f71162f473252p-1. */ +float64x2_t VPCS_ATTR V_NAME_D2 (powr) (float64x2_t x, float64x2_t y) +{ + const struct data *d = ptr_barrier (&data); + + /* Case of x <= 0 is too complicated to be vectorised efficiently here, + fallback to scalar pow for all lanes if any x < 0 detected. */ + if (v_any_u64 (vclezq_s64 (vreinterpretq_s64_f64 (x)))) + return scalar_fallback (x, y); + + uint64x2_t vix = vreinterpretq_u64_f64 (x); + uint64x2_t viy = vreinterpretq_u64_f64 (y); + + /* Special cases of x or y. + The case y==0 does not trigger a special case, since in this case it is + necessary to fix the result only if x is a signalling nan, which already + triggers a special case. We test y==0 directly in the scalar fallback. */ + uint64x2_t x_is_inf_or_nan = vcgeq_u64 (vandq_u64 (vix, d->inf), d->inf); + uint64x2_t y_is_inf_or_nan = vcgeq_u64 (vandq_u64 (viy, d->inf), d->inf); + uint64x2_t special = vorrq_u64 (x_is_inf_or_nan, y_is_inf_or_nan); + + /* Fallback to scalar on all lanes if any lane is inf or nan. */ + if (__glibc_unlikely (v_any_u64 (special))) + return scalar_fallback (x, y); + + /* Cases of subnormal x: |x| < 0x1p-1022. */ + uint64x2_t x_is_subnormal = vcaltq_f64 (x, d->subnormal_bound); + if (__glibc_unlikely (v_any_u64 (x_is_subnormal))) + { + /* Normalize subnormal x so exponent becomes negative. */ + uint64x2_t vix_norm + = vreinterpretq_u64_f64 (vmulq_f64 (x, d->subnormal_scale)); + vix_norm = vsubq_u64 (vix_norm, d->subnormal_bias); + x = vbslq_f64 (x_is_subnormal, vreinterpretq_f64_u64 (vix_norm), x); + } + + /* Core computation of exp (y * log (x)). */ + return v_pow_inline (x, y, d); +} diff --git a/sysdeps/aarch64/fpu/powr_sve.c b/sysdeps/aarch64/fpu/powr_sve.c new file mode 100644 index 0000000000..142ff6cc65 --- /dev/null +++ b/sysdeps/aarch64/fpu/powr_sve.c @@ -0,0 +1,122 @@ +/* Double-precision vector (SVE) powr function + + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "math_config.h" +#include "sv_math.h" + +#define WANT_SV_POW_SIGN_BIAS 0 +#include "sv_pow_inline.h" + +/* A scalar subroutine used to fix main powr special cases. */ +static inline double +powr_specialcase (double x, double y) +{ + uint64_t ix = asuint64 (x); + uint64_t iy = asuint64 (y); + /* |y| is 0, Inf or NaN. */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + /* |x| or |y| is NaN. */ + if (2 * ix > 2 * asuint64 (INFINITY) || 2 * iy > 2 * asuint64 (INFINITY)) + return __builtin_nan (""); + /* |y| is 0.0. */ + if (2 * iy == 0) + { + /* |x| = 0 or Inf. */ + if ((2 * ix == 0) || (2 * ix == 2 * asuint64 (INFINITY))) + return __builtin_nan (""); + /* x is finite. */ + return 1.0; + } + /* x is 1.0. */ + if (ix == asuint64 (1.0)) + return __builtin_nan (""); + /* |x| < 1 and y = Inf or |x| > 1 and y = -Inf. */ + if ((2 * ix < 2 * asuint64 (1.0)) == !(iy >> 63)) + return 0.0; + /* |y| = Inf and previous conditions not met. */ + return y * y; + } + /* x is 0, Inf or NaN. Negative x are handled in the core. */ + if (__glibc_unlikely (zeroinfnan (ix))) + { + double x2 = x * x; + return (iy >> 63) ? 1 / x2 : x2; + } + /* Return x for convenience, but make sure result is never used. */ + return x; +} + +/* Scalar fallback for special case routines with custom signature. */ +static svfloat64_t NOINLINE +sv_powr_specialcase (svfloat64_t x1, svfloat64_t x2, svfloat64_t y, + svbool_t cmp) +{ + return sv_call2_f64 (powr_specialcase, x1, x2, y, cmp); +} + +/* Implementation of SVE powr. + + Provides the same accuracy as AdvSIMD pow and powr, since it relies on the + same algorithm. + + Maximum measured error is 1.04 ULPs: + SV_NAME_D2 (powr) (0x1.3d2d45bc848acp+63, -0x1.a48a38b40cd43p-12) + got 0x1.f7116284221fcp-1 + want 0x1.f7116284221fdp-1. */ +svfloat64_t SV_NAME_D2 (powr) (svfloat64_t x, svfloat64_t y, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svuint64_t vix = svreinterpret_u64 (x); + svuint64_t viy = svreinterpret_u64 (y); + + svbool_t xpos = svcmpge (pg, x, sv_f64 (0.0)); + + /* Special cases of x or y: zero, inf and nan. */ + svbool_t xspecial = sv_zeroinfnan (xpos, vix); + svbool_t yspecial = sv_zeroinfnan (xpos, viy); + svbool_t cmp = svorr_z (xpos, xspecial, yspecial); + + /* Cases of positive subnormal x: 0 < x < 0x1p-1022. */ + svbool_t x_is_subnormal = svaclt (xpos, x, 0x1p-1022); + if (__glibc_unlikely (svptest_any (xpos, x_is_subnormal))) + { + /* Normalize subnormal x so exponent becomes negative. */ + svuint64_t vix_norm + = svreinterpret_u64 (svmul_m (x_is_subnormal, x, 0x1p52)); + vix = svsub_m (x_is_subnormal, vix_norm, 52ULL << 52); + } + + svfloat64_t vlo; + svfloat64_t vhi = sv_log_inline (xpos, vix, &vlo, d); + + svfloat64_t vehi = svmul_x (svptrue_b64 (), y, vhi); + svfloat64_t vemi = svmls_x (xpos, vehi, y, vhi); + svfloat64_t velo = svnmls_x (xpos, vemi, y, vlo); + svfloat64_t vz = sv_exp_inline (xpos, vehi, velo, sv_u64 (0), d); + + /* Cases of negative x. */ + vz = svsel (xpos, vz, sv_f64 (__builtin_nan (""))); + + if (__glibc_unlikely (svptest_any (cmp, cmp))) + return sv_powr_specialcase (x, y, vz, cmp); + + return vz; +} diff --git a/sysdeps/aarch64/fpu/powrf_advsimd.c b/sysdeps/aarch64/fpu/powrf_advsimd.c new file mode 100644 index 0000000000..64ddd65ac7 --- /dev/null +++ b/sysdeps/aarch64/fpu/powrf_advsimd.c @@ -0,0 +1,135 @@ +/* Single-precision vector (AdvSIMD) powr function + + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "flt-32/math_config.h" +#include "v_math.h" +#include "v_powrf_inline.h" + +/* A scalar subroutine used to fix main powrf special cases. */ +static inline float +powrf_specialcase (float x, float y) +{ + /* Negative x returns NaN (+0/-0 and NaN x not handled here). */ + if (x < 0) + return __builtin_nanf (""); + + uint32_t ix = asuint (x); + uint32_t iy = asuint (y); + /* y is 0, Inf or NaN. */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + /* |x| or |y| is NaN. */ + if (2 * ix > 2u * 0x7f800000 || 2 * iy > 2u * 0x7f800000) + return __builtin_nanf (""); + /* |y| = 0. */ + if (2 * iy == 0) + { + /* |x| = 0 or inf. */ + if ((2 * ix == 0) || (2 * ix == 2u * 0x7f800000)) + return __builtin_nanf (""); + /* x is finite. */ + return 1.0f; + } + /* |y| = Inf and x = 1.0. */ + if (ix == 0x3f800000) + return __builtin_nanf (""); + /* |x| < 1 and y = Inf or |x| > 1 and y = -Inf. */ + if ((2 * ix < 2 * 0x3f800000) == !(iy & 0x80000000)) + return 0.0f; + /* |y| = Inf and previous conditions not met. */ + return y * y; + } + /* x is 0, Inf or NaN. Negative x are handled in the core. */ + if (__glibc_unlikely (zeroinfnan (ix))) + { + float x2 = x * x; + return iy & 0x80000000 ? 1 / x2 : x2; + } + + /* Return x for convenience, but make sure result is never used. */ + return x; +} + +/* Special case function wrapper. */ +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, float32x4_t ret, uint32x4_t cmp) +{ + return v_call2_f32 (powrf_specialcase, x, y, ret, cmp); +} + +/* Power implementation for x containing negative or subnormal lanes. */ +static inline float32x4_t +v_powrf_x_is_neg_or_sub (float32x4_t x, float32x4_t y, const struct data *d) +{ + uint32x4_t xsmall = vcaltq_f32 (x, v_f32 (0x1p-126f)); + + /* Normalize subnormals. */ + float32x4_t a = vabsq_f32 (x); + uint32x4_t ia_norm = vreinterpretq_u32_f32 (vmulq_f32 (a, d->norm)); + ia_norm = vsubq_u32 (ia_norm, d->subnormal_bias); + a = vbslq_f32 (xsmall, vreinterpretq_f32_u32 (ia_norm), a); + + /* Evaluate exp (y * log(x)) using |x| and sign bias correction. */ + float32x4_t ret = v_powrf_core (a, y, d); + + /* Cases of finite y and finite negative x. */ + uint32x4_t xisneg = vcltzq_f32 (x); + return vbslq_f32 (xisneg, d->nan, ret); +} + +/* Implementation of AdvSIMD powrf. + + powr(x,y) := exp(y * log (x)) + + This means powr(x,y) core computation matches that of pow(x,y) + but powr returns NaN for negative x even if y is an integer. + + Maximum measured error is 2.57 ULPs: + V_NAME_F2 (powr) (0x1.031706p+0, 0x1.ce2ec2p+12) + got 0x1.fff868p+127 + want 0x1.fff862p+127. */ +float32x4_t VPCS_ATTR NOINLINE V_NAME_F2 (powr) (float32x4_t x, float32x4_t y) +{ + const struct data *d = ptr_barrier (&data); + + /* Special cases of x or y: zero, inf and nan. */ + uint32x4_t ix = vreinterpretq_u32_f32 (x); + uint32x4_t iy = vreinterpretq_u32_f32 (y); + uint32x4_t xspecial = v_zeroinfnan (d, ix); + uint32x4_t yspecial = v_zeroinfnan (d, iy); + uint32x4_t cmp = vorrq_u32 (xspecial, yspecial); + + /* Evaluate pow(x, y) for x containing negative or subnormal lanes. */ + uint32x4_t x_is_neg_or_sub = vcltq_f32 (x, v_f32 (0x1p-126f)); + if (__glibc_unlikely (v_any_u32 (x_is_neg_or_sub))) + { + float32x4_t ret = v_powrf_x_is_neg_or_sub (x, y, d); + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, ret, cmp); + return ret; + } + + /* Else evaluate pow(x, y) for normal and positive x only. */ + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, v_powrf_core (x, y, d), cmp); + return v_powrf_core (x, y, d); +} + +libmvec_hidden_def (V_NAME_F2 (powr)) +HALF_WIDTH_ALIAS_F2 (powr) diff --git a/sysdeps/aarch64/fpu/powrf_sve.c b/sysdeps/aarch64/fpu/powrf_sve.c new file mode 100644 index 0000000000..fc6adee659 --- /dev/null +++ b/sysdeps/aarch64/fpu/powrf_sve.c @@ -0,0 +1,135 @@ +/* Single-precision vector (SVE) powr function + + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "flt-32/math_config.h" +#include "sv_math.h" + +#define WANT_SV_POWF_SIGN_BIAS 0 +#include "sv_powf_inline.h" + +/* A scalar subroutine used to fix main powrf special cases. */ +static inline float +powrf_specialcase (float x, float y) +{ + uint32_t ix = asuint (x); + uint32_t iy = asuint (y); + /* |y| is 0, Inf or NaN. */ + if (__glibc_unlikely (zeroinfnan (iy))) + { + /* |x| or |y| is NaN. */ + if (2 * ix > 2u * 0x7f800000 || 2 * iy > 2u * 0x7f800000) + return __builtin_nanf (""); + /* |y| = 0. */ + if (2 * iy == 0) + { + /* |x| = 0 or Inf. */ + if ((2 * ix == 0) || (2 * ix == 2u * 0x7f800000)) + return __builtin_nanf (""); + /* x is finite. */ + return 1.0f; + } + /* |y| = Inf and x = 1.0. */ + if (ix == 0x3f800000) + return __builtin_nanf (""); + /* |x| < 1 and y = Inf or |x| > 1 and y = -Inf. */ + if ((2 * ix < 2 * 0x3f800000) == !(iy & 0x80000000)) + return 0.0f; + /* |y| = Inf and previous conditions not met. */ + return y * y; + } + /* x is 0, Inf or NaN. Negative x are handled in the core. */ + if (__glibc_unlikely (zeroinfnan (ix))) + { + float x2 = x * x; + return iy & 0x80000000 ? 1 / x2 : x2; + } + /* Return x for convenience, but make sure result is never used. */ + return x; +} + +/* Scalar fallback for special case routines with custom signature. */ +static svfloat32_t NOINLINE +sv_call_powrf_sc (svfloat32_t x1, svfloat32_t x2, svfloat32_t y, svbool_t cmp) +{ + return sv_call2_f32 (powrf_specialcase, x1, x2, y, cmp); +} + +/* Implementation of SVE powrf. + + Provides the same accuracy as AdvSIMD powf and powrf, since it relies on the + same algorithm. + + Maximum measured error is 2.57 ULPs: + SV_NAME_F2 (powr) (0x1.031706p+0, 0x1.ce2ec2p+12) + got 0x1.fff868p+127 + want 0x1.fff862p+127. */ +svfloat32_t SV_NAME_F2 (powr) (svfloat32_t x, svfloat32_t y, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + svuint32_t vix = svreinterpret_u32 (x); + svuint32_t viy = svreinterpret_u32 (y); + + svbool_t xpos = svcmpge (pg, x, sv_f32 (0.0f)); + + /* Special cases of x or y: zero, inf and nan. */ + svbool_t xspecial = sv_zeroinfnan (xpos, vix); + svbool_t yspecial = sv_zeroinfnan (xpos, viy); + svbool_t cmp = svorr_z (xpos, xspecial, yspecial); + + /* Cases of subnormal x: |x| < 0x1p-126. */ + svbool_t x_is_subnormal = svaclt (xpos, x, d->small_bound); + if (__glibc_unlikely (svptest_any (xpos, x_is_subnormal))) + { + /* Normalize subnormal x so exponent becomes negative. */ + vix = svreinterpret_u32 (svmul_m (x_is_subnormal, x, 0x1p23f)); + vix = svsub_m (x_is_subnormal, vix, d->subnormal_bias); + } + + /* Part of core computation carried in working precision. */ + svuint32_t tmp = svsub_x (xpos, vix, d->off); + svuint32_t i + = svand_x (xpos, svlsr_x (xpos, tmp, (23 - V_POWF_LOG2_TABLE_BITS)), + V_POWF_LOG2_N - 1); + svuint32_t top = svand_x (xpos, tmp, 0xff800000); + svuint32_t iz = svsub_x (xpos, vix, top); + svint32_t k + = svasr_x (xpos, svreinterpret_s32 (top), (23 - V_POWF_EXP2_TABLE_BITS)); + + /* Compute core in extended precision and return intermediate ylogx results + to handle cases of underflow and underflow in exp. */ + svfloat32_t ylogx; + /* Pass a dummy sign_bias so we can re-use powf core. + The core is simplified by setting WANT_SV_POWF_SIGN_BIAS = 0. */ + svfloat32_t ret = sv_powf_core (xpos, i, iz, k, y, sv_u32 (0), &ylogx, d); + + /* Handle exp special cases of underflow and overflow. */ + svbool_t no_uflow = svcmpgt (xpos, ylogx, d->uflow_bound); + svbool_t oflow = svcmpgt (xpos, ylogx, d->oflow_bound); + svfloat32_t ret_flow = svdup_n_f32_z (no_uflow, INFINITY); + ret = svsel (svorn_z (xpos, oflow, no_uflow), ret_flow, ret); + + /* Cases of negative x. */ + ret = svsel (xpos, ret, sv_f32 (__builtin_nanf (""))); + + if (__glibc_unlikely (svptest_any (cmp, cmp))) + return sv_call_powrf_sc (x, y, ret, cmp); + + return ret; +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 42d076b9a9..ead961aa7d 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -54,6 +54,7 @@ VPCS_VECTOR_WRAPPER (log1p_advsimd, _ZGVnN2v_log1p) VPCS_VECTOR_WRAPPER (log2_advsimd, _ZGVnN2v_log2) VPCS_VECTOR_WRAPPER (log2p1_advsimd, _ZGVnN2v_log2p1) VPCS_VECTOR_WRAPPER_ff (pow_advsimd, _ZGVnN2vv_pow) +VPCS_VECTOR_WRAPPER_ff (powr_advsimd, _ZGVnN2vv_powr) VPCS_VECTOR_WRAPPER (rsqrt_advsimd, _ZGVnN2v_rsqrt) VPCS_VECTOR_WRAPPER (sin_advsimd, _ZGVnN2v_sin) VPCS_VECTOR_WRAPPER (sinh_advsimd, _ZGVnN2v_sinh) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index 543816558b..6481b781bf 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -73,6 +73,7 @@ SVE_VECTOR_WRAPPER (log1p_sve, _ZGVsMxv_log1p) SVE_VECTOR_WRAPPER (log2_sve, _ZGVsMxv_log2) SVE_VECTOR_WRAPPER (log2p1_sve, _ZGVsMxv_log2p1) SVE_VECTOR_WRAPPER_ff (pow_sve, _ZGVsMxvv_pow) +SVE_VECTOR_WRAPPER_ff (powr_sve, _ZGVsMxvv_powr) SVE_VECTOR_WRAPPER (rsqrt_sve, _ZGVsMxv_rsqrt) SVE_VECTOR_WRAPPER (sin_sve, _ZGVsMxv_sin) SVE_VECTOR_WRAPPER (sinh_sve, _ZGVsMxv_sinh) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index 5217709796..6f82117615 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -54,6 +54,7 @@ VPCS_VECTOR_WRAPPER (log1pf_advsimd, _ZGVnN4v_log1pf) VPCS_VECTOR_WRAPPER (log2f_advsimd, _ZGVnN4v_log2f) VPCS_VECTOR_WRAPPER (log2p1f_advsimd, _ZGVnN4v_log2p1f) VPCS_VECTOR_WRAPPER_ff (powf_advsimd, _ZGVnN4vv_powf) +VPCS_VECTOR_WRAPPER_ff (powrf_advsimd, _ZGVnN4vv_powrf) VPCS_VECTOR_WRAPPER (rsqrtf_advsimd, _ZGVnN4v_rsqrtf) VPCS_VECTOR_WRAPPER (sinf_advsimd, _ZGVnN4v_sinf) VPCS_VECTOR_WRAPPER (sinhf_advsimd, _ZGVnN4v_sinhf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index a35b2fc7b5..8234bb3318 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -73,6 +73,7 @@ SVE_VECTOR_WRAPPER (log1pf_sve, _ZGVsMxv_log1pf) SVE_VECTOR_WRAPPER (log2f_sve, _ZGVsMxv_log2f) SVE_VECTOR_WRAPPER (log2p1f_sve, _ZGVsMxv_log2p1f) SVE_VECTOR_WRAPPER_ff (powf_sve, _ZGVsMxvv_powf) +SVE_VECTOR_WRAPPER_ff (powrf_sve, _ZGVsMxvv_powrf) SVE_VECTOR_WRAPPER (rsqrtf_sve, _ZGVsMxv_rsqrtf) SVE_VECTOR_WRAPPER (sinf_sve, _ZGVsMxv_sinf) SVE_VECTOR_WRAPPER (sinhf_sve, _ZGVsMxv_sinhf) diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index 6d13d53613..ac86c52bce 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -178,11 +178,14 @@ GLIBC_2.43 _ZGVnN2v_log2p1 F GLIBC_2.43 _ZGVnN2v_log2p1f F GLIBC_2.43 _ZGVnN2v_rsqrt F GLIBC_2.43 _ZGVnN2v_rsqrtf F +GLIBC_2.43 _ZGVnN2vv_powr F +GLIBC_2.43 _ZGVnN2vv_powrf F GLIBC_2.43 _ZGVnN4v_exp10m1f F GLIBC_2.43 _ZGVnN4v_exp2m1f F GLIBC_2.43 _ZGVnN4v_log10p1f F GLIBC_2.43 _ZGVnN4v_log2p1f F GLIBC_2.43 _ZGVnN4v_rsqrtf F +GLIBC_2.43 _ZGVnN4vv_powrf F GLIBC_2.43 _ZGVsMxv_exp10m1 F GLIBC_2.43 _ZGVsMxv_exp10m1f F GLIBC_2.43 _ZGVsMxv_exp2m1 F @@ -193,3 +196,5 @@ GLIBC_2.43 _ZGVsMxv_log2p1 F GLIBC_2.43 _ZGVsMxv_log2p1f F GLIBC_2.43 _ZGVsMxv_rsqrt F GLIBC_2.43 _ZGVsMxv_rsqrtf F +GLIBC_2.43 _ZGVsMxvv_powr F +GLIBC_2.43 _ZGVsMxvv_powrf F