From patchwork Thu Jan 2 10:43:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luna Lamb X-Patchwork-Id: 103878 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1DF2A3858C50 for ; Thu, 2 Jan 2025 10:45:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1DF2A3858C50 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=laMmSZJh; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=laMmSZJh X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from AM0PR83CU005.outbound.protection.outlook.com (mail-westeuropeazlp170100000.outbound.protection.outlook.com [IPv6:2a01:111:f403:c201::]) by sourceware.org (Postfix) with ESMTPS id 750003858C54 for ; Thu, 2 Jan 2025 10:44:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 750003858C54 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 750003858C54 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c201:: ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1735814656; cv=pass; b=tnFfJEN4klKXPANSZ0EwCs/Wuok5BDEiHlNsffvFZlBoNqxe4FICha5Soc7GJEwWubu5oxn0rG3Oq2elC91lqcYP9bUXISA9SnxAn/Uidnro+qn3ZFX3QoiIYdgdVVQ297Ib3g5ttDGYDc8H1MZKVBo9lVgvzkG3t7dA4ao285w= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1735814656; c=relaxed/simple; bh=E6iiwD0Tupst0vt5kW1slqhP+V3cSdVfYZN/cClS3Rk=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=wkBHjSr4TvQglbJCaND78dbyeqcjsagJgGetkVBkqHOYaaENC4cfWk05m75OEDYdNkC675lhHJvGRMPcxm7zZhAL6yJqja3IvpyR8jUKJcTJL2t2I0gbNswOwwjLw5qZW60G+XF930/vRg7w9el4LVDgXZrJXxxxi+O2gXsg9xM= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 750003858C54 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=J+p/CHoLiuckIOkJD+Iic9V4ruwLhwb1/KAgdZA14GGGKdua8ENC332wADnqfJzjG0BsrsSxsB3A4NLtmJokpZcwT7dULxNfg3BfDAdnVJ5w+yrrx17BpndCIrv8tamH7gDeJHhUyb1T7AWB3Lz86Mx/s+VR2voFO3hFyCvA7NRhulKLVrxyQOnFDgB4/ucHOypYjZIVH1Rirmqfy8Ovx7gBK9eCY/V1PzngqFKOZYvZv4v3NxeUr+NfOVqPS/6qV4NHYR6SN8AZqs34JonOvDfQEwCIimtGob5E9PAxQ/jeAbwbWrFWbz5S3nqycWLyx0V/pJG/27dbznv8kCEBDQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QOJAI0LgC/DvKcdcrcBGF8UL78z1ipIxnMA3LOb8EZ8=; b=I7k5E1x3eq6/oeGiHFKCs31kcxKmLtblGQBPtRkFIs17MIJ7WG2MOSPKtSSvz8PfHbwimPAMycrpY0zuMpndvwELnjFuKShbmyzIHVngWYvfxDF6wTRxQpN+95KcyBYWzPCcLaSJRNzkqCpTcd/+hKLU0JBjlnOJF0zkYnow5ZF4yJpJa9FEJcNe1v16z6Q/3FK7YPvVMOihtpDmnPqRRBoSmtIRLLB+034VbFjmX8MjDEDy1Au4Pp2ZKlEQva55k86IzbVEtEsyu9aRMgGXcMQ4lGAKZofD6zG/oOeKSYXu9qrHOQNedC+mA9/52B+oKQUJUzOYcws+Fmw7F9/Opg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=0 ltdi=1) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QOJAI0LgC/DvKcdcrcBGF8UL78z1ipIxnMA3LOb8EZ8=; b=laMmSZJhqq6ZO+sgA506UTvRVRvnOklj0AOo9z0ez5zb6IxBSkrWPB01di94Z+OimIOD1AYRUdUzxboOb4YLqoiIlww9A+H4xZRAJxFJEBNwlNXxellkmvY+OdP5L/mhTBpOBkKSx+R0uXnY9XsaxEmwlXHLIEa/Mb5uCjYvzd8= Received: from PAZP264CA0085.FRAP264.PROD.OUTLOOK.COM (2603:10a6:102:1fa::8) by DB9PR08MB7697.eurprd08.prod.outlook.com (2603:10a6:10:390::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8314.12; Thu, 2 Jan 2025 10:44:06 +0000 Received: from AM1PEPF000252DD.eurprd07.prod.outlook.com (2603:10a6:102:1fa:cafe::c) by PAZP264CA0085.outlook.office365.com (2603:10a6:102:1fa::8) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8314.13 via Frontend Transport; Thu, 2 Jan 2025 10:44:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM1PEPF000252DD.mail.protection.outlook.com (10.167.16.55) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8314.11 via Frontend Transport; Thu, 2 Jan 2025 10:44:04 +0000 Received: ("Tessian outbound f3d0f8c9d340:v528"); Thu, 02 Jan 2025 10:44:04 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: f00453ae617443b5 X-TessianGatewayMetadata: jzstY5YZEwrLwy3Ns2VZMJSYRUaVan9wfVig6Zp1G+G6rJKttEOA6DlrvD0sEDNuz6+UOu7eldLguLR1x/DMVS0myUXa/FNQwSlbmrxECtU3jmQ1VfNi7TdWrLGfNd3tngXI61KLSEAM5FkfFEaKSdMrCppQJEW9oEkIRPEdEq8= X-CR-MTA-TID: 64aa7808 Received: from L801d29b6f4c1.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 95C6248D-7CB3-4B30-8BA0-73BBDF9646AF.1; Thu, 02 Jan 2025 10:43:56 +0000 Received: from EUR05-VI1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L801d29b6f4c1.1 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Thu, 02 Jan 2025 10:43:56 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=RsSTZPLM73ZRljN7fLk7Nfen7ymfZFE4NyrK0a0eNNN+cp0avh8OoWSyOx98sCqXC9+Co7AUDln/7WDZbukRS75CwCX7voD2JwKreyZtcfjmppAoJHFfNJlzVb/U2MZ6N//DAip4YzQl/H2gJcH+RrK/FDNRirveFuhanopnlQBvhESqWFc3mcAq9D0iwpnmjO8JMt8asYiyxH9QnQACB0jEjQ09KHOCSYf8t3zwj7UD6+P4eOU4d5QT+ngYaVqR9p9Wa9ImyDCrlbDeP5TN9ZmbHGKL1Q8QsxM1iFOa5Sd5euBX+Id++/IejiVVR3A3qi7zCQ8KcDa8BMGdco03YA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QOJAI0LgC/DvKcdcrcBGF8UL78z1ipIxnMA3LOb8EZ8=; b=l8awECCafp6vSlcvefqcvyVYh7BPyELb22S8tBXhU7MJfpUeoDKBOTNkvl8soPYKoGuq/3ZW+TA1Ixjl2/i+yzYL0TKxLtKkdlR8msxwHoMVvlrBIxOXuE3R7POsbendK+dzVmRNRV9jJGE8AvjvX5Si+6KyckXC0mm5O/M0VJBnXvqncLZ5y+hB96sDFUI/C6dWt9zPIXibOYeMQgQl6lxUjvLIgDFeSMlKv44HKBapq4X6BiUk14Ns4JKCKjtrYsQMCFLNCPsfvb2h5IVnqcYaBXE4hcn2hQj8+zXHVqVs9ya0k3y0nwy2O3g+LnqvoPRTF6X/1OpMXLIyb0hjkw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=fail (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QOJAI0LgC/DvKcdcrcBGF8UL78z1ipIxnMA3LOb8EZ8=; b=laMmSZJhqq6ZO+sgA506UTvRVRvnOklj0AOo9z0ez5zb6IxBSkrWPB01di94Z+OimIOD1AYRUdUzxboOb4YLqoiIlww9A+H4xZRAJxFJEBNwlNXxellkmvY+OdP5L/mhTBpOBkKSx+R0uXnY9XsaxEmwlXHLIEa/Mb5uCjYvzd8= Received: from DU7P195CA0019.EURP195.PROD.OUTLOOK.COM (2603:10a6:10:54d::32) by DB9PR08MB7584.eurprd08.prod.outlook.com (2603:10a6:10:308::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8314.12; Thu, 2 Jan 2025 10:43:50 +0000 Received: from DU2PEPF00028D10.eurprd03.prod.outlook.com (2603:10a6:10:54d:cafe::24) by DU7P195CA0019.outlook.office365.com (2603:10a6:10:54d::32) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8314.13 via Frontend Transport; Thu, 2 Jan 2025 10:43:50 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=fail action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; Received: from nebula.arm.com (172.205.89.229) by DU2PEPF00028D10.mail.protection.outlook.com (10.167.242.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8314.11 via Frontend Transport; Thu, 2 Jan 2025 10:43:50 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX05.Arm.com (10.240.25.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 2 Jan 2025 10:43:50 +0000 Received: from ip-10-252-30-138.eu-west-1.compute.internal (10.252.30.138) by mail.arm.com (10.240.25.134) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Thu, 2 Jan 2025 10:43:49 +0000 From: Luna Lamb To: CC: Luna Lamb Subject: [PATCH] aarch64: Improve codegen in SVE expm1f and users. Date: Thu, 2 Jan 2025 10:43:47 +0000 Message-ID: <20250102104347.2535-1-Luna.lamb@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU2PEPF00028D10:EE_|DB9PR08MB7584:EE_|AM1PEPF000252DD:EE_|DB9PR08MB7697:EE_ X-MS-Office365-Filtering-Correlation-Id: ab3ec4c4-28ee-411b-c749-08dd2b1a609f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info-Original: PmVLfv1/5VxPk56ADFftI35awNadlwzSkVH139H/ssT5Fv4cU9fXRF5PVLKMztRlOy7hHV78EjoadX6kAoLedtgAouuAAKQ48j2bWJv1jESxQDuCv+T+lf27xaYGfZ3k+kurw3yn7HlbTW9VsxKcBIAUd7lIig0dH3WNJnn88CCoT8SudVpHJHh0YayUWN7OTYnm88JuZKQcciELuQnJI1+CdL2GNR3W+mLShAhLb/skFw7gA7MSGjXibpTnrU3w0crO8+u/VN7TNecpxvUVJD437m9uC0FOy1k5quN7+UlPnmkUiUdIprxfz/K0g8zWNkxVVMDhOwDsgQfp1y9teilewv7DyRxYSCx7cSDBLnLPyOwVnf4gvX3VBFX1/FA8sd3okey5McfvkiNBnYp2ks+Xub7yGfoBXemW98KJbFk8NzIxChPQ/bMOGH1zk3YlYlI/8JBFtIzJ2BgdIt49m7Tayl50bMz6jMB7/WdWYyEG71U7KcuIzt6SBNRQB0bmOETcPp+hPxUEx2DtpakGrfB0OlspeajhSizvKl5KkP/uY7Z2obNx9KGvnZtD6s7xFWPofjFi/kFCRCkAutD7znwDK0mwiqWyu4gHZY36HZn/kzBYltfRI9aISIID2RlSF+u2odr6p//lQ/mKOgs0Tk3NZ553pipORHpWrTdfsRRQicdqwmVmpV9ecJ9IXxMjhnkjpuPnzEaXBHH6zEL84ked/IBLjtA1q6+Og00zsEIMOq18vB/N5jjZ4BxBk08V4+j208Gp29UyppeFUimcYNaieFzDmiEYX2P1KmV4fpuqqxWkZqUDNiGerIWKYfOGYc4M+ndu7iO1vn3JMRGRRlYd7GhWWJltTJK8UkWxTW1GpzTQmYL86lIOP7rYsos2VC3N1wmXrYKHvvFZkCoyHDzS4zqAloVIZkXdT3xqYvh0PjL22MqLqFpa3upaaY9L73yqTOx4NAu9HxfRohYx707TDsgBX0Usv5yBu7gYfAqTqfA7lr3+AuVh64iTml0bAg3PS+fRjNnxMv9Lw/TjdqCg0xbJaaJVt/qbVoaO5aEIRM2fAUJ/J11dftKu/9QfpK2Cjl3TnnODIU4EDgUs8crfPz4W0SHdY6GRAl+ff1Cj3jSufDD0wXBTQvnv3fhE9JgyuDqgtMEMilHHNsZSaIUxVhWyXMX2pDavjfMRxzFaVUCPcABlq83vun/WdXvYx86tI+53rSOjoAJuE7c5KqVMGijY6cqBRqH4kUNNBCbsP1jpaoiXjwezKY3fuarVZb3kuh2TSxfbg/N9wE++NcHMtZq1LxRrRrkhV/95kO4kie5E+xK5uEyA1u/rC7IweS5AifYMq59GKAdzlaqH3Kb1b6Fvq6i9b+fEMbwLzHCCyDdX5K2+m8PYosd1IRgYXxV+fWRP1Heo+Ks2bWLYxfomaj3NPU4GEZGcmbxYwRvTjsSYkQZT7coTn47nTWqGrheuJrc5PE3hexCmXFSmhVnuHeQEyxsBrxOJaT/UBak= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(1800799024)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7584 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:10:54d::32]; domain=DU7P195CA0019.EURP195.PROD.OUTLOOK.COM X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM1PEPF000252DD.eurprd07.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: d586ffae-5c2f-4784-9ada-08dd2b1a5816 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|376014|14060799003|36860700013|35042699022|82310400026; X-Microsoft-Antispam-Message-Info: WmMq0p7IWXdA2LIq/YGUuQ9Lph78Fqis1xGFiZIb+Is4D5eVNYVcmKdZm9nXBsq/NTJwV2pp+wPZfm3sN651uPuFOL2c/RVj6DF9pZl6ENiE4o2WOxzgpbfjypbzmpToHFgQv9SrWMED1rgGpQFrSp/0j6bwao2eNP925sN22qofFjmuv+l/9RfLkFq7JbmDd+5f7yVX0glrxYDnHj2QfIDsh9s6Nzq0R23qoW/Y/boqvYPGXGU/ulwm5t542VISXBtw82aSQw+drvJ8U64vV2Fs8kZRCxkGxeQ8KF3+qimvKYy19DF2N8TROjwl1EpuOQkKct5tgsZi4YtreKI3G8sFHEJENN19dGudIzsrSPutdHTSu83LUE8SoTWX1wImYLqnEiEKcwlo1B76VtQbWqJosfxgvPSQGtDEoJONK5JL8pa7Lhh9uKdC4e+oW7UOnxeZsFsD+oe7iaraieu0ZPbYQWFnVENHOl4isYihrHxFf4lpeGrMaAQB8PqnDpEWyayLQRIX454MvvC0u0zP9zbnh/yWVTdLUTvhYRvRJg6SRCaEw0obdxSUcCGS6aeYHrRlnWStcNzrqDj969qaKYx677zkBOVcPYIQ05pYkpq+eRm0pBlGOOClO6bjbxm6LmtUxhAlBKA5ojJJZUOqKIO7fmHdnqWrLjF6T0tYTrh/ehViLLSDlw7mu9Zj61ivIKUY0JhbuhzAIJ7Cua4CG0+X1+O/KTIDUfm79+p77RnB+Uu9Ptvpscs3Uxv4N5+fByEK2pXca7HJid3VZRGKG4mdNdMj88hfvRUeV3x4TTWOKTrcSb4DXyjc/horW5WWThKA7sB/QhrzNFn8pVUWx8UgSQYyCRxjySV6jxqQNL7AnmS6CRnxZAKnfc/6Xame+tFDBEwP7WTexB0y7psUnMYGNtllghi0CKZQJ5ZDCFtOqv/5UJ2ywZ8bvh2Uj/EEnir2DH4DaNAy3jaZsQn88bAzKDtqZcECSHl0GWp/RIexEIMZR4HR7ZRMrYNjGgKHnqot/bqgcc7+BihvRM9gNYnaTmm991QyNro2Kqf7inC+oWEXGKlgy1/lQ8ZMFOX5F9wKrP1YVZeqDVugER5sphnN30g+XritQI5+SmKcPGHGuZMS5/3kEktoD8ScJ7/blr+3kJ/WtgKaPO+xNQyvtit5adaM4MbVRzZn9+pXkEmTHOAaV8YvS5lBfB9zGj1Kht0hC5ZusUgkmVV5lHMVYkrqO5zeH5Rra+E+A6p2+mfhC+/I919rAUfUURYHpIU1pc9F180qbeDWhJna4682AtvI6G8wADdRwC11faQPVbwTjuo6DnRzvNfGXZKAVd2Rnv/PhYLn3qYXKcvnVXRSZetD7fCnB3QWpyZEHzIHaTDrtMvzjb00obZUuW8RavITz+U+4lflUqzhrVFyBWWgwyvnTjyJ8/hbiduXMEnlAC29DXaVAd7Q6Xp7fI5lrGvplox730z0WsjnXUjuFCX6AplfJune2ra4mHRNdeOFQSD9gI5AFu0HQAIHP8EINmmf X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:64aa7808-outbound-1.mta.getcheckrecipient.com; CAT:NONE; SFS:(13230040)(1800799024)(376014)(14060799003)(36860700013)(35042699022)(82310400026); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jan 2025 10:44:04.6081 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ab3ec4c4-28ee-411b-c749-08dd2b1a609f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM1PEPF000252DD.eurprd07.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7697 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Use unpredicated muls, use absolute compare and improve memory access. 7%, 5% and 1% improvement in throughput microbenchmark on Neoverse V1, for expm1f, sinhf and tanhf respectively. --- OK for master? If so please commit for me as I don't have commit rights. Thanks, Luna sysdeps/aarch64/fpu/expm1f_sve.c | 31 +++++++++++--------------- sysdeps/aarch64/fpu/sinhf_sve.c | 2 +- sysdeps/aarch64/fpu/sv_expm1f_inline.h | 30 +++++++++++-------------- sysdeps/aarch64/fpu/tanhf_sve.c | 28 ++++++++++++++--------- 4 files changed, 45 insertions(+), 46 deletions(-) diff --git a/sysdeps/aarch64/fpu/expm1f_sve.c b/sysdeps/aarch64/fpu/expm1f_sve.c index 7c852125cd..05a66400d4 100644 --- a/sysdeps/aarch64/fpu/expm1f_sve.c +++ b/sysdeps/aarch64/fpu/expm1f_sve.c @@ -18,7 +18,6 @@ . */ #include "sv_math.h" -#include "poly_sve_f32.h" /* Largest value of x for which expm1(x) should round to -1. */ #define SpecialBound 0x1.5ebc4p+6f @@ -28,20 +27,17 @@ static const struct data /* These 4 are grouped together so they can be loaded as one quadword, then used with _lane forms of svmla/svmls. */ float c2, c4, ln2_hi, ln2_lo; - float c0, c1, c3, inv_ln2, special_bound, shift; + float c0, inv_ln2, c1, c3, special_bound; } data = { /* Generated using fpminimax. */ .c0 = 0x1.fffffep-2, .c1 = 0x1.5554aep-3, .c2 = 0x1.555736p-5, .c3 = 0x1.12287cp-7, - .c4 = 0x1.6b55a2p-10, + .c4 = 0x1.6b55a2p-10, .inv_ln2 = 0x1.715476p+0f, + .special_bound = SpecialBound, .ln2_lo = 0x1.7f7d1cp-20f, + .ln2_hi = 0x1.62e4p-1f, - .special_bound = SpecialBound, .shift = 0x1.8p23f, - .inv_ln2 = 0x1.715476p+0f, .ln2_hi = 0x1.62e4p-1f, - .ln2_lo = 0x1.7f7d1cp-20f, }; -#define C(i) sv_f32 (d->c##i) - static svfloat32_t NOINLINE special_case (svfloat32_t x, svbool_t pg) { @@ -71,9 +67,8 @@ svfloat32_t SV_NAME_F1 (expm1) (svfloat32_t x, svbool_t pg) and f = x - i * ln2, then f is in [-ln2/2, ln2/2]. exp(x) - 1 = 2^i * (expm1(f) + 1) - 1 where 2^i is exact because i is an integer. */ - svfloat32_t j = svmla_x (pg, sv_f32 (d->shift), x, d->inv_ln2); - j = svsub_x (pg, j, d->shift); - svint32_t i = svcvt_s32_x (pg, j); + svfloat32_t j = svmul_x (svptrue_b32 (), x, d->inv_ln2); + j = svrinta_x (pg, j); svfloat32_t f = svmls_lane (x, j, lane_constants, 2); f = svmls_lane (f, j, lane_constants, 3); @@ -83,17 +78,17 @@ svfloat32_t SV_NAME_F1 (expm1) (svfloat32_t x, svbool_t pg) x + ax^2 + bx^3 + cx^4 .... So we calculate the polynomial P(f) = a + bf + cf^2 + ... and assemble the approximation expm1(f) ~= f + f^2 * P(f). */ - svfloat32_t p12 = svmla_lane (C (1), f, lane_constants, 0); - svfloat32_t p34 = svmla_lane (C (3), f, lane_constants, 1); - svfloat32_t f2 = svmul_x (pg, f, f); + svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), f, lane_constants, 0); + svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), f, lane_constants, 1); + svfloat32_t f2 = svmul_x (svptrue_b32 (), f, f); svfloat32_t p = svmla_x (pg, p12, f2, p34); - p = svmla_x (pg, C (0), f, p); + + p = svmla_x (pg, sv_f32 (d->c0), f, p); p = svmla_x (pg, f, f2, p); /* Assemble the result. expm1(x) ~= 2^i * (p + 1) - 1 Let t = 2^i. */ - svfloat32_t t = svreinterpret_f32 ( - svadd_x (pg, svreinterpret_u32 (svlsl_x (pg, i, 23)), 0x3f800000)); - return svmla_x (pg, svsub_x (pg, t, 1), p, t); + svfloat32_t t = svscale_x (pg, sv_f32 (1.0f), svcvt_s32_x (pg, j)); + return svmla_x (pg, svsub_x (pg, t, 1.0f), p, t); } diff --git a/sysdeps/aarch64/fpu/sinhf_sve.c b/sysdeps/aarch64/fpu/sinhf_sve.c index 6c204b57a2..50dd386774 100644 --- a/sysdeps/aarch64/fpu/sinhf_sve.c +++ b/sysdeps/aarch64/fpu/sinhf_sve.c @@ -63,5 +63,5 @@ svfloat32_t SV_NAME_F1 (sinh) (svfloat32_t x, const svbool_t pg) if (__glibc_unlikely (svptest_any (pg, special))) return special_case (x, svmul_x (pg, t, halfsign), special); - return svmul_x (pg, t, halfsign); + return svmul_x (svptrue_b32 (), t, halfsign); } diff --git a/sysdeps/aarch64/fpu/sv_expm1f_inline.h b/sysdeps/aarch64/fpu/sv_expm1f_inline.h index 5b72451222..83319a2228 100644 --- a/sysdeps/aarch64/fpu/sv_expm1f_inline.h +++ b/sysdeps/aarch64/fpu/sv_expm1f_inline.h @@ -27,21 +27,18 @@ struct sv_expm1f_data /* These 4 are grouped together so they can be loaded as one quadword, then used with _lane forms of svmla/svmls. */ float32_t c2, c4, ln2_hi, ln2_lo; - float32_t c0, c1, c3, inv_ln2, shift; + float c0, inv_ln2, c1, c3, special_bound; }; /* Coefficients generated using fpminimax. */ #define SV_EXPM1F_DATA \ { \ - .c0 = 0x1.fffffep-2, .c1 = 0x1.5554aep-3, .c2 = 0x1.555736p-5, \ - .c3 = 0x1.12287cp-7, .c4 = 0x1.6b55a2p-10, \ + .c0 = 0x1.fffffep-2, .c1 = 0x1.5554aep-3, .inv_ln2 = 0x1.715476p+0f, \ + .c2 = 0x1.555736p-5, .c3 = 0x1.12287cp-7, \ \ - .shift = 0x1.8p23f, .inv_ln2 = 0x1.715476p+0f, .ln2_hi = 0x1.62e4p-1f, \ - .ln2_lo = 0x1.7f7d1cp-20f, \ + .c4 = 0x1.6b55a2p-10, .ln2_lo = 0x1.7f7d1cp-20f, .ln2_hi = 0x1.62e4p-1f, \ } -#define C(i) sv_f32 (d->c##i) - static inline svfloat32_t expm1f_inline (svfloat32_t x, svbool_t pg, const struct sv_expm1f_data *d) { @@ -55,9 +52,8 @@ expm1f_inline (svfloat32_t x, svbool_t pg, const struct sv_expm1f_data *d) and f = x - i * ln2, then f is in [-ln2/2, ln2/2]. exp(x) - 1 = 2^i * (expm1(f) + 1) - 1 where 2^i is exact because i is an integer. */ - svfloat32_t j = svmla_x (pg, sv_f32 (d->shift), x, d->inv_ln2); - j = svsub_x (pg, j, d->shift); - svint32_t i = svcvt_s32_x (pg, j); + svfloat32_t j = svmul_x (svptrue_b32 (), x, d->inv_ln2); + j = svrinta_x (pg, j); svfloat32_t f = svmls_lane (x, j, lane_constants, 2); f = svmls_lane (f, j, lane_constants, 3); @@ -67,18 +63,18 @@ expm1f_inline (svfloat32_t x, svbool_t pg, const struct sv_expm1f_data *d) x + ax^2 + bx^3 + cx^4 .... So we calculate the polynomial P(f) = a + bf + cf^2 + ... and assemble the approximation expm1(f) ~= f + f^2 * P(f). */ - svfloat32_t p12 = svmla_lane (C (1), f, lane_constants, 0); - svfloat32_t p34 = svmla_lane (C (3), f, lane_constants, 1); - svfloat32_t f2 = svmul_x (pg, f, f); + svfloat32_t p12 = svmla_lane (sv_f32 (d->c1), f, lane_constants, 0); + svfloat32_t p34 = svmla_lane (sv_f32 (d->c3), f, lane_constants, 1); + svfloat32_t f2 = svmul_x (svptrue_b32 (), f, f); svfloat32_t p = svmla_x (pg, p12, f2, p34); - p = svmla_x (pg, C (0), f, p); + p = svmla_x (pg, sv_f32 (d->c0), f, p); p = svmla_x (pg, f, f2, p); /* Assemble the result. expm1(x) ~= 2^i * (p + 1) - 1 Let t = 2^i. */ - svfloat32_t t = svscale_x (pg, sv_f32 (1), i); - return svmla_x (pg, svsub_x (pg, t, 1), p, t); + svfloat32_t t = svscale_x (pg, sv_f32 (1.0f), svcvt_s32_x (pg, j)); + return svmla_x (pg, svsub_x (pg, t, 1.0f), p, t); } -#endif +#endif \ No newline at end of file diff --git a/sysdeps/aarch64/fpu/tanhf_sve.c b/sysdeps/aarch64/fpu/tanhf_sve.c index 0b94523cf5..80dd679346 100644 --- a/sysdeps/aarch64/fpu/tanhf_sve.c +++ b/sysdeps/aarch64/fpu/tanhf_sve.c @@ -19,20 +19,27 @@ #include "sv_expm1f_inline.h" +/* Largest value of x for which tanhf(x) rounds to 1 (or -1 for negative). */ +#define BoringBound 0x1.205966p+3f + static const struct data { struct sv_expm1f_data expm1f_consts; - uint32_t boring_bound, onef; + uint32_t onef, special_bound; + float boring_bound; } data = { .expm1f_consts = SV_EXPM1F_DATA, - /* 0x1.205966p+3, above which tanhf rounds to 1 (or -1 for negative). */ - .boring_bound = 0x41102cb3, .onef = 0x3f800000, + .special_bound = 0x7f800000, + .boring_bound = BoringBound, }; static svfloat32_t NOINLINE -special_case (svfloat32_t x, svfloat32_t y, svbool_t special) +special_case (svfloat32_t x, svbool_t pg, svbool_t is_boring, + svfloat32_t boring, svfloat32_t q, svbool_t special) { + svfloat32_t y + = svsel_f32 (is_boring, boring, svdiv_x (pg, q, svadd_x (pg, q, 2.0))); return sv_call_f32 (tanhf, x, y, special); } @@ -47,15 +54,16 @@ svfloat32_t SV_NAME_F1 (tanh) (svfloat32_t x, const svbool_t pg) svfloat32_t ax = svabs_x (pg, x); svuint32_t iax = svreinterpret_u32 (ax); svuint32_t sign = sveor_x (pg, svreinterpret_u32 (x), iax); - svbool_t is_boring = svcmpgt (pg, iax, d->boring_bound); svfloat32_t boring = svreinterpret_f32 (svorr_x (pg, sign, d->onef)); - - svbool_t special = svcmpgt (pg, iax, 0x7f800000); + svbool_t special = svcmpgt (pg, iax, d->special_bound); + svbool_t is_boring = svacgt (pg, x, d->boring_bound); /* tanh(x) = (e^2x - 1) / (e^2x + 1). */ - svfloat32_t q = expm1f_inline (svmul_x (pg, x, 2.0), pg, &d->expm1f_consts); - svfloat32_t y = svdiv_x (pg, q, svadd_x (pg, q, 2.0)); + svfloat32_t q = expm1f_inline (svmul_x (svptrue_b32 (), x, 2.0), pg, + &d->expm1f_consts); + if (__glibc_unlikely (svptest_any (pg, special))) - return special_case (x, svsel_f32 (is_boring, boring, y), special); + return special_case (x, pg, is_boring, boring, q, special); + svfloat32_t y = svdiv_x (pg, q, svadd_x (pg, q, 2.0)); return svsel_f32 (is_boring, boring, y); }