From patchwork Wed Jun 18 15:40:23 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luna Lamb X-Patchwork-Id: 114680 X-Patchwork-Delegate: Wilco.Dijkstra@arm.com Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CBA273886622 for ; Wed, 18 Jun 2025 15:42:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CBA273886622 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=M8/+WEP1; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=M8/+WEP1 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from DB3PR0202CU003.outbound.protection.outlook.com (mail-northeuropeazlp170100001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c200::1]) by sourceware.org (Postfix) with ESMTPS id 7B32A385B527 for ; Wed, 18 Jun 2025 15:41:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B32A385B527 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7B32A385B527 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:c200::1 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1750261281; cv=pass; b=jJ7ZS7q9HDXGotetQ/tZlI1dGn0y62gzuQq0dmV5yYGLjK3hLu2wSnY4OH4bhLATUnBkWLdVXSI2onFWCrFEuf/O0X89sHGNb0L7xxgruNexiXst/OZNZRbEkzqsILURZ10qBZ3YvqZgyoPtXWDd1MfPc7dNUmW3aj97ZIefW9o= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1750261281; c=relaxed/simple; bh=2fN8MvrLqiQvjMIntw8UPtJ0qEsxBXSEvMUIp2jtv4U=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=AndKI6TpYQtVmCTiHXqn71mlU/7IqoEl/w+0F6HsiN5eIrE0b3NqcPaknTN6DmQn8c8kla7fLt5SDbeI90qgNvTGjjz1eginUxrGl7DI6M7I5P0HvcKRNWGImNCGx4W/7EngtYIkKOux2y5nNp2pnu8LvfR36VQJPimw1ehSOKM= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7B32A385B527 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=iiwGiz9uf9hvZxCl6R/bJGMcgtFgFYQDXm3OKYnQ2iqzOtsrOZEyliH1gQa6XMIV7TF3MBmyG0sOlhlfnWm8uMvnEPfYIGPlIVVgs42909xTR7AFSAN30adZ/s2BWDGSWIzHC4Wj9u5z621uUkyvQicoV2XTD+8NMGnmNsey66js+rsiNZIyiCzrAi6YV3Mr+HfKk75SxzHz4YCR6bmuOzHXU9vw+DFchWEzUyRKUQpN3Ht4sUf8FErRX3f016g4Din6TdsLE2d8B+xGH/3Nvw5duTbwevS5gClISPw3+ai3wrvZkCQorjpdF+bdyy1ZTAcECyR0JsI/4ZmQkB05aw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uzaElb1E3hhOpQHICG9AqfG3eRC4dXWz+GYdpWgw2IA=; b=kpvz1EqBHX058Of8ynNrgaL9BhdMPe2wQ7Tx/lk6Wb3J4MTYPOPjyhSXZq0gZVrYN0DZjSqG9XUqgW5KC8QYPm+jS9lsaoXxjThV+Nzc9MxYZz8YU+1hnE1Xbt1aXbOAYNXabfyWTzMVq/PROPS53lDOvKHRTGk9Cr1PcZ/LD7QFCK71vIkNNVAY+or98HBjMdFsW8n+aR8wOCUXXvoZCmmWETg4s/r2qx8Nnn7DzpUO7GPzCm8sVertmZkw8iG/yMs/wXDfSAw5Ym/5hSMDKtHoL4zQm7yXRUTo5AaR6tePvELnq/G4JebjZTdn2HbDm/tebK62nRq5iZJrc6nYIg== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=0 ltdi=1) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uzaElb1E3hhOpQHICG9AqfG3eRC4dXWz+GYdpWgw2IA=; b=M8/+WEP1P+lF31WSVZGo69gFIYPxoOtRKTX5fjYk6JogdFzPZo2Go17HDNde6DGm+PRzmauokHOpC46/afl24C6nsPobIbzNeMXuAvFmNx8Y7FFRgA/TNPDFAzRIqj5WLKROsUBBfFCRSwDcecQ3f0kD36pdZb0QbrIxFSPk4jA= Received: from AM6PR10CA0021.EURPRD10.PROD.OUTLOOK.COM (2603:10a6:209:89::34) by AS4PR08MB7951.eurprd08.prod.outlook.com (2603:10a6:20b:577::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8835.28; Wed, 18 Jun 2025 15:41:18 +0000 Received: from AM4PEPF00025F97.EURPRD83.prod.outlook.com (2603:10a6:209:89:cafe::b6) by AM6PR10CA0021.outlook.office365.com (2603:10a6:209:89::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8835.30 via Frontend Transport; Wed, 18 Jun 2025 15:41:18 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 4.158.2.129 as permitted sender) receiver=protection.outlook.com; client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by AM4PEPF00025F97.mail.protection.outlook.com (10.167.16.6) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8880.0 via Frontend Transport; Wed, 18 Jun 2025 15:41:18 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DDMd8bqiPwHaDEU+FegH//kDvJxBsFT/xBcpQ1CFijQ3xWXsnDa6mk8gKea73uWjMuWFOYbYH+fro9avvgSeX35Jr3N/N6YhrXcIkay9MU/OPq5BAscEzLuPeYC3jKmfrZ97zB/zYr0NEZhl1JjXRRJro1yQo9qglMXT+a7PztwykENz1piZsDEyJfL3mLEt0VMDNinsg0NZMjlxeFI801mok57D9xITR/ElTGeCPOJvygz12y0QAUY2rWj4HNJ/wKuDnxXUSYZWAPqVO2Y1g++DaV08i2Jye1vanDd+RTsaxFSh2mAC67xkfQZle8pQb3/txDazf781Xi6izPg2LA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uzaElb1E3hhOpQHICG9AqfG3eRC4dXWz+GYdpWgw2IA=; b=v3Hg6XG73R+Ff1N34Hwo+0VhaNCViayWNUT2754HAY6I4s1/1WxPmcu3h7qSRXbnuzmMPszVyu7LOnPlrmiELWh9wrD/vAwXu15+b8sCOoqhlS4emkC6MkzHb7c9K9snzbQQHMSjUBCS2q/ZrO434BxbeB4Ch9O+vSYgXowyUYsvJQ6Br/taPhn4lwHDvy46Lh/90B5d2qW9DNlNb0EQFgoj03CKKqnol1yFRIGBkgjJJev7c6nXmEYU4C0rlBtd/aqzq61ZeJTSxxHRd6/qBul2JGqxTEEYuslybB+/RYwOqf0aCgP53F2w4WJaCQfiYpPPIq8TBcQtr2AAGwEDkw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=fail (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uzaElb1E3hhOpQHICG9AqfG3eRC4dXWz+GYdpWgw2IA=; b=M8/+WEP1P+lF31WSVZGo69gFIYPxoOtRKTX5fjYk6JogdFzPZo2Go17HDNde6DGm+PRzmauokHOpC46/afl24C6nsPobIbzNeMXuAvFmNx8Y7FFRgA/TNPDFAzRIqj5WLKROsUBBfFCRSwDcecQ3f0kD36pdZb0QbrIxFSPk4jA= Received: from DB8PR06CA0025.eurprd06.prod.outlook.com (2603:10a6:10:100::38) by DB9PR08MB7892.eurprd08.prod.outlook.com (2603:10a6:10:39f::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8857.19; Wed, 18 Jun 2025 15:40:44 +0000 Received: from DB3PEPF00008860.eurprd02.prod.outlook.com (2603:10a6:10:100:cafe::c6) by DB8PR06CA0025.outlook.office365.com (2603:10a6:10:100::38) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8835.30 via Frontend Transport; Wed, 18 Jun 2025 15:40:44 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=fail action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; Received: from nebula.arm.com (172.205.89.229) by DB3PEPF00008860.mail.protection.outlook.com (10.167.242.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8857.21 via Frontend Transport; Wed, 18 Jun 2025 15:40:44 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 18 Jun 2025 15:40:43 +0000 Received: from ip-10-252-30-138.eu-west-1.compute.internal (10.252.30.138) by mail.arm.com (10.240.25.134) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Wed, 18 Jun 2025 15:40:43 +0000 From: Luna Lamb To: CC: Luna Lamb Subject: [PATCH] aarch64: Improve codegen SVE log1p helper. Date: Wed, 18 Jun 2025 15:40:23 +0000 Message-ID: <20250618154023.77959-1-Luna.lamb@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DB3PEPF00008860:EE_|DB9PR08MB7892:EE_|AM4PEPF00025F97:EE_|AS4PR08MB7951:EE_ X-MS-Office365-Filtering-Correlation-Id: c5a7108c-3a42-4aa8-071a-08ddae7e9195 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|1800799024|36860700013|82310400026; X-Microsoft-Antispam-Message-Info-Original: G3deyQ6TD6ib2Ao+iq89DFBdQtK0lVsg3ZDBbwe9dylPhmjUyGtnP2rkv5W2gQoEym+UAzz/SvUov7LPFcVFUX1avNj/ejW4LSCh0PUcv8/4Zcmsiw2s5X96tmKqFlepVPfgHIFjH5mS98bHu3ZE43/M3vlRTD9vurDpq9n5Xx5Qm3SY40QV2vl+IMWMxT17CotOwkmewmy9W+1KEyuVn0j+rWficEEfmxoYv3h2IynKhvT1CN8g0+23B+vct9eNpSzbT/b4PzdC+QpnLupwHABP4yMHrFUS+9gM+XjO/D32lb1jyt/T7JGXdNckRGTPlZq35jBeojuARQKlcEQrkq2+6VU9fobrB0+dQ6ynpoHAw3PPN8wj7S23oPNYmOkPMECRnpDygenq3NnvS3SdQsVS2sfOIxMFjXpkLGcqtRkoJbshCke3s8AplAt3I5PX0vcQtp48g5EoWJp6zMAT60HICLmyYwCMiqd4yJLLMvm9pKUiTF0dVqYAA1JFi3F3/4CFj73wM7hYcfWpCE0AmSWZR6mQx42TF3HOz/1TCklATFHYc4HcPUJMnuVU6Cg9nKsFl9RDR3fatxjxpY91pmw216LDE2GVa24QL7Lg+6Am/Ax0uPj+aBCqwByr4aln6CBcwDcHzH/FDGLJVacrSrFdchEWfohRHLgpnLYwkaZadD3V9MKf/2TxDstkhANwb2yWHJsXgKWX4S3HWEPeVCni5nebH8rIF1eRpRh8J62JESki7gU+XXXpC+S16aG/OSroOci6sYSK5M+I0MR1esNOdPiy5RTIcp86A+m+BwkjMuF/O0U6R/zS3xZWD+uYGGNMAcTwHgbkQ+rtE4eFMv8sgQQ86eoFKrgoQYWHylIGW9Wk4PJPl2+DWHzxoPrEZhojUcnat9oTMLUJZwh7CwWzzTSpzDSZn6vkRdUQhm7Mu28aXFE9ck9h1SkN+xO/P1v5mA08N2dZ9izDfTgrKyk+iFDSde/gMKMlRDcSTAWGYUIMfKob0BrxLKNquAN0ElnFdYEEBGbC2cjqpNoVlvAP2BFAJlVCGxuAfk5/B6j4+Yzno+EFrM7qSkBfICY1Syi4j6VvQ5lSiUKLQQvYoztriBJ1hV+BJ96uBKHhMf1g2bKFADQcV7K1PCFYj4dEmMehWQxqs9yxehkFbpE5ijdAE5usNrLnORdu+HxKJ81IpiZccO3+T+s5kZwAZh8Eh4l0XlvTu3pnjzcKCJHUJqqdnvrTtis6iVtMyzi40pHraiR0YG/bNES8FYHvCYzV1iU2JF2gA6PJFMzNsTp259BkRaPoQzd1l/xS9ts4afwYO8YsZkbLyZ0KKoFeC2+oKHr7K8cigBRET41swFq5uxFBZnSeBijaGb55GUbOrVLx9n8twY6I7h0Lvxjan8okVNuPKVoYDa+ZkcGNw4/4HEyICg5sX8vwNx2mykYGE/WxIqDdGYNlfxbE/YNu7zRmOGA5Zf6zc+7gQJGzYbny9SepcQEDetKSYARdqcGbUHRpgWr+LilIsu2ONdakEk8a X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(376014)(1800799024)(36860700013)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB7892 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM4PEPF00025F97.EURPRD83.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 89a00428-1bc3-458f-03a9-08ddae7e7d04 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|35042699022|1800799024|82310400026|376014|14060799003; X-Microsoft-Antispam-Message-Info: BQZ8wB8OICF7hdwazQWPb2xaA08krN7wbY7boojB/R7pRdTn+c6xjmkx2KAX+IwYawlD9jLmIKAYVnrgBebbJoVqSpQ0NlGKixAemBkQfIsnxt2T9bdvD8zkywjfGfng2DafkyQAj3VLZl+1GuzPsSDq+EYwKLJdLZQMiEbOphebe+JTQkO7ObZJ4coO+l7l/PC/zPxVyx496eoXQ60WqPObA+rR+145eIGT0Vq9bs9Bld8WlRDQssKmNhBHcsZrPTpXou5Tk4vRUa8Zrh7IZ49jKh4j+d5bFX/v5LqDP26QQpPKcK6pCEeokoPXWKzK1fXju3Mkqq0thdvqxCc32Nxe0hUp7zUv3JUpORf5vN+ciCrxpwu4R7UHbuXFxMQSWXQOGSUUGOd95b0UncMsSmno0ope8xh4eUsVy9Dn1EEKl4nIZeZyCp6s/NdtvrjvOjynLdfrMR/KJX5hAe8olLoMdhqQwx/b3R/pk4h0kOfsW8yyUh5GO76zZXXks9qRvZkfoEvVnt9SgnUblaazsmX4V5+QArNJRekh3DZobtAjDTU2TmWJxIHEoPeSnBxOlGxj6M5BoX06MNGl1GByzyk3El+pihuNQQTv/Hh4Uh52pt6CkU/7ui26gVAlWD10WFpdXGwrLZplW+/yxlOWC3KsBDrqem2V6vdH24TnW3fXn5I3mx2aqNN9nZps4AlyOTxp1tdbKT7+CGfDx1rZbaKmm7cGit+cOeFw7k7D3SAvDmqmSyc74pqlryXScNdx1HEc6KA0q5tFda5913IXiXRnS9YsYoCy55mF9jOu24IDCAOXkLBzB4jujpO/izjjwPzUOXpJQw7+MRrDUzbmRuS57U8ebzB5DMq0iP5r5JLN9tSn75HgDGT8+O5i9AG+MXhJUyrbEzZTX2TBtOfHmikVoDPgJH3nQyYJj4bIeOV02s3zLHaA6zpPQzXdenE52qvHoyJk1/ZA/+FlEBM6Ql8pLMzpsM1KywLQ3qz5F5kCDqHdbrhp4QNh7TUnWMxt66ERzoM42VO1t7x9rHh2Hxse39BzPaIV7r2H3htdOFVtjhjHSBiteGK0cdeohAfk9m9oSKmGuTjn7IVDYPQjZS1qCw20+kGMq4VrLDQOhrjNDzUkeBy7QXq8eJbrfMJnT09nAe2/37bDMZ+rdLgaITiv9if16dA/UugiIr4Yg2OMOKKpQn9oFspHxbD5bQyhX8GB7zmsbLKueN04LG4r/td6iNYTIfdI+9t4zIm+VVawCCoIHII6N6kKBCaE78oZ9MfnSYucDJmMTUu/K13XZQKfsJuvO1P8/ydUDe9vXuOUOMJY+HVhKCjNgunY5IuoZGFomy/qHmHc8ECMKslsf86SPCTpre07tZY3HH5qb5736dwsq/xTt87CpDTZhWShL0wzDIS8OhFxsW/vSX/cmZR57zup1dG06zIHUZ83Ko2pj2+8QDTxXeCYTXRvTZK6bXjTrU7b+PNN9IsSViW5qEbMHVWXmPc/mbcys4nme5WN7HlXcIAxBIJJmI3+VC7I X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(36860700013)(35042699022)(1800799024)(82310400026)(376014)(14060799003); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Jun 2025 15:41:18.7688 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c5a7108c-3a42-4aa8-071a-08ddae7e9195 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129]; Helo=[outbound-uk1.az.dlp.m.darktrace.com] X-MS-Exchange-CrossTenant-AuthSource: AM4PEPF00025F97.EURPRD83.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS4PR08MB7951 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Improves codegen by packing coefficients. 4% and 2% improvement in throughput microbenchmark on Neoverse V1, for acosh and atanh respectively. --- OK for master? If so please commit for me as I don't have commit rights. Thanks, Luna sysdeps/aarch64/fpu/acosh_sve.c | 6 +- sysdeps/aarch64/fpu/atanh_sve.c | 3 +- sysdeps/aarch64/fpu/sv_log1p_inline.h | 88 +++++++++++++++++++++------ 3 files changed, 72 insertions(+), 25 deletions(-) diff --git a/sysdeps/aarch64/fpu/acosh_sve.c b/sysdeps/aarch64/fpu/acosh_sve.c index 326b2cca2e..3a84959f0a 100644 --- a/sysdeps/aarch64/fpu/acosh_sve.c +++ b/sysdeps/aarch64/fpu/acosh_sve.c @@ -30,10 +30,10 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t special) } /* SVE approximation for double-precision acosh, based on log1p. - The largest observed error is 3.19 ULP in the region where the + The largest observed error is 3.14 ULP in the region where the argument to log1p falls in the k=0 interval, i.e. x close to 1: - SV_NAME_D1 (acosh)(0x1.1e4388d4ca821p+0) got 0x1.ed23399f5137p-2 - want 0x1.ed23399f51373p-2. */ + SV_NAME_D1 (acosh)(0x1.1e80ed12f0ad1p+0) got 0x1.ef0cee7c33ce1p-2 + want 0x1.ef0cee7c33ce4p-2. */ svfloat64_t SV_NAME_D1 (acosh) (svfloat64_t x, const svbool_t pg) { /* (ix - One) >= (BigBound - One). */ diff --git a/sysdeps/aarch64/fpu/atanh_sve.c b/sysdeps/aarch64/fpu/atanh_sve.c index 16a7cf6aa7..958d69a5f5 100644 --- a/sysdeps/aarch64/fpu/atanh_sve.c +++ b/sysdeps/aarch64/fpu/atanh_sve.c @@ -30,7 +30,7 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t special) } /* SVE approximation for double-precision atanh, based on log1p. - The greatest observed error is 2.81 ULP: + The greatest observed error is 3.3 ULP: _ZGVsMxv_atanh(0x1.ffae6288b601p-6) got 0x1.ffd8ff31b5019p-6 want 0x1.ffd8ff31b501cp-6. */ svfloat64_t SV_NAME_D1 (atanh) (svfloat64_t x, const svbool_t pg) @@ -42,7 +42,6 @@ svfloat64_t SV_NAME_D1 (atanh) (svfloat64_t x, const svbool_t pg) svfloat64_t halfsign = svreinterpret_f64 (svorr_x (pg, sign, Half)); /* It is special if iax >= 1. */ -// svbool_t special = svcmpge (pg, iax, One); svbool_t special = svacge (pg, x, 1.0); /* Computation is performed based on the following sequence of equality: diff --git a/sysdeps/aarch64/fpu/sv_log1p_inline.h b/sysdeps/aarch64/fpu/sv_log1p_inline.h index 71f88e02de..6fac4cb91d 100644 --- a/sysdeps/aarch64/fpu/sv_log1p_inline.h +++ b/sysdeps/aarch64/fpu/sv_log1p_inline.h @@ -21,11 +21,12 @@ #define AARCH64_FPU_SV_LOG1P_INLINE_H #include "sv_math.h" -#include "poly_sve_f64.h" static const struct sv_log1p_data { - double poly[19], ln2[2]; + double c0, c2, c4, c6, c8, c10, c12, c14, c16; + double c1, c3, c5, c7, c9, c11, c13, c15, c17, c18; + double ln2_lo, ln2_hi; uint64_t hf_rt2_top; uint64_t one_m_hf_rt2_top; uint32_t bottom_mask; @@ -33,15 +34,30 @@ static const struct sv_log1p_data } sv_log1p_data = { /* Coefficients generated using Remez, deg=20, in [sqrt(2)/2-1, sqrt(2)-1]. */ - .poly = { -0x1.ffffffffffffbp-2, 0x1.55555555551a9p-2, -0x1.00000000008e3p-2, - 0x1.9999999a32797p-3, -0x1.555555552fecfp-3, 0x1.249248e071e5ap-3, - -0x1.ffffff8bf8482p-4, 0x1.c71c8f07da57ap-4, -0x1.9999ca4ccb617p-4, - 0x1.7459ad2e1dfa3p-4, -0x1.554d2680a3ff2p-4, 0x1.3b4c54d487455p-4, - -0x1.2548a9ffe80e6p-4, 0x1.0f389a24b2e07p-4, -0x1.eee4db15db335p-5, - 0x1.e95b494d4a5ddp-5, -0x1.15fdf07cb7c73p-4, 0x1.0310b70800fcfp-4, - -0x1.cfa7385bdb37ep-6 }, - .ln2 = { 0x1.62e42fefa3800p-1, 0x1.ef35793c76730p-45 }, + .c0 = -0x1.ffffffffffffbp-2, + .c1 = 0x1.55555555551a9p-2, + .c2 = -0x1.00000000008e3p-2, + .c3 = 0x1.9999999a32797p-3, + .c4 = -0x1.555555552fecfp-3, + .c5 = 0x1.249248e071e5ap-3, + .c6 = -0x1.ffffff8bf8482p-4, + .c7 = 0x1.c71c8f07da57ap-4, + .c8 = -0x1.9999ca4ccb617p-4, + .c9 = 0x1.7459ad2e1dfa3p-4, + .c10 = -0x1.554d2680a3ff2p-4, + .c11 = 0x1.3b4c54d487455p-4, + .c12 = -0x1.2548a9ffe80e6p-4, + .c13 = 0x1.0f389a24b2e07p-4, + .c14 = -0x1.eee4db15db335p-5, + .c15 = 0x1.e95b494d4a5ddp-5, + .c16 = -0x1.15fdf07cb7c73p-4, + .c17 = 0x1.0310b70800fcfp-4, + .c18 = -0x1.cfa7385bdb37ep-6, + .ln2_lo = 0x1.62e42fefa3800p-1, + .ln2_hi = 0x1.ef35793c76730p-45, + /* top32(asuint64(sqrt(2)/2)) << 32. */ .hf_rt2_top = 0x3fe6a09e00000000, + /* (top32(asuint64(1)) - top32(asuint64(sqrt(2)/2))) << 32. */ .one_m_hf_rt2_top = 0x00095f6200000000, .bottom_mask = 0xffffffff, .one_top = 0x3ff @@ -51,14 +67,14 @@ static inline svfloat64_t sv_log1p_inline (svfloat64_t x, const svbool_t pg) { /* Helper for calculating log(x + 1). Adapted from v_log1p_inline.h, which - differs from v_log1p_2u5.c by: + differs from advsimd/log1p.c by: - No special-case handling - this should be dealt with by the caller. - Pairwise Horner polynomial evaluation for improved accuracy. - Optionally simulate the shortcut for k=0, used in the scalar routine, using svsel, for improved accuracy when the argument to log1p is close to 0. This feature is enabled by defining WANT_SV_LOG1P_K0_SHORTCUT as 1 in the source of the caller before including this file. - See sv_log1p_2u1.c for details of the algorithm. */ + See sve/log1p.c for details of the algorithm. */ const struct sv_log1p_data *d = ptr_barrier (&sv_log1p_data); svfloat64_t m = svadd_x (pg, x, 1); svuint64_t mi = svreinterpret_u64 (m); @@ -79,7 +95,7 @@ sv_log1p_inline (svfloat64_t x, const svbool_t pg) svfloat64_t cm; #ifndef WANT_SV_LOG1P_K0_SHORTCUT -#error \ +#error \ "Cannot use sv_log1p_inline.h without specifying whether you need the k0 shortcut for greater accuracy close to 0" #elif WANT_SV_LOG1P_K0_SHORTCUT /* Shortcut if k is 0 - set correction term to 0 and f to x. The result is @@ -96,14 +112,46 @@ sv_log1p_inline (svfloat64_t x, const svbool_t pg) #endif /* Approximate log1p(f) on the reduced input using a polynomial. */ - svfloat64_t f2 = svmul_x (pg, f, f); - svfloat64_t p = sv_pw_horner_18_f64_x (pg, f, f2, d->poly); + svfloat64_t f2 = svmul_x (svptrue_b64 (), f, f), + f4 = svmul_x (svptrue_b64 (), f2, f2), + f8 = svmul_x (svptrue_b64 (), f4, f4), + f16 = svmul_x (svptrue_b64 (), f8, f8); + + svfloat64_t c13 = svld1rq (svptrue_b64 (), &d->c1); + svfloat64_t c57 = svld1rq (svptrue_b64 (), &d->c5); + svfloat64_t c911 = svld1rq (svptrue_b64 (), &d->c9); + svfloat64_t c1315 = svld1rq (svptrue_b64 (), &d->c13); + svfloat64_t c1718 = svld1rq (svptrue_b64 (), &d->c17); + + /* Order-18 Estrin scheme. */ + svfloat64_t p01 = svmla_lane (sv_f64 (d->c0), f, c13, 0); + svfloat64_t p23 = svmla_lane (sv_f64 (d->c2), f, c13, 1); + svfloat64_t p45 = svmla_lane (sv_f64 (d->c4), f, c57, 0); + svfloat64_t p67 = svmla_lane (sv_f64 (d->c6), f, c57, 1); + + svfloat64_t p03 = svmla_x (pg, p01, f2, p23); + svfloat64_t p47 = svmla_x (pg, p45, f2, p67); + svfloat64_t p07 = svmla_x (pg, p03, f4, p47); + + svfloat64_t p89 = svmla_lane (sv_f64 (d->c8), f, c911, 0); + svfloat64_t p1011 = svmla_lane (sv_f64 (d->c10), f, c911, 1); + svfloat64_t p1213 = svmla_lane (sv_f64 (d->c12), f, c1315, 0); + svfloat64_t p1415 = svmla_lane (sv_f64 (d->c14), f, c1315, 1); + + svfloat64_t p811 = svmla_x (pg, p89, f2, p1011); + svfloat64_t p1215 = svmla_x (pg, p1213, f2, p1415); + svfloat64_t p815 = svmla_x (pg, p811, f4, p1215); + + svfloat64_t p015 = svmla_x (pg, p07, f8, p815); + svfloat64_t p1617 = svmla_lane (sv_f64 (d->c16), f, c1718, 0); + svfloat64_t p1618 = svmla_lane (p1617, f2, c1718, 1); + svfloat64_t p = svmla_x (pg, p015, f16, p1618); /* Assemble log1p(x) = k * log2 + log1p(f) + c/m. */ - svfloat64_t ylo = svmla_x (pg, cm, k, d->ln2[0]); - svfloat64_t yhi = svmla_x (pg, f, k, d->ln2[1]); + svfloat64_t ln2_lo_hi = svld1rq (svptrue_b64 (), &d->ln2_lo); + svfloat64_t ylo = svmla_lane (cm, k, ln2_lo_hi, 0); + svfloat64_t yhi = svmla_lane (f, k, ln2_lo_hi, 1); - return svmla_x (pg, svadd_x (pg, ylo, yhi), f2, p); + return svmad_x (pg, p, f2, svadd_x (pg, ylo, yhi)); } - -#endif +#endif \ No newline at end of file