From patchwork Thu Nov 6 09:32:33 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 123649 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E4843385DC32 for ; Thu, 6 Nov 2025 15:30:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E4843385DC32 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=eRD2D3z9; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=eRD2D3z9 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from OSPPR02CU001.outbound.protection.outlook.com (mail-norwayeastazon11013003.outbound.protection.outlook.com [40.107.159.3]) by sourceware.org (Postfix) with ESMTPS id A83CC385C6F3 for ; Thu, 6 Nov 2025 09:33:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A83CC385C6F3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A83CC385C6F3 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=40.107.159.3 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1762421599; cv=pass; b=qkFhq6dOCznqdI/pUCdtBzxnU+IbsqyVCd0v8ZLqGxZoVhjC4HoIvFIMVmXUA05ruOmro8dMTu77BC3QtlY4j4lDtSnZEdbHIoMISItooNkmLQrFMe00u8g/QhuW8l6zGHgcwhrwW+AjX4ZWNkOP+4msBisD24UzZ7iQFFOoqqc= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1762421599; c=relaxed/simple; bh=XfebBgYXgtID/bDoN55iYF3duf97IBIYWlOP5kVpQww=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=GBBhbNaSB8p41hN5i4QaglkM0NCEev/0uZFsO7Hh60wMdPQDUsTstGVcrnfqSbCdYZgS+crla3xpZW+qt13fAAzC22NzsDi4MKEpDAghxBxwS1mz+pW1IBg24l2FQnfL+7UoikxbswRspqPm0BJesCo4evIeDvi48e62uhXX/MI= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A83CC385C6F3 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=SfQ4W/KtTUo1p/k1yy4vryJIdNaUFnPwggQ6pMRnzppPV/Y/e/6+KKBaPQnqwAPQ8zj3LxhU6SdCBSAKDaTDtOSa6/YEGtj+8Aa9DZyxxnaOlGuJPz/QhIAMYJXlo0rpYZaOi5iSytvKe0Jep8tP4cXdpQKhDYTtpzD1eVrJ/MLqND15U+9WYazLpaiqqm3dYoEXhYZWwWXO6jX1751y1/Hp2fXRzpjj1K8MvqRjjMIyCeEXNiEYhRLAkJiuUHvBf/4RYYLuWpCZweCnsWNHhoMgpBVot1ie/b1LZBTFwVqBiLrV9nnIkIPLTblS6X85o1c+idEHeVUFo4OeK20+Yg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lEesLGjTPI3j+p2wPczxgLTW1X+hIzvRfh76RV883e8=; b=M0Q0ljoHSQNJby/qAgc5vesDyA5ZQFPXdYirbAV5zfENhC/g8SxS9eEvj5MwPaqzMf3Q52OPiHxvBso5c2fmFFD78YXS6QImDRqrnhsSs59urwOHaOG7t+1ztRzfLGNEgeViNJRxgexa8p4bH6KpnQbpllgjcxpG/v6HoeTicOabzE8EiMmoA0W+2/18cMbh1QZDbeqFr1jxPrFWL/CP4RJLUEMhzSRCR0v03ft7pniPhJaeVrqu8pRl7pZxSfui3Io8T0vyDcXLrzZkdWj+LtEbxJeB5BLhoGlMxVOmQKlw/YtpWIoStvuIu+DFKPBap+8awMVHZI70uZt+fBYJKw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lEesLGjTPI3j+p2wPczxgLTW1X+hIzvRfh76RV883e8=; b=eRD2D3z9HFWTMrzo7ezzKBXTiNyQvrytJrJIKTLO7UC9BIPOVlez7Vv2jAXxAbeTVG4qn5wy2yOqW1ODHZQEB7NGm3hNLuFdzV30fN5hHg2FLk3Y/tObEt+A4nyUUgu+JSGSTOMOx5Stc7VcoKn0nNFHc3FD/ZgmOeF3W98X/A0= Received: from AS4PR09CA0013.eurprd09.prod.outlook.com (2603:10a6:20b:5e0::19) by PAVPR08MB8846.eurprd08.prod.outlook.com (2603:10a6:102:2fe::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9275.16; Thu, 6 Nov 2025 09:33:14 +0000 Received: from AM3PEPF00009BA1.eurprd04.prod.outlook.com (2603:10a6:20b:5e0:cafe::5) by AS4PR09CA0013.outlook.office365.com (2603:10a6:20b:5e0::19) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9298.12 via Frontend Transport; Thu, 6 Nov 2025 09:33:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 4.158.2.129 as permitted sender) receiver=protection.outlook.com; client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by AM3PEPF00009BA1.mail.protection.outlook.com (10.167.16.26) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9298.6 via Frontend Transport; Thu, 6 Nov 2025 09:33:13 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=s7hOcZFYHC/deMiZdXX3BAV6TQnRSw3/Cvn3Dif63J9rB3Tc/NW6yYuuYcaCRmW5X5I71/LRFLJDcMCRf+5xwX1ntYE5a0rLGOCaMgzUUc3F1XGeFXrLRpKsdr9GxzFnv8BRnu8dyE79RzEtAQPuh5r0jXTUqHWADTNcb1biDdYxpRw4OVHY0xKlNU3N2zTScKrYitxWrcOk0dFwzxuPDnB1ALmAFUrNs728YW8tG7kcXJwFKRQNie3PJ/zyfySe8vIIKHBm1q3xQlWeeOPgIV96zMXFg413dBdUWxcUizTuipAFj9Ocrzc1JTYswS7FfHywm0RM6aMMhq+WHHwgJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lEesLGjTPI3j+p2wPczxgLTW1X+hIzvRfh76RV883e8=; b=YzNPaaDQCRrFc6KYQqW48UM+PXEWDuIc1AtsELRzb3AO2J08XlRjmPHwZUQ7fva7dllONJrgCg3J+Af89KXjDSyUMvKs38t6ChzpBOeAaxrJ/046BZwHQN4fTqG7jfd6jIyJY3k8Po6ncFyWMHYM2tx0vVHPK4zD31MFR7xFa2TV4rXpEMBtqffivpyS2u7yzoqM0Sfwz5Yr790kxpm2FxCgB2STWE2ezUdMmoLYa13/K5yo1VEd/P2XnrjV1wKfhoqzJAT9mSrRIi1ef6FtWYH/4isdGGcTq266zx9OQ5cPWTtX92ltUKrqY3o2TAv6JFBf4SvwOxeKwMn2fK1Kyg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lEesLGjTPI3j+p2wPczxgLTW1X+hIzvRfh76RV883e8=; b=eRD2D3z9HFWTMrzo7ezzKBXTiNyQvrytJrJIKTLO7UC9BIPOVlez7Vv2jAXxAbeTVG4qn5wy2yOqW1ODHZQEB7NGm3hNLuFdzV30fN5hHg2FLk3Y/tObEt+A4nyUUgu+JSGSTOMOx5Stc7VcoKn0nNFHc3FD/ZgmOeF3W98X/A0= Received: from DB9PR02CA0029.eurprd02.prod.outlook.com (2603:10a6:10:1d9::34) by AS2PR08MB9787.eurprd08.prod.outlook.com (2603:10a6:20b:604::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9275.16; Thu, 6 Nov 2025 09:32:40 +0000 Received: from DU6PEPF0000A7E0.eurprd02.prod.outlook.com (2603:10a6:10:1d9:cafe::a) by DB9PR02CA0029.outlook.office365.com (2603:10a6:10:1d9::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9298.10 via Frontend Transport; Thu, 6 Nov 2025 09:32:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (172.205.89.229) by DU6PEPF0000A7E0.mail.protection.outlook.com (10.167.8.39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9298.6 via Frontend Transport; Thu, 6 Nov 2025 09:32:35 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX04.Arm.com (10.240.25.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.27; Thu, 6 Nov 2025 09:32:35 +0000 Received: from AZ-NEU-EX04.Arm.com (10.240.25.138) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 6 Nov 2025 09:32:34 +0000 Received: from H2JQC6K4Y7.arm.com (10.57.72.55) by mail.arm.com (10.240.25.138) with Microsoft SMTP Server id 15.2.2562.27 via Frontend Transport; Thu, 6 Nov 2025 09:32:34 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH v2] AArch64: Optimise SVE scalar callbacks Date: Thu, 6 Nov 2025 09:32:33 +0000 Message-ID: <20251106093233.55262-1-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DU6PEPF0000A7E0:EE_|AS2PR08MB9787:EE_|AM3PEPF00009BA1:EE_|PAVPR08MB8846:EE_ X-MS-Office365-Filtering-Correlation-Id: 438f16f3-191f-4072-5502-08de1d1781e6 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|82310400026|36860700013|376014|1800799024; X-Microsoft-Antispam-Message-Info-Original: JVO/B5D1L2c2/cF3VVnui4y1f6uuBo5TvBcmRPTG1Rr0q8MHbucTDy8MC9XU1SrgI0P/q4nfWg5mgmYwtPnx9QuC+UHEdscpqGjE5LBpm5rrHHmZpqr6/lGkLY0M1dcWEvcL0i6uLHW0iEPjnIpvMS6dDU8rSmTvLbptN0ZzPrm6zbAnMrnZhzd9zpoZ+qbtARAtSYWWThb7/3Xkcolu/UZ0UwSq+W0DFTjh04NM1Ar3krY18lsdpgl9qzQM5XvuRY9LidY/yC4T+LS0IvfZ0lBJTF3hqGZ3g9Il6CnAKBR78YrNb6J6yQUvK+TkT3FxDqEylFxUW3U/dLjGaJY+eqfF3bCpTgepqF7VCIx9bFce27iBRCoTdYFscjfyvn8hnMPRfiM8aKtnhbLdKNb5S2yqt1yXMmvMi64MpGf+unuCBmE6E63W1VXP9GrJmb09GKQIsFlIVaIr0tKLoUB6On1mJwJ2uV/haiCkU4Rz3a/fbPvA0iY3RV/yXCFpQoF5DDvIakDaCYeqbmSCc4h1/xRSTaLTm6jZWOiQ6t7FqicSgLk0K5RrfkdnyUYRMh46cgTZn71SVmlqDS/HO3fbwN5ZhFQDNmLM7yF0S40X5oTeqKiWIdPg1dEca/JVbE4zKDyn8L+l1hdqxQXj7xCxbWMq5WU3/lwQaB/XRoHbrLyiAgEkinu2rNLddSj1E2xBXVzOJk38ltUlLn9uWI64byOPTXbSoIEKbpc1kLrtu03nEiA+lxaua9Tpy6bCdBtPyZxznAplFJLNN8j4VT3ZStkwmjQF1SDwgZ8WUnyRA/DSgvFoJWXS1y8IQLylpFBOqOfoY0AwsvCD5FkEy6h4Y2P6H4PFzAKzDL5YPZpF3Fswlz2P28qI9fb+VZtozvXkXaG/ZUYu97PtrRmOftIKxtexBo6EeK+ralZYQxTcBya1tlpqC7g4Gw4Euip/fgOvTQNht0zdb9D/oqFYTwCY85SZHeacK7eS8qaG4aMMPiAJtLbJe7sAnkdVc7fqEs8BDJY3XtWQ5wB5tTc9zehysc5ckcPUx/6qJ+8qnmqIKghGRtZ7DDGX66p+4L5Crk45S9MbK0V/tZc1R6Wh7F/bb0fxp/Vr1+noKXdlmM4fC3Ff8fW3U9Gm/lU5mzwDZqGigrgJ45gEbb/G/zRN/VtyN6DLX9iOP6ORrjdDAcp+rZGNBTQdyt4uUG7H7zqzn3Q7FdqaCubLKRZOXtY6xy2m/M8pNw7kksUcdv1JWnRk/cA0RI5xXQ9g6lytiTKrhGAYcO6YPp1ikaYSgQ/ulalFQAYd4nNfl553T3wUv4v5Y01TSRnu6+5AC0ygpBZAoB5kRJjgIQt0+OVDBwjnsM/SXt0DJBdMCNtFIvTSRKUGapAO9tTetRu2pn3/y+jCQBmyKdiq1XUrBJ7moxVj1LoWkWENjr7Il4j4lSV81VJvzDoOAMS36oP2Q12YkZm4QZM5ta2/3edMU1sDzVtyWyIYB7ehIGguXb6H2KuRZM1C2wyzpZNOVw3OxXSwcvaPTKYy70G0Ai8eUUKjE27NWYnncvBbgVyLgJr0NElgISTkQl8= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(82310400026)(36860700013)(376014)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9787 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF00009BA1.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 09ed651a-1bcc-45e9-89f4-08de1d176b82 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|1800799024|14060799003|36860700013|376014; X-Microsoft-Antispam-Message-Info: +yNhqspSyTDZ6YHq8qnglKpILUqOMFN+JAuzu+dQPPe6PjEFwntGOb2/BXBnoO9bYVYaiJvCP4Q9BWrEFvzEMbCqLIGm76VfVrMR8BJT7oWjLkeeDr8nhEy9SPDLIfsvX5g+Xlh8ZaAbYz7jf+SS8/fCXp3BO8Pw6kWsMmPxwVV4VKYHRnx924D7TyTLhpOfW61yYGAub9dNJDlZscTGC+tBv3YCI0rRpFSYk/Y/0xoc3Ij9fF1r2bl1iWivuyHBWw9UiHKUVTAQp52DJdpBiMTbfOSfUS2nq61Nzs5NqIpz0zelF1mVBj7e5h+HVz2UB63LbMpv/N55DHDd3NJfH/AYIBJdP5JiG1LernTLwePzSJx6blPrmIbaKLB4iolYlZ5sng3aRf9XgubNRWWZgS0pU2jSsi6s3mXaDmiu/nUuvPhgigyPPUkPp64KgRixVZrt+kGqkkmDv2dKmFIzZUo8uAbH112nzM/45ZY1cdLIPX/rM0mZfubtj+hDneJJAD6zK2Qn3KZqNkt1PeRRHlUZ8J2qTbdA/TdVa9FDyKZMn8tUMEE6x1zVJjzQOeQ0GTcwG9sQsiujzdRaD/cD2IP682JWFhh0yq4ZSbE6lKnW2iuLtST3Fqw7eDYDJ6DS6W6jJ87TxsEwW218ziRKwF4ff/rinDn7T3cM9aJkxxM5gy1k7bZwuliC7ShyxWUWQbp8t09gABj+ZCdhSa5XYFmcqFRP0SVwrjPIrzgtADnil6uGSFYkSFh6FXvivrR804nhHwaqtmoMXbSxwgfBJnsOAcWSQ8EupjuI2eJYjGMbif9+GOvmjO2+RTcXTkQ6w94o5nbgEW4J5Z50/43uU87I6mmUuhYH6p8KjIlnzECK0Tg6hL6EnRdCtsSQ+Ihf63wminXdKVx/Ky1vRAiXGzNcMmmDAVLcid4z+EekIKxMHuPjIjxKqj/A7SjLx6T2hbSHQKfYhUMDrTqtKi49WUJxhwThoWjRys+1oTx0D/Pc8S5qRgrQaDJ4H3znJQsHDBlSO7sARllnMQucGCazDi0j7NPrmPN+A3FP4dHICvYjImCq73seSf48xAwjZDqMrx0qydU+kaiebvT+9hvmizY1ZyXZiPDOJHktdhnCGigB/CbzCNjvcn7jEvGaupypQoQyVrgdeXR4SGXMMeHdtmtsp0R19N+fJ/zYvzId171DpPf8N+MC39xwZiBJT3yw0eE9bwlOgraCKsY+dZagnU9pHVYRqp7Cct3CQZ3oBXqzbXDdud5VRTvbMLfpG0zL1dCPCFhWu+KDDcHFaHF+0Uv2GHMqznkPP3Ep0zF0LwZV2vpwE7S9S05uxNWbFwGfkbGT7g23693yh8ZTaJ9iDw0gUrG6BqV6Qg+hR7gAkwLWgFq5lXyZieAR9R8WqA3GlaIIBA7hzVa1QzBfL51hLEmNuiXacOYfGm5b57vZ+eqXKLm4uOUdy1n9HC49OXLYXPD+vYN+IMgvtYNafbuC8sLKl5X1JXtSO1yDmRg8JQnc5tnGAuOUPMkja2Auls/IQhvPYmRdlYaBAr1b3iYS7kuAMVkawuSDgubHzlZtbKE= X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(1800799024)(14060799003)(36860700013)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Nov 2025 09:33:13.3615 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 438f16f3-191f-4072-5502-08de1d1781e6 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129]; Helo=[outbound-uk1.az.dlp.m.darktrace.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF00009BA1.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAVPR08MB8846 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Instead of using SVE instructions to marshall special results into the correct lane, just write the entire vector (and the predicate) to memory, then use cheaper scalar operations. Geomean speedup of 16% in special intervals on Neoverse with GCC 14. --- Changes from v1: * Use fixed-size arrays rather than VLA * Don't force building with -fno-stack-protector OK for master? If so please commit for me. Thanks, Joe sysdeps/aarch64/fpu/sv_math.h | 97 ++++++++++++++++++++++------------- 1 file changed, 62 insertions(+), 35 deletions(-) diff --git a/sysdeps/aarch64/fpu/sv_math.h b/sysdeps/aarch64/fpu/sv_math.h index 3d576df4cc..65d7f0ff20 100644 --- a/sysdeps/aarch64/fpu/sv_math.h +++ b/sysdeps/aarch64/fpu/sv_math.h @@ -24,11 +24,29 @@ #include "vecmath_config.h" +#if !defined(__ARM_FEATURE_SVE_BITS) || __ARM_FEATURE_SVE_BITS == 0 +/* If not specified by -msve-vector-bits, assume maximum vector length. */ +# define SVE_VECTOR_BYTES 256 +#else +# define SVE_VECTOR_BYTES (__ARM_FEATURE_SVE_BITS / 8) +#endif +#define SVE_NUM_FLTS (SVE_VECTOR_BYTES / sizeof (float)) +#define SVE_NUM_DBLS (SVE_VECTOR_BYTES / sizeof (double)) +/* Predicate is stored as one bit per byte of VL so requires VL / 64 bytes. */ +#define SVE_NUM_PG_BYTES (SVE_VECTOR_BYTES / sizeof (uint64_t)) + #define SV_NAME_F1(fun) _ZGVsMxv_##fun##f #define SV_NAME_D1(fun) _ZGVsMxv_##fun #define SV_NAME_F2(fun) _ZGVsMxvv_##fun##f #define SV_NAME_D2(fun) _ZGVsMxvv_##fun +static inline void +svstr_p (uint8_t *dst, svbool_t p) +{ + /* Predicate STR does not currently have an intrinsic. */ + __asm__("str %0, [%x1]\n" : : "Upa"(p), "r"(dst) : "memory"); +} + /* Double precision. */ static inline svint64_t sv_s64 (int64_t x) @@ -51,33 +69,35 @@ sv_f64 (double x) static inline svfloat64_t sv_call_f64 (double (*f) (double), svfloat64_t x, svfloat64_t y, svbool_t cmp) { - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) + double tmp[SVE_NUM_DBLS]; + uint8_t pg_bits[SVE_NUM_PG_BYTES]; + svstr_p (pg_bits, cmp); + svst1 (svptrue_b64 (), tmp, svsel (cmp, x, y)); + + for (int i = 0; i < svcntd (); i++) { - double elem = svclastb_n_f64 (p, 0, x); - elem = (*f) (elem); - svfloat64_t y2 = svdup_n_f64 (elem); - y = svsel_f64 (p, y2, y); - p = svpnext_b64 (cmp, p); + if (pg_bits[i] & 1) + tmp[i] = f (tmp[i]); } - return y; + return svld1 (svptrue_b64 (), tmp); } static inline svfloat64_t sv_call2_f64 (double (*f) (double, double), svfloat64_t x1, svfloat64_t x2, svfloat64_t y, svbool_t cmp) { - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) + double tmp1[SVE_NUM_DBLS], tmp2[SVE_NUM_DBLS]; + uint8_t pg_bits[SVE_NUM_PG_BYTES]; + svstr_p (pg_bits, cmp); + svst1 (svptrue_b64 (), tmp1, svsel (cmp, x1, y)); + svst1 (cmp, tmp2, x2); + + for (int i = 0; i < svcntd (); i++) { - double elem1 = svclastb_n_f64 (p, 0, x1); - double elem2 = svclastb_n_f64 (p, 0, x2); - double ret = (*f) (elem1, elem2); - svfloat64_t y2 = svdup_n_f64 (ret); - y = svsel_f64 (p, y2, y); - p = svpnext_b64 (cmp, p); + if (pg_bits[i] & 1) + tmp1[i] = f (tmp1[i], tmp2[i]); } - return y; + return svld1 (svptrue_b64 (), tmp1); } static inline svuint64_t @@ -109,33 +129,40 @@ sv_f32 (float x) static inline svfloat32_t sv_call_f32 (float (*f) (float), svfloat32_t x, svfloat32_t y, svbool_t cmp) { - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) + float tmp[SVE_NUM_FLTS]; + uint8_t pg_bits[SVE_NUM_PG_BYTES]; + svstr_p (pg_bits, cmp); + svst1 (svptrue_b32 (), tmp, svsel (cmp, x, y)); + + for (int i = 0; i < svcntd (); i++) { - float elem = svclastb_n_f32 (p, 0, x); - elem = f (elem); - svfloat32_t y2 = svdup_n_f32 (elem); - y = svsel_f32 (p, y2, y); - p = svpnext_b32 (cmp, p); + uint8_t p = pg_bits[i]; + if (p & 1) + tmp[i * 2] = f (tmp[i * 2]); + if (p & (1 << 4)) + tmp[i * 2 + 1] = f (tmp[i * 2 + 1]); } - return y; + return svld1 (svptrue_b32 (), tmp); } static inline svfloat32_t sv_call2_f32 (float (*f) (float, float), svfloat32_t x1, svfloat32_t x2, svfloat32_t y, svbool_t cmp) { - svbool_t p = svpfirst (cmp, svpfalse ()); - while (svptest_any (cmp, p)) + float tmp1[SVE_NUM_FLTS], tmp2[SVE_NUM_FLTS]; + uint8_t pg_bits[SVE_NUM_PG_BYTES]; + svstr_p (pg_bits, cmp); + svst1 (svptrue_b32 (), tmp1, svsel (cmp, x1, y)); + svst1 (cmp, tmp2, x2); + + for (int i = 0; i < svcntd (); i++) { - float elem1 = svclastb_n_f32 (p, 0, x1); - float elem2 = svclastb_n_f32 (p, 0, x2); - float ret = f (elem1, elem2); - svfloat32_t y2 = svdup_n_f32 (ret); - y = svsel_f32 (p, y2, y); - p = svpnext_b32 (cmp, p); + uint8_t p = pg_bits[i]; + if (p & 1) + tmp1[i * 2] = f (tmp1[i * 2], tmp2[i * 2]); + if (p & (1 << 4)) + tmp1[i * 2 + 1] = f (tmp1[i * 2 + 1], tmp2[i * 2 + 1]); } - return y; + return svld1 (svptrue_b32 (), tmp1); } - #endif