From patchwork Tue Dec 17 10:35:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 103254 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B25883858D38 for ; Tue, 17 Dec 2024 10:38:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B25883858D38 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=I1hOGzCF; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=I1hOGzCF X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR03-DBA-obe.outbound.protection.outlook.com (mail-dbaeur03on20622.outbound.protection.outlook.com [IPv6:2a01:111:f403:260d::622]) by sourceware.org (Postfix) with ESMTPS id 783B43858D33 for ; Tue, 17 Dec 2024 10:35:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 783B43858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 783B43858D33 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:260d::622 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1734431752; cv=pass; b=DDj4L8q0UFwZj0CXPIoMJA/RIdLNSaso03XwHq2CfEGvxq6AMD1EuXjN3H9tNakKrLtrmtaNwPqG3CLc4ySfUDyRyd8AZtadKx9p4T4RoF5kytPYhRznpwWbLZ1FyNoEddZS+PL20YHYqzaaB0A30vshUXo5qQBt16r80JU28wE= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1734431752; c=relaxed/simple; bh=gyuoYOoi6pKZsIT+3coXxZiCHrt/hoRq0Cppqi9xXnM=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=xVrOPpTJ9uGkMT5kWrYV5vH6e8OHZMKnqtzdHXeZu7DROtYtpxa0ScOqCvXbmZIX/Dr12AkObzMBxgWXyjUyE09JhNqc46+mG9ZSAEDIHJ0EgkDohgo1CFbbfC7d2P22t1CGt7ibq7DQyzU4ncjQHEaX6/WkqGk2JHnmBe/LGg0= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 783B43858D33 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=ZjBXfMBR3IQojUCDHgWi+rgndVLvQZ68ibGuT+SbRRjTwy0itVQ61byJS42F9utfDBcGwdhMEcmYGRJvFy+9aDrrhvG/2V25mgRaCd1qFlntOUAfrxbLdOjV+v7irqQSLaiLLGY47ptERir2Qd4DLwkDUpH5BICkHQ+odVEJxwF5eG+MdzNX/1McowsmlDVawcaUJcpmK55NDPbuEuI7OWSPyp2OxLA/ePWwxC14dEAfJWQFPFVSSKQCj/aBHR4mHkNSHc7/rc/aRMyUYgZFgjbFl/uZpU1Mo6T1GRR+65Obb426G/t+ganEImH34ju//4NjlDc9fqMXIelDA2diAQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AO3QhcibQZYOtCMI+m/0qsbXZJ+5A4OJXq2fDEJc834=; b=PauY4BttRU42cd/PEpKpaBSKQOr+R/O+7EyETrVZIKkYeDOGzTA93G7Jy5mqe4fvSSTQq8RNueDzNeyiEbZ2d7faUI7/vK1F/GVQeUY7geYHy3FK959Gv9hEuwCTz8mFBlN31s03Eh2dg9gSCZ3KDtKvMaHOkSoyQBvScZdZPBQTLDGB+jxI3pLCiSX7jsvtK5NIWRxVPWDDKlDWetiQ05ughQxAZ8To+CIU+wctxl8/ThPp6HoWLiY0pF2w/2wzmduOchFgiKYFSsPcTG9xI3Ci7MqVqYy9E0qLq7sjUSDRJpm99AKmt91PSxdRfEhP8dRPMv0GdZRhqTeSdwiHpA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=0 ltdi=1) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AO3QhcibQZYOtCMI+m/0qsbXZJ+5A4OJXq2fDEJc834=; b=I1hOGzCFuYNOl97kcDKx6PX0jjsgvbsH2jQmZO/XdX6joQgM3GapxzNauCMNoTAURa2Xh8c6K2c8cZNGbGfX8oCWMEmIiVVTmo9kUzK8xamlmGXsbR4/ZbM98LkQqImnaduAKwMCyOVuZ7jITJ24fljpi5ZWzgBDnfeMx2IOtPI= Received: from DU7PR01CA0011.eurprd01.prod.exchangelabs.com (2603:10a6:10:50f::14) by AM8PR08MB5825.eurprd08.prod.outlook.com (2603:10a6:20b:1d6::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.22; Tue, 17 Dec 2024 10:35:46 +0000 Received: from DB5PEPF00014B90.eurprd02.prod.outlook.com (2603:10a6:10:50f:cafe::a9) by DU7PR01CA0011.outlook.office365.com (2603:10a6:10:50f::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.21 via Frontend Transport; Tue, 17 Dec 2024 10:35:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DB5PEPF00014B90.mail.protection.outlook.com (10.167.8.228) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.15 via Frontend Transport; Tue, 17 Dec 2024 10:35:46 +0000 Received: ("Tessian outbound 3017059ab6f1:v526"); Tue, 17 Dec 2024 10:35:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: f7dc816375befe7b X-TessianGatewayMetadata: Vn/VCajowBhuBAq8EFEQ0GabMrh4gobBtwDZMavqjD63wq5/P8/pY1o2MJD+qEUwwIy9mZqwklSzU3EWSy08k3kct0WewruL7+DYArotjDuiIzMSjgzftbYRZs9+wKLWqoY8DJYZ9659dxxO/9W96f7eROKwMdxBH0afLl6qHA0= X-CR-MTA-TID: 64aa7808 Received: from Le3f9b03b133c.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 786BB2A4-141B-4583-9C46-CFCCFEED24E3.1; Tue, 17 Dec 2024 10:35:39 +0000 Received: from EUR02-DB5-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id Le3f9b03b133c.1 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Tue, 17 Dec 2024 10:35:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=a+F0qTdnWACmACwHAtsYIRrka/4XDbgyQ3XUWw/I4IUCCwFuRyfzmUxLPZsW9PbvrN3gsWTjdvm+ngyb/4pEDlPNOuXsVOeDFnPos0l6AG7oulc0u3IMW0/nYARMAfAO5ljXUk8r8Han+nKKLf6mMfJosWYXMvy8uJNAkgdyRJjiC+gbHt/g0NtfZF7WuNHYy3nYn/WAd4Ih4GiyCF35fCWkgEtB2OG74eoSOg9owdotnNG5enAxgjOJSJ9CrOOROhq0lYdMdcRANUaZSH87wXJBd/c4oAKIcnoXKe3rSZE9hGlQv1EFJlNHUbNUlbj9wZBShJYDVEoWV+nbQx2nqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=AO3QhcibQZYOtCMI+m/0qsbXZJ+5A4OJXq2fDEJc834=; b=LG3Fi6UViy6LLSHvWst8+AgQWXtmpBxKuMSDZfQGjlv8NCV52tsHDQPLp+tk8vDb8X1tvanFwiBAlQjtlXQUHX/jVYPqD8XrBqKjFMK4Ym6v+i0ro4o227aDRpGgD2k+W5ExnVGbC3FgAoMROejT55Oti/ipAHxbdHg0D/r0xZASXzaL4cdtl7lmUnTVz93siuuP748b/13HKq/or/c2uXDxtYHG186Pc8eUM5BWNwb6ekLYt5v9sCjkCbVNM1ajJO/88qKQBZ3wM+3y1QVwr93bwLVwis6z4PBaHyM2PF4sutDbMhaLJ2XcoUjyUq/ojfKFqd0IPmaeYUIWYpZXaQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=fail (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=AO3QhcibQZYOtCMI+m/0qsbXZJ+5A4OJXq2fDEJc834=; b=I1hOGzCFuYNOl97kcDKx6PX0jjsgvbsH2jQmZO/XdX6joQgM3GapxzNauCMNoTAURa2Xh8c6K2c8cZNGbGfX8oCWMEmIiVVTmo9kUzK8xamlmGXsbR4/ZbM98LkQqImnaduAKwMCyOVuZ7jITJ24fljpi5ZWzgBDnfeMx2IOtPI= Received: from AS4P250CA0012.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:5df::18) by AS2PR08MB9392.eurprd08.prod.outlook.com (2603:10a6:20b:594::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.22; Tue, 17 Dec 2024 10:35:37 +0000 Received: from AM4PEPF00027A63.eurprd04.prod.outlook.com (2603:10a6:20b:5df:cafe::76) by AS4P250CA0012.outlook.office365.com (2603:10a6:20b:5df::18) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.21 via Frontend Transport; Tue, 17 Dec 2024 10:35:37 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=fail action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; Received: from nebula.arm.com (172.205.89.229) by AM4PEPF00027A63.mail.protection.outlook.com (10.167.16.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Tue, 17 Dec 2024 10:35:37 +0000 Received: from AZ-NEU-EXJ01.Arm.com (10.240.25.132) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 17 Dec 2024 10:35:30 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EXJ01.Arm.com (10.240.25.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 17 Dec 2024 10:35:30 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.240.25.134) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Tue, 17 Dec 2024 10:35:29 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 3/4] AArch64: Add vector cospi routines Date: Tue, 17 Dec 2024 10:35:25 +0000 Message-ID: <20241217103527.1992781-3-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241217103527.1992781-1-Joe.Ramsay@arm.com> References: <20241217103527.1992781-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM4PEPF00027A63:EE_|AS2PR08MB9392:EE_|DB5PEPF00014B90:EE_|AM8PR08MB5825:EE_ X-MS-Office365-Filtering-Correlation-Id: 4db6b466-053a-4425-c45d-08dd1e8690f5 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|1800799024|36860700013|376014|82310400026; X-Microsoft-Antispam-Message-Info-Original: aCkWL34LZWKdeaQrb6RNZkrrLuJoSWyVQFkQ8kKjwNb7Et6LoOF0TL+uoBmQyGxi6io2HHohKTJmrEL/RgfH3jTIoj40XcZ9HWOx+WAcqTnmTHVRSGTEu0kDcWOZ+jmnoBC9Ijh01KN5RfSxyBxADwoJRMy70ZWMxV8UfrCRe/B+oRS3KYzVCUeWnYl0mhz4aIdLMYeplpHR+g9BEDnOr74cBx43ctVpNTQwQinfsw3tX1mUPhyfVgRfCDCEO+YXZIBv+KKwrsbBIzZuCuvp0o1wg2ga9DstztjQ5hJwrTp6uxjqijzKGdRf+9FnPvUoqBEl89gZyI/yJ5vWnyEbfjpfjxgVMMBN0dE0qbvXyZU6HnYWNjG1ifGJqU5v+4be5IhevLlVrLTG7CPG9y5B+WwP1buUivWNyR0SUWutonc4zg5rjCqH6CLq6sI5GBh5UP9FnjfGPlVkgk+Yc3371JLCxt3ZQafg81fUVXBxyYXi+xNCcr5dH4OQbbInEY7Yt3xhCMulsaq87SA8MAQOfHviFjFBJAiLD+egdWVEgdtTvGwkwJs9+fV4Yxq+AJkBd8rKC+1ITg8QI7ze2KEfWszOTKKQP38kVXZNA0Irp8jDuv65KsEys7N0vRAfd2qEIMug5dGeY3bvPYVuBFgifwe2pemEh9ggu/+cWOxXMJxONqtiRXoT2ctLrITlqtKkFoXAkBVZI2dUJA/ln9D8VPNYAHWD6yXPmUXvXX3UYPxBcF4BcepatiqnjlT7ddR7JWb4fUtHMmYO+32N5niHGAMRyO9lTmL82h6FExEA++fyM6FWWPW93gzx0GGie0TfFM6yTlZxvYHB+BUnC/urjO5cXQrPjj2cdHalJT9Xpij6Ko1mjNGo1rjs8ZAr1etZn+KutCvQNjbAey97kiP7wByg+Ut5c81RKZpmA+isUxHntMMv+WDskNiTpqIRe/7OP5jmi1BkOO+EfGcbkfZxdp4JecBBVl8L6+dpTdz86bEcXsndehMPOVyvMfd7MaFakj/TaxVrrcdfshAvAWU/ExAIV/EL21bbY6AE982jLVP+eccCov17d5Sd/iK2Fu0ODrdQxYsDcAffV+Y2bMpEDPE2gU6Ri7t+01RoSjaSFJhk0SOEznDkmYCZvBi7TnT5CPcl19Rii37qAKjKg5/L4668/ciJd/QX8c7TjWFltGbYRG4vCiD04k1GweJarrHn3gc654N0DVIOhrIQsAhHnqTXOqfk9EBNgDApF50n2ooIjmBPJZfRPwn7Teep9VQOZ6UOiPeX74y1/5CTKkxYpu0zi6M3hnFhzb1/UgqTRJ9MjQ3zdbPELMiU2bwDqYXHhD5aASs2YFhWEYMlHN8NTG9VRVaejLoOo5z6n/i9OCSUYoHGn2x9T/8oVuMduWqiLzpFmFlZPbRnGGDMItOZ/P/vXdbHnzhtwb7krPk2FBVeNXkEQ5fZwO0qFZ0K9EUjJIOiOrG48vReU/Ul4/xjcB2QokjjMc1SuSrOZ9ofXvk= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9392 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:5df::18]; domain=AS4P250CA0012.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-Transport-CrossTenantHeadersStripped: DB5PEPF00014B90.eurprd02.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 61f1d43c-03f7-4b13-17a8-08dd1e868b84 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|14060799003|1800799024|376014|36860700013; X-Microsoft-Antispam-Message-Info: piqfQUzunVa1D6xmFJymEsXsxRBCY+H2IW5reYzU5Es6/V1/fzzwFJT8fAd9alEPUCWUHE5erow3u7kVcz6O6UqPQemLjSQpMb3oethgna4aNu6EDREDIOBt5kCUDmP01HzOWJysOy7S99AlWai6DLm1N4dd/T6HeC9J6C2teCQmn9U8njSsKeoMn+Xr3O5UhIMckrvqQ1kGQU/GNJBYi545lYZMElRmIHkubAZD37goizaP2Kqs+xcmPTaO4UurZ7BqqEkRX4EfDSWqNLtGLMr7HJPiSqbW7VGGe8FeaPfmrIa5q9QDXo/hhmfXieW5zjb1CEiMaRBmjFzwgS0Dn6Q3S8irRJ8vkxI7TzDAG5gK0vdJFVa+YjhRe+KCgL5xXalCC+zmutorSUxt9oKEUWKGOgFzFsg20TIWLxpPkE4ixDtHyf4UQh73SvpVBU1m6VnFJWJrP9ShMjU5d82Cmv4myuk6AQexc696e5Homeh+3zRbhIOqpvxlXmygvwgSX/U5y6lnJ4gNuqFb11adWJhGjCyZDrOZHrgzwLBy9DVnr9w2Vk92CdHni0b5IpI3MN1I+VH6Rmezu1hIR3TxnQj3e78bE9VjATOBYRjfjiZ95cEvNLZiD8B9hAxUruc/8bh6s4QOmBuE/JmUwuzFCsOHc2APk5xzBBohFQWCcmSArhQE82QkcAUSmGbG9X3WGAaKVdls8Z6ubG008vwGXYNG9FwUjjeW1EbeLaTSWlH/btpGI7iIBrwJthbGvvZfAxqiZJcNKJ3LzItZrigxR3AAfTRAJ9cm1E0azPDARoQ57JhYf4CPTAtZ/jj7dgGpcpQmpyTtoTETdpy/xfKxhxKXSqDdyxLA5uKyOE9fh/eCSTNCt8GMRuDcegliHVO0ykN3U1n1oWCye3WD8RKf7j8qwjzNYnKJ77Ysl1BZFLkDJ8bVknaBp7etE0M/3bZubnCPqRVCnLfB5p7KbF6sUXQA0Ng8xZr2VunnXZQoL/FaUsx54Rk4gaF6cFc71eDOARA3Y7+Q+cuFmJVDfOTquc7zKTDoTkB7mPjwzyaspR69/5o/VNh06rMEJepJMeoegp3NebQ/1u9v4M18psUUK+g3RMVjbgluGv7uqsimsqWew7cXgXpb+emz7+cO2KPN1b6OMUAdjiMIBuQoiQZjE88SD+eQ57Wp91K5tFA6rjAUk4xgo5wh+hBQhXnPDCR/t1bRfXyp7ui8LVnmePR5SBAAOFsbhE+MVzIiRHmSZ1OXDHlAbYz/CrXFfYa9Lb3OpjED/rnJ0CtfReiaibiuMRIJYDOEmblDmDZwopLaF1AhaDSNDytSKVna4kD8Q1gqUvyS3nphQLSZy3N9N39B3MEkPBSKDF3BM/JIkjOAToR/u0r0GeYLM8YVgz7TLJK7P6HfQSrXpLtS8d/3r8RFq1l2gTjHQCFnFCy9xEOYvJgzziTZtbQXZpOR07G8k+zqgMzujLMyMz+OfbxIruxAD3iAuWSF72wiOlmjOYkw1QNwvK/SsaipKEi3D4oyT0sl X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:64aa7808-outbound-1.mta.getcheckrecipient.com; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(14060799003)(1800799024)(376014)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2024 10:35:46.2486 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4db6b466-053a-4425-c45d-08dd1e8690f5 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DB5PEPF00014B90.eurprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB5825 X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Vector variant of the new C23 cospi. New tests pass on AArch64. --- OK for master? If so please commit for me as I don't have commit rights. Thanks, Joe bits/libm-simd-decl-stubs.h | 11 +++ math/bits/mathcalls.h | 2 +- sysdeps/aarch64/fpu/Makefile | 1 + sysdeps/aarch64/fpu/Versions | 5 ++ sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 ++ sysdeps/aarch64/fpu/cospi_advsimd.c | 87 +++++++++++++++++++ sysdeps/aarch64/fpu/cospi_sve.c | 65 ++++++++++++++ sysdeps/aarch64/fpu/cospif_advsimd.c | 87 +++++++++++++++++++ sysdeps/aarch64/fpu/cospif_sve.c | 61 +++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 5 ++ 16 files changed, 344 insertions(+), 1 deletion(-) create mode 100644 sysdeps/aarch64/fpu/cospi_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cospi_sve.c create mode 100644 sysdeps/aarch64/fpu/cospif_advsimd.c create mode 100644 sysdeps/aarch64/fpu/cospif_sve.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index 805a04473e..7f2857a13d 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -351,4 +351,15 @@ #define __DECL_SIMD_sinpif32x #define __DECL_SIMD_sinpif64x #define __DECL_SIMD_sinpif128x + +#define __DECL_SIMD_cospi +#define __DECL_SIMD_cospif +#define __DECL_SIMD_cospil +#define __DECL_SIMD_cospif16 +#define __DECL_SIMD_cospif32 +#define __DECL_SIMD_cospif64 +#define __DECL_SIMD_cospif128 +#define __DECL_SIMD_cospif32x +#define __DECL_SIMD_cospif64x +#define __DECL_SIMD_cospif128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 240cecf003..67f8d0b853 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -76,7 +76,7 @@ __MATHCALL (atanpi,, (_Mdouble_ __x)); __MATHCALL (atan2pi,, (_Mdouble_ __y, _Mdouble_ __x)); /* Cosine of pi * X. */ -__MATHCALL (cospi,, (_Mdouble_ __x)); +__MATHCALL_VEC (cospi,, (_Mdouble_ __x)); /* Sine of pi * X. */ __MATHCALL_VEC (sinpi,, (_Mdouble_ __x)); /* Tangent of pi * X. */ diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 915da37a06..6d1e55c4e6 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -8,6 +8,7 @@ libmvec-supported-funcs = acos \ cbrt \ cos \ cosh \ + cospi \ erf \ erfc \ exp \ diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index 4cbb906022..f8581cf881 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -136,6 +136,11 @@ libmvec { _ZGVsMxv_tanhf; } GLIBC_2.41 { + _ZGVnN2v_cospi; + _ZGVnN2v_cospif; + _ZGVnN4v_cospif; + _ZGVsMxv_cospi; + _ZGVsMxv_cospif; _ZGVnN2v_logp1; _ZGVnN2v_logp1f; _ZGVnN4v_logp1f; diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index 103983f671..eca8dfd616 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -26,6 +26,7 @@ libmvec_hidden_proto (V_NAME_F1(atanh)); libmvec_hidden_proto (V_NAME_F1(cbrt)); libmvec_hidden_proto (V_NAME_F1(cos)); libmvec_hidden_proto (V_NAME_F1(cosh)); +libmvec_hidden_proto (V_NAME_F1(cospi)); libmvec_hidden_proto (V_NAME_F1(erf)); libmvec_hidden_proto (V_NAME_F1(erfc)); libmvec_hidden_proto (V_NAME_F1(exp10)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index b9092a4ad1..530ad246ea 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -69,6 +69,10 @@ # define __DECL_SIMD_cosh __DECL_SIMD_aarch64 # undef __DECL_SIMD_coshf # define __DECL_SIMD_coshf __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cospi +# define __DECL_SIMD_cospi __DECL_SIMD_aarch64 +# undef __DECL_SIMD_cospif +# define __DECL_SIMD_cospif __DECL_SIMD_aarch64 # undef __DECL_SIMD_erf # define __DECL_SIMD_erf __DECL_SIMD_aarch64 # undef __DECL_SIMD_erff @@ -177,6 +181,7 @@ __vpcs __f32x4_t _ZGVnN4v_atanhf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_cbrtf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_cosf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_coshf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_cospif (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_erff (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_erfcf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_expf (__f32x4_t); @@ -206,6 +211,7 @@ __vpcs __f64x2_t _ZGVnN2v_atanh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cbrt (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cos (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_cosh (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_cospi (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_erf (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_erfc (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_exp (__f64x2_t); @@ -240,6 +246,7 @@ __sv_f32_t _ZGVsMxv_atanhf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_cbrtf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_coshf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_cospif (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_erff (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_erfcf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_expf (__sv_f32_t, __sv_bool_t); @@ -269,6 +276,7 @@ __sv_f64_t _ZGVsMxv_atanh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cbrt (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cos (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_cosh (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_cospi (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_erf (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_erfc (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_exp (__sv_f64_t, __sv_bool_t); diff --git a/sysdeps/aarch64/fpu/cospi_advsimd.c b/sysdeps/aarch64/fpu/cospi_advsimd.c new file mode 100644 index 0000000000..dcd12c8f89 --- /dev/null +++ b/sysdeps/aarch64/fpu/cospi_advsimd.c @@ -0,0 +1,87 @@ +/* Double-precision (Advanced SIMD) cospi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f64.h" + +static const struct data +{ + float64x2_t poly[10]; + float64x2_t range_val; +} data = { + /* Polynomial coefficients generated using Remez algorithm, + see sinpi.sollya for details. */ + .poly = { V2 (0x1.921fb54442d184p1), V2 (-0x1.4abbce625be53p2), + V2 (0x1.466bc6775ab16p1), V2 (-0x1.32d2cce62dc33p-1), + V2 (0x1.507834891188ep-4), V2 (-0x1.e30750a28c88ep-8), + V2 (0x1.e8f48308acda4p-12), V2 (-0x1.6fc0032b3c29fp-16), + V2 (0x1.af86ae521260bp-21), V2 (-0x1.012a9870eeb7dp-25) }, + .range_val = V2 (0x1p63), +}; + +static float64x2_t VPCS_ATTR NOINLINE +special_case (float64x2_t x, float64x2_t y, uint64x2_t odd, uint64x2_t cmp) +{ + /* Fall back to scalar code. */ + y = vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); + return v_call_f64 (cospi, x, y, cmp); +} + +/* Approximation for vector double-precision cospi(x). + Maximum Error 3.06 ULP: + _ZGVnN2v_cospi(0x1.7dd4c0b03cc66p-5) got 0x1.fa854babfb6bep-1 + want 0x1.fa854babfb6c1p-1. */ +float64x2_t VPCS_ATTR V_NAME_D1 (cospi) (float64x2_t x) +{ + const struct data *d = ptr_barrier (&data); + +#if WANT_SIMD_EXCEPT + float64x2_t r = vabsq_f64 (x); + uint64x2_t cmp = vcaleq_f64 (v_f64 (0x1p64), x); + + /* When WANT_SIMD_EXCEPT = 1, special lanes should be zero'd + to avoid them overflowing and throwing exceptions. */ + r = v_zerofy_f64 (r, cmp); + uint64x2_t odd = vshlq_n_u64 (vcvtnq_u64_f64 (r), 63); + +#else + float64x2_t r = x; + uint64x2_t cmp = vcageq_f64 (r, d->range_val); + uint64x2_t odd + = vshlq_n_u64 (vreinterpretq_u64_s64 (vcvtaq_s64_f64 (r)), 63); + +#endif + + r = vsubq_f64 (r, vrndaq_f64 (r)); + + /* cospi(x) = sinpi(0.5 - abs(x)) for values -1/2 .. 1/2. */ + r = vsubq_f64 (v_f64 (0.5), vabsq_f64 (r)); + + /* y = sin(r). */ + float64x2_t r2 = vmulq_f64 (r, r); + float64x2_t r4 = vmulq_f64 (r2, r2); + float64x2_t y = vmulq_f64 (v_pw_horner_9_f64 (r2, r4, d->poly), r); + + /* Fallback to scalar. */ + if (__glibc_unlikely (v_any_u64 (cmp))) + return special_case (x, y, odd, cmp); + + /* Reintroduce the sign bit for inputs which round to odd. */ + return vreinterpretq_f64_u64 (veorq_u64 (vreinterpretq_u64_f64 (y), odd)); +} diff --git a/sysdeps/aarch64/fpu/cospi_sve.c b/sysdeps/aarch64/fpu/cospi_sve.c new file mode 100644 index 0000000000..dd98815907 --- /dev/null +++ b/sysdeps/aarch64/fpu/cospi_sve.c @@ -0,0 +1,65 @@ +/* Double-precision (SVE) cospi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f64.h" + +static const struct data +{ + double poly[10]; + double range_val; +} data = { + /* Polynomial coefficients generated using Remez algorithm, + see sinpi.sollya for details. */ + .poly = { 0x1.921fb54442d184p1, -0x1.4abbce625be53p2, 0x1.466bc6775ab16p1, + -0x1.32d2cce62dc33p-1, 0x1.507834891188ep-4, -0x1.e30750a28c88ep-8, + 0x1.e8f48308acda4p-12, -0x1.6fc0032b3c29fp-16, + 0x1.af86ae521260bp-21, -0x1.012a9870eeb7dp-25 }, + .range_val = 0x1p53, +}; + +/* A fast SVE implementation of cospi. + Maximum error 3.20 ULP: + _ZGVsMxv_cospi(0x1.f18ba32c63159p-6) got 0x1.fdabf595f9763p-1 + want 0x1.fdabf595f9766p-1. */ +svfloat64_t SV_NAME_D1 (cospi) (svfloat64_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + /* Using cospi(x) = sinpi(0.5 - x) + range reduction and offset into sinpi range -1/2 .. 1/2 + r = 0.5 - |x - rint(x)|. */ + svfloat64_t n = svrinta_x (pg, x); + svfloat64_t r = svsub_x (pg, x, n); + r = svsub_x (pg, sv_f64 (0.5), svabs_x (pg, r)); + + /* Result should be negated based on if n is odd or not. + If ax >= 2^53, the result will always be positive. */ + svbool_t cmp = svaclt (pg, x, d->range_val); + svuint64_t intn = svreinterpret_u64 (svcvt_s64_z (pg, n)); + svuint64_t sign = svlsl_z (cmp, intn, 63); + + /* y = sin(r). */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t r4 = svmul_x (pg, r2, r2); + svfloat64_t y = sv_pw_horner_9_f64_x (pg, r2, r4, d->poly); + y = svmul_x (pg, y, r); + + return svreinterpret_f64 (sveor_x (pg, svreinterpret_u64 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/cospif_advsimd.c b/sysdeps/aarch64/fpu/cospif_advsimd.c new file mode 100644 index 0000000000..a81471f408 --- /dev/null +++ b/sysdeps/aarch64/fpu/cospif_advsimd.c @@ -0,0 +1,87 @@ +/* Single-precision (Advanced SIMD) cospi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" +#include "poly_advsimd_f32.h" + +static const struct data +{ + float32x4_t poly[6]; + float32x4_t range_val; +} data = { + /* Taylor series coefficents for sin(pi * x). */ + .poly = { V4 (0x1.921fb6p1f), V4 (-0x1.4abbcep2f), V4 (0x1.466bc6p1f), + V4 (-0x1.32d2ccp-1f), V4 (0x1.50783p-4f), V4 (-0x1.e30750p-8f) }, + .range_val = V4 (0x1p31f), +}; + +static float32x4_t VPCS_ATTR NOINLINE +special_case (float32x4_t x, float32x4_t y, uint32x4_t odd, uint32x4_t cmp) +{ + y = vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); + return v_call_f32 (cospif, x, y, cmp); +} + +/* Approximation for vector single-precision cospi(x) + Maximum Error: 3.17 ULP: + _ZGVnN4v_cospif(0x1.d341a8p-5) got 0x1.f7cd56p-1 + want 0x1.f7cd5p-1. */ +float32x4_t VPCS_ATTR V_NAME_F1 (cospi) (float32x4_t x) +{ + const struct data *d = ptr_barrier (&data); + +#if WANT_SIMD_EXCEPT + float32x4_t r = vabsq_f32 (x); + uint32x4_t cmp = vcaleq_f32 (v_f32 (0x1p32f), x); + + /* When WANT_SIMD_EXCEPT = 1, special lanes should be zero'd + to avoid them overflowing and throwing exceptions. */ + r = v_zerofy_f32 (r, cmp); + uint32x4_t odd = vshlq_n_u32 (vcvtnq_u32_f32 (r), 31); + +#else + float32x4_t r = x; + uint32x4_t cmp = vcageq_f32 (r, d->range_val); + + uint32x4_t odd + = vshlq_n_u32 (vreinterpretq_u32_s32 (vcvtaq_s32_f32 (r)), 31); + +#endif + + /* r = x - rint(x). */ + r = vsubq_f32 (r, vrndaq_f32 (r)); + + /* cospi(x) = sinpi(0.5 - abs(x)) for values -1/2 .. 1/2. */ + r = vsubq_f32 (v_f32 (0.5f), vabsq_f32 (r)); + + /* Pairwise Horner approximation for y = sin(r * pi). */ + float32x4_t r2 = vmulq_f32 (r, r); + float32x4_t r4 = vmulq_f32 (r2, r2); + float32x4_t y = vmulq_f32 (v_pw_horner_5_f32 (r2, r4, d->poly), r); + + /* Fallback to scalar. */ + if (__glibc_unlikely (v_any_u32 (cmp))) + return special_case (x, y, odd, cmp); + + /* Reintroduce the sign bit for inputs which round to odd. */ + return vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), odd)); +} + +libmvec_hidden_def (V_NAME_F1 (cospi)) +HALF_WIDTH_ALIAS_F1 (cospi) diff --git a/sysdeps/aarch64/fpu/cospif_sve.c b/sysdeps/aarch64/fpu/cospif_sve.c new file mode 100644 index 0000000000..e8980dac1c --- /dev/null +++ b/sysdeps/aarch64/fpu/cospif_sve.c @@ -0,0 +1,61 @@ +/* Single-precision (SVE) cospi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" +#include "poly_sve_f32.h" + +static const struct data +{ + float poly[6]; + float range_val; +} data = { + /* Taylor series coefficents for sin(pi * x). */ + .poly = { 0x1.921fb6p1f, -0x1.4abbcep2f, 0x1.466bc6p1f, -0x1.32d2ccp-1f, + 0x1.50783p-4f, -0x1.e30750p-8f }, + .range_val = 0x1p31f, +}; + +/* A fast SVE implementation of cospif. + Maximum error: 2.60 ULP: + _ZGVsMxv_cospif(+/-0x1.cae664p-4) got 0x1.e09c9ep-1 + want 0x1.e09c98p-1. */ +svfloat32_t SV_NAME_F1 (cospi) (svfloat32_t x, const svbool_t pg) +{ + const struct data *d = ptr_barrier (&data); + + /* Using cospi(x) = sinpi(0.5 - x) + range reduction and offset into sinpi range -1/2 .. 1/2 + r = 0.5 - |x - rint(x)|. */ + svfloat32_t n = svrinta_x (pg, x); + svfloat32_t r = svsub_x (pg, x, n); + r = svsub_x (pg, sv_f32 (0.5f), svabs_x (pg, r)); + + /* Result should be negated based on if n is odd or not. + If ax >= 2^31, the result will always be positive. */ + svbool_t cmp = svaclt (pg, x, d->range_val); + svuint32_t intn = svreinterpret_u32 (svcvt_s32_x (pg, n)); + svuint32_t sign = svlsl_z (cmp, intn, 31); + + /* y = sin(r). */ + svfloat32_t r2 = svmul_x (pg, r, r); + svfloat32_t y = sv_horner_5_f32_x (pg, r2, d->poly); + y = svmul_x (pg, y, r); + + return svreinterpret_f32 (sveor_x (pg, svreinterpret_u32 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index 9b72293bed..f4babdda95 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -33,6 +33,7 @@ VPCS_VECTOR_WRAPPER_ff (atan2_advsimd, _ZGVnN2vv_atan2) VPCS_VECTOR_WRAPPER (cbrt_advsimd, _ZGVnN2v_cbrt) VPCS_VECTOR_WRAPPER (cos_advsimd, _ZGVnN2v_cos) VPCS_VECTOR_WRAPPER (cosh_advsimd, _ZGVnN2v_cosh) +VPCS_VECTOR_WRAPPER (cospi_advsimd, _ZGVnN2v_cospi) VPCS_VECTOR_WRAPPER (erf_advsimd, _ZGVnN2v_erf) VPCS_VECTOR_WRAPPER (erfc_advsimd, _ZGVnN2v_erfc) VPCS_VECTOR_WRAPPER (exp_advsimd, _ZGVnN2v_exp) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index bb0886580a..4627ea91b1 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -52,6 +52,7 @@ SVE_VECTOR_WRAPPER_ff (atan2_sve, _ZGVsMxvv_atan2) SVE_VECTOR_WRAPPER (cbrt_sve, _ZGVsMxv_cbrt) SVE_VECTOR_WRAPPER (cos_sve, _ZGVsMxv_cos) SVE_VECTOR_WRAPPER (cosh_sve, _ZGVsMxv_cosh) +SVE_VECTOR_WRAPPER (cospi_sve, _ZGVsMxv_cospi) SVE_VECTOR_WRAPPER (erf_sve, _ZGVsMxv_erf) SVE_VECTOR_WRAPPER (erfc_sve, _ZGVsMxv_erfc) SVE_VECTOR_WRAPPER (exp_sve, _ZGVsMxv_exp) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index 4beb5ba9e5..882109d986 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -33,6 +33,7 @@ VPCS_VECTOR_WRAPPER_ff (atan2f_advsimd, _ZGVnN4vv_atan2f) VPCS_VECTOR_WRAPPER (cbrtf_advsimd, _ZGVnN4v_cbrtf) VPCS_VECTOR_WRAPPER (cosf_advsimd, _ZGVnN4v_cosf) VPCS_VECTOR_WRAPPER (coshf_advsimd, _ZGVnN4v_coshf) +VPCS_VECTOR_WRAPPER (cospif_advsimd, _ZGVnN4v_cospif) VPCS_VECTOR_WRAPPER (erff_advsimd, _ZGVnN4v_erff) VPCS_VECTOR_WRAPPER (erfcf_advsimd, _ZGVnN4v_erfcf) VPCS_VECTOR_WRAPPER (expf_advsimd, _ZGVnN4v_expf) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 8ac48902d2..8b4e17e09a 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -52,6 +52,7 @@ SVE_VECTOR_WRAPPER_ff (atan2f_sve, _ZGVsMxvv_atan2f) SVE_VECTOR_WRAPPER (cbrtf_sve, _ZGVsMxv_cbrtf) SVE_VECTOR_WRAPPER (cosf_sve, _ZGVsMxv_cosf) SVE_VECTOR_WRAPPER (coshf_sve, _ZGVsMxv_coshf) +SVE_VECTOR_WRAPPER (cospif_sve, _ZGVsMxv_cospif) SVE_VECTOR_WRAPPER (erff_sve, _ZGVsMxv_erff) SVE_VECTOR_WRAPPER (erfcf_sve, _ZGVsMxv_erfcf) SVE_VECTOR_WRAPPER (expf_sve, _ZGVsMxv_expf) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 6a409f4b88..4534c4de45 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -837,11 +837,19 @@ double: 1 float: 1 ldouble: 1 +Function: "cospi_advsimd": +double: 2 +float: 1 + Function: "cospi_downward": double: 1 float: 1 ldouble: 2 +Function: "cospi_sve": +double: 2 +float: 1 + Function: "cospi_towardzero": double: 1 float: 1 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index dd69f818c1..c081f5fb28 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -128,12 +128,17 @@ GLIBC_2.40 _ZGVsMxvv_hypot F GLIBC_2.40 _ZGVsMxvv_hypotf F GLIBC_2.40 _ZGVsMxvv_pow F GLIBC_2.40 _ZGVsMxvv_powf F +GLIBC_2.41 _ZGVnN2v_cospi F +GLIBC_2.41 _ZGVnN2v_cospif F GLIBC_2.41 _ZGVnN2v_logp1 F GLIBC_2.41 _ZGVnN2v_logp1f F GLIBC_2.41 _ZGVnN2v_sinpi F GLIBC_2.41 _ZGVnN2v_sinpif F +GLIBC_2.41 _ZGVnN4v_cospif F GLIBC_2.41 _ZGVnN4v_logp1f F GLIBC_2.41 _ZGVnN4v_sinpif F +GLIBC_2.41 _ZGVsMxv_cospi F +GLIBC_2.41 _ZGVsMxv_cospif F GLIBC_2.41 _ZGVsMxv_logp1 F GLIBC_2.41 _ZGVsMxv_logp1f F GLIBC_2.41 _ZGVsMxv_sinpi F