From patchwork Tue Dec 17 10:35:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Ramsay X-Patchwork-Id: 103252 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 67F0F3858402 for ; Tue, 17 Dec 2024 10:38:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 67F0F3858402 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=WXrbP+ZG; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=WXrbP+ZG X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2060f.outbound.protection.outlook.com [IPv6:2a01:111:f403:2614::60f]) by sourceware.org (Postfix) with ESMTPS id E07D23858D26 for ; Tue, 17 Dec 2024 10:35:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E07D23858D26 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E07D23858D26 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=2a01:111:f403:2614::60f ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1734431752; cv=pass; b=m4fEOfNzEl48JRovKZWNGnQhY/jQLHpvZM/2l80WjyucdBfzeKD6FQmnE9qOEPh5WAGsHdCQ6tjR/OPlcmitS6CmMdodyvg6S24EmJsFppvMvL/5myacchl4E9DlSeg4szGDGfCGUBmtiWCcBILYpemb4ByJqbuvjnkgmYJucaI= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1734431752; c=relaxed/simple; bh=XquT9ufYo+dUCSAFCMbZ5ccUasDdjvqUANiXB8cZAbs=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=lFe7YFzJEuUB5PWc35yBccrNKq01slvPa8C/Y24mH2oapTylqjjh6eEgAuDry47pCYFLDg1xA8ZodSMZ/aKyS4A3iJDqGc05mRtc482sn0F6j93NWYpHYVLZjPKSB5zgCYOoTpExc+NzQ2QsxatHncX0j2rAZf/VYP9MikLazIw= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E07D23858D26 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=r2/NPhygLobSnRBYq/O7CQJScaYwPylSDAulPTtZ+qqwztaqpsOOOhOSaeRP9JtJ38ttulojrIg6Pf+yqrhKPbxtPTbGf0uB80kLaqCQ3bE/Y3I3blKP1L7UXP5Te3qjCcOXKIdTCxodJgnKi1ouPReLN4KqE7ntOg9SAUeNufSQils7jdOUn7JUwNImdEiPinA0bBGEcPOsgtOxCxkMOzFmD6tjBmcdmmJPWYCTuxUzPMA0RF5ah4v4NGxAlnS1G84PpRer8XnlmGb0txGYxD0qe0hSuRkir9YUBVJ07OYvFAFVHzsn4JbbMA+uuFOWYRUrChMWMRBGvOmCgz3myA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TLIgERR+EISk3/+HmlAh/cTAsdxb8gw+9uG1KOuO6dg=; b=BZuUm0XeahcvZ1FfaypvD5L6A/gZAG7ILtfjqTD273VoWh+v6EFFN+RwZ7q2S2Dvb514zBl1i0nNTVtQLWSwsKn7yIXhquIxSTTHzlbyyHbC0bBojV/EeJj10iv/sCe8q5KIkiZ3OCq8prAP5QHIcdykhYTWAn5C5w/1z7EgkXvLHl7AgPHqWwUGY/jmasgqUUoa9okc5f11zUNU426+pNPaRt70Av+aWEdkz4WlBpcCVkNG5nU0UPR4ckm1yjE+4TGW5JfLso09N4sC9JPTNwU3Y8DHX6/R2hZx4690AuVT/hH/4bt+tFOsE4ar0PDcpYO9AyUpqJz7BYccT7n5Nw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=0 ltdi=1) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TLIgERR+EISk3/+HmlAh/cTAsdxb8gw+9uG1KOuO6dg=; b=WXrbP+ZGpmwM1SCBaktGwoRUmxOG1N+sAFO0wCwE++ufhEOoAUgUfwEFAAmRQ6blKRj9bydEOCalSXJk8/AZnNZMphlanTM0UGlT8P5oXXpJb8fZPqlrDKfUOhnsDaH2Wu6o1Du88LrB6VRe6BbfWo2moP3TIzIBwXWn5dfT6c8= Received: from AS9PR05CA0082.eurprd05.prod.outlook.com (2603:10a6:20b:499::30) by DB4PR08MB9334.eurprd08.prod.outlook.com (2603:10a6:10:3f5::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.21; Tue, 17 Dec 2024 10:35:46 +0000 Received: from AM3PEPF00009BA0.eurprd04.prod.outlook.com (2603:10a6:20b:499:cafe::6c) by AS9PR05CA0082.outlook.office365.com (2603:10a6:20b:499::30) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.22 via Frontend Transport; Tue, 17 Dec 2024 10:35:46 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM3PEPF00009BA0.mail.protection.outlook.com (10.167.16.25) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.15 via Frontend Transport; Tue, 17 Dec 2024 10:35:46 +0000 Received: ("Tessian outbound f5e646402644:v526"); Tue, 17 Dec 2024 10:35:46 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 4cab0a4cefa49ca8 X-TessianGatewayMetadata: Llljy+Ahk2ePfXHCKrXxzzkL+tnoCprTJHhHbRSrSEvdUgMw6unczG3ng4fZoy3ly1vtmL7L+tUKZ9R3HX0TTY1nrL+17w03y9l0E0ntxVO6n5NpfXS1vTliVwUuwmFluzPxnMBYP/rxrubcN6i2duAie+eayW7JgWcukEjjAW8= X-CR-MTA-TID: 64aa7808 Received: from L1f72c0ac0728.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 91336299-6F28-4D4F-9A52-805EE3517F36.1; Tue, 17 Dec 2024 10:35:40 +0000 Received: from EUR02-AM0-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id L1f72c0ac0728.1 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Tue, 17 Dec 2024 10:35:39 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=jP1A+dugIPR9IafBmjp1ZFYLFCz0gynt6AoxnWAwO2REF5EbnYieHtN2cgO6xuTTb1merMi9q9uyQGVeUl3aEYvrdqrbywJcLCDcQ8twoP8+xhlqHBbjaDwgJv43tKrRpstWj4diynsgV4GvjvslCRfYyTxuvRFdv3dBfQ2qHIt5kq3grYNQzx04SpGeojwdH8b80iHuST/El+7UzYvOmiUPueYw6BDesTKspH+doQ12OoelAL0FjETPDqM0Vb4j7bds8OZYBuf5ma0w1R1wRqP74e6DZxOQ4Z4mBVUyl8YD8CV9D7ju849qEgKHmN5KxzGk7JXXkOVMNLFYIXMjCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TLIgERR+EISk3/+HmlAh/cTAsdxb8gw+9uG1KOuO6dg=; b=E32z5rTs5jNjVNtN36K8+SAzDQgKqNrQJkhqmjIKRvZ8W/LS2r4zqqjg7wLXrnqzbO84DnTL0kdZbBYV4Xo8plyUkB8GCzOVg5jvi1RH0sRKxKhGs3qXxUmSI242faYDeL9XY+Y4b5lPyfNEigoAYUDnVlcofodlnM6wCI4P6+tTRvfF9ggYUjCS1YqBDsYwthupRZVdfxQ8+rg5KC+LQZscLAJTR3Gtib9J8iXuBwy3nhBlCAklkVzsPhl3QVanyCHeTlH6LaHm4lsGhd5wE4WBgwQixKVr79sj7ZhwRE+zDFnRxK+A6p7tF9AzkGGhpDHxfk16lgFpsLob2AcXqg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=fail (sender ip is 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com; dmarc=fail (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TLIgERR+EISk3/+HmlAh/cTAsdxb8gw+9uG1KOuO6dg=; b=WXrbP+ZGpmwM1SCBaktGwoRUmxOG1N+sAFO0wCwE++ufhEOoAUgUfwEFAAmRQ6blKRj9bydEOCalSXJk8/AZnNZMphlanTM0UGlT8P5oXXpJb8fZPqlrDKfUOhnsDaH2Wu6o1Du88LrB6VRe6BbfWo2moP3TIzIBwXWn5dfT6c8= Received: from AS4P250CA0001.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:5df::6) by AS8PR08MB6215.eurprd08.prod.outlook.com (2603:10a6:20b:291::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8251.21; Tue, 17 Dec 2024 10:35:37 +0000 Received: from AM4PEPF00027A63.eurprd04.prod.outlook.com (2603:10a6:20b:5df:cafe::9d) by AS4P250CA0001.outlook.office365.com (2603:10a6:20b:5df::6) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8251.22 via Frontend Transport; Tue, 17 Dec 2024 10:35:36 +0000 X-MS-Exchange-Authentication-Results: spf=fail (sender IP is 172.205.89.229) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=fail action=none header.from=arm.com; Received-SPF: Fail (protection.outlook.com: domain of arm.com does not designate 172.205.89.229 as permitted sender) receiver=protection.outlook.com; client-ip=172.205.89.229; helo=nebula.arm.com; Received: from nebula.arm.com (172.205.89.229) by AM4PEPF00027A63.mail.protection.outlook.com (10.167.16.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8251.15 via Frontend Transport; Tue, 17 Dec 2024 10:35:36 +0000 Received: from AZ-NEU-EX06.Arm.com (10.240.25.134) by AZ-NEU-EX06.Arm.com (10.240.25.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 17 Dec 2024 10:35:30 +0000 Received: from vcn-man-apps.manchester.arm.com (10.32.108.22) by mail.arm.com (10.240.25.134) with Microsoft SMTP Server id 15.1.2507.39 via Frontend Transport; Tue, 17 Dec 2024 10:35:30 +0000 From: Joe Ramsay To: CC: Joe Ramsay Subject: [PATCH 4/4] AArch64: Add vector tanpi routines Date: Tue, 17 Dec 2024 10:35:26 +0000 Message-ID: <20241217103527.1992781-4-Joe.Ramsay@arm.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20241217103527.1992781-1-Joe.Ramsay@arm.com> References: <20241217103527.1992781-1-Joe.Ramsay@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: AM4PEPF00027A63:EE_|AS8PR08MB6215:EE_|AM3PEPF00009BA0:EE_|DB4PR08MB9334:EE_ X-MS-Office365-Filtering-Correlation-Id: 6676f76c-0a9a-405c-3d83-08dd1e869128 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|82310400026|376014|1800799024|36860700013; X-Microsoft-Antispam-Message-Info-Original: ocbAjvW1+Z0XGfXZl4QAukdu7olPI2K0b6JhT578yTyRN4T7+hhre9QQCMkX8UH4pQJYn1xJzeDURVBc72p6V8zj+uxSNE9ttUTfvfgsvH6nYcxB6DySYjYPMJY+c1YdwtB8TV7If/nqMRzc2MXHUY4LYKhAqa8s/n023P70faslfQ8KC4ZDYjJoeT2sTzJuLbxKZLYuUboOSClyEt0F05Eglb/PoZRMholya0kd57d7LbXNZ4/YJoRPevsHfoI9qf3MHU3vPFBJyvFmTfC0A4pZTglkfcjvaiMKIvKFM0PISL9f03xSbMaQCmQExFv/Fa9FCZZ41BiXzwBjEcLP4t4BZjYkjPFJKPHorAloP5p5fDjocqtfvArNv6lSXdHI5OPg72zfp8F5NsUlsdjguUB9+LLgS5I08bfw7DdIxLjcBXMgZmGcEx3QcLWazB9q0v4CEm527rSLVlxAucJN/rDs+PajVBTHLZ62zMyKtmwjDImNryjLdbJ6IPw+gzO6Qg+0/O0erDos73abPzbDlt7VfPrkuAzyHTybDVC79AAByGmazrXMoJGV0zvHnZU52z4/tAElZbEmjuZri5yVRUXHTthdAMuX06g/WzuAsGIx7YKugUcOt4rlUqogc8RtX3k1qxe4QG1H73LIE6aRhy23EWzICYHOl2jcc8qCEDUTVx+kFceRK13Tvh2dmFbHwfHBri4Efc/+QjLroc/r6w/Fjp90j0V/L8+uMTIjoigQD165G2okdieilN3t7Nk40g589cd13BJpTcLAvSH1BY5dfMRY4wNHxc79WNUwHsJ+fYtTgDzv9WwXhF8IsL1nntwuNYOni/pKD/+9Rf+a6FskgoGQtjN4R5GvuxpSwkqQKpeeX9yRGbHYdoPLTKCgSA7O7Aialdi0Q36ACbuf5oKPn3H1F/IsdC1qfsZXT13acvkYI9RcgvAULDvH/+eL1uP/iYMRdxf5rxUwNwXO3gQvPQQg56WqWg8OGeTcEDJVdP3UANCWEsRpxNxEApbL+AtK6AQ952hKDG10eU/zAKuCguSm5pV4HNsAlDpmRnu19A1/EQihzcbiVhVsDnSHdl+C2XbtZL5WJqCuMH9KB/laWB/ci1YMK1sdWYcjDBQ/apbscJr94yFueDRl6nC5VBX8epFFKW6Csi82pvz0fIfxk1lutrS2aKcEQ7+ZF5LOr42mU16B8K32pHcGNAou7a9fVW77RWMr4PJOmDHUDmJJ7+G5eN41AM4+jlWGJPl0M3h2XCXOu9fdJ4K60B6Ts84sNCFGvKoDenA4crcjGdf98MNA0FYRXOx+U5x6AdLAQkoNYZB+m2e9b2tN1U/dzC7upBXiSwofi4YnYXyRugdo5KVuAPNac6i8/Tc0O8Y7X6Qo9tlxirKqKBzyPo/tHVpJPbBn2fXcRu19Dzbevzv5YqSSzoa8pCDHx+pTGiL4D7Ki36Q5mkp4KMOrQ8HeLWUzu1T7sRiJvm1OKR0PKS4v69pENJvrYYuVXA2e8OI= X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(82310400026)(376014)(1800799024)(36860700013); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6215 X-MS-Exchange-SkipListedInternetSender: ip=[2603:10a6:20b:5df::6]; domain=AS4P250CA0001.EURP250.PROD.OUTLOOK.COM X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM3PEPF00009BA0.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 762cf918-dc51-4677-18e3-08dd1e868b65 X-Microsoft-Antispam: BCL:0; ARA:13230040|35042699022|82310400026|14060799003|1800799024|376014|36860700013; X-Microsoft-Antispam-Message-Info: tkSEQhkEGni6/HizKhbGstTBynWVI/au3lgfc7aY613oKECEs9K/ddodXE6C4yar4F9bMCQvoc2zkBiRwd/avf4i8W13DpYulIZH/HB+WwnoeJLm5e0BpuSsrnOIhx9yY8aHL99LF2iTjb7EZf+1zBROjd8MXcTMC7GBdll4dsIw1X2evjgW+zlkw/0SVLk3n0UwsPQligFZEwqXmZ1B0ZkLuQnRjfz/NU2cTvppeGecJfFJAkHS4VVD5+TJpa72TUTeLiuCaH9Sgvm13cdj+WIb1/RSAPlGz64F6s9YwHssYWdQ5Xhc0YgBoXkskwV4c6Pbsk3Yh09pwtNXIuhGBLhfJT+4vZhDFHxgncsFvxv6rlmlnDNHzaFhomU1k7fDD/CVUAHPa8eIx0dKB01B6sZ57hQJGR6WvqHXU1wYoapyNXeJC+Y+WTR3xVn+5AMJiuw9Y5GaWeB22pjJhvkHMJn/r8e2OPEWIO7Oc+a+i4AbTHoC7REkaCWSC+HvQMJc0s613k4Ob8J3vcdlIREXpBj10CmxyZ8Xg6+ANjMLgMWWw9Eoy4ewiWp97Di5XwaxoKGSzwmin4IMzjoGU6uvHCNKDxofJ55+G3p/fSStD+WD0Yalswp4gY3QqAIHgJqGOLkeyXH1x30FzmMYcRwG2RHLaqlBDf1fg/RKkOVOAFJKGO/jzF24OdB0fu8VxMOJx+m7UtTBG5Ol/vFkDTDdrc9xQTPX544Ljt8qLMIpQdlAOTe2DrOMfV9nIO8abwSERGvj3b1nDtiiao/LuP1Fz2IE17cd39iZI6LmxzrQRvxpycW/C46EnXSTNG27fUFslgGh6Wn9SVtN6jmR0xtFrzA8LfyePjc3wxFDQd9jmNEPSzvzSHJwdtM+IBMpBm1KaexQGAoM+lwyxQTCcrxTW7eVGRq8wmp31ulrOl1OhLotbGvKaBcisPFL0q1VrTeaJ7sVgfNXdHzEtMGpYw5RSLIMzS37P6Ziw6CjY+z+wFGIlyIhqxUJjUFZCzX9oOwpIAmV5j5cJ8UY7o28fRWBMkfF/tGsAUxFyktTLfaKg6yvTUwwWc9pM5tPFpuBP8idIfA2HJ+KICu+4/VxmVKHK/N9+HgTnyXi0ElXnT31CFw6Rnkag48ewG/ejMUvNWhy4lSMyCvhk8IKnZYJB2Yfh0GIBRTWWFJL9gV1WNjh8Bmyb43qJoCcHhUREXQogp6/NMxC1Am9J3CNsspePsn0C3wZet4rXPvzko2qMtMLLtKQu8QOkrwqU96aeL5InflSiLgHZ8/eW3s+KaXJ9v4HTIOxByZKV+YJOHU/+dgiMc5yhs/goV3Ph3GojLGKriqe8DvF8S9GTvynKHm7Pbm4xI97I/REyab6fddtDl04krLKFeJr00PLcG/OY2JrUTGtFLqbDmAlz7B6R30hf7yy6/eQ17TF/eqnRkIjTc5KWH05L3NFWJ8+xO8nMGu2XLpG7hN/GYlx+MvXxM3Jh/gkM/DjXxObDcuKTZIDo7FIzj4= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:64aa7808-outbound-1.mta.getcheckrecipient.com; CAT:NONE; SFS:(13230040)(35042699022)(82310400026)(14060799003)(1800799024)(376014)(36860700013); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 Dec 2024 10:35:46.5481 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6676f76c-0a9a-405c-3d83-08dd1e869128 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF00009BA0.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB9334 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Vector variant of the new C23 tanpi. New tests pass on AArch64. --- OK for master? If so please commit for me as I don't have commit rights. Thanks, Joe bits/libm-simd-decl-stubs.h | 11 +++ math/bits/mathcalls.h | 2 +- sysdeps/aarch64/fpu/Makefile | 3 +- sysdeps/aarch64/fpu/Versions | 5 ++ sysdeps/aarch64/fpu/advsimd_f32_protos.h | 1 + sysdeps/aarch64/fpu/bits/math-vector.h | 8 ++ sysdeps/aarch64/fpu/tanpi_advsimd.c | 88 +++++++++++++++++++ sysdeps/aarch64/fpu/tanpi_sve.c | 88 +++++++++++++++++++ sysdeps/aarch64/fpu/tanpif_advsimd.c | 72 +++++++++++++++ sysdeps/aarch64/fpu/tanpif_sve.c | 68 ++++++++++++++ .../fpu/test-double-advsimd-wrappers.c | 1 + .../aarch64/fpu/test-double-sve-wrappers.c | 1 + .../aarch64/fpu/test-float-advsimd-wrappers.c | 1 + sysdeps/aarch64/fpu/test-float-sve-wrappers.c | 1 + sysdeps/aarch64/libm-test-ulps | 8 ++ .../unix/sysv/linux/aarch64/libmvec.abilist | 5 ++ 16 files changed, 361 insertions(+), 2 deletions(-) create mode 100644 sysdeps/aarch64/fpu/tanpi_advsimd.c create mode 100644 sysdeps/aarch64/fpu/tanpi_sve.c create mode 100644 sysdeps/aarch64/fpu/tanpif_advsimd.c create mode 100644 sysdeps/aarch64/fpu/tanpif_sve.c diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index 7f2857a13d..c926ed9de1 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -362,4 +362,15 @@ #define __DECL_SIMD_cospif32x #define __DECL_SIMD_cospif64x #define __DECL_SIMD_cospif128x + +#define __DECL_SIMD_tanpi +#define __DECL_SIMD_tanpif +#define __DECL_SIMD_tanpil +#define __DECL_SIMD_tanpif16 +#define __DECL_SIMD_tanpif32 +#define __DECL_SIMD_tanpif64 +#define __DECL_SIMD_tanpif128 +#define __DECL_SIMD_tanpif32x +#define __DECL_SIMD_tanpif64x +#define __DECL_SIMD_tanpif128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index 67f8d0b853..ccfa056814 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -80,7 +80,7 @@ __MATHCALL_VEC (cospi,, (_Mdouble_ __x)); /* Sine of pi * X. */ __MATHCALL_VEC (sinpi,, (_Mdouble_ __x)); /* Tangent of pi * X. */ -__MATHCALL (tanpi,, (_Mdouble_ __x)); +__MATHCALL_VEC (tanpi,, (_Mdouble_ __x)); #endif /* Hyperbolic functions. */ diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile index 6d1e55c4e6..aadedf1517 100644 --- a/sysdeps/aarch64/fpu/Makefile +++ b/sysdeps/aarch64/fpu/Makefile @@ -25,7 +25,8 @@ libmvec-supported-funcs = acos \ sinh \ sinpi \ tan \ - tanh + tanh \ + tanpi float-advsimd-funcs = $(libmvec-supported-funcs) double-advsimd-funcs = $(libmvec-supported-funcs) diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions index f8581cf881..0f9503f9d8 100644 --- a/sysdeps/aarch64/fpu/Versions +++ b/sysdeps/aarch64/fpu/Versions @@ -151,5 +151,10 @@ libmvec { _ZGVnN4v_sinpif; _ZGVsMxv_sinpi; _ZGVsMxv_sinpif; + _ZGVnN2v_tanpi; + _ZGVnN2v_tanpif; + _ZGVnN4v_tanpif; + _ZGVsMxv_tanpi; + _ZGVsMxv_tanpif; } } diff --git a/sysdeps/aarch64/fpu/advsimd_f32_protos.h b/sysdeps/aarch64/fpu/advsimd_f32_protos.h index eca8dfd616..2471b148ad 100644 --- a/sysdeps/aarch64/fpu/advsimd_f32_protos.h +++ b/sysdeps/aarch64/fpu/advsimd_f32_protos.h @@ -45,4 +45,5 @@ libmvec_hidden_proto (V_NAME_F1(sinh)); libmvec_hidden_proto (V_NAME_F1(sinpi)); libmvec_hidden_proto (V_NAME_F1(tan)); libmvec_hidden_proto (V_NAME_F1(tanh)); +libmvec_hidden_proto (V_NAME_F1(tanpi)); libmvec_hidden_proto (V_NAME_F2(atan2)); diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h index 530ad246ea..b242cef7ef 100644 --- a/sysdeps/aarch64/fpu/bits/math-vector.h +++ b/sysdeps/aarch64/fpu/bits/math-vector.h @@ -145,6 +145,10 @@ # define __DECL_SIMD_tanh __DECL_SIMD_aarch64 # undef __DECL_SIMD_tanhf # define __DECL_SIMD_tanhf __DECL_SIMD_aarch64 +# undef __DECL_SIMD_tanpi +# define __DECL_SIMD_tanpi __DECL_SIMD_aarch64 +# undef __DECL_SIMD_tanpif +# define __DECL_SIMD_tanpif __DECL_SIMD_aarch64 #endif #if __GNUC_PREREQ(9, 0) @@ -200,6 +204,7 @@ __vpcs __f32x4_t _ZGVnN4v_sinhf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_sinpif (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_tanf (__f32x4_t); __vpcs __f32x4_t _ZGVnN4v_tanhf (__f32x4_t); +__vpcs __f32x4_t _ZGVnN4v_tanpif (__f32x4_t); __vpcs __f64x2_t _ZGVnN2vv_atan2 (__f64x2_t, __f64x2_t); __vpcs __f64x2_t _ZGVnN2v_acos (__f64x2_t); @@ -230,6 +235,7 @@ __vpcs __f64x2_t _ZGVnN2v_sinh (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_sinpi (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_tan (__f64x2_t); __vpcs __f64x2_t _ZGVnN2v_tanh (__f64x2_t); +__vpcs __f64x2_t _ZGVnN2v_tanpi (__f64x2_t); # undef __ADVSIMD_VEC_MATH_SUPPORTED #endif /* __ADVSIMD_VEC_MATH_SUPPORTED */ @@ -265,6 +271,7 @@ __sv_f32_t _ZGVsMxv_sinhf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_sinpif (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_tanf (__sv_f32_t, __sv_bool_t); __sv_f32_t _ZGVsMxv_tanhf (__sv_f32_t, __sv_bool_t); +__sv_f32_t _ZGVsMxv_tanpif (__sv_f32_t, __sv_bool_t); __sv_f64_t _ZGVsMxvv_atan2 (__sv_f64_t, __sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_acos (__sv_f64_t, __sv_bool_t); @@ -295,6 +302,7 @@ __sv_f64_t _ZGVsMxv_sinh (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_sinpi (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_tan (__sv_f64_t, __sv_bool_t); __sv_f64_t _ZGVsMxv_tanh (__sv_f64_t, __sv_bool_t); +__sv_f64_t _ZGVsMxv_tanpi (__sv_f64_t, __sv_bool_t); # undef __SVE_VEC_MATH_SUPPORTED #endif /* __SVE_VEC_MATH_SUPPORTED */ diff --git a/sysdeps/aarch64/fpu/tanpi_advsimd.c b/sysdeps/aarch64/fpu/tanpi_advsimd.c new file mode 100644 index 0000000000..0a93beebca --- /dev/null +++ b/sysdeps/aarch64/fpu/tanpi_advsimd.c @@ -0,0 +1,88 @@ +/* Double-precision (Advanced SIMD) tanpi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +const static struct v_tanpi_data +{ + float64x2_t c0, c2, c4, c6, c8, c10, c12; + double c1, c3, c5, c7, c9, c11, c13, c14; +} tanpi_data = { + /* Coefficents for tan(pi * x) computed with fpminimax + on [ 0x1p-1022 0x1p-2 ] + approx rel error: 0x1.7eap-55 + approx abs error: 0x1.7eap-55. */ + .c0 = V2 (0x1.921fb54442d18p1), /* pi. */ + .c1 = 0x1.4abbce625be52p3, .c2 = V2 (0x1.466bc6775b0f9p5), + .c3 = 0x1.45fff9b426f5ep7, .c4 = V2 (0x1.45f4730dbca5cp9), + .c5 = 0x1.45f3265994f85p11, .c6 = V2 (0x1.45f4234b330cap13), + .c7 = 0x1.45dca11be79ebp15, .c8 = V2 (0x1.47283fc5eea69p17), + .c9 = 0x1.3a6d958cdefaep19, .c10 = V2 (0x1.927896baee627p21), + .c11 = -0x1.89333f6acd922p19, .c12 = V2 (0x1.5d4e912bb8456p27), + .c13 = -0x1.a854d53ab6874p29, .c14 = 0x1.1b76de7681424p32, +}; + +/* Approximation for double-precision vector tanpi(x) + The maximum error is 3.06 ULP: + _ZGVnN2v_tanpi(0x1.0a4a07dfcca3ep-1) got -0x1.fa30112702c98p+3 + want -0x1.fa30112702c95p+3. */ +float64x2_t VPCS_ATTR V_NAME_D1 (tanpi) (float64x2_t x) +{ + const struct v_tanpi_data *d = ptr_barrier (&tanpi_data); + + float64x2_t n = vrndnq_f64 (x); + + /* inf produces nan that propagates. */ + float64x2_t xr = vsubq_f64 (x, n); + float64x2_t ar = vabdq_f64 (x, n); + uint64x2_t flip = vcgtq_f64 (ar, v_f64 (0.25)); + float64x2_t r = vbslq_f64 (flip, vsubq_f64 (v_f64 (0.5), ar), ar); + + /* Order-14 pairwise Horner. */ + float64x2_t r2 = vmulq_f64 (r, r); + float64x2_t r4 = vmulq_f64 (r2, r2); + + float64x2_t c_1_3 = vld1q_f64 (&d->c1); + float64x2_t c_5_7 = vld1q_f64 (&d->c5); + float64x2_t c_9_11 = vld1q_f64 (&d->c9); + float64x2_t c_13_14 = vld1q_f64 (&d->c13); + float64x2_t p01 = vfmaq_laneq_f64 (d->c0, r2, c_1_3, 0); + float64x2_t p23 = vfmaq_laneq_f64 (d->c2, r2, c_1_3, 1); + float64x2_t p45 = vfmaq_laneq_f64 (d->c4, r2, c_5_7, 0); + float64x2_t p67 = vfmaq_laneq_f64 (d->c6, r2, c_5_7, 1); + float64x2_t p89 = vfmaq_laneq_f64 (d->c8, r2, c_9_11, 0); + float64x2_t p1011 = vfmaq_laneq_f64 (d->c10, r2, c_9_11, 1); + float64x2_t p1213 = vfmaq_laneq_f64 (d->c12, r2, c_13_14, 0); + + float64x2_t p = vfmaq_laneq_f64 (p1213, r4, c_13_14, 1); + p = vfmaq_f64 (p1011, r4, p); + p = vfmaq_f64 (p89, r4, p); + p = vfmaq_f64 (p67, r4, p); + p = vfmaq_f64 (p45, r4, p); + p = vfmaq_f64 (p23, r4, p); + p = vfmaq_f64 (p01, r4, p); + p = vmulq_f64 (r, p); + + float64x2_t p_recip = vdivq_f64 (v_f64 (1.0), p); + float64x2_t y = vbslq_f64 (flip, p_recip, p); + + uint64x2_t sign + = veorq_u64 (vreinterpretq_u64_f64 (xr), vreinterpretq_u64_f64 (ar)); + return vreinterpretq_f64_u64 (vorrq_u64 (vreinterpretq_u64_f64 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/tanpi_sve.c b/sysdeps/aarch64/fpu/tanpi_sve.c new file mode 100644 index 0000000000..57c643ae29 --- /dev/null +++ b/sysdeps/aarch64/fpu/tanpi_sve.c @@ -0,0 +1,88 @@ +/* Double-precision (SVE) tanpi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +const static struct v_tanpi_data +{ + double c0, c2, c4, c6, c8, c10, c12; + double c1, c3, c5, c7, c9, c11, c13, c14; +} tanpi_data = { + /* Coefficents for tan(pi * x) computed with fpminimax + on [ 0x1p-1022 0x1p-2 ] + approx rel error: 0x1.7eap-55 + approx abs error: 0x1.7eap-55. */ + .c0 = 0x1.921fb54442d18p1, /* pi. */ + .c1 = 0x1.4abbce625be52p3, .c2 = 0x1.466bc6775b0f9p5, + .c3 = 0x1.45fff9b426f5ep7, .c4 = 0x1.45f4730dbca5cp9, + .c5 = 0x1.45f3265994f85p11, .c6 = 0x1.45f4234b330cap13, + .c7 = 0x1.45dca11be79ebp15, .c8 = 0x1.47283fc5eea69p17, + .c9 = 0x1.3a6d958cdefaep19, .c10 = 0x1.927896baee627p21, + .c11 = -0x1.89333f6acd922p19, .c12 = 0x1.5d4e912bb8456p27, + .c13 = -0x1.a854d53ab6874p29, .c14 = 0x1.1b76de7681424p32, +}; + +/* Approximation for double-precision vector tanpi(x) + The maximum error is 3.06 ULP: + _ZGVsMxv_tanpi(0x1.0a4a07dfcca3ep-1) got -0x1.fa30112702c98p+3 + want -0x1.fa30112702c95p+3. */ +svfloat64_t SV_NAME_D1 (tanpi) (svfloat64_t x, const svbool_t pg) +{ + const struct v_tanpi_data *d = ptr_barrier (&tanpi_data); + + svfloat64_t n = svrintn_x (pg, x); + + /* inf produces nan that propagates. */ + svfloat64_t xr = svsub_x (pg, x, n); + svfloat64_t ar = svabd_x (pg, x, n); + svbool_t flip = svcmpgt (pg, ar, 0.25); + svfloat64_t r = svsel (flip, svsubr_x (pg, ar, 0.5), ar); + + /* Order-14 pairwise Horner. */ + svfloat64_t r2 = svmul_x (pg, r, r); + svfloat64_t r4 = svmul_x (pg, r2, r2); + + svfloat64_t c_1_3 = svld1rq (pg, &d->c1); + svfloat64_t c_5_7 = svld1rq (pg, &d->c5); + svfloat64_t c_9_11 = svld1rq (pg, &d->c9); + svfloat64_t c_13_14 = svld1rq (pg, &d->c13); + svfloat64_t p01 = svmla_lane (sv_f64 (d->c0), r2, c_1_3, 0); + svfloat64_t p23 = svmla_lane (sv_f64 (d->c2), r2, c_1_3, 1); + svfloat64_t p45 = svmla_lane (sv_f64 (d->c4), r2, c_5_7, 0); + svfloat64_t p67 = svmla_lane (sv_f64 (d->c6), r2, c_5_7, 1); + svfloat64_t p89 = svmla_lane (sv_f64 (d->c8), r2, c_9_11, 0); + svfloat64_t p1011 = svmla_lane (sv_f64 (d->c10), r2, c_9_11, 1); + svfloat64_t p1213 = svmla_lane (sv_f64 (d->c12), r2, c_13_14, 0); + + svfloat64_t p = svmla_lane (p1213, r4, c_13_14, 1); + p = svmad_x (pg, p, r4, p1011); + p = svmad_x (pg, p, r4, p89); + p = svmad_x (pg, p, r4, p67); + p = svmad_x (pg, p, r4, p45); + p = svmad_x (pg, p, r4, p23); + p = svmad_x (pg, p, r4, p01); + p = svmul_x (pg, r, p); + + svfloat64_t p_recip = svdivr_x (pg, p, 1.0); + svfloat64_t y = svsel (flip, p_recip, p); + + svuint64_t sign + = sveor_x (pg, svreinterpret_u64 (xr), svreinterpret_u64 (ar)); + return svreinterpret_f64 (svorr_x (pg, svreinterpret_u64 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/tanpif_advsimd.c b/sysdeps/aarch64/fpu/tanpif_advsimd.c new file mode 100644 index 0000000000..248cb0f999 --- /dev/null +++ b/sysdeps/aarch64/fpu/tanpif_advsimd.c @@ -0,0 +1,72 @@ +/* Single-precision (Advanced SIMD) tanpi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "v_math.h" + +const static struct v_tanpif_data +{ + float32x4_t c0, c2, c4, c6; + float c1, c3, c5, c7; +} tanpif_data = { + /* Coefficents for tan(pi * x). */ + .c0 = V4 (0x1.921fb4p1f), .c1 = 0x1.4abbcep3f, .c2 = V4 (0x1.466b8p5f), + .c3 = 0x1.461c72p7f, .c4 = V4 (0x1.42e9d4p9f), .c5 = 0x1.69e2c4p11f, + .c6 = V4 (0x1.e85558p11f), .c7 = 0x1.a52e08p16f, +}; + +/* Approximation for single-precision vector tanpi(x) + The maximum error is 3.34 ULP: + _ZGVnN4v_tanpif(0x1.d6c09ap-2) got 0x1.f70aacp+2 + want 0x1.f70aa6p+2. */ +float32x4_t VPCS_ATTR V_NAME_F1 (tanpi) (float32x4_t x) +{ + const struct v_tanpif_data *d = ptr_barrier (&tanpif_data); + + float32x4_t n = vrndnq_f32 (x); + + /* inf produces nan that propagates. */ + float32x4_t xr = vsubq_f32 (x, n); + float32x4_t ar = vabdq_f32 (x, n); + uint32x4_t flip = vcgtq_f32 (ar, v_f32 (0.25f)); + float32x4_t r = vbslq_f32 (flip, vsubq_f32 (v_f32 (0.5f), ar), ar); + + /* Order-7 pairwise Horner polynomial evaluation scheme. */ + float32x4_t r2 = vmulq_f32 (r, r); + float32x4_t r4 = vmulq_f32 (r2, r2); + + float32x4_t odd_coeffs = vld1q_f32 (&d->c1); + float32x4_t p01 = vfmaq_laneq_f32 (d->c0, r2, odd_coeffs, 0); + float32x4_t p23 = vfmaq_laneq_f32 (d->c2, r2, odd_coeffs, 1); + float32x4_t p45 = vfmaq_laneq_f32 (d->c4, r2, odd_coeffs, 2); + float32x4_t p67 = vfmaq_laneq_f32 (d->c6, r2, odd_coeffs, 3); + float32x4_t p = vfmaq_f32 (p45, r4, p67); + p = vfmaq_f32 (p23, r4, p); + p = vfmaq_f32 (p01, r4, p); + + p = vmulq_f32 (r, p); + float32x4_t p_recip = vdivq_f32 (v_f32 (1.0f), p); + float32x4_t y = vbslq_f32 (flip, p_recip, p); + + uint32x4_t sign + = veorq_u32 (vreinterpretq_u32_f32 (xr), vreinterpretq_u32_f32 (ar)); + return vreinterpretq_f32_u32 (vorrq_u32 (vreinterpretq_u32_f32 (y), sign)); +} + +libmvec_hidden_def (V_NAME_F1 (tanpi)) +HALF_WIDTH_ALIAS_F1 (tanpi) diff --git a/sysdeps/aarch64/fpu/tanpif_sve.c b/sysdeps/aarch64/fpu/tanpif_sve.c new file mode 100644 index 0000000000..0285f56f34 --- /dev/null +++ b/sysdeps/aarch64/fpu/tanpif_sve.c @@ -0,0 +1,68 @@ +/* Single-precision (SVE) tanpi function + + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "sv_math.h" + +const static struct v_tanpif_data +{ + float c0, c2, c4, c6; + float c1, c3, c5, c7; +} tanpif_data = { + /* Coefficients for tan(pi * x). */ + .c0 = 0x1.921fb4p1f, .c1 = 0x1.4abbcep3f, .c2 = 0x1.466b8p5f, + .c3 = 0x1.461c72p7f, .c4 = 0x1.42e9d4p9f, .c5 = 0x1.69e2c4p11f, + .c6 = 0x1.e85558p11f, .c7 = 0x1.a52e08p16f, +}; + +/* Approximation for single-precision vector tanpif(x) + The maximum error is 3.34 ULP: + _ZGVsMxv_tanpif(0x1.d6c09ap-2) got 0x1.f70aacp+2 + want 0x1.f70aa6p+2. */ +svfloat32_t SV_NAME_F1 (tanpi) (svfloat32_t x, const svbool_t pg) +{ + const struct v_tanpif_data *d = ptr_barrier (&tanpif_data); + svfloat32_t odd_coeffs = svld1rq (pg, &d->c1); + svfloat32_t n = svrintn_x (pg, x); + + /* inf produces nan that propagates. */ + svfloat32_t xr = svsub_x (pg, x, n); + svfloat32_t ar = svabd_x (pg, x, n); + svbool_t flip = svcmpgt (pg, ar, 0.25f); + svfloat32_t r = svsel (flip, svsub_x (pg, sv_f32 (0.5f), ar), ar); + + svfloat32_t r2 = svmul_x (pg, r, r); + svfloat32_t r4 = svmul_x (pg, r2, r2); + + /* Order-7 Pairwise Horner. */ + svfloat32_t p01 = svmla_lane (sv_f32 (d->c0), r2, odd_coeffs, 0); + svfloat32_t p23 = svmla_lane (sv_f32 (d->c2), r2, odd_coeffs, 1); + svfloat32_t p45 = svmla_lane (sv_f32 (d->c4), r2, odd_coeffs, 2); + svfloat32_t p67 = svmla_lane (sv_f32 (d->c6), r2, odd_coeffs, 3); + svfloat32_t p = svmad_x (pg, p67, r4, p45); + p = svmad_x (pg, p, r4, p23); + p = svmad_x (pg, p, r4, p01); + svfloat32_t poly = svmul_x (pg, r, p); + + svfloat32_t poly_recip = svdiv_x (pg, sv_f32 (1.0), poly); + svfloat32_t y = svsel (flip, poly_recip, poly); + + svuint32_t sign + = sveor_x (pg, svreinterpret_u32 (xr), svreinterpret_u32 (ar)); + return svreinterpret_f32 (svorr_x (pg, svreinterpret_u32 (y), sign)); +} diff --git a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c index f4babdda95..1855ee1aaf 100644 --- a/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-advsimd-wrappers.c @@ -51,3 +51,4 @@ VPCS_VECTOR_WRAPPER (sinh_advsimd, _ZGVnN2v_sinh) VPCS_VECTOR_WRAPPER (sinpi_advsimd, _ZGVnN2v_sinpi) VPCS_VECTOR_WRAPPER (tan_advsimd, _ZGVnN2v_tan) VPCS_VECTOR_WRAPPER (tanh_advsimd, _ZGVnN2v_tanh) +VPCS_VECTOR_WRAPPER (tanpi_advsimd, _ZGVnN2v_tanpi) diff --git a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c index 4627ea91b1..db35819172 100644 --- a/sysdeps/aarch64/fpu/test-double-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-double-sve-wrappers.c @@ -70,3 +70,4 @@ SVE_VECTOR_WRAPPER (sinh_sve, _ZGVsMxv_sinh) SVE_VECTOR_WRAPPER (sinpi_sve, _ZGVsMxv_sinpi) SVE_VECTOR_WRAPPER (tan_sve, _ZGVsMxv_tan) SVE_VECTOR_WRAPPER (tanh_sve, _ZGVsMxv_tanh) +SVE_VECTOR_WRAPPER (tanpi_sve, _ZGVsMxv_tanpi) diff --git a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c index 882109d986..6811eefa0c 100644 --- a/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-advsimd-wrappers.c @@ -51,3 +51,4 @@ VPCS_VECTOR_WRAPPER (sinhf_advsimd, _ZGVnN4v_sinhf) VPCS_VECTOR_WRAPPER (sinpif_advsimd, _ZGVnN4v_sinpif) VPCS_VECTOR_WRAPPER (tanf_advsimd, _ZGVnN4v_tanf) VPCS_VECTOR_WRAPPER (tanhf_advsimd, _ZGVnN4v_tanhf) +VPCS_VECTOR_WRAPPER (tanpif_advsimd, _ZGVnN4v_tanpif) diff --git a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c index 8b4e17e09a..ffe505334f 100644 --- a/sysdeps/aarch64/fpu/test-float-sve-wrappers.c +++ b/sysdeps/aarch64/fpu/test-float-sve-wrappers.c @@ -70,3 +70,4 @@ SVE_VECTOR_WRAPPER (sinhf_sve, _ZGVsMxv_sinhf) SVE_VECTOR_WRAPPER (sinpif_sve, _ZGVsMxv_sinpif) SVE_VECTOR_WRAPPER (tanf_sve, _ZGVsMxv_tanf) SVE_VECTOR_WRAPPER (tanhf_sve, _ZGVsMxv_tanhf) +SVE_VECTOR_WRAPPER (tanpif_sve, _ZGVsMxv_tanpif) diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps index 4534c4de45..a71001249e 100644 --- a/sysdeps/aarch64/libm-test-ulps +++ b/sysdeps/aarch64/libm-test-ulps @@ -1752,11 +1752,19 @@ double: 2 float: 2 ldouble: 2 +Function: "tanpi_advsimd": +double: 2 +float: 1 + Function: "tanpi_downward": double: 2 float: 3 ldouble: 4 +Function: "tanpi_sve": +double: 2 +float: 1 + Function: "tanpi_towardzero": double: 2 float: 3 diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist index c081f5fb28..a56ce7f4e2 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist @@ -134,12 +134,17 @@ GLIBC_2.41 _ZGVnN2v_logp1 F GLIBC_2.41 _ZGVnN2v_logp1f F GLIBC_2.41 _ZGVnN2v_sinpi F GLIBC_2.41 _ZGVnN2v_sinpif F +GLIBC_2.41 _ZGVnN2v_tanpi F +GLIBC_2.41 _ZGVnN2v_tanpif F GLIBC_2.41 _ZGVnN4v_cospif F GLIBC_2.41 _ZGVnN4v_logp1f F GLIBC_2.41 _ZGVnN4v_sinpif F +GLIBC_2.41 _ZGVnN4v_tanpif F GLIBC_2.41 _ZGVsMxv_cospi F GLIBC_2.41 _ZGVsMxv_cospif F GLIBC_2.41 _ZGVsMxv_logp1 F GLIBC_2.41 _ZGVsMxv_logp1f F GLIBC_2.41 _ZGVsMxv_sinpi F GLIBC_2.41 _ZGVsMxv_sinpif F +GLIBC_2.41 _ZGVsMxv_tanpi F +GLIBC_2.41 _ZGVsMxv_tanpif F