From patchwork Thu Feb 26 14:23:45 2026
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: "Cosmina.Dunca@arm.com" <cosmina.dunca@arm.com>
X-Patchwork-Id: 130709
Return-Path: <libc-alpha-bounces~patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from vm01.sourceware.org (localhost [127.0.0.1])
	by sourceware.org (Postfix) with ESMTP id EBE414B9DB40
	for <patchwork@sourceware.org>; Thu, 26 Feb 2026 14:26:47 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EBE414B9DB40
Authentication-Results: sourceware.org;
	dkim=pass (1024-bit key,
 unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256
 header.s=selector1 header.b=oioKgU/Y;
	dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com
 header.a=rsa-sha256 header.s=selector1 header.b=oioKgU/Y
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from DB3PR0202CU003.outbound.protection.outlook.com
 (mail-northeuropeazlp170100001.outbound.protection.outlook.com
 [IPv6:2a01:111:f403:c200::1])
 by sourceware.org (Postfix) with ESMTPS id A4A524BA23E0
 for <libc-alpha@sourceware.org>; Thu, 26 Feb 2026 14:25:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A4A524BA23E0
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A4A524BA23E0
Authentication-Results: server2.sourceware.org;
 arc=pass smtp.remote-ip=2a01:111:f403:c200::1
ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1772115913; cv=pass;
 b=KHh4U6IPkQb1eWmFaK5OMS+EnmmXTBpwFFy2o3BpGx2zWAd2eEfP9FqzqRaAH85fkRWGftti5koujOHp/2CUbV+IkDjLyEnwI9TQMyUpSc29vtjn3CrriHFGaiW8GYywEDpM/OueAgn/6qnJgVSFWzFFttgNYXTqCLr3JPnPLec=
ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key;
 t=1772115913; c=relaxed/simple;
 bh=MmrhXL5a6ATa4tQDBRS+mb+YwKXkLrmNJSG0bvdQ8JA=;
 h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID:
 MIME-Version;
 b=e+DZ5bdng0ckP20IILeVkGEmpT4jeVhnPXhzCIVuPRN7o+PafVTOy22du0t3vesRxudIgWAEaMnVmzCQ1oMBWXBr3RSZdgy9L2ABgk3Da3rQfc5CnEURyvl8PFRNsSLgdGkp6nCzaEEcEXSCtiITXkbWweidiXCx/cSu51F5eOs=
ARC-Authentication-Results: i=3; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A4A524BA23E0
ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass;
 b=C2q+4/wNFOPMynnjsJhxIUUyvNg+tGecNOtgjYsHdTLdSQVAvH3QBImEE6JtP9Y8Od57kFeIz/M3ngD9THWIWLOQx3/YpxNoxOQNz+8bhaf7TW2G3LsmBabC7tjxna2sqyvbIAWXXTlOtBWOBEJj4yY3xtU0EPCondAwASTnkfQPN2zFYVDTpbjkO8uZskikTjx6TGh2liDCshAemR6i5nOnNXhtmv20/rWyUt79IZGA4TkIo3xAWslWjh7P+CjOGZWeBtSmFBHleXwz15RmIiGfm8J6FTISJgMkm7CMSaxuUJkTQ8GfDnurOJQzQ9IiMBm9B8N/z3C+iiMfONuEXg==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=RrJRjFc2vWhJjb+l24om2IvFyn4kJBJqBdVUsYXOjmM=;
 b=P5xWntkMHusJiivIQarLuHRXEW7xYyOvUrD6XpRQ3xHcxsJpHJN5fKdW8wjvE8iWEpe1WCZMIeUYwXu8KD4mpkfS4QO7EQEMX7zT8jH3VS7eOebYjmnL+rKFSrJXJdXD4OQeQkMA7TeIxxro2V8TF7Zb8FdZpfIA5obx2iSWIi+iclF2DDqp0wqg8tAwGl0+DY8Z9eD6/XOWJEbnGcq+nRRyoH9I0mLVtuw4fKGzqHdSIV9ZGfCnMW5xE6ImE81SPY1CaT65r3i0tIP+jn3P4mTQ/uFqUVShcdyjqE2B0piW46mQytcX3EOcCO0S90LDgwrqLuTBEhRKvbXd4hl+DA==
ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is
 4.158.2.129) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com;
 dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com;
 dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1
 spf=[1,1,smtp.mailfrom=arm.com] dmarc=[1,1,header.from=arm.com])
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=RrJRjFc2vWhJjb+l24om2IvFyn4kJBJqBdVUsYXOjmM=;
 b=oioKgU/YunX0m2MGekn5WoLfAlL2gAeRtb+LY1HWLJQQdytol4bw8QvXNNzr66NFTgwaVIdN1tM4Qs1av/ScO2hYk080vG1RFHBc7ff8rprU95EA3vPyOGYY5mFVdqcUvulZXUe976pOGFyE4c5t7RkqMrChoyghsy60TWNVZYg=
Received: from AS4P189CA0008.EURP189.PROD.OUTLOOK.COM (2603:10a6:20b:5d7::9)
 by AS8PR08MB8016.eurprd08.prod.outlook.com (2603:10a6:20b:549::14) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9632.23; Thu, 26 Feb
 2026 14:25:04 +0000
Received: from AMS1EPF00000045.eurprd04.prod.outlook.com
 (2603:10a6:20b:5d7:cafe::2d) by AS4P189CA0008.outlook.office365.com
 (2603:10a6:20b:5d7::9) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9632.26 via Frontend Transport; Thu,
 26 Feb 2026 14:25:05 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129)
 smtp.mailfrom=arm.com; dkim=pass (signature was verified)
 header.d=arm.com;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 4.158.2.129 as permitted sender) receiver=protection.outlook.com;
 client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C
Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by
 AMS1EPF00000045.mail.protection.outlook.com (10.167.16.42) with Microsoft
 SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9632.12
 via Frontend Transport; Thu, 26 Feb 2026 14:25:02 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=QzRGAk7Q5Ksv+Snt6bK3cKIa7geho7oZNw6FU5T5XDS3DZtijps9surpSTPGhDbAROrJZn/LFNRcjGSIo1TiwJTA9BONAXSxttFXEB9N1VjT2hxhZXA6iL8yBU4rQw4h3E5mz9XLcjsHHgP28bK1mUwvHYc0qQgpA8EMBhUk7U9d9jpVu/ykEnWKKU88bzxsIlQBg2OuMbK5EmD9nHJ4BZkQsGlVmg/qpQd6kJVPRUCpvTa8hkk+mJecAIpb3+nXTVqFjhQraTLx9MMh0nHbhXx9mCNRXyPe1bEGRPZlckrd+RtDSi/2cHUc3wYyEPlXzkdCum/IXBeT7QAvnQn3WQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=RrJRjFc2vWhJjb+l24om2IvFyn4kJBJqBdVUsYXOjmM=;
 b=MGmczmWJH+xBPmxNYKBojrWW0End141njbxE6E0uEO/G1n1SJR1sKnsCfu2a14rEowZoX2hAmPilTr1JvoqOt+lINROzYA1aE9iXliP66SDCzWDQK+EyyPi37HayMurHFsOwhfjqsGDOo+cqEvR5dnNoU2Sbk7A1GiG4MkLPQf3Hw0RwkYmltReMaJwHREnt9PZFNNdZ4mIl2T1RfL0jzP/bsZG+RLVm1ufUsKqHDE+ee4A7zO3oxqwKotwFL2ebcz5caoJi01iGQXte/1IBr2dhyBIQ/B5n+G51ekEQa7zw/WB/tnzlmOEUIrm3AqiO3K/lHeoaT7s+AFTUhD9e0g==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 172.205.89.229) smtp.rcpttodomain=sourceware.org smtp.mailfrom=arm.com;
 dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=RrJRjFc2vWhJjb+l24om2IvFyn4kJBJqBdVUsYXOjmM=;
 b=oioKgU/YunX0m2MGekn5WoLfAlL2gAeRtb+LY1HWLJQQdytol4bw8QvXNNzr66NFTgwaVIdN1tM4Qs1av/ScO2hYk080vG1RFHBc7ff8rprU95EA3vPyOGYY5mFVdqcUvulZXUe976pOGFyE4c5t7RkqMrChoyghsy60TWNVZYg=
Received: from AM5PR0301CA0010.eurprd03.prod.outlook.com
 (2603:10a6:206:14::23) by DB9PR08MB9489.eurprd08.prod.outlook.com
 (2603:10a6:10:458::6) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9632.22; Thu, 26 Feb
 2026 14:23:52 +0000
Received: from AMS1EPF0000003F.eurprd04.prod.outlook.com
 (2603:10a6:206:14:cafe::30) by AM5PR0301CA0010.outlook.office365.com
 (2603:10a6:206:14::23) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9632.24 via Frontend Transport; Thu,
 26 Feb 2026 14:23:50 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 172.205.89.229)
 smtp.mailfrom=arm.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=arm.com;
Received-SPF: Pass (protection.outlook.com: domain of arm.com designates
 172.205.89.229 as permitted sender) receiver=protection.outlook.com;
 client-ip=172.205.89.229; helo=nebula.arm.com; pr=C
Received: from nebula.arm.com (172.205.89.229) by
 AMS1EPF0000003F.mail.protection.outlook.com (10.167.16.36) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.9632.12 via Frontend Transport; Thu, 26 Feb 2026 14:23:51 +0000
Received: from AZ-NEU-EXJ02.Arm.com (10.240.25.139) by AZ-NEU-EX04.Arm.com
 (10.240.25.138) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.29; Thu, 26 Feb
 2026 14:23:48 +0000
Received: from AZ-NEU-EX03.Arm.com (10.240.25.137) by AZ-NEU-EXJ02.Arm.com
 (10.240.25.139) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.29; Thu, 26 Feb
 2026 14:23:48 +0000
Received: from ip-10-249-51-4.eu-west-1.compute.internal (10.252.0.220) by
 mail.arm.com (10.240.25.137) with Microsoft SMTP Server id 15.2.2562.29 via
 Frontend Transport; Thu, 26 Feb 2026 14:23:48 +0000
From: "Cosmina.Dunca@arm.com" <cosmina.dunca@arm.com>
To: <libc-alpha@sourceware.org>
CC: "Cosmina.Dunca@arm.com" <cosmina.dunca@arm.com>
Subject: [PATCH] Vectorise inverse hyperbolic special cases
Date: Thu, 26 Feb 2026 14:23:45 +0000
Message-ID: <20260226142346.3483-1-cosmina.dunca@arm.com>
X-Mailer: git-send-email 2.43.0
MIME-Version: 1.0
X-EOPAttributedMessage: 1
X-MS-TrafficTypeDiagnostic: 
 AMS1EPF0000003F:EE_|DB9PR08MB9489:EE_|AMS1EPF00000045:EE_|AS8PR08MB8016:EE_
X-MS-Office365-Filtering-Correlation-Id: 69ffdc86-32c5-41d7-d21b-08de7542d4a2
x-checkrecipientrouted: true
NoDisclaimer: true
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam-Untrusted: BCL:0;
 ARA:13230040|376014|36860700013|1800799024|82310400026;
X-Microsoft-Antispam-Message-Info-Original: 
 Cgu6iyOmIxgYqjU0rrsjReX0zOJM1p1jUhNMXgGhyFr/0HnM11dpKT6CJsQG2scMHXtEfKBM/ECFvdMWlJS1M+Gg+PzCvJoymSGREpKadPze/0SLET19z50h1D2h+vePCzOvKoTO5xN6vZK9wZJUw31/crZO3zWKiz065iWu0nZYisY86nIgujjOOgCy/deHu1VTQZPt4lXa5+OoccyU/XagImfu/TeEIa1gXy3IeNeC3Lipv4qxYto3D/REj2LAT8UR/5Oczu6PMBstWG4ccMsUQwMMU4Dl761lIO7EtWVRagjJV3wAohkggdRvItHrS6/aWOPnFdLl7/NzF0USLCSWXHMrgUH/oDiZ2m54IKpewGUb3YlsaQJORj406VuX65GY9J4tULHHgtlbbwXbz+VwNyHZEut37lsxInEefRhHodUzn/uiVgJZl3uDEakGC+VIwJOtWI0g7SRieXZ4PPm5n+ES1PLDfJpUcreGNVFLud7wAKy66DU9yMu45KJ6mZyFJBQ2gh2BL8MnYCsl63msoWoQbvo5p78SEwaiaChquVJZwKvHaBVF/mHsn5sRRiePm+xO+N4deK31xwRwd55PLXe5/tpy6pMRx0uMJbyTJVHP5cUMFKTSJUE+2kO1BPjRyIlVBaP04CR/X8Mpfkg6RCJ3FVKPA5dpN1MdZ7qJvrmUi7519dZLi3cAxaVqjlF/jQo4IE9qC/trmjiMI6SeVHS2jIpyl+q5E4AkRpjdjV4Llk9G6E/LNI0Ddv0ON+yORHWpqNiNsFmB2siTBmVyc1sLpAQrY5LqOcZ17qkeWFoPoYIMTFVfxhKU0xrQoAMgYfMdVqYMN+tx0VqAOw==
X-Forefront-Antispam-Report-Untrusted: CIP:172.205.89.229; CTRY:IE; LANG:en;
 SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent;
 CAT:NONE; SFS:(13230040)(376014)(36860700013)(1800799024)(82310400026);
 DIR:OUT; SFP:1101;
X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB9489
X-MS-Exchange-Transport-CrossTenantHeadersStripped: 
 AMS1EPF00000045.eurprd04.prod.outlook.com
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id-Prvs: 
 229e0830-c8bf-4f9d-a450-08de7542aa2f
X-Microsoft-Antispam: BCL:0;
 ARA:13230040|14060799003|376014|1800799024|82310400026|35042699022|36860700013;
X-Microsoft-Antispam-Message-Info: 
 0rNsfzWXvM46adWELTy9+HCY+rznwcDnVyD3fiPt9JjP+SOL7D6rsAKBycjlxVOICz5pUBUw217B/yEGJxr1HT6UyqohiujXHruKqlAoYEhRwIICTWtpFRVcI8snmq42BABpkOzvNgl7FGlCeexpkXQgBWWCZAKzMzW9X9roQAlOf8JyrHrO54WSSYFEMhZ64X6kTZptEMjFArfClGggilgHVFjewsmu1LqbQBsU/E9koO65QlacnQiB4tpqone04T0VWt/1F+BxU8gYcV9xRmOmGDJ8p+r2ijAfkoiorVV4C4OKDZpxwqkTIoY1A1KWFM2Hw4KuvJEV6ghgLDyGuWl2pJtX84UXDfB6znDNetdDHY6q8BaULpuh/4ODml0zQltOiAap18ZUHVitS1KpItVjKxqExkb4hsAvgKlBCYYWzzC1/xenkPXCyiwO95dqj5C8YMjSoqynkK4FzMF3h7NDc8puTcbDrN3gWtXeDC9MT8i4LSE8rRxD280j7VdJZZxR3815Fidg0anAoyntoYkmV9h8bJbaB/wUqz9uD4XfpgjgL5ZZ7AOF9Fw2HM0YX0SfFFjNSWeyfATa5zGRgYpK79iHmCBbZt6Mz+UabxhdUroIGmDb6THbRlwANcrACQXQGbXT+SzCSHFDtS2R3mZJs771AbdLCNOTbj/jhvOYrtScabDyN8c9HNFhMm7sVUW9J66G80GZv20MWyvaja4iJcSkGD9NWHw1c5awI4z3iBbb3aBYh3ZFWYekCVqI8Ro6zOTJMNVXZGq4QAgNFLcc8B0Lv5KW0w9ERanYKPBXlmWs/6PXpwfcsj5CECbo2h7IzNQJBEUfa9xrdwRB3g==
X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com;
 PTR:InfoDomainNonexistent; CAT:NONE;
 SFS:(13230040)(14060799003)(376014)(1800799024)(82310400026)(35042699022)(36860700013);
 DIR:OUT; SFP:1101;
X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1
X-MS-Exchange-AntiSpam-MessageData-0: 
 qt5VJz4UFKhxABtJT5s9FDQaRKYeDRUTKW5ybZlHetNP9YUioYNiBovVz8aXkHCuZLa7e6Av/TbIHXCzXsi2SPjBY8L7GaGY9xu2CnxAOcYxFsDqPWdSfHJ+VSavCfHhLqvY6AMhv0pHwcCO4nFDfVQ2hepGLCUmz+wPoGmkN5gDU0b7IjQC1hzSt400USqVeUAG4/N8di70oiGc+Mg3XLZNotIhGT2qbTy/W75G4vBmwLkmaRnGbvgHlIJTVqTld6q4uW3gQ7YIYUTkdGlBOMzFXdn+828RotA2VcBB0ULNknARoPBMucc3KkCH9m9ClzIiLVbDtmAVJWerXtEPNbd9yscjYZW+JgnlLyxDIxSx2QfYocVAb4MNTLDm8w7S6Anqho9Hmutrk2gZQRwIRTZnWJd7nPTy2gXqPg8K1JvXPZoifegypwwV6fPOhmFa
X-OriginatorOrg: arm.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Feb 2026 14:25:02.8540 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 69ffdc86-32c5-41d7-d21b-08de7542d4a2
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129];
 Helo=[outbound-uk1.az.dlp.m.darktrace.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 AMS1EPF00000045.eurprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8016
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0,
 RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS, SPF_NONE, TXREP,
 URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org

Vectorise SVE and AdvSIMD special-case handling for inverse
hyperbolic functions (acosh, asinh, atanh).

General-case improvements yield an average 11% speedup, with
peak gains of up to 80%.

For benchmarking I used Neoverse V2 with GCC@15.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
---
 sysdeps/aarch64/fpu/acosh_advsimd.c  | 13 ++++-
 sysdeps/aarch64/fpu/acosh_sve.c      | 32 ++++++++++--
 sysdeps/aarch64/fpu/acoshf_advsimd.c | 73 +++++++++++++++-------------
 sysdeps/aarch64/fpu/asinh_advsimd.c  | 68 +++++++++++++++++---------
 sysdeps/aarch64/fpu/asinh_sve.c      | 48 ++++++++++++------
 sysdeps/aarch64/fpu/asinhf_advsimd.c | 63 ++++++++++++++----------
 sysdeps/aarch64/fpu/atanh_advsimd.c  | 27 +++++-----
 sysdeps/aarch64/fpu/atanh_sve.c      | 24 ++++++---
 8 files changed, 224 insertions(+), 124 deletions(-)

diff --git a/sysdeps/aarch64/fpu/acosh_advsimd.c b/sysdeps/aarch64/fpu/acosh_advsimd.c
index ac0db3ed15..b5efe9222f 100644
--- a/sysdeps/aarch64/fpu/acosh_advsimd.c
+++ b/sysdeps/aarch64/fpu/acosh_advsimd.c
@@ -34,7 +34,18 @@ static float64x2_t NOINLINE VPCS_ATTR
 special_case (float64x2_t x, float64x2_t y, uint64x2_t special,
 	      const struct v_log1p_data *d)
 {
-  return v_call_f64 (acosh, x, log1p_inline (y, d), special);
+  /* Acosh is define on [1, inf). Its formula can be re-written knowing that 1
+    becomes negligible when x is a very large number. So for special numbers,
+    where x >= 2^511, acosh ~= ln(2x). But, ln(2x) = ln(2) + ln(x) and below we
+    calculate ln(x) and then add ln(2) to the result.
+
+    Right before returning we check if x is infinity or if x is lower than 1,
+    in which case we return infinity or NaN.  */
+  float64x2_t one = v_f64 (1.0);
+  float64x2_t t = log1p_inline (vbslq_f64 (special, x, y), d);
+  t = vbslq_f64 (special, vaddq_f64 (t, v_f64 (d->ln2[0])), t);
+  t = vbslq_f64 (vceqq_f64 (x, v_f64 (INFINITY)), v_f64 (INFINITY), t);
+  return vbslq_f64 (vcltq_f64 (x, one), v_f64 (NAN), t);
 }
 
 /* Vector approximation for double-precision acosh, based on log1p.
diff --git a/sysdeps/aarch64/fpu/acosh_sve.c b/sysdeps/aarch64/fpu/acosh_sve.c
index 1f6b8fc19a..6d996e3d36 100644
--- a/sysdeps/aarch64/fpu/acosh_sve.c
+++ b/sysdeps/aarch64/fpu/acosh_sve.c
@@ -20,13 +20,33 @@
 #define WANT_SV_LOG1P_K0_SHORTCUT 1
 #include "sv_log1p_inline.h"
 
-#define One (0x3ff0000000000000)
-#define Thres (0x1ff0000000000000) /* asuint64 (0x1p511) - One.  */
+#define One (0x3ff0000000000000ULL)
+#define Thres (0x1ff0000000000000ULL)
 
+const static struct data
+{
+  double ln2;
+} data = { .ln2 = 0x1.62e42fefa39efp-1 };
+
+/* Acosh is define on [1, inf). Its formula can be re-written knowing that 1
+   becomes negligible when x is a very large number. So for special numbers,
+   where x >= 2^511, acosh ~= ln(2x). But, ln(2x) = ln(2) + ln(x) and below we
+   calculate ln(x) and then add ln(2) to the result.
+
+   Right before returning we check if x is infinity or if x is lower than 1,
+   in which case we return infinity or NaN.  */
 static svfloat64_t NOINLINE
-special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
+special_case (svfloat64_t x, svfloat64_t xm1, svfloat64_t y, svbool_t special,
+	      svbool_t pg, const struct data *d)
 {
-  return sv_call_f64 (acosh, x, y, special);
+  svfloat64_t inf = sv_f64 (INFINITY);
+  svfloat64_t nan = sv_f64 (NAN);
+  svfloat64_t log = sv_log1p_inline (svsel (special, xm1, y), pg);
+  svfloat64_t result = svadd_m (special, log, sv_f64 (d->ln2));
+  svbool_t is_inf_nan = svorr_b_z (special, svcmpge (special, x, inf),
+				   svcmplt (special, x, sv_f64 (1.0)));
+  svfloat64_t inf_or_nan = svsel (svcmpge (special, x, inf), inf, nan);
+  return svsel (is_inf_nan, inf_or_nan, result);
 }
 
 /* SVE approximation for double-precision acosh, based on log1p.
@@ -36,6 +56,8 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
 					   want 0x1.ef0cee7c33ce4p-2.  */
 svfloat64_t SV_NAME_D1 (acosh) (svfloat64_t x, const svbool_t pg)
 {
+  const struct data *d = ptr_barrier (&data);
+
   /* (ix - One) >= (BigBound - One).  */
   svuint64_t ix = svreinterpret_u64 (x);
   svbool_t special = svcmpge (pg, svsub_x (pg, ix, One), Thres);
@@ -46,6 +68,6 @@ svfloat64_t SV_NAME_D1 (acosh) (svfloat64_t x, const svbool_t pg)
 
   /* Fall back to scalar routine for special lanes.  */
   if (__glibc_unlikely (svptest_any (pg, special)))
-    return special_case (x, sv_log1p_inline (y, pg), special);
+    return special_case (x, xm1, y, special, pg, d);
   return sv_log1p_inline (y, pg);
 }
diff --git a/sysdeps/aarch64/fpu/acoshf_advsimd.c b/sysdeps/aarch64/fpu/acoshf_advsimd.c
index 6071190938..275de77e1b 100644
--- a/sysdeps/aarch64/fpu/acoshf_advsimd.c
+++ b/sysdeps/aarch64/fpu/acoshf_advsimd.c
@@ -24,15 +24,13 @@ const static struct data
   struct v_log1pf_data log1pf_consts;
   uint32x4_t one;
   uint16x4_t special_bound_u16;
-  uint32x4_t special_bound_u32;
-  float32x4_t pinf, nan;
+  float32x4_t inf, nan;
 } data = {
   .log1pf_consts = V_LOG1PF_CONSTANTS_TABLE,
   .one = V4 (0x3f800000),
-  .special_bound_u16 = V4 (0x2000),
   /* asuint(sqrt(FLT_MAX)) - asuint(1).  */
-  .special_bound_u32 = V4 (0x20000000),
-  .pinf = V4 (INFINITY),
+  .special_bound_u16 = V4 (0x2000),
+  .inf = V4 (INFINITY),
   .nan = V4 (NAN),
 };
 
@@ -51,34 +49,42 @@ inline_acoshf (float32x4_t x, const struct data *d)
 }
 
 static float32x4_t VPCS_ATTR NOINLINE
-special_case (float32x4_t x, const struct data *d)
+special_case (float32x4_t x, uint16x4_t special_u16, const struct data *d)
 {
-  uint32x4_t special = vcgeq_u32 (
-      vsubq_u32 (vreinterpretq_u32_f32 (x), d->one), d->special_bound_u32);
-
-  /* To avoid the overflow in x^2 (so the x < sqrt(FLT_MAX) constraint), we
-    reduce the input of acosh to a narrower interval by relying on the identity
-    acosh(t) = 1/2acosh(2t^2 - 1) for t>=1.
-    If we set t=sqrt((x+1)/2), since x>=1 then t>=sqrt(2/2)=1, and therefore
-    acosh(x) = 2acosh(sqrt((x+1)/2)).  */
-  float32x4_t r = vaddq_f32 (x, vreinterpretq_f32_u32 (d->one));
-  r = vmulq_f32 (r, v_f32 (0.5f));
-  r = vbslq_f32 (special, vsqrtq_f32 (r), x);
-
-  float32x4_t y = inline_acoshf (r, d);
-
-  y = vbslq_f32 (special, vmulq_f32 (y, v_f32 (2.0f)), y);
-
-  /* Check whether x is less than 1, or x is inf or nan.  */
-  uint32x4_t inf_minus_one
-      = vsubq_u32 (vreinterpretq_u32_f32 (d->pinf), d->one);
-  uint32x4_t is_infnan = vcgeq_u32 (
-      vsubq_u32 (vreinterpretq_u32_f32 (x), d->one), inf_minus_one);
-
-  y = vbslq_f32 (is_infnan, d->nan, y);
-  uint32x4_t ret_pinf = vceqq_f32 (x, d->pinf);
-  y = vbslq_f32 (ret_pinf, d->pinf, y);
-  return y;
+  float32x4_t one = v_f32 (1.0);
+  float32x4_t ln2 = d->log1pf_consts.ln2;
+
+  /* acosh(x) = ln(x + sqrt[x^2 -1]).
+    So acosh(x) = log1p (x + sqrt[x^2 - 1] - 1).  */
+  float32x4_t xm1 = vsubq_f32 (x, one);
+  float32x4_t u = vmulq_f32 (xm1, vaddq_f32 (x, one));
+
+  float32x4_t y = vaddq_f32 (xm1, vsqrtq_f32 (u));
+
+  /* special_u16 will be 0x0000ffff for true lanes, which doesn't work with bit
+     selects therefore to make it a 32 bit predicate, we have to add special
+     << 16.  */
+  uint32x4_t special = vmovl_u16 (special_u16);
+  special = vaddq_u32 (special, vshlq_n_u32 (special, 16));
+  float32x4_t xy = vbslq_f32 (special, x, y);
+
+  /* For large inputs, acosh(x) ≈ log(x) + ln(2).
+     We use log1pf-inline log implementation and add ln(2).  */
+  float32x4_t log_xy = log1pf_inline (xy, &d->log1pf_consts);
+
+  /* For acoshf there are three special cases that need considering. Infinity
+     and NaNs, which are also returned unchanged and for cases of x < 1 we'll
+     return all NaNs since acoshf is defined on (1, inf).  */
+  uint32x4_t is_lower_one = vcltq_f32 (x, one);
+  uint32x4_t is_finite = vcltq_f32 (x, d->inf);
+
+  /* This dependency chain of selects can run in parallel with y and log_x
+     being calculated before the last addition.  */
+  float32x4_t ln2_inf_nan = vbslq_f32 (is_finite, ln2, x);
+  ln2_inf_nan = vbslq_f32 (is_lower_one, d->nan, ln2_inf_nan);
+  float32x4_t ln2_inf_nan_zero = vbslq_f32 (special, ln2_inf_nan, v_f32 (0));
+
+  return vaddq_f32 (log_xy, ln2_inf_nan_zero);
 }
 
 /* Vector approximation for single-precision acosh, based on log1p.
@@ -97,8 +103,9 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (acosh) (float32x4_t x)
       = vcge_u16 (vsubhn_u32 (ix, d->one), d->special_bound_u16);
 
   if (__glibc_unlikely (v_any_u16h (special)))
-    return special_case (x, d);
+    return special_case (x, special, d);
   return inline_acoshf (x, d);
 }
+
 libmvec_hidden_def (V_NAME_F1 (acosh))
 HALF_WIDTH_ALIAS_F1 (acosh)
diff --git a/sysdeps/aarch64/fpu/asinh_advsimd.c b/sysdeps/aarch64/fpu/asinh_advsimd.c
index 51689179ff..650ed46eec 100644
--- a/sysdeps/aarch64/fpu/asinh_advsimd.c
+++ b/sysdeps/aarch64/fpu/asinh_advsimd.c
@@ -27,36 +27,43 @@ const static struct data
   double lc1, lc3, ln2, lc4;
   float64x2_t c0, c2, c4, c6, c8, c10, c12, c14, c16, c17;
   double c1, c3, c5, c7, c9, c11, c13, c15;
+  float64x2_t inf;
 } data = {
 
   /* Even terms of polynomial s.t. asinh(x) is approximated by
      asinh(x) ~= x + x^3 * (C0 + C1 * x + C2 * x^2 + C3 * x^3 + ...).
      Generated using Remez, f = (asinh(sqrt(x)) - sqrt(x))/x^(3/2).  */
-  .c0 = V2 (-0x1.55555555554a7p-3),    .c1 = 0x1.3333333326c7p-4,
-  .c2 = V2 (-0x1.6db6db68332e6p-5),    .c3 = 0x1.f1c71b26fb40dp-6,
-  .c4 = V2 (-0x1.6e8b8b654a621p-6),    .c5 = 0x1.1c4daa9e67871p-6,
-  .c6 = V2 (-0x1.c9871d10885afp-7),    .c7 = 0x1.7a16e8d9d2ecfp-7,
-  .c8 = V2 (-0x1.3ddca533e9f54p-7),    .c9 = 0x1.0becef748dafcp-7,
-  .c10 = V2 (-0x1.b90c7099dd397p-8),   .c11 = 0x1.541f2bb1ffe51p-8,
-  .c12 = V2 (-0x1.d217026a669ecp-9),   .c13 = 0x1.0b5c7977aaf7p-9,
-  .c14 = V2 (-0x1.e0f37daef9127p-11),  .c15 = 0x1.388b5fe542a6p-12,
-  .c16 = V2 (-0x1.021a48685e287p-14),  .c17 = V2 (0x1.93d4ba83d34dap-18),
-  .lc0 = V2 (-0x1.ffffffffffff7p-2),   .lc1 = 0x1.55555555170d4p-2,
-  .lc2 = V2 (-0x1.0000000399c27p-2),   .lc3 = 0x1.999b2e90e94cap-3,
-  .lc4 = -0x1.554e550bd501ep-3,	       .ln2 = 0x1.62e42fefa39efp-1,
-  .off = V2 (0x3fe6900900000000),      .huge_bound = V2 (0x5fe0000000000000),
-  .abs_mask = V2 (0x7fffffffffffffff), .mask = V2 (0xfffULL << 52),
+  .c0 = V2 (-0x1.55555555554a7p-3),
+  .c1 = 0x1.3333333326c7p-4,
+  .c2 = V2 (-0x1.6db6db68332e6p-5),
+  .c3 = 0x1.f1c71b26fb40dp-6,
+  .c4 = V2 (-0x1.6e8b8b654a621p-6),
+  .c5 = 0x1.1c4daa9e67871p-6,
+  .c6 = V2 (-0x1.c9871d10885afp-7),
+  .c7 = 0x1.7a16e8d9d2ecfp-7,
+  .c8 = V2 (-0x1.3ddca533e9f54p-7),
+  .c9 = 0x1.0becef748dafcp-7,
+  .c10 = V2 (-0x1.b90c7099dd397p-8),
+  .c11 = 0x1.541f2bb1ffe51p-8,
+  .c12 = V2 (-0x1.d217026a669ecp-9),
+  .c13 = 0x1.0b5c7977aaf7p-9,
+  .c14 = V2 (-0x1.e0f37daef9127p-11),
+  .c15 = 0x1.388b5fe542a6p-12,
+  .c16 = V2 (-0x1.021a48685e287p-14),
+  .c17 = V2 (0x1.93d4ba83d34dap-18),
+  .lc0 = V2 (-0x1.ffffffffffff7p-2),
+  .lc1 = 0x1.55555555170d4p-2,
+  .lc2 = V2 (-0x1.0000000399c27p-2),
+  .lc3 = 0x1.999b2e90e94cap-3,
+  .lc4 = -0x1.554e550bd501ep-3,
+  .ln2 = 0x1.62e42fefa39efp-1,
+  .off = V2 (0x3fe6900900000000),
+  .huge_bound = V2 (0x5fe0000000000000),
+  .abs_mask = V2 (0x7fffffffffffffff),
+  .mask = V2 (0xfffULL << 52),
+  .inf = V2 (INFINITY)
 };
 
-static float64x2_t NOINLINE VPCS_ATTR
-special_case (float64x2_t x, float64x2_t y, uint64x2_t abs_mask,
-	      uint64x2_t special)
-{
-  /* Copy sign.  */
-  y = vbslq_f64 (abs_mask, y, x);
-  return v_call_f64 (asinh, x, y, special);
-}
-
 #define N (1 << V_LOG_TABLE_BITS)
 #define IndexMask (N - 1)
 
@@ -110,6 +117,19 @@ log_inline (float64x2_t ax, const struct data *d)
   return vfmaq_f64 (hi, y, r2);
 }
 
+static float64x2_t NOINLINE VPCS_ATTR
+special_case (float64x2_t x, uint64x2_t special, float64x2_t y, float64x2_t ax,
+	      const struct data *d)
+{
+  /* For very large inputs (x > 2^511), asinh(x) ≈ ln(2x).
+    In this range the +sqrt(x^2+1) term is negligible, so we compute
+    asinh(x) as ln(x) + ln(2).  */
+  float64x2_t res_asinh = vaddq_f64 (log_inline (ax, d), v_f64 (d->ln2));
+  res_asinh = vbslq_f64 (vceqq_f64 (ax, d->inf), d->inf, res_asinh);
+  res_asinh = vbslq_f64 (special, res_asinh, y);
+  return vbslq_f64 (d->abs_mask, res_asinh, x);
+}
+
 /* Double-precision implementation of vector asinh(x).
    asinh is very sensitive around 1, so it is impractical to devise a single
    low-cost algorithm which is sufficiently accurate on a wide range of input.
@@ -179,7 +199,7 @@ VPCS_ATTR float64x2_t V_NAME_D1 (asinh) (float64x2_t x)
   /* Choose the right option for each lane.  */
   float64x2_t y = vbslq_f64 (gt1, option_1, option_2);
   if (__glibc_unlikely (v_any_u64 (special)))
-    return special_case (x, y, d->abs_mask, special);
+    return special_case (x, special, y, ax, d);
 
   /* Copy sign.  */
   return vbslq_f64 (d->abs_mask, y, x);
diff --git a/sysdeps/aarch64/fpu/asinh_sve.c b/sysdeps/aarch64/fpu/asinh_sve.c
index bf9dad9e4b..cb32b60c3d 100644
--- a/sysdeps/aarch64/fpu/asinh_sve.c
+++ b/sysdeps/aarch64/fpu/asinh_sve.c
@@ -29,7 +29,7 @@ static const struct data
   double even_coeffs[9];
   double ln2, p3, p1, p4, p0, p2, c1, c3, c5, c7, c9, c11, c13, c15, c17;
   uint64_t off, mask;
-
+  double inf;
 } data = {
    /* Polynomial generated using Remez on [2^-26, 1].  */
   .even_coeffs ={
@@ -61,14 +61,9 @@ static const struct data
   .p4 = -0x1.554e550bd501ep-3,
   .off = 0x3fe6900900000000,
   .mask = 0xfffULL << 52,
+  .inf = INFINITY
 };
 
-static svfloat64_t NOINLINE
-special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
-{
-  return sv_call_f64 (asinh, x, y, special);
-}
-
 static inline svfloat64_t
 __sv_log_inline (svfloat64_t x, const struct data *d, const svbool_t pg)
 {
@@ -104,12 +99,38 @@ __sv_log_inline (svfloat64_t x, const struct data *d, const svbool_t pg)
   return y;
 }
 
+static svfloat64_t NOINLINE
+special_case (svfloat64_t ax, svfloat64_t y, svuint64_t sign, svbool_t special,
+	      svbool_t pg, const struct data *d)
+{
+  /* For very large inputs (x > 2^511), asinh(x) ≈ ln(2x).
+     In this range the +sqrt(x^2+1) term is negligible, so we compute
+     asinh(x) as ln(x) + ln(2) later in this function.  */
+  svfloat64_t log_ax = __sv_log_inline (ax, d, special);
+
+  /* The only special cases that need considering are infinity and NaNs since
+     0 will be handled by other calculations.  */
+  svfloat64_t inf = sv_f64 (d->inf);
+  svbool_t is_inf = svcmpeq (special, ax, inf);
+  svbool_t is_nan = svcmpne (special, ax, ax);
+  svfloat64_t inf_ln2 = svsel (is_inf, inf, sv_f64 (d->ln2));
+  svfloat64_t inf_nan_ln2 = svsel (is_nan, sv_f64 (NAN), inf_ln2);
+  svfloat64_t asinh_x_res = svadd_x (special, log_ax, inf_nan_ln2);
+
+  /* Now select (based on special) between x and y to change the type and,
+     return either the positive or negative value, considering the input and
+     its sign.  */
+  svfloat64_t result = svsel (special, asinh_x_res, y);
+  svuint64_t result_uint = svreinterpret_u64 (result);
+  return svreinterpret_f64 (sveor_m (pg, result_uint, sign));
+}
+
 /* Double-precision implementation of SVE asinh(x).
    asinh is very sensitive around 1, so it is impractical to devise a single
    low-cost algorithm which is sufficiently accurate on a wide range of input.
    Instead we use two different algorithms:
    asinh(x) = sign(x) * log(|x| + sqrt(x^2 + 1)      if |x| >= 1
-	    = sign(x) * (|x| + |x|^3 * P(x^2))       otherwise
+      = sign(x) * (|x| + |x|^3 * P(x^2))       otherwise
    where log(x) is an optimized log approximation, and P(x) is a polynomial
    shared with the scalar routine. The greatest observed error 2.51 ULP, in
    |x| >= 1:
@@ -180,14 +201,11 @@ svfloat64_t SV_NAME_D1 (asinh) (svfloat64_t x, const svbool_t pg)
       option_2 = svmla_x (pg, ax, p, svmul_x (svptrue_b64 (), x2, ax));
     }
 
-  if (__glibc_unlikely (svptest_any (pg, special)))
-    return special_case (
-	x,
-	svreinterpret_f64 (sveor_x (
-	    pg, svreinterpret_u64 (svsel (ge1, option_1, option_2)), sign)),
-	special);
-
   /* Choose the right option for each lane.  */
   svfloat64_t y = svsel (ge1, option_1, option_2);
+
+  if (__glibc_unlikely (svptest_any (pg, special)))
+    return special_case (ax, y, sign, special, pg, d);
+
   return svreinterpret_f64 (sveor_x (pg, svreinterpret_u64 (y), sign));
 }
diff --git a/sysdeps/aarch64/fpu/asinhf_advsimd.c b/sysdeps/aarch64/fpu/asinhf_advsimd.c
index f1b1dd7a46..3454e465cb 100644
--- a/sysdeps/aarch64/fpu/asinhf_advsimd.c
+++ b/sysdeps/aarch64/fpu/asinhf_advsimd.c
@@ -25,12 +25,12 @@ const static struct data
   struct v_log1pf_data log1pf_consts;
   float32x4_t one;
   uint32x4_t square_lim;
-  float32x4_t pinf, nan;
+  float32x4_t inf, nan;
 } data = {
   .one = V4 (1),
   .log1pf_consts = V_LOG1PF_CONSTANTS_TABLE,
   .square_lim = V4 (0x5f800000), /* asuint(sqrt(FLT_MAX)).  */
-  .pinf = V4 (INFINITY),
+  .inf = V4 (INFINITY),
   .nan = V4 (NAN),
 };
 
@@ -51,31 +51,39 @@ static float32x4_t VPCS_ATTR NOINLINE
 special_case (float32x4_t ax, uint32x4_t sign, uint32x4_t special,
 	      const struct data *d)
 {
-  /* To avoid overflow in x^2 (so the x < sqrt(FLT_MAX) constraint), we
-    reduce the input of asinh to a narrower interval by relying on the
-    identity: 2asinh(t) = +-acosh(2t^2 + 1)
-    If we set t=sqrt((x-1)/2), then
-    2asinh(sqrt((x-1)/2)) = acosh(x).
-    Found that, for a high input x, asinh(x) very closely approximates
-    acosh(x), so implemented it with this function instead.  */
-  float32x4_t r = vsubq_f32 (ax, d->one);
-  r = vmulq_f32 (r, v_f32 (0.5f));
-  r = vbslq_f32 (special, vsqrtq_f32 (r), ax);
-
-  float32x4_t y = inline_asinhf (r, sign, d);
-
-  y = vbslq_f32 (special, vmulq_f32 (y, v_f32 (2.0f)), y);
-
-  /* Check whether x is inf or nan.  */
-  uint32x4_t ret_inf = vceqq_f32 (ax, d->pinf);
-  uint32x4_t ret_nan = vmvnq_u32 (vcleq_f32 (ax, d->pinf));
-  y = vbslq_f32 (ret_inf, d->pinf, y);
-  y = vbslq_f32 (ret_nan, d->nan, y);
-  /* Put sign back in for minf, as it doesn't happen in log1pf_inline call.  */
-  y = vbslq_f32 (
-      ret_inf,
-      vreinterpretq_f32_u32 (veorq_u32 (vreinterpretq_u32_f32 (y), sign)), y);
-  return y;
+  float32x4_t t
+      = vaddq_f32 (v_f32 (1.0f), vsqrtq_f32 (vfmaq_f32 (d->one, ax, ax)));
+  float32x4_t y = vaddq_f32 (ax, vdivq_f32 (vmulq_f32 (ax, ax), t));
+
+  /* For large inputs (x > 2^64), asinh(x) ≈ ln(2x).
+     1 becomes negligible in sqrt(x^2+1), so we compute
+     asinh(x) as ln(x) + ln(2).  */
+  float32x4_t xy = vbslq_f32 (special, ax, y);
+  float32x4_t log_xy = log1pf_inline (xy, &d->log1pf_consts);
+
+  /* Infinity and NaNs are the only other special cases that need checking
+     before we return the values. 0 is handled by inline_asinhf as it returns
+     0. Below we implememt the logic that returns infinity when infinity is
+     passed and NaNs when NaNs are passed. Sometimes due to intrinsics like
+     vdivq_f32, infinity can change into NaNs, so we want to make sure the
+     right result is returned.
+
+     Since these steps run in parallel with the log, we select between the
+     results we'll want to add to log_x in time, since addition with infinity
+     and NaNs doesn't make a difference. When we get to the adition step,
+     everything is already in the right place.  */
+  float32x4_t ln2 = d->log1pf_consts.ln2;
+  uint32x4_t is_finite = vcltq_f32 (ax, d->inf);
+  float32x4_t ln2_inf_nan = vbslq_f32 (is_finite, ln2, ax);
+
+  /* Before returning the result, the right sign will be assinged to the
+     absolute result. This is because we pass an absoulte x to the function.
+   */
+
+  float32x4_t asinhf
+      = vbslq_f32 (special, vaddq_f32 (log_xy, ln2_inf_nan), log_xy);
+  return vreinterpretq_f32_u32 (
+      veorq_u32 (sign, vreinterpretq_u32_f32 (asinhf)));
 }
 
 /* Single-precision implementation of vector asinh(x), using vector log1p.
@@ -99,5 +107,6 @@ float32x4_t VPCS_ATTR NOINLINE V_NAME_F1 (asinh) (float32x4_t x)
     return special_case (ax, sign, special, d);
   return inline_asinhf (ax, sign, d);
 }
+
 libmvec_hidden_def (V_NAME_F1 (asinh))
 HALF_WIDTH_ALIAS_F1 (asinh)
diff --git a/sysdeps/aarch64/fpu/atanh_advsimd.c b/sysdeps/aarch64/fpu/atanh_advsimd.c
index 711a234b41..a3798efc34 100644
--- a/sysdeps/aarch64/fpu/atanh_advsimd.c
+++ b/sysdeps/aarch64/fpu/atanh_advsimd.c
@@ -23,38 +23,43 @@
 const static struct data
 {
   struct v_log1p_data log1p_consts;
-  uint64x2_t one;
+  float64x2_t one;
   uint64x2_t sign_mask;
 } data = { .log1p_consts = V_LOG1P_CONSTANTS_TABLE,
-	   .one = V2 (0x3ff0000000000000),
+	   .one = V2 (1.0),
 	   .sign_mask = V2 (0x8000000000000000) };
 
 static float64x2_t VPCS_ATTR NOINLINE
-special_case (float64x2_t x, float64x2_t halfsign, float64x2_t y,
+special_case (float64x2_t ax, float64x2_t halfsign, float64x2_t y,
 	      uint64x2_t special, const struct data *d)
 {
   y = log1p_inline (y, &d->log1p_consts);
-  return v_call_f64 (atanh, vbslq_f64 (d->sign_mask, halfsign, x),
-		     vmulq_f64 (halfsign, y), special);
+  /* Copy the sign bit from the input to inf.  */
+  uint64x2_t is_one = vceqq_f64 (ax, d->one);
+  float64x2_t signed_inf
+      = vbslq_f64 (d->sign_mask, halfsign, v_f64 (INFINITY));
+  /* Here we check for all the rest of the cases where |x| > 1 will return a
+     NaN, including if x = NaN.  */
+  float64x2_t res = vbslq_f64 (special, v_f64 (NAN), vmulq_f64 (halfsign, y));
+
+  return vbslq_f64 (is_one, signed_inf, res);
 }
 
 /* Approximation for vector double-precision atanh(x) using modified log1p.
-   The greatest observed error is 3.31 ULP:
+   The greatest observed error is 2.81 + 0.5 ULP:
    _ZGVnN2v_atanh(0x1.ffae6288b601p-6) got 0x1.ffd8ff31b5019p-6
 				      want 0x1.ffd8ff31b501cp-6.  */
-VPCS_ATTR
-float64x2_t V_NAME_D1 (atanh) (float64x2_t x)
+VPCS_ATTR float64x2_t V_NAME_D1 (atanh) (float64x2_t x)
 {
   const struct data *d = ptr_barrier (&data);
 
   float64x2_t halfsign = vbslq_f64 (d->sign_mask, x, v_f64 (0.5));
   float64x2_t ax = vabsq_f64 (x);
-  uint64x2_t ia = vreinterpretq_u64_f64 (ax);
-  uint64x2_t special = vcgeq_u64 (ia, d->one);
+  uint64x2_t special = vcageq_f64 (x, d->one);
 
   float64x2_t y;
   y = vaddq_f64 (ax, ax);
-  y = vdivq_f64 (y, vsubq_f64 (vreinterpretq_f64_u64 (d->one), ax));
+  y = vdivq_f64 (y, vsubq_f64 (d->one, ax));
 
   if (__glibc_unlikely (v_any_u64 (special)))
     return special_case (ax, halfsign, y, special, d);
diff --git a/sysdeps/aarch64/fpu/atanh_sve.c b/sysdeps/aarch64/fpu/atanh_sve.c
index fddf4c8706..558984974d 100644
--- a/sysdeps/aarch64/fpu/atanh_sve.c
+++ b/sysdeps/aarch64/fpu/atanh_sve.c
@@ -20,13 +20,20 @@
 #define WANT_SV_LOG1P_K0_SHORTCUT 0
 #include "sv_log1p_inline.h"
 
-#define One (0x3ff0000000000000)
-#define Half (0x3fe0000000000000)
+static const struct data
+{
+  uint64_t half;
+  double inf;
+  double nan;
+} data = { .half = 0x3fe0000000000000, .inf = INFINITY, .nan = NAN };
 
 static svfloat64_t NOINLINE
-special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
+special_case (svfloat64_t ax, svfloat64_t y, svbool_t pg, svbool_t special,
+	      svfloat64_t halfsign, const struct data *d)
 {
-  return sv_call_f64 (atanh, x, y, special);
+  svfloat64_t res = svsel (special, sv_f64 (d->nan), y);
+  res = svsel (svcmpeq (special, ax, sv_f64 (1.0)), sv_f64 (d->inf), res);
+  return svmul_x (pg, res, halfsign);
 }
 
 /* SVE approximation for double-precision atanh, based on log1p.
@@ -35,24 +42,25 @@ special_case (svfloat64_t x, svfloat64_t y, svbool_t special)
 				      want 0x1.ffd8ff31b501cp-6.  */
 svfloat64_t SV_NAME_D1 (atanh) (svfloat64_t x, const svbool_t pg)
 {
+  const struct data *d = ptr_barrier (&data);
 
   svfloat64_t ax = svabs_x (pg, x);
   svuint64_t iax = svreinterpret_u64 (ax);
   svuint64_t sign = sveor_x (pg, svreinterpret_u64 (x), iax);
-  svfloat64_t halfsign = svreinterpret_f64 (svorr_x (pg, sign, Half));
+  svfloat64_t halfsign = svreinterpret_f64 (svorr_x (pg, sign, d->half));
 
   /* It is special if iax >= 1.  */
-  svbool_t special = svacge (pg, x, 1.0);
+  svbool_t special = svacge (pg, ax, 1.0);
 
   /* Computation is performed based on the following sequence of equality:
 	(1+x)/(1-x) = 1 + 2x/(1-x).  */
   svfloat64_t y;
   y = svadd_x (pg, ax, ax);
-  y = svdiv_x (pg, y, svsub_x (pg, sv_f64 (1), ax));
+  y = svdiv_x (pg, y, svsub_x (pg, sv_f64 (1.0), ax));
   /* ln((1+x)/(1-x)) = ln(1+2x/(1-x)) = ln(1 + y).  */
   y = sv_log1p_inline (y, pg);
 
   if (__glibc_unlikely (svptest_any (pg, special)))
-    return special_case (x, svmul_x (pg, halfsign, y), special);
+    return special_case (ax, y, pg, special, halfsign, d);
   return svmul_x (pg, halfsign, y);
 }