From patchwork Mon Dec 29 15:00:19 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 127184 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 90A564BA2E06 for ; Mon, 29 Dec 2025 15:02:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 90A564BA2E06 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=GTP2GqVV; dkim=pass (1024-bit key) header.d=arm.com header.i=@arm.com header.a=rsa-sha256 header.s=selector1 header.b=GTP2GqVV X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from DUZPR83CU001.outbound.protection.outlook.com (mail-northeuropeazon11012041.outbound.protection.outlook.com [52.101.66.41]) by sourceware.org (Postfix) with ESMTPS id 5E2DA4BA2E05 for ; Mon, 29 Dec 2025 15:01:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5E2DA4BA2E05 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5E2DA4BA2E05 Authentication-Results: server2.sourceware.org; arc=pass smtp.remote-ip=52.101.66.41 ARC-Seal: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1767020491; cv=pass; b=MB7OA+KtHrVnwCCVaGMYPZx/NNSlc2lesGnJip115nmVKbWZ4KTPCT1vVli9VDEcrocuNoXxbmIE8gvhscX2c+O/ELxplWQtEqpyWLuY6YypFyxzewaq8ilINpgd+edzsmi2GHTX8j3QNEeC/GnvDN2k/+tKRpKDA56YwbDB/iA= ARC-Message-Signature: i=3; a=rsa-sha256; d=sourceware.org; s=key; t=1767020491; c=relaxed/simple; bh=SEAI8e8glf418lJKNA+KxtkI4jgfD14yYVQ4GW2XToM=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=HFOdwcVs6wLPPSQ485a7RjVuZ3XJZu1mQe1rfEvO5LHYPAGY4qOdhoZzbH1YY5xaLVHsmXAcYhnFZI27kybWr5AWVPdGvJdF30K1tuOdkvsvRrGB6dTl/HTig5zomsfx0gxM1RY4Kp/AD57/rf3hKTUaynDoidV1ABxCwvWoP8U= ARC-Authentication-Results: i=3; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5E2DA4BA2E05 ARC-Seal: i=2; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=pass; b=iyCaM4S6M7S+jZVMIE6YKt320eikW4LUPZXxD9WuieM5Ew8ujvZ0fdkdHADeOQ70kbWRfZ1OMnSwftUmuzzbZ8QVMYbic2d9A29nRK2KR8MFlD95QL4EETWuUh+eIB05pIjC5jnkeFr0yPTFDQ+kVrxK5jH9wI0SJGwisWQHhHOfE76RNBnsxQZu+0bQiEyeJ0U4s7iS8sGb0uqtg1jn+0P17jrECu1eqA5PRcDHsbIuxS5NNMWlnxVrCq+1KI2HcHJR33iMOp6Gt9Bf99U9KwB8b1skpA/Dq+b5s7VMnaYq+oKx4CXxlFI/s9x/ZhE2rM9wjvEeqmXA1kfEQSU8Ng== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GEkx3eRhVyQ3F8zdOufZTxZaKXlo+4cvsrTzQ9MSFgM=; b=BvX+RnzN4tO24iT/6QUEvSTqpz1zDqZICzqDK8GuNwQeagL4tW7eo1IczLPDHxObtzN13WuxTkHU1D3mQdfDYR053VdO6QTf3P3YFJi8XOEDYT69uI7BcwXN/fTQEYEmPrre3JfwqFy+/SMFYUgZekZDvIjX+vGt/lL3kzpM4aO2iDZEQ0zQ6I9pNdx++CvRrqp9pZiUl2ReSVNOWPPatgdu5LTBnSlBSitc7hDwe0jKFCsFp6pPxXbWOckvjvWbXeNdvi7GIcWnDouFL3thqdxcNk1lJ9aoqAAt2RZeFghTfudAgmQckGXMCxzRI1BmS+cxyLuGfVwx+Fzc1HnX4g== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 4.158.2.129) smtp.rcpttodomain=foxmail.com smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=arm.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GEkx3eRhVyQ3F8zdOufZTxZaKXlo+4cvsrTzQ9MSFgM=; b=GTP2GqVVxjieuT+aVCwIuqrapXy2hwtidI3CgBsfo4/v3YOlMgN6M4JaY4OLDnjhrycblrp3fH4BQ7JHbxfwk3FPq9EFYdG8PyfT7w5NILh2G5V9h1zyl032mdbV73nnx5sj4VvDWe3OMd5nmPKJb/6hK0H4NBsEjAYsKtAck60= Received: from DB8P191CA0017.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:130::27) by PA6PR08MB10393.eurprd08.prod.outlook.com (2603:10a6:102:3cb::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9456.13; Mon, 29 Dec 2025 15:01:24 +0000 Received: from DU2PEPF0001E9C3.eurprd03.prod.outlook.com (2603:10a6:10:130:cafe::4e) by DB8P191CA0017.outlook.office365.com (2603:10a6:10:130::27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9434.10 via Frontend Transport; Mon, 29 Dec 2025 15:01:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 4.158.2.129) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=arm.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 4.158.2.129 as permitted sender) receiver=protection.outlook.com; client-ip=4.158.2.129; helo=outbound-uk1.az.dlp.m.darktrace.com; pr=C Received: from outbound-uk1.az.dlp.m.darktrace.com (4.158.2.129) by DU2PEPF0001E9C3.mail.protection.outlook.com (10.167.8.72) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9478.4 via Frontend Transport; Mon, 29 Dec 2025 15:01:22 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TfFlNzsgqsqG9U1O9fzLvzuI95xgwZVMSIn+SjL+5ANd7+1oI+ge5C2zscM3qTl+VlnAwkfoVt7zjYQUxEtr96iI4gO9fyqU1Czur4CpQ1k7Jf12HhoMZGg5XHsOraJTs/DsACl2nW+sN85tS4qoscwhINaQsTB9VopT6LbraTnbPsEnwv9qdSv1l+Sg92yfAQOvr25seyn16auFnOpSpG5zqFNXk+9idzS2sLnIGyOnLMCe1Wz5aAtkgO0ePj1cVX6HbZeEtPil3AlXflU3pityaFff1Mp1d/QeslMqp8V32scg+DkLPqq6r5QQAQntyazYQzkADWk+QIFwTxXUmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GEkx3eRhVyQ3F8zdOufZTxZaKXlo+4cvsrTzQ9MSFgM=; b=T74VOwK0Tu6rY0lYp5Kv8z+eM8xsJg3h1p5jtMi+28TbBOhUiGnh1Lvf8Tho5bzkuQsc/DRf4VNs4EeSwPKXlF+ffFKuImQuSIZM4DiSDxqxbNJb1e5Cl7ob6u5zZWsC6u4BguJTt1K+VTxjgWunK/IFoV1WbnSu6Wz0m4pQlJXlhd7dFCtAbXEry9nDx7SKpoemdfcxC9YmE28pwmEQmMFAgonPfqf54vxlm94BZoVqium1i5WsigdzryURAugWXtNin8bDW/9k65X8IIzqYXC89371knYo4bOc88Zs+fpZo+FQ4WEvwqd4sYvlcFFlaHCEwkQwkh5yBfU8bUDdgQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arm.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GEkx3eRhVyQ3F8zdOufZTxZaKXlo+4cvsrTzQ9MSFgM=; b=GTP2GqVVxjieuT+aVCwIuqrapXy2hwtidI3CgBsfo4/v3YOlMgN6M4JaY4OLDnjhrycblrp3fH4BQ7JHbxfwk3FPq9EFYdG8PyfT7w5NILh2G5V9h1zyl032mdbV73nnx5sj4VvDWe3OMd5nmPKJb/6hK0H4NBsEjAYsKtAck60= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AM8PR08MB6514.eurprd08.prod.outlook.com (2603:10a6:20b:36b::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9456.14; Mon, 29 Dec 2025 15:00:19 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::b366:6358:236e:352d]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::b366:6358:236e:352d%6]) with mapi id 15.20.9456.013; Mon, 29 Dec 2025 15:00:19 +0000 From: Wilco Dijkstra To: "weihong_ye@foxmail.com" CC: GNU C Library Subject: [PATCH] aarch64: Optimize memcpy for Kunpeng 950 using SVE and loop unrolling Thread-Topic: [PATCH] aarch64: Optimize memcpy for Kunpeng 950 using SVE and loop unrolling Thread-Index: AQHcdyqvaA4Wb/slzEeFGIsaF82D0g== Date: Mon, 29 Dec 2025 15:00:19 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AM8PR08MB6514:EE_|DU2PEPF0001E9C3:EE_|PA6PR08MB10393:EE_ X-MS-Office365-Filtering-Correlation-Id: b87c61da-1bfd-4de7-d544-08de46eb219b x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; ARA:13230040|376014|366016|1800799024|38070700021; X-Microsoft-Antispam-Message-Info-Original: =?iso-8859-1?q?f0JLKxgG3TL13CaMl?= =?iso-8859-1?q?k3Ljtn83MjqfstkfCLjFyWhe/YxaM9ZgyV3IGFNzoH6R+OnK5Kv1QEpdF2gZ?= =?iso-8859-1?q?8BoWlpv9KAMktOE7Q7vcpLn50RDN+WEurqoeMfgr3NyKZxYdbBixru/jsi9B?= =?iso-8859-1?q?q0Ibw944Ew0j+BfKN587P9sScgDqiArLfCewr4uHJ/Dq7UZ91v7sAlOle5Ch?= =?iso-8859-1?q?Lmmzulahkfxesz1ugXwXDeJ+VatXM9KDqjnpki2yLyQ7Mrz0N6vSaeExZTHC?= =?iso-8859-1?q?IBNEH/BFCO1rQY9q/0uLC6aSiZaTYyHRfGKs822zdZnrDhPYc1xDmeOPZnW4?= =?iso-8859-1?q?U7X8YSKlN+JP/t4/qYf92N9OQCBkahJvTUlooo7pNuH88bcJadWpyzUVKI6+?= =?iso-8859-1?q?ljcRH2lHSPYILuGTARgqtaZsWsgPXHUn98q+oeWfhNy8jNc+5WlHsUFOt6WS?= =?iso-8859-1?q?bpqtYn12P1OTpGIS5doHWDGt976oWRGximsS/DWIxHhpPGL4Z6MR4olpRH8p?= =?iso-8859-1?q?l6KAdZpqJ8x38vaqhj9RJZKZGbknC7yfHm7CKm8qy7eZ1xOXDPs5UvXpzuS7?= =?iso-8859-1?q?Mn6++NZvQmbO8kVY8KyZOWQsMjiu0ky5xNssUOSCEYng5e2DYKRueSFFHCqH?= =?iso-8859-1?q?nyWX3qjEPplawcrdMB9s5ycZIIctRKaehcC6DYKyujXyAHVwAgTh1k7k6QuW?= =?iso-8859-1?q?04+BDGp0jOHwBjDjcqx0jI5CujUUzY/2483WScIkAZ908ATNqNklsdXwoeV0?= =?iso-8859-1?q?VQ4YLPfpQX5ktvW1euo2LHFDWFy0lYqQhAzGDjB5HTwMC2kLmmNuxku0A9pm?= =?iso-8859-1?q?jou26ecOnx3PAa90etArS2XGf+QfmgYFfIRd6HcuX0nzjVJlvU934Wd+I2Nx?= =?iso-8859-1?q?Dt8t/kpRlQjSgJ1oSHpgSLPbJ8FEO9+o11AG4y0mD6LaXXN8QslNdVtgGKEu?= =?iso-8859-1?q?jdX0abVcNL1kBn+sOC1GbY6GuCflAsPEzoBwZuiAJyObEDRQfDXpOpUca6ou?= =?iso-8859-1?q?oQxlpUAEpqu/4hog3lVkVYvcf+zde/+A8A0NggRx4u/S8jH/yFbxg3NgF8Xu?= =?iso-8859-1?q?b3YL6nZ8cCJOT1vN9HtsKmjcnw5cN7lBE4rbzHyGPdYQUEsCx3bcSwyMewoa?= =?iso-8859-1?q?7nxb4xrV1lixQA8g7b+gRqfAZXt0okTZdRfLJHpsXKMVLfpcI/F5QnFWqOvf?= =?iso-8859-1?q?9OvA0BjQ+3kYXPkKij3HMjlXX5BMITowixBBgcupm8fJ/eSluZUmggKe8NsR?= =?iso-8859-1?q?shFZ/8WoMSbLygmgkdUm3Dqi8uuXpEy/bk+QgMbFoLiCdcMVqYGu6nF1dJsI?= =?iso-8859-1?q?8kEQGzLT9rfYsmnHCbD9Y9UxWryhIN15d2YJ64HkBFkn5MFT/5MRLUyTId1s?= =?iso-8859-1?q?cmOAYdiHHIDcvW/+Q5ktPqWAigi819qZVei/Fbc+CWmAvmsRctFM0mqnnKyn?= =?iso-8859-1?q?vahfzz/XPQ+eKDIb24YP6YqlVC8VjJflRY3ezDIFouk5ko1W8a5nGu1lEZoT?= =?iso-8859-1?q?LTDwlA3DIPeY2xWMnAkCtmQKN64Sh6Jmf0mervDxNDeLMmQFjA3LNP9ADKOt?= =?iso-8859-1?q?jJ1Op4lbpjl5ro36RKJ?= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024)(38070700021); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM8PR08MB6514 X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DU2PEPF0001E9C3.eurprd03.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 4f0bc48f-9e3d-4475-729f-08de46eafbca X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|14060799003|35042699022|376014|36860700013|1800799024; X-Microsoft-Antispam-Message-Info: =?iso-8859-1?q?VhSn17VHjewQInxPUdLdKyJXr+?= =?iso-8859-1?q?Y4R1O8PHdxdyGjJ4C8NIjjDIlujpbAdaKf65o5coBpThkymFnMOI69NTWpY5?= =?iso-8859-1?q?P5p3iJsCE7zcJ6IrOXU9Q0WyBnjF6G9GuomYrHFa6JID622d2V9YsEX8w6qt?= =?iso-8859-1?q?vsVf1Bl2YyY3kK95JUMzuZSYIxd8p2z3qo+WWiqiqs5BPhZ2FsQC+bqEbMU+?= =?iso-8859-1?q?IU4mrZQu05LYVBg87D7n2vKTsY8XQrQk99c3bEWT/Q8WUbyTIIRMYqWDmo/5?= =?iso-8859-1?q?U7NzmA7SdFsq4AupYikQcGwf9aW8gkwUPg7AszXQrnvagg6GMY2zNuWo1AY2?= =?iso-8859-1?q?5K3BfMRiK2hG98oXo434qu7R4nB0PlMbpMp+fmACG+WLFZZJEuJTzWFu48GY?= =?iso-8859-1?q?SrN8cQAEHv121oYY/1SJ28r/zkTKzpSek/oTAbtzBIABq3gT/BBVKUxgycOg?= =?iso-8859-1?q?/3ZsY4XBaIZK9n/MD42dJOhk8r72AW6aO0rckoRkk8j6bB/xFWQuXciBjXq1?= =?iso-8859-1?q?WaisYEaGQe1akGX8+4tESlsc3KBNfzl/gNKS7YAkoH8WdCI4bzIDZAwZ0qGx?= =?iso-8859-1?q?1RpHi/SBuY/Wb0EHNm9vPx0YPrraoPxGi9oTcmEuxhQOHwhc88rSrujMnLsS?= =?iso-8859-1?q?vtRHToMiEplKCw/hZnUsw4ut5ujuJPI90OF6T5+kohGVMMlpy/M/IKA6BVae?= =?iso-8859-1?q?HmeQSvT4pMZ0KDQ41udf9xZ0s8YqH9MPliV/T2TokPi62YbcXJCckbEurpcI?= =?iso-8859-1?q?fsPOjDpVztr63PQCATR/YP+F0fbjEjTrSjWw7pH1L+2qMBiotuKygUYEC/l/?= =?iso-8859-1?q?a7o8vjD6XpdJEWLqIvweLz6tkHdaYj/hsqNyRWIWRobg9XwFfNUlA1H+mJ8/?= =?iso-8859-1?q?4T6ZSaTFbzykT8A4x6kKh4iwzpctpghVPN5lq1hoEbfaZp1frjB+PSUtzz45?= =?iso-8859-1?q?YjYsx2dstRkf2+cKo12+d4rTgEwjUrHxAhwDmjgokzc0xyaGxmReWOdTYQQQ?= =?iso-8859-1?q?i5v00lpjAxC1j/EOl/jAw+1/7P8QKELv5T5QqEbb5IoAbrkW8cttvl/bbS8E?= =?iso-8859-1?q?m0dSbBkJMOscRm7g1njj0HRc4bUdypUYBjp1VMfFfaUzWGKnpqIkOfH2belo?= =?iso-8859-1?q?wnZoioPrb0pai6vecP/Tpfxbd5Bmch5EdJEiL19p1sdYK3lTiftXyTZl7n6x?= =?iso-8859-1?q?ORfz2NICglNoZo9lItkUfgi6IK1eAQJK0yQQcjgfQXciqI1JW1OYvN5cIb2y?= =?iso-8859-1?q?4dc1WqpxoN4OPlZ2a10lojODjaAmQDBBKIadvtAg9dLxBI9v05EcudLDrU18?= =?iso-8859-1?q?XGLVUzXqOkcRt+oME541HZdBEEDonpag+oPAEh/JVnrnWfxa6NU1HpMWhchi?= =?iso-8859-1?q?QDpZ43m1HSP6cXora3EizIrwO8VBr7CqS1btCQJvzPfgmF8pmodR+wdB2OZ5?= =?iso-8859-1?q?fU5vwNP+A+RQLESOZpiTsf4ZMhlSzEL3XzyFtNWWPz4ddGcnBAEXgq7TvMkD?= =?iso-8859-1?q?1KEyoUJeohcWEafrLv2v/KsiRazUVtdL7ij0V2vu/SuoD8tHhYzEWQ0+sxsd?= =?iso-8859-1?q?OEFY/GZvaiUutdgdo/NgkU3K2HWr/9FLQzw+NmtDHkIoyDB14YCLM=3D?= X-Forefront-Antispam-Report: CIP:4.158.2.129; CTRY:GB; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:outbound-uk1.az.dlp.m.darktrace.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(82310400026)(14060799003)(35042699022)(376014)(36860700013)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Dec 2025 15:01:22.7634 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b87c61da-1bfd-4de7-d544-08de46eb219b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[4.158.2.129]; Helo=[outbound-uk1.az.dlp.m.darktrace.com] X-MS-Exchange-CrossTenant-AuthSource: DU2PEPF0001E9C3.eurprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA6PR08MB10393 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FORGED_SPF_HELO, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_PASS, SPF_NONE, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Hi Weihong, So overall the results look good. I've pointed out a few minor issues below. However the main issue is that it won't work with different vector lengths and thus fails the benchmarks and testsuite on other machines, so that will need to be fixed. It's clearly better in the random and large benchmarks - the main loop is fastest from 65536 bytes. However memcpy_sve and memcpy_a64fx are typically faster from 64 bytes. Since the sizes 64-1024 are important, it is worth checking whether you can further improve that range. Perhaps a lower alignment, less unrolling or aligning the destination (which is what memcpy_a64fx does) works out better for medium sizes. Note this could be done in later patches if needed. diff --git a/sysdeps/aarch64/cpu-features.h b/sysdeps/aarch64/cpu-features.h index 855990b575..8330c13884 100644 --- a/sysdeps/aarch64/cpu-features.h +++ b/sysdeps/aarch64/cpu-features.h @@ -45,6 +45,9 @@ #define IS_KUNPENG920(midr) (MIDR_IMPLEMENTOR(midr) == 'H' \ && MIDR_PARTNUM(midr) == 0xd01) + +#define IS_KUNPENG950(midr) (MIDR_IMPLEMENTOR(midr) == 'H' \ + && MIDR_PARTNUM(midr) == 0xd06) #define IS_A64FX(midr) (MIDR_IMPLEMENTOR(midr) == 'F' \ && MIDR_PARTNUM(midr) == 0x001) diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile index 1c3c392513..96ad9828d2 100644 --- a/sysdeps/aarch64/multiarch/Makefile +++ b/sysdeps/aarch64/multiarch/Makefile @@ -3,6 +3,7 @@ sysdep_routines += \ memchr_generic \ memchr_nosimd \ memcpy_a64fx \ + memcpy_kunpeng950 \ As you already noted, the entries need to be alphabetically sorted. memcpy_generic \ memcpy_mops \ memcpy_oryon1 \ diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 0e26171929..a06c8882e4 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -38,6 +38,7 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_oryon1) IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_a64fx) IFUNC_IMPL_ADD (array, i, memcpy, sve, __memcpy_sve) + IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_kunpeng950) This needs to use "sve". Currently it will fail benchmarks and the testsuite on any machine that doesn't have SVE or the same SVE vector length... Also note indentation. IFUNC_IMPL_ADD (array, i, memcpy, mops, __memcpy_mops) IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) IFUNC_IMPL (i, name, memmove, diff --git a/sysdeps/aarch64/multiarch/memcpy.c b/sysdeps/aarch64/multiarch/memcpy.c index 894dabe2ef..d649f22c1c 100644 --- a/sysdeps/aarch64/multiarch/memcpy.c +++ b/sysdeps/aarch64/multiarch/memcpy.c @@ -34,12 +34,16 @@ extern __typeof (__redirect_memcpy) __memcpy_a64fx attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_sve attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_mops attribute_hidden; extern __typeof (__redirect_memcpy) __memcpy_oryon1 attribute_hidden; +extern __typeof (__redirect_memcpy) __memcpy_kunpeng950 attribute_hidden; static inline __typeof (__redirect_memcpy) * select_memcpy_ifunc (void) { INIT_ARCH (); + if (IS_KUNPENG950 (midr)) + return __memcpy_kunpeng950; + This should be inside the if (sve) block. You cannot assume the OS always enables SVE. diff --git a/sysdeps/aarch64/multiarch/memcpy_kunpeng950.S b/sysdeps/aarch64/multiarch/memcpy_kunpeng950.S new file mode 100644 index 0000000000..cb03a1f762 --- /dev/null +++ b/sysdeps/aarch64/multiarch/memcpy_kunpeng950.S @@ -0,0 +1,137 @@ +/* Optimized glibc function for Huawei Kupeng 950 processor. + Copyright (C) 2012-2022 Free Software Foundation, Inc. Surely 2025? + Copyright (c) 2025 Huawei Technologies Co., Ltd. Does Huawei have a copyright assignment with FSF? If so, you only need the FSF Copyright here. If not, and it is as an individual Signed-off-by, then I think you just mention "Copyright The GNU Toolchain Authors." here. +ENTRY (__memcpy_kunpeng950) + cmp cnt, 192 + b.hi L(192_more) + cntb vlen + +L(less_64): + whilelo p0.b, xzr, cnt + whilelo p1.b, vlen, cnt + ld1b z0.b, p0/z, [src, 0, mul vl] + ld1b z1.b, p1/z, [src, 1, mul vl] + st1b z0.b, p0, [dst_in, 0, mul vl] + st1b z1.b, p1, [dst_in, 1, mul vl] + subs cnt, cnt, 64 This makes the code dependent on vector length... You can do "subs cnt, cnt, vlen, lsl 1" here (which should not be slower), but it will need more changes to become vector length agnostic. + b.hi L(64_more) + ret + +L(64_more): + whilelo p2.b, xzr, cnt + whilelo p3.b, vlen, cnt + ld1b z2.b, p2/z, [src, 2, mul vl] + ld1b z3.b, p3/z, [src, 3, mul vl] + st1b z2.b, p2, [dst_in, 2, mul vl] + st1b z3.b, p3, [dst_in, 3, mul vl] + subs cnt, cnt, 64 + b.hi L(128_more) + ret + +L(128_more): + whilelo p4.b, xzr, cnt + whilelo p5.b, vlen, cnt + ld1b z4.b, p4/z, [src, 4, mul vl] + ld1b z5.b, p5/z, [src, 5, mul vl] + st1b z4.b, p4, [dst_in, 4, mul vl] + st1b z5.b, p5, [dst_in, 5, mul vl] + ret Is it worth aligning so the main loop ends up 16-byte aligned without nops? +L(192_more): + ldp E_q, F_q, [src] + ldp G_q, H_q, [src, 32] + add src_end, src, cnt + add dst_end, dst_in, cnt + mov dst, dst_in + and tmp1, src, 63 + cbz tmp1, L(already_align_64) This adds 2 instructions to skip 3 - and since it's more likely to not be 64-byte aligned, you usually execute 5 instructions rather than 3... Also is there a real benefit to 64-byte alignment? LDP might prefer 32-byte alignment rather than 16, but 64? Also it is worth checking whether aligning STP instead works out better based on the benchmark results. + bic src, src, 63 + sub dst, dst_in, tmp1 + add cnt, cnt, tmp1 +L(already_align_64): + ldp A_q, B_q, [src, 64] + ldp C_q, D_q, [src, 96] + + ldp I_q, J_q, [src_end, -128] + ldp K_q, L_q, [src_end, -96] + ldp M_q, N_q, [src_end, -64] + ldp O_q, P_q, [src_end, -32] + + stp E_q, F_q, [dst_in] + stp G_q, H_q, [dst_in, 32] + subs cnt, cnt, 128+64+64 + b.ls L(tail128_align_64) + + .p2align 4 +L(loop128_align_64): + ldp E_q, F_q, [src, 128] + stp A_q, B_q, [dst, 64] + ldp G_q, H_q, [src, 160] + stp C_q, D_q, [dst, 96] + ldp A_q, B_q, [src, 192] + stp E_q, F_q, [dst, 128] + ldp C_q, D_q, [src, 224] + stp G_q, H_q, [dst, 160] + + add src, src, 128 + add dst, dst, 128 + subs cnt, cnt, 128 + b.hi L(loop128_align_64) + +L(tail128_align_64): + stp A_q, B_q, [dst, 64] + stp C_q, D_q, [dst, 96] + + stp I_q, J_q, [dst_end, -128] + stp K_q, L_q, [dst_end, -96] + stp M_q, N_q, [dst_end, -64] + stp O_q, P_q, [dst_end, -32] + ret +END (__memcpy_kunpeng950) \ No newline at end of file Please fix. Cheers, Wilco