From patchwork Tue Oct 14 12:14:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiamei Xie X-Patchwork-Id: 121850 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 36AF53857359 for ; Tue, 14 Oct 2025 12:18:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 36AF53857359 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mailgw2.hygon.cn (unknown [101.204.27.37]) by sourceware.org (Postfix) with ESMTP id 7FFDA3857432 for ; Tue, 14 Oct 2025 12:14:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7FFDA3857432 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=hygon.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=hygon.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7FFDA3857432 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=101.204.27.37 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760444080; cv=none; b=fAZBzpXIckQIt/60xguAoUFGiaV6QT9nR6+jksRRJxx4DgEX+Wasgve17p9F234n2UORk4pqSyNsYta8prYI9zY5VRFBRMehUjSr30lxrLtEFhlh50QnZafH4kWHPzWhQQQ8PXqu5xgE7ANUa4KXaJ6cUM8Y5tgVrZw9DCbn0yw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1760444080; c=relaxed/simple; bh=Ej/IH2vpjoYyAC0xkGzhHiyfxgDrLdTwazF+ClLqX1Q=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=innNN0it0eRvwNa5OFImOvXubReNfNPUDj6hME/K5XzPxBWPTOfjyhuGmr20ljCfrBSm5ym5squpAZdINIcAt2Fp9v2qnWlUTb/CF0gba1iaRJ52qrcxN7Dwq/3Zf+V60K3a3k3AclAUwtAtNITnfzQiCw79CcR5oi0XEeHQTjM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7FFDA3857432 Received: from maildlp2.hygon.cn (unknown [127.0.0.1]) by mailgw2.hygon.cn (Postfix) with ESMTP id 4cmCnp6MtDz1YQrFd; Tue, 14 Oct 2025 20:14:38 +0800 (CST) Received: from maildlp2.hygon.cn (unknown [172.23.18.61]) by mailgw2.hygon.cn (Postfix) with ESMTP id 4cmCnm39XKz1YQrFd; Tue, 14 Oct 2025 20:14:36 +0800 (CST) Received: from cncheex04.Hygon.cn (unknown [172.23.18.114]) by maildlp2.hygon.cn (Postfix) with ESMTPS id 48ED130004DA; Tue, 14 Oct 2025 20:11:30 +0800 (CST) Received: from mercury.hygon.cn (172.22.228.121) by cncheex04.Hygon.cn (172.23.18.114) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.27; Tue, 14 Oct 2025 20:14:36 +0800 From: Jiamei Xie To: CC: xiejiamei , Li jing Subject: [PATCH v2 1/1] x86: fix wmemset ifunc stray '!' (bug 33542) Date: Tue, 14 Oct 2025 20:14:11 +0800 Message-ID: <20251014121411.11623-2-xiejiamei@hygon.cn> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20251014121411.11623-1-xiejiamei@hygon.cn> References: <20251014121411.11623-1-xiejiamei@hygon.cn> MIME-Version: 1.0 X-Originating-IP: [172.22.228.121] X-ClientProxiedBy: cncheex06.Hygon.cn (172.23.18.116) To cncheex04.Hygon.cn (172.23.18.114) X-Spam-Status: No, score=-9.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org The ifunc selector for wmemset had a stray '!' in the X86_ISA_CPU_FEATURES_ARCH_P(...) check: if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, AVX_Fast_Unaligned_Load, !)) This effectively negated the predicate and caused the AVX2/AVX512 paths to be skipped, making the dispatcher fall back to the SSE2 implementation even on CPUs where AVX2/AVX512 are available. The regression leads to noticeable throughput loss for wmemset. Remove the stray '!' so the AVX_Fast_Unaligned_Load capability is tested as intended and the correct AVX2/EVEX variants are selected. Impact: - On AVX2/AVX512-capable x86_64, wmemset no longer incorrectly falls back to SSE2; perf now shows __wmemset_evex/avx2 variants. Testing: - benchtests/bench-wmemset shows improved bandwidth across sizes. - perf confirm the selected symbol is no longer SSE2. Signed-off-by: xiejiamei Signed-off-by: Li jing --- sysdeps/x86_64/multiarch/ifunc-wmemset.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sysdeps/x86_64/multiarch/ifunc-wmemset.h b/sysdeps/x86_64/multiarch/ifunc-wmemset.h index f95cca6ae5..50af138230 100644 --- a/sysdeps/x86_64/multiarch/ifunc-wmemset.h +++ b/sysdeps/x86_64/multiarch/ifunc-wmemset.h @@ -35,7 +35,7 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, - AVX_Fast_Unaligned_Load, !)) + AVX_Fast_Unaligned_Load,)) { if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)) {