From patchwork Fri Nov 28 08:37:20 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 125479 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 681433858D29 for ; Fri, 28 Nov 2025 08:46:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 681433858D29 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=PG1aXr9+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id 038B83858408 for ; Fri, 28 Nov 2025 08:37:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 038B83858408 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 038B83858408 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1764319053; cv=none; b=W9wHZXyYF9kxN07khGlHCNidwMbWG3drwj3EP18wAj1jRUohatVnY1k54iuHEEHUH6f/0xCSvT4LjZ43WmDjMimhZ4uHMpV0Z0wpsyZrUFQgTRSOu1HSypp7CM+B4vB+yoctStzOU5Mcn5LqlXCwQ1UQXUPqb5GQLOs+Ez6+zX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1764319053; c=relaxed/simple; bh=XHCyt6OHEPCBE+GgPnVhPLwmPo86pJlBtdxVWZNcRio=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=mRsD31YMWqBcdiPAz/X0Rez/dLRzb5yrGwqZCp0JO0JosRRjAJz6erLiCQYNhPozZPa8jUWINLBN0zllL/ac+gbCr1bqnY1+NDagOB6zERAC1hmpY3NkISKE9qH+7EqztpXUZ6UOMKRcTqYOnhKVmAl6HYI3sYc3J+uNOS+i1wA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 038B83858408 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-298287a26c3so19126715ad.0 for ; Fri, 28 Nov 2025 00:37:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764319052; x=1764923852; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lOMkK2kbjAnlYglHHu4tAQizj5WiBSmTlrkRz3Ai1qI=; b=PG1aXr9+I3a/1OUjqg+X4GpKZTXIjvlljk3qjfRCedtG5wc9dlDy4w8quwYFQ30VxI Kk0ZpcXK3j/6SvA5wTpT9UZeU7lpsJDn6j+6C2g3PSjPtbKWIPbt9IYdYXUxFlZxVym/ 0KUeMU2njb97Fu6xDWnxexPJ6NV9nx2+cQdjgqvmvjhO9dw6a0NkcEgLn49pWlmXUTaA 1bKNTjNtK2Yx3BGDSyRdbeCdgdgsu4jQF/GHptvuTQ4lCdDDaNGGYBS6cMRU8OgoFvGc ZFBiHKJ6Mllte38+s7HnU4lqk9lAb29fDlo5Bijc5TyZB2ZXV2tCf+dbBFnKlkGwVuZx k7ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764319052; x=1764923852; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=lOMkK2kbjAnlYglHHu4tAQizj5WiBSmTlrkRz3Ai1qI=; b=RDTx3JrvH3FQyPFRxMxfvDVI8kAovs9Vo7mFA+94PXENysuFKgI4UAuxHqYSWQu2L+ iYxuMV6fgH2LQONYf+Wvv8DL2QZOAb7ajBWIBG6XLiJ3iZ2Ds2yk90tOff+14VqLFJS2 VmUqC/y7q6QOsvscUElMmvY3EC57uQS4rX16pzWi32nnknKkFfyeH0Nm+azAaSzsEQ8s OZ+OoBicOg9Mm4TQOivTzkslCj2EwgoOL0/AsfcRd5EuCKJeOsTsgA1rPdSgvwX0Srhe K5ZKzFWS36rJtU4J1jnGNd9oIXGTB8vGwY3D6a7AILFtb/IdgyN8spwa8YA+gdDpdwvT OrMQ== X-Gm-Message-State: AOJu0YzJFJXwjlK9HDbLyzom3suFcYpS2RO+YNS0YF+ku7bW4Lsqy8gT mWaR4tkeh54JIHvcYuCdXXosWkSQsFAl6Ru44QjsMSJLRfkBUmAbwiEwqHjKGFqWs1s= X-Gm-Gg: ASbGnctawSA8EDD48KgpQRwtFldIu8JQY2WOGzGBewgpfF39LK/QqksqysmkK0kkYdx 1m7NPzF3cf53rYM3H7u1LWXsy7ZUTUp2W9YlFQj24cYJrn2Ndym6Yf8Vptl8uEJGU8cbNHh+W49 CX2zG5pU2fzBZW84ulMbdmRG8Tbzo45jfDVjXQsBV4CefO5XzVJE/ODknpgut554e0psCtuVgTY JfH0NCB41neAKDVFSXUvlsnyqElatM5PjBi9xw/J3gadb/cqL2YpiP8xkL7Sgtr6nrSTS/4Acq8 CIoshlS+j1r0/oIGsSfEnh4e4ImoqeGTMbp5afpJb5xB0Y5W8JT8lhVEVHJmDAptQC0oUV/35DB RNXs/BxN5teh+ha4PYJdq20zuoL2rIHk4L6CmDaGSmij77ncOG88hsi15zVdapRcrYaiO5g7Phd ALbPcauWqamBqsQWmT1SQqKjgfewd7wRiAjuRu3NZTpP67J3IwHy7oJWn6nDD9hg== X-Google-Smtp-Source: AGHT+IEegFyMggvbQ5V+tGajkeP0dFrRGGKHaCC+0yQAUoYDE4W+5/gDl9aiLMfuV4HA/VeW7HSdYw== X-Received: by 2002:a17:90b:2f84:b0:343:7714:4c9e with SMTP id 98e67ed59e1d1-34733e46f2amr22194980a91.2.1764319051571; Fri, 28 Nov 2025 00:37:31 -0800 (PST) Received: from localhost.localdomain ([103.137.210.78]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7d15fb1417csm4143843b3a.60.2025.11.28.00.37.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Nov 2025 00:37:31 -0800 (PST) From: Noah Goldstein To: libc-alpha@sourceware.org Cc: goldstein.w.n@gmail.com, hjl.tools@gmail.com, carlos@systemhalted.org Subject: [PATCH v6 4/4] x86/string: Make page-unrolled memmove default for Intel processors Date: Fri, 28 Nov 2025 03:37:20 -0500 Message-ID: <20251128083720.92561-4-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251128083720.92561-1-goldstein.w.n@gmail.com> References: <20251115093318.830179-1-goldstein.w.n@gmail.com> <20251128083720.92561-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Continue using page-unrolled large memmove implementation for all Intel processors to stay consistent with how it has been since glibc 2.34. So far the page-unrolled implementation has been benchmarked on HSW-SPR, with it yielding 5-15% performance improvements on HSW-ICX. On SPR it appears to be roughly equal, and it is not known how it compares on newer hardware (although Sunil has some benchmarks that suggest it may still yield performance improvements). Since as of now, we only have positive or roughly neutral benchmark results on Intel hardware, it seems fair to continue using this implementation as the default. It would be prudent, however, to continue benchmarking the page-unrolled vs standard implementation for future hardware to ensure that is remains performant. --- sysdeps/x86/cpu-features.c | 14 ++++ sysdeps/x86_64/multiarch/Makefile | 2 + sysdeps/x86_64/multiarch/ifunc-impl-list.c | 72 +++++++++++++++++-- sysdeps/x86_64/multiarch/ifunc-memmove.h | 38 +++++++--- ...move-avx512-unaligned-erms-page-unrolled.S | 24 +++++++ ...emmove-sse2-unaligned-erms-page-unrolled.S | 24 +++++++ .../multiarch/memmove-sse2-unaligned-erms.S | 2 +- 7 files changed, 161 insertions(+), 15 deletions(-) create mode 100644 sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S create mode 100644 sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c index 36803aa53f..2d9ded010f 100644 --- a/sysdeps/x86/cpu-features.c +++ b/sysdeps/x86/cpu-features.c @@ -769,6 +769,20 @@ init_cpu_features (struct cpu_features *cpu_features) cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset] &= ~bit_arch_Avoid_Non_Temporal_Memset; + /* Enable page unrolled large implementation to remain consistent + with glibc 2.34 and earlier. Thus far all benchmarks on Intel Bigcore + hardware suggest the large implementation is equal to or more + performant than the standard large memmove implementation. + However, since the page-unrolled implementation is less + standard/obvious, it may not receive as much optimization + attention from hardware implementors, so we should continue to + benchmark new hardware to ensure this is the right decision. + Note: As of writing this, there have been no benchmarks indicating the + page-unrolled implementation is prefered on Atom processors. We keep + enabled in the interest of remaining consistent with glibc 2.34. */ + cpu_features->preferred[index_arch_Prefer_Page_Unrolled_Large_Copy] + |= bit_arch_Prefer_Page_Unrolled_Large_Copy; + enum intel_microarch microarch = INTEL_UNKNOWN; if (family == 0x06) { diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile index 381eaef455..c4573f27df 100644 --- a/sysdeps/x86_64/multiarch/Makefile +++ b/sysdeps/x86_64/multiarch/Makefile @@ -21,10 +21,12 @@ sysdep_routines += \ memmove-avx-unaligned-erms-rtm \ memmove-avx512-no-vzeroupper \ memmove-avx512-unaligned-erms \ + memmove-avx512-unaligned-erms-page-unrolled \ memmove-erms \ memmove-evex-unaligned-erms \ memmove-evex-unaligned-erms-page-unrolled \ memmove-sse2-unaligned-erms \ + memmove-sse2-unaligned-erms-page-unrolled \ memmove-ssse3 \ memrchr-avx2 \ memrchr-avx2-rtm \ diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c index f9add65d24..26278da8d4 100644 --- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c +++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c @@ -127,9 +127,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk, CPU_FEATURE_USABLE (AVX512VL), __memmove_chk_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk, + CPU_FEATURE_USABLE (AVX512VL), + __memmove_chk_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk, CPU_FEATURE_USABLE (AVX512VL), __memmove_chk_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk, + CPU_FEATURE_USABLE (AVX512VL), + __memmove_chk_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk, CPU_FEATURE_USABLE (AVX512VL), __memmove_chk_evex_unaligned) @@ -181,7 +187,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1, __memmove_chk_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1, - __memmove_chk_sse2_unaligned_erms)) + __memmove_chk_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1, + __memmove_chk_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1, + __memmove_chk_sse2_unaligned_erms_page_unrolled)) #endif /* Support sysdeps/x86_64/multiarch/memmove.c. */ @@ -194,9 +204,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, memmove, CPU_FEATURE_USABLE (AVX512VL), __memmove_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, memmove, + CPU_FEATURE_USABLE (AVX512VL), + __memmove_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, memmove, CPU_FEATURE_USABLE (AVX512VL), __memmove_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, memmove, + CPU_FEATURE_USABLE (AVX512VL), + __memmove_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, memmove, CPU_FEATURE_USABLE (AVX512VL), __memmove_evex_unaligned) @@ -248,7 +264,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1, __memmove_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1, - __memmove_sse2_unaligned_erms)) + __memmove_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1, + __memmove_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1, + __memmove_sse2_unaligned_erms_page_unrolled)) /* Support sysdeps/x86_64/multiarch/memrchr.c. */ IFUNC_IMPL (i, name, memrchr, @@ -1174,9 +1194,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __memcpy_chk_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk, + CPU_FEATURE_USABLE (AVX512VL), + __memcpy_chk_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __memcpy_chk_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk, + CPU_FEATURE_USABLE (AVX512VL), + __memcpy_chk_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __memcpy_chk_evex_unaligned) @@ -1228,7 +1254,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1, __memcpy_chk_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1, - __memcpy_chk_sse2_unaligned_erms)) + __memcpy_chk_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1, + __memcpy_chk_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1, + __memcpy_chk_sse2_unaligned_erms_page_unrolled)) #endif /* Support sysdeps/x86_64/multiarch/memcpy.c. */ @@ -1241,9 +1271,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy, CPU_FEATURE_USABLE (AVX512VL), __memcpy_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy, + CPU_FEATURE_USABLE (AVX512VL), + __memcpy_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy, CPU_FEATURE_USABLE (AVX512VL), __memcpy_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy, + CPU_FEATURE_USABLE (AVX512VL), + __memcpy_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy, CPU_FEATURE_USABLE (AVX512VL), __memcpy_evex_unaligned) @@ -1295,7 +1331,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1, __memcpy_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1, - __memcpy_sse2_unaligned_erms)) + __memcpy_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1, + __memcpy_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1, + __memcpy_sse2_unaligned_erms_page_unrolled)) #ifdef SHARED /* Support sysdeps/x86_64/multiarch/mempcpy_chk.c. */ @@ -1308,9 +1348,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_chk_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk, + CPU_FEATURE_USABLE (AVX512VL), + __mempcpy_chk_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_chk_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk, + CPU_FEATURE_USABLE (AVX512VL), + __mempcpy_chk_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_chk_evex_unaligned) @@ -1362,7 +1408,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1, __mempcpy_chk_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1, - __mempcpy_chk_sse2_unaligned_erms)) + __mempcpy_chk_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1, + __mempcpy_chk_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1, + __mempcpy_chk_sse2_unaligned_erms_page_unrolled)) #endif /* Support sysdeps/x86_64/multiarch/mempcpy.c. */ @@ -1375,9 +1425,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_avx512_unaligned) + X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy, + CPU_FEATURE_USABLE (AVX512VL), + __mempcpy_avx512_unaligned_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_avx512_unaligned_erms) + X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy, + CPU_FEATURE_USABLE (AVX512VL), + __mempcpy_avx512_unaligned_erms_page_unrolled) X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy, CPU_FEATURE_USABLE (AVX512VL), __mempcpy_evex_unaligned) @@ -1429,7 +1485,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1, __mempcpy_sse2_unaligned) X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1, - __mempcpy_sse2_unaligned_erms)) + __mempcpy_sse2_unaligned_page_unrolled) + X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1, + __mempcpy_sse2_unaligned_erms) + X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1, + __mempcpy_sse2_unaligned_erms_page_unrolled)) /* Support sysdeps/x86_64/multiarch/strncmp.c. */ IFUNC_IMPL (i, name, strncmp, diff --git a/sysdeps/x86_64/multiarch/ifunc-memmove.h b/sysdeps/x86_64/multiarch/ifunc-memmove.h index 6d5df8a9eb..dc03269d8f 100644 --- a/sysdeps/x86_64/multiarch/ifunc-memmove.h +++ b/sysdeps/x86_64/multiarch/ifunc-memmove.h @@ -23,10 +23,14 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (erms) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_unaligned) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_unaligned_erms) - attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_no_vzeroupper) - attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (avx512_unaligned_page_unrolled) attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (avx512_unaligned_erms) attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (avx512_unaligned_erms_page_unrolled) attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (avx512_no_vzeroupper) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (evex_unaligned) attribute_hidden; extern __typeof (REDIRECT_NAME) @@ -54,8 +58,12 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden; extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned) attribute_hidden; -extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned_erms) - attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (sse2_unaligned_page_unrolled) attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (sse2_unaligned_erms) attribute_hidden; +extern __typeof (REDIRECT_NAME) + OPTIMIZE (sse2_unaligned_erms_page_unrolled) attribute_hidden; static inline void * IFUNC_SELECTOR (void) @@ -72,8 +80,16 @@ IFUNC_SELECTOR (void) if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL)) { if (CPU_FEATURE_USABLE_P (cpu_features, ERMS)) - return OPTIMIZE (avx512_unaligned_erms); + { + if (CPU_FEATURES_ARCH_P (cpu_features, + Prefer_Page_Unrolled_Large_Copy)) + return OPTIMIZE (avx512_unaligned_erms_page_unrolled); + return OPTIMIZE (avx512_unaligned_erms); + } + if (CPU_FEATURES_ARCH_P (cpu_features, + Prefer_Page_Unrolled_Large_Copy)) + return OPTIMIZE (avx512_unaligned_page_unrolled); return OPTIMIZE (avx512_unaligned); } @@ -140,7 +156,13 @@ IFUNC_SELECTOR (void) } if (CPU_FEATURE_USABLE_P (cpu_features, ERMS)) - return OPTIMIZE (sse2_unaligned_erms); + { + if (CPU_FEATURES_ARCH_P (cpu_features, Prefer_Page_Unrolled_Large_Copy)) + return OPTIMIZE (sse2_unaligned_erms_page_unrolled); + return OPTIMIZE (sse2_unaligned_erms); + } + if (CPU_FEATURES_ARCH_P (cpu_features, Prefer_Page_Unrolled_Large_Copy)) + return OPTIMIZE (sse2_unaligned_page_unrolled); return OPTIMIZE (sse2_unaligned); } diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S new file mode 100644 index 0000000000..a7dee420b4 --- /dev/null +++ b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S @@ -0,0 +1,24 @@ +/* Memmove w/ AVX512 and Page Unrolled Large Copy + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + + +#ifndef MEMMOVE_SYMBOL +# define MEMMOVE_SYMBOL(p,s) p##_avx512_##s##_page_unrolled +#endif +#define MEMMOVE_VEC_LARGE_IMPL "memmove-vec-large-page-unrolled.S" +#include "memmove-avx512-unaligned-erms.S" diff --git a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S new file mode 100644 index 0000000000..9ecd223e4e --- /dev/null +++ b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S @@ -0,0 +1,24 @@ +/* Memmove w/ SSE2 and Page Unrolled Large Copy + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef MEMMOVE_SYMBOL +# define MEMMOVE_SYMBOL(p,s) p##_sse2_##s##_page_unrolled +#endif +#define MEMMOVE_VEC_LARGE_IMPL "memmove-vec-large-page-unrolled.S" +#define PAGE_UNROLLED 1 +#include "memmove-sse2-unaligned-erms.S" diff --git a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S index aeaa3bd2f0..c941d62279 100644 --- a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S @@ -32,7 +32,7 @@ # include "multiarch/memmove-vec-unaligned-erms.S" -# if MINIMUM_X86_ISA_LEVEL <= 2 +# if MINIMUM_X86_ISA_LEVEL <= 2 && !(defined PAGE_UNROLLED) # include "memmove-shlib-compat.h" # endif #endif