From patchwork Fri Nov 28 08:37:20 2025
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Noah Goldstein <goldstein.w.n@gmail.com>
X-Patchwork-Id: 125479
Return-Path: <libc-alpha-bounces~patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 681433858D29
	for <patchwork@sourceware.org>; Fri, 28 Nov 2025 08:46:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 681433858D29
Authentication-Results: sourceware.org;
	dkim=pass (2048-bit key,
 unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256
 header.s=20230601 header.b=PG1aXr9+
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com
 [IPv6:2607:f8b0:4864:20::630])
 by sourceware.org (Postfix) with ESMTPS id 038B83858408
 for <libc-alpha@sourceware.org>; Fri, 28 Nov 2025 08:37:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 038B83858408
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 038B83858408
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2607:f8b0:4864:20::630
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1764319053; cv=none;
 b=W9wHZXyYF9kxN07khGlHCNidwMbWG3drwj3EP18wAj1jRUohatVnY1k54iuHEEHUH6f/0xCSvT4LjZ43WmDjMimhZ4uHMpV0Z0wpsyZrUFQgTRSOu1HSypp7CM+B4vB+yoctStzOU5Mcn5LqlXCwQ1UQXUPqb5GQLOs+Ez6+zX8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1764319053; c=relaxed/simple;
 bh=XHCyt6OHEPCBE+GgPnVhPLwmPo86pJlBtdxVWZNcRio=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=mRsD31YMWqBcdiPAz/X0Rez/dLRzb5yrGwqZCp0JO0JosRRjAJz6erLiCQYNhPozZPa8jUWINLBN0zllL/ac+gbCr1bqnY1+NDagOB6zERAC1hmpY3NkISKE9qH+7EqztpXUZ6UOMKRcTqYOnhKVmAl6HYI3sYc3J+uNOS+i1wA=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 038B83858408
Received: by mail-pl1-x630.google.com with SMTP id
 d9443c01a7336-298287a26c3so19126715ad.0
 for <libc-alpha@sourceware.org>; Fri, 28 Nov 2025 00:37:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1764319052; x=1764923852; darn=sourceware.org;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=lOMkK2kbjAnlYglHHu4tAQizj5WiBSmTlrkRz3Ai1qI=;
 b=PG1aXr9+I3a/1OUjqg+X4GpKZTXIjvlljk3qjfRCedtG5wc9dlDy4w8quwYFQ30VxI
 Kk0ZpcXK3j/6SvA5wTpT9UZeU7lpsJDn6j+6C2g3PSjPtbKWIPbt9IYdYXUxFlZxVym/
 0KUeMU2njb97Fu6xDWnxexPJ6NV9nx2+cQdjgqvmvjhO9dw6a0NkcEgLn49pWlmXUTaA
 1bKNTjNtK2Yx3BGDSyRdbeCdgdgsu4jQF/GHptvuTQ4lCdDDaNGGYBS6cMRU8OgoFvGc
 ZFBiHKJ6Mllte38+s7HnU4lqk9lAb29fDlo5Bijc5TyZB2ZXV2tCf+dbBFnKlkGwVuZx
 k7ng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1764319052; x=1764923852;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
 :to:cc:subject:date:message-id:reply-to;
 bh=lOMkK2kbjAnlYglHHu4tAQizj5WiBSmTlrkRz3Ai1qI=;
 b=RDTx3JrvH3FQyPFRxMxfvDVI8kAovs9Vo7mFA+94PXENysuFKgI4UAuxHqYSWQu2L+
 iYxuMV6fgH2LQONYf+Wvv8DL2QZOAb7ajBWIBG6XLiJ3iZ2Ds2yk90tOff+14VqLFJS2
 VmUqC/y7q6QOsvscUElMmvY3EC57uQS4rX16pzWi32nnknKkFfyeH0Nm+azAaSzsEQ8s
 OZ+OoBicOg9Mm4TQOivTzkslCj2EwgoOL0/AsfcRd5EuCKJeOsTsgA1rPdSgvwX0Srhe
 K5ZKzFWS36rJtU4J1jnGNd9oIXGTB8vGwY3D6a7AILFtb/IdgyN8spwa8YA+gdDpdwvT
 OrMQ==
X-Gm-Message-State: AOJu0YzJFJXwjlK9HDbLyzom3suFcYpS2RO+YNS0YF+ku7bW4Lsqy8gT
 mWaR4tkeh54JIHvcYuCdXXosWkSQsFAl6Ru44QjsMSJLRfkBUmAbwiEwqHjKGFqWs1s=
X-Gm-Gg: ASbGnctawSA8EDD48KgpQRwtFldIu8JQY2WOGzGBewgpfF39LK/QqksqysmkK0kkYdx
 1m7NPzF3cf53rYM3H7u1LWXsy7ZUTUp2W9YlFQj24cYJrn2Ndym6Yf8Vptl8uEJGU8cbNHh+W49
 CX2zG5pU2fzBZW84ulMbdmRG8Tbzo45jfDVjXQsBV4CefO5XzVJE/ODknpgut554e0psCtuVgTY
 JfH0NCB41neAKDVFSXUvlsnyqElatM5PjBi9xw/J3gadb/cqL2YpiP8xkL7Sgtr6nrSTS/4Acq8
 CIoshlS+j1r0/oIGsSfEnh4e4ImoqeGTMbp5afpJb5xB0Y5W8JT8lhVEVHJmDAptQC0oUV/35DB
 RNXs/BxN5teh+ha4PYJdq20zuoL2rIHk4L6CmDaGSmij77ncOG88hsi15zVdapRcrYaiO5g7Phd
 ALbPcauWqamBqsQWmT1SQqKjgfewd7wRiAjuRu3NZTpP67J3IwHy7oJWn6nDD9hg==
X-Google-Smtp-Source: 
 AGHT+IEegFyMggvbQ5V+tGajkeP0dFrRGGKHaCC+0yQAUoYDE4W+5/gDl9aiLMfuV4HA/VeW7HSdYw==
X-Received: by 2002:a17:90b:2f84:b0:343:7714:4c9e with SMTP id
 98e67ed59e1d1-34733e46f2amr22194980a91.2.1764319051571;
 Fri, 28 Nov 2025 00:37:31 -0800 (PST)
Received: from localhost.localdomain ([103.137.210.78])
 by smtp.gmail.com with ESMTPSA id
 d2e1a72fcca58-7d15fb1417csm4143843b3a.60.2025.11.28.00.37.30
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 28 Nov 2025 00:37:31 -0800 (PST)
From: Noah Goldstein <goldstein.w.n@gmail.com>
To: libc-alpha@sourceware.org
Cc: goldstein.w.n@gmail.com,
	hjl.tools@gmail.com,
	carlos@systemhalted.org
Subject: [PATCH v6 4/4] x86/string: Make page-unrolled memmove default for
 Intel processors
Date: Fri, 28 Nov 2025 03:37:20 -0500
Message-ID: <20251128083720.92561-4-goldstein.w.n@gmail.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20251128083720.92561-1-goldstein.w.n@gmail.com>
References: <20251115093318.830179-1-goldstein.w.n@gmail.com>
 <20251128083720.92561-1-goldstein.w.n@gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org

Continue using page-unrolled large memmove implementation for all
Intel processors to stay consistent with how it has been since glibc
2.34.

So far the page-unrolled implementation has been benchmarked on
HSW-SPR, with it yielding 5-15% performance improvements on
HSW-ICX. On SPR it appears to be roughly equal, and it is not known
how it compares on newer hardware (although Sunil has some benchmarks
that suggest it may still yield performance improvements).

Since as of now, we only have positive or roughly neutral benchmark
results on Intel hardware, it seems fair to continue using this
implementation as the default. It would be prudent, however, to
continue benchmarking the page-unrolled vs standard implementation for
future hardware to ensure that is remains performant.
---
 sysdeps/x86/cpu-features.c                    | 14 ++++
 sysdeps/x86_64/multiarch/Makefile             |  2 +
 sysdeps/x86_64/multiarch/ifunc-impl-list.c    | 72 +++++++++++++++++--
 sysdeps/x86_64/multiarch/ifunc-memmove.h      | 38 +++++++---
 ...move-avx512-unaligned-erms-page-unrolled.S | 24 +++++++
 ...emmove-sse2-unaligned-erms-page-unrolled.S | 24 +++++++
 .../multiarch/memmove-sse2-unaligned-erms.S   |  2 +-
 7 files changed, 161 insertions(+), 15 deletions(-)
 create mode 100644 sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S

diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 36803aa53f..2d9ded010f 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -769,6 +769,20 @@ init_cpu_features (struct cpu_features *cpu_features)
       cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
 	  &= ~bit_arch_Avoid_Non_Temporal_Memset;
 
+      /* Enable page unrolled large implementation to remain consistent
+	 with glibc 2.34 and earlier. Thus far all benchmarks on Intel Bigcore
+	 hardware suggest the large implementation is equal to or more
+	 performant than the standard large memmove implementation.
+	 However, since the page-unrolled implementation is less
+	 standard/obvious, it may not receive as much optimization
+	 attention from hardware implementors, so we should continue to
+	 benchmark new hardware to ensure this is the right decision.
+	 Note: As of writing this, there have been no benchmarks indicating the
+	 page-unrolled implementation is prefered on Atom processors. We keep
+	 enabled in the interest of remaining consistent with glibc 2.34.  */
+      cpu_features->preferred[index_arch_Prefer_Page_Unrolled_Large_Copy]
+	  |= bit_arch_Prefer_Page_Unrolled_Large_Copy;
+
       enum intel_microarch microarch = INTEL_UNKNOWN;
       if (family == 0x06)
 	{
diff --git a/sysdeps/x86_64/multiarch/Makefile b/sysdeps/x86_64/multiarch/Makefile
index 381eaef455..c4573f27df 100644
--- a/sysdeps/x86_64/multiarch/Makefile
+++ b/sysdeps/x86_64/multiarch/Makefile
@@ -21,10 +21,12 @@ sysdep_routines += \
   memmove-avx-unaligned-erms-rtm \
   memmove-avx512-no-vzeroupper \
   memmove-avx512-unaligned-erms \
+  memmove-avx512-unaligned-erms-page-unrolled \
   memmove-erms \
   memmove-evex-unaligned-erms \
   memmove-evex-unaligned-erms-page-unrolled \
   memmove-sse2-unaligned-erms \
+  memmove-sse2-unaligned-erms-page-unrolled \
   memmove-ssse3 \
   memrchr-avx2 \
   memrchr-avx2-rtm \
diff --git a/sysdeps/x86_64/multiarch/ifunc-impl-list.c b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
index f9add65d24..26278da8d4 100644
--- a/sysdeps/x86_64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/x86_64/multiarch/ifunc-impl-list.c
@@ -127,9 +127,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_chk_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memmove_chk_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_chk_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memmove_chk_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memmove_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_chk_evex_unaligned)
@@ -181,7 +187,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1,
 				     __memmove_chk_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1,
-				     __memmove_chk_sse2_unaligned_erms))
+				     __memmove_chk_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1,
+				     __memmove_chk_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memmove_chk, 1,
+				     __memmove_chk_sse2_unaligned_erms_page_unrolled))
 #endif
 
   /* Support sysdeps/x86_64/multiarch/memmove.c.  */
@@ -194,9 +204,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memmove,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, memmove,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memmove_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memmove,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, memmove,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memmove_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memmove,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memmove_evex_unaligned)
@@ -248,7 +264,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1,
 				     __memmove_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1,
-				     __memmove_sse2_unaligned_erms))
+				     __memmove_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1,
+				     __memmove_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, memmove, 1,
+				     __memmove_sse2_unaligned_erms_page_unrolled))
 
   /* Support sysdeps/x86_64/multiarch/memrchr.c.  */
   IFUNC_IMPL (i, name, memrchr,
@@ -1174,9 +1194,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_chk_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memcpy_chk_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_chk_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memcpy_chk_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __memcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_chk_evex_unaligned)
@@ -1228,7 +1254,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1,
 				     __memcpy_chk_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1,
-				     __memcpy_chk_sse2_unaligned_erms))
+				     __memcpy_chk_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1,
+				     __memcpy_chk_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __memcpy_chk, 1,
+				     __memcpy_chk_sse2_unaligned_erms_page_unrolled))
 #endif
 
   /* Support sysdeps/x86_64/multiarch/memcpy.c.  */
@@ -1241,9 +1271,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memcpy_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __memcpy_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, memcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __memcpy_evex_unaligned)
@@ -1295,7 +1331,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1,
 				     __memcpy_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1,
-				     __memcpy_sse2_unaligned_erms))
+				     __memcpy_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1,
+				     __memcpy_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, memcpy, 1,
+				     __memcpy_sse2_unaligned_erms_page_unrolled))
 
 #ifdef SHARED
   /* Support sysdeps/x86_64/multiarch/mempcpy_chk.c.  */
@@ -1308,9 +1348,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_chk_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __mempcpy_chk_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_chk_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __mempcpy_chk_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, __mempcpy_chk,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_chk_evex_unaligned)
@@ -1362,7 +1408,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1,
 				     __mempcpy_chk_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1,
-				     __mempcpy_chk_sse2_unaligned_erms))
+				     __mempcpy_chk_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1,
+				     __mempcpy_chk_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, __mempcpy_chk, 1,
+				     __mempcpy_chk_sse2_unaligned_erms_page_unrolled))
 #endif
 
   /* Support sysdeps/x86_64/multiarch/mempcpy.c.  */
@@ -1375,9 +1425,15 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_avx512_unaligned)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __mempcpy_avx512_unaligned_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_avx512_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy,
+				     CPU_FEATURE_USABLE (AVX512VL),
+				     __mempcpy_avx512_unaligned_erms_page_unrolled)
 	      X86_IFUNC_IMPL_ADD_V4 (array, i, mempcpy,
 				     CPU_FEATURE_USABLE (AVX512VL),
 				     __mempcpy_evex_unaligned)
@@ -1429,7 +1485,11 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1,
 				     __mempcpy_sse2_unaligned)
 	      X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1,
-				     __mempcpy_sse2_unaligned_erms))
+				     __mempcpy_sse2_unaligned_page_unrolled)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1,
+				     __mempcpy_sse2_unaligned_erms)
+	      X86_IFUNC_IMPL_ADD_V2 (array, i, mempcpy, 1,
+				     __mempcpy_sse2_unaligned_erms_page_unrolled))
 
   /* Support sysdeps/x86_64/multiarch/strncmp.c.  */
   IFUNC_IMPL (i, name, strncmp,
diff --git a/sysdeps/x86_64/multiarch/ifunc-memmove.h b/sysdeps/x86_64/multiarch/ifunc-memmove.h
index 6d5df8a9eb..dc03269d8f 100644
--- a/sysdeps/x86_64/multiarch/ifunc-memmove.h
+++ b/sysdeps/x86_64/multiarch/ifunc-memmove.h
@@ -23,10 +23,14 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (erms) attribute_hidden;
 
 extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_unaligned)
   attribute_hidden;
-extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_unaligned_erms)
-  attribute_hidden;
-extern __typeof (REDIRECT_NAME) OPTIMIZE (avx512_no_vzeroupper)
-  attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (avx512_unaligned_page_unrolled) attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (avx512_unaligned_erms) attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (avx512_unaligned_erms_page_unrolled) attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (avx512_no_vzeroupper) attribute_hidden;
 
 extern __typeof (REDIRECT_NAME) OPTIMIZE (evex_unaligned) attribute_hidden;
 extern __typeof (REDIRECT_NAME)
@@ -54,8 +58,12 @@ extern __typeof (REDIRECT_NAME) OPTIMIZE (ssse3) attribute_hidden;
 
 extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned)
   attribute_hidden;
-extern __typeof (REDIRECT_NAME) OPTIMIZE (sse2_unaligned_erms)
-  attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (sse2_unaligned_page_unrolled) attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (sse2_unaligned_erms) attribute_hidden;
+extern __typeof (REDIRECT_NAME)
+    OPTIMIZE (sse2_unaligned_erms_page_unrolled) attribute_hidden;
 
 static inline void *
 IFUNC_SELECTOR (void)
@@ -72,8 +80,16 @@ IFUNC_SELECTOR (void)
       if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX512VL))
 	{
 	  if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
-	    return OPTIMIZE (avx512_unaligned_erms);
+	    {
+	      if (CPU_FEATURES_ARCH_P (cpu_features,
+				       Prefer_Page_Unrolled_Large_Copy))
+		return OPTIMIZE (avx512_unaligned_erms_page_unrolled);
+	      return OPTIMIZE (avx512_unaligned_erms);
+	    }
 
+	  if (CPU_FEATURES_ARCH_P (cpu_features,
+				   Prefer_Page_Unrolled_Large_Copy))
+	    return OPTIMIZE (avx512_unaligned_page_unrolled);
 	  return OPTIMIZE (avx512_unaligned);
 	}
 
@@ -140,7 +156,13 @@ IFUNC_SELECTOR (void)
     }
 
   if (CPU_FEATURE_USABLE_P (cpu_features, ERMS))
-    return OPTIMIZE (sse2_unaligned_erms);
+    {
+      if (CPU_FEATURES_ARCH_P (cpu_features, Prefer_Page_Unrolled_Large_Copy))
+	return OPTIMIZE (sse2_unaligned_erms_page_unrolled);
+      return OPTIMIZE (sse2_unaligned_erms);
+    }
 
+  if (CPU_FEATURES_ARCH_P (cpu_features, Prefer_Page_Unrolled_Large_Copy))
+    return OPTIMIZE (sse2_unaligned_page_unrolled);
   return OPTIMIZE (sse2_unaligned);
 }
diff --git a/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S
new file mode 100644
index 0000000000..a7dee420b4
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms-page-unrolled.S
@@ -0,0 +1,24 @@
+/* Memmove w/ AVX512 and Page Unrolled Large Copy
+   Copyright (C) 2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+
+#ifndef MEMMOVE_SYMBOL
+# define MEMMOVE_SYMBOL(p,s)	p##_avx512_##s##_page_unrolled
+#endif
+#define MEMMOVE_VEC_LARGE_IMPL	"memmove-vec-large-page-unrolled.S"
+#include "memmove-avx512-unaligned-erms.S"
diff --git a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S
new file mode 100644
index 0000000000..9ecd223e4e
--- /dev/null
+++ b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms-page-unrolled.S
@@ -0,0 +1,24 @@
+/* Memmove w/ SSE2 and Page Unrolled Large Copy
+   Copyright (C) 2025 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef MEMMOVE_SYMBOL
+# define MEMMOVE_SYMBOL(p,s)	p##_sse2_##s##_page_unrolled
+#endif
+#define MEMMOVE_VEC_LARGE_IMPL	"memmove-vec-large-page-unrolled.S"
+#define PAGE_UNROLLED	1
+#include "memmove-sse2-unaligned-erms.S"
diff --git a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
index aeaa3bd2f0..c941d62279 100644
--- a/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
+++ b/sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
@@ -32,7 +32,7 @@
 
 # include "multiarch/memmove-vec-unaligned-erms.S"
 
-# if MINIMUM_X86_ISA_LEVEL <= 2
+# if MINIMUM_X86_ISA_LEVEL <= 2 && !(defined PAGE_UNROLLED)
 #  include "memmove-shlib-compat.h"
 # endif
 #endif