From patchwork Fri Jan 16 06:54:38 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Zheng Ziyang X-Patchwork-Id: 128207 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 7AB2C4BA2E29 for ; Fri, 16 Jan 2026 06:56:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7AB2C4BA2E29 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mxct.zte.com.cn (mxct.zte.com.cn [183.62.165.209]) by sourceware.org (Postfix) with ESMTPS id 2B3C84BA2E25 for ; Fri, 16 Jan 2026 06:55:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2B3C84BA2E25 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=zte.com.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=zte.com.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2B3C84BA2E25 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=183.62.165.209 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1768546523; cv=none; b=bNthCxO9O8kQVwVDF98P8Op/YrQa+3jFag9uGH6TCdqMvCSDHCP+YSnfgtkCBPaSr/e1LszJ7lqpIsPvJ38IZwBocxZ6LqYYqLOTAcoP9sHwCSbMgJsH509t00N5JtBJFJdkK889I5Qg5vBwJf2YJl3+1+GY+BibDNR6UswlLBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1768546523; c=relaxed/simple; bh=bFdxusfNqbXF5Mu8jG+OK6nbuC5lQvggMF+izv47lHk=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=PYGowaPwR6Ulk0v4IsO+xsnuO4Doh+AHjuHoOId4SHNBWENbrGaxpCJeDEKtk+LAtLu9CGohfPFno6yAVWsn2Fs+Dy7bFHm7zuHQfwC+d6cbIk8pZ5ePON4HBo5eb6Am6NwOpO0R8W7RwGEqZ8nI8HzhbgDSp987ju/Lsq03j30= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2B3C84BA2E25 Received: from mse-fl1.zte.com.cn (unknown [10.5.228.132]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4dsrFx2J3bz51Sf9; Fri, 16 Jan 2026 14:55:17 +0800 (CST) Received: from szxl2zmapp06.zte.com.cn ([10.1.32.108]) by mse-fl1.zte.com.cn with SMTP id 60G6t2P5081004; Fri, 16 Jan 2026 14:55:02 +0800 (+08) (envelope-from zheng.ziyang@zte.com.cn) Received: from localhost.localdomain (unknown [10.4.24.57]) by smtp (Zmail) with SMTP; Sun, 16 Jan 2026 14:55:05 +0800 X-Zmail-TransId: 3e816969e0c8002-8bb46 X-Zmail-LocalSMTP: 1 X-Zmail-RealSender: zheng.ziyang@zte.com.cn From: Zheng Ziyang To: libc-alpha@sourceware.org Cc: jeffrey.law@oss.qualcomm.com, adhemerval.zanella@linaro.org, bergner@tenstorrent.com, palmer@dabbelt.com, darius@bluespec.com, zhengziyang Subject: [PATCH v2] riscv: Add optimised memrchr implementation using RVV extension Date: Fri, 16 Jan 2026 14:54:38 +0800 Message-Id: <20260116065438.21776-1-zheng.ziyang@zte.com.cn> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20251230070510.8676-1-zheng.ziyang@zte.com.cn> References: <20251230070510.8676-1-zheng.ziyang@zte.com.cn> MIME-Version: 1.0 X-MAIL: mse-fl1.zte.com.cn 60G6t2P5081004 X-TLS: YES X-SPF-DOMAIN: zte.com.cn X-ENVELOPE-SENDER: zheng.ziyang@zte.com.cn X-SPF: None X-SOURCE-IP: 10.5.228.132 unknown Fri, 16 Jan 2026 14:55:17 +0800 X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 6969E0D5.000/4dsrFx2J3bz51Sf9 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HTML_MESSAGE, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H2, RCVD_IN_VALIDITY_RPBL_BLOCKED, RCVD_IN_VALIDITY_SAFE_BLOCKED, SPF_HELO_NONE, SPF_PASS, TXREP, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org From: zhengziyang This patch adds an optimized memrchr implementation for RISC-V using the RVV 1.0 extension. memrchr is the reverse-direction variant of memchr: it searches for the last (rightmost) occurrence of a byte in a buffer. It dispatches based on buffer length N: - Scalar path for very small N (≤ 8 bytes) to avoid vector setup cost - Main vector path using LMUL=m4 (or m2 depending on tuning) for larger N - Backward scanning from the end of the buffer Key optimisation techniques: 1. Backward scanning: search from end to start, reducing address calculations for reverse search semantics 2. Unrolled vector loop (2×LMUL=4 or similar): process two (or more) vector groups per iteration to improve throughput and hide memory latency 3. Efficient last-match finding: use vcpop.m + vcompress.vm + vslidedown (or vfirst + reverse logic) to locate the highest matching index within a vector register group 4. Fast single-match path: skip expensive vcompress/vslidedown operations when only one match exists in the current vector chunk (common case optimisation) 5. Clean tail handling: use masked vector operations (vmv) for the remaining bytes near the start of the buffer, avoiding extra loops 6. Prefetching (optional): software prefetch for the next block while comparing the current one to overlap computation and memory access The implementation assumes RVV 1.0 with VLEN >= 128, supports arbitrary VLEN (via vsetvli), and works on both RV32 and RV64. No page-size assumptions are made. Performance improvements (relative speedup %) over generic scalar memrchr: | Test Category | Config (VLENB) | vs. generic memrchr | |-------------------|------------------------|---------------------| | **memrchr-default** | XuanTie C920 (128) | +33.85% | | | Spacemit X60 (256) | +15.07% | Signed-off-by: Zheng Ziyang --- sysdeps/riscv/multiarch/memrchr-generic.c | 26 ++++ sysdeps/riscv/multiarch/memrchr-vector.S | 121 ++++++++++++++++++ .../unix/sysv/linux/riscv/multiarch/Makefile | 3 + .../linux/riscv/multiarch/ifunc-impl-list.c | 5 + .../unix/sysv/linux/riscv/multiarch/memrchr.c | 68 ++++++++++ 5 files changed, 223 insertions(+) create mode 100644 sysdeps/riscv/multiarch/memrchr-generic.c create mode 100644 sysdeps/riscv/multiarch/memrchr-vector.S create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c diff --git a/sysdeps/riscv/multiarch/memrchr-generic.c b/sysdeps/riscv/multiarch/memrchr-generic.c new file mode 100644 index 0000000000..16d9b2eb81 --- /dev/null +++ b/sysdeps/riscv/multiarch/memrchr-generic.c @@ -0,0 +1,26 @@ +/* Re-include the default memcpy implementation. + Copyright (C) 2026 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +#if IS_IN(libc) +# define MEMRCHR __memrchr_generic +# undef libc_hidden_builtin_def +# define libc_hidden_builtin_def(x) +#endif +#include diff --git a/sysdeps/riscv/multiarch/memrchr-vector.S b/sysdeps/riscv/multiarch/memrchr-vector.S new file mode 100644 index 0000000000..49c43a57bb --- /dev/null +++ b/sysdeps/riscv/multiarch/memrchr-vector.S @@ -0,0 +1,121 @@ +/* memrchr for RISC-V, ignoring buffer alignment + Copyright (C) 2026 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU Lesser General Public License. If not, see + . +*/ +/* + * Optimized memrchr for RISC-V with Vector extension. + * Finds the LAST occurrence of a character in a memory region. + * + * Key optimizations: + * 1. Backward scanning: search from end to start, reducing address + * calculations for reverse search semantics + * 2. Unrolled vector loop (2×LMUL=4): process two vector groups per + * iteration to improve throughput and hide memory latency + * 3. Efficient last-match finding: use vcpop + vcompress + vslidedown + * to locate the highest matching index within a vector register + * 4. Fast single-match path: skip vcompress/vslidedown when only one + * match exists (common case optimization) + * 5. Prefetching: load second block while processing first block to + * overlap computation with memory access + */ + +#include +.text +.p2align 2 +ENTRY(__memrchr_vector) + + beqz a2, .Lnomatch + mv a6, a0 # srcin_save = srcin + add a3, a0, a2 # end = srcin + cntin + + /* ========== Main vector loop (2x unrolled) ========== */ +.Lvec_loop: + vsetvli t0, a2, e8, m4, ta, ma + sub a3, a3, t0 # src -= vl, point to first block + + vle8.v v8, (a3) # load first block into v8-v11 + vmseq.vx v0, v8, a1 # v0 = (v8 == chrin) mask + + sub t1, a2, t0 # remaining length after first block + bleu t1, t0, .Lvec_check_single + + sub a4, a3, t0 # a4 = address of second block (use different reg) + vle8.v v12, (a4) # prefetch second group into v12-v15 + + vfirst.m a5, v0 + bgez a5, .Lvec_found_block # a3 still points to first block here + + sub a2, a2, t0 # cntin -= vl (first block) + mv a3, a4 # update a3 to second block + vmseq.vx v0, v12, a1 # compare second block + vfirst.m a5, v0 + bgez a5, .Lvec_found_block_v12 + + sub a2, a2, t0 # cntin -= vl (second block) + bnez a2, .Lvec_loop + j .Lnomatch + + /* ========== Single block check ========== */ +.Lvec_check_single: + vfirst.m a5, v0 + bgez a5, .Lvec_found_block + sub a2, a2, t0 # cntin -= vl + bnez a2, .Lvec_loop + j .Lnomatch + + /* ========== Found match in first block (v8) ========== */ +.Lvec_found_block: + vcpop.m t1, v0 # t1 = match count + li t2, 1 + beq t1, t2, .Lvec_single_match + + /* Multiple matches: find the last one */ + vid.v v16 # v16[i] = i + vcompress.vm v20, v16, v0 # compress matched position indices + addi t1, t1, -1 # index of last element + vslidedown.vx v20, v20, t1 # move last element to position 0 + vmv.x.s a5, v20 # extract offset value + j .Lvec_calc_result + + /* ========== Found match in second block (v12) ========== */ +.Lvec_found_block_v12: + vcpop.m t1, v0 # t1 = match count + li t2, 1 + beq t1, t2, .Lvec_single_match + + /* Multiple matches: find the last one */ + vid.v v16 + vcompress.vm v20, v16, v0 + addi t1, t1, -1 + vslidedown.vx v20, v20, t1 + vmv.x.s a5, v20 + j .Lvec_calc_result + + /* ========== Single match case ========== */ +.Lvec_single_match: + /* ========== Calculate result ========== */ +.Lvec_calc_result: + add a0, a3, a5 # result = src + offset + bltu a0, a6, .Lnomatch # safety check: result >= srcin + ret + + /* ========== No match found ========== */ +.Lnomatch: + li a0, 0 + ret + +END(__memrchr_vector) diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile index 1d26966ded..4466696179 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile @@ -3,6 +3,9 @@ sysdep_routines += \ memcpy \ memcpy-generic \ memcpy_noalignment \ + memrchr \ + memrchr-generic \ + memrchr-vector \ memset \ memset-generic \ memset-vector \ diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c index 87456f3370..5738d0bce9 100644 --- a/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c @@ -53,5 +53,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, __memset_vector) IFUNC_IMPL_ADD (array, i, memset, 1, __memset_generic)) + IFUNC_IMPL (i, name, memrchr, + IFUNC_IMPL_ADD (array, i, memrchr, rvv_enabled, + __memrchr_vector) + IFUNC_IMPL_ADD (array, i, memrchr, 1, __memrchr_generic)) + return 0; } diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c b/sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c new file mode 100644 index 0000000000..729f2417a0 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/memrchr.c @@ -0,0 +1,68 @@ +/* Multiple versions of memcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2026 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +/* Redefine memrchr so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memrchr +# define memrchr __redirect_memrchr +# undef __memrchr +# define __memrchr __redirect___memrchr +# include +# include +# include +# include +# include + +extern __typeof (__redirect_memrchr) __libc_memrchr; + +extern __typeof (__redirect_memrchr) __memrchr_generic attribute_hidden; +extern __typeof (__redirect_memrchr) __memrchr_vector attribute_hidden; + +static inline __typeof (__redirect_memrchr) * +select_memrchr_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) +{ + unsigned long long int v; + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_IMA_EXT_0, &v) == 0 + && (v & RISCV_HWPROBE_IMA_V) == RISCV_HWPROBE_IMA_V) + return __memrchr_vector; + + return __memrchr_generic; +} + +riscv_libc_ifunc (__libc_memrchr, select_memrchr_ifunc); + +# undef memrchr +# undef __memrchr + +strong_alias (__libc_memrchr, memrchr) +strong_alias (__libc_memrchr, __memrchr) + +# ifdef SHARED +__hidden_ver1 (memrchr, __GI_memrchr, __redirect_memrchr) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memrchr); +__hidden_ver1 (__memrchr, __GI___memrchr, __redirect___memrchr) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memrchr); + +# endif + +#else +# include +#endif \ No newline at end of file -- 2.21.0.windows.1