From patchwork Thu Jan 23 13:43:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105305 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2928A3858290 for ; Thu, 23 Jan 2025 13:57:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2928A3858290 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=hJ0IJrd+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by sourceware.org (Postfix) with ESMTPS id 874463858C66 for ; Thu, 23 Jan 2025 13:43:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 874463858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 874463858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::331 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; cv=none; b=nkTbofs7EmFJiz9uj3iQV3wMGpk75tlwSu+4w+Y6QY4mOrfip04MLuIeFUFp8LD4hoU/vd6yvBjp9V7hNP7xH/JlJFZ7qKAPcBfpEFnRZ6oFzHVu6r1SbichbdKPGIGn89Qnzrmph+N7M/4aFXQ+G9OVHqILXWj1bhnHKEYBYm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; c=relaxed/simple; bh=sHPCYvhE9fHuBEGeT56Vo+9UGXSYVBbGgikc7fk7YDY=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=fjwd1o+nFUqVFPgt0NDJiHNRphRPtCA2h8CX9SFQ1shsxSGspTCX1hVbgxcV8ZH5MVO4zT/AFRAEhJMzsU9VW3lCb+Y4Ew4dquxnMk/q8Rc6RF67oAmknj/9+ip3AoZBa7CQ9fEh1nsjFoGqOBNxEC4Aw2zT+ZVuXx1OopDVtUI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 874463858C66 Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-43616bf3358so1392085e9.3 for ; Thu, 23 Jan 2025 05:43:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639813; x=1738244613; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=thcRS8XnNRIb5etwmQcQcSlXfuzf6lwZY/J3zYB7QsQ=; b=hJ0IJrd+Z8rrRoWRxPYYSV8VgRII6yDRkHI64hxl4bAq8tCp6xkzi/OKYnoJ1g8UrW 6WFH6vF2Q5AkzSjWuPEZm48wZR3/KfQg8OyRBYTwnus1p+FkCucPaMNxcVcjQRasRJUs HG6DxAyASufBGW8hhHSFUCYW0KqrgvzYtRegBf3gQns8cME3bW9y4yH6tvOf3VzItRu8 kP2e1yX8C8infDhc48anzoz1mlnA5LK6IhTUGBtSHcp207Xn77zlCZXoFHQe21t14FJp sfkskKfFnRypPUPUNEnqeWbZNkp7A1msePAsexAEzk57UgO0Fkn1Zov3m5D8D9Cnmja0 dp2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639813; x=1738244613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=thcRS8XnNRIb5etwmQcQcSlXfuzf6lwZY/J3zYB7QsQ=; b=sPJ+kyiyic3qsov/YbiHx/low+JZxrNdcWQu93CrostI8lBirHvzn7+Jmnx4Vqc5FS a0BzGmFf2yTOtHWLUyJvtSIxTf/nB5Dd6mDh++Wu19zwRVSSsfiQtgyt7OBG2hY5sgbI GcyrC30zohMZ6LIeMueypC54LXLVQ2TwMu6Q5XwXl5kZf5rdwbMS0huVsdTnQpKdIvds 2Tk9cAKc0r83DnayamztW98zt6LGC1FC3DzK1tR5ZwlgRl+N24C8yQXuWyLuqV5IFNgm CKXj1Rl5nORR0y+7uIN6jCPPlDHfoEqjSCKqLIQhNSPDDtIY+TEM+99LMW8/8U8Cguoj fcaQ== X-Gm-Message-State: AOJu0YzLB30VQ5gajFjoW8HLMPhXO4QfXhuOG7wgZHVpgADeyKyok/KO IXJxNoBHn3pEQd/uQEy8+mF890AumOmSp1MIfjBn1M/MtT2sRcig2P97wg== X-Gm-Gg: ASbGncvheTjk5LmfRVuYYKSr8S5HehTl6nIqsvFbtO5bzJApvlcHq/80Z7zPuUKcVxy lzlfyJrdgMhSL9MqBLIK+r7nRt9HFwqtXM58gWhkG1Mpetkeb9YoiGfaDTnPOirDWDvoMTP9fBZ yZsWkN7ocGuwzG9mWgPZYIUf61b8ajFcbq7pGs1pZBk2FTmpCuwAr6ebc8Fn2HOEaNIgb2Dmo// uRfZDXXTGiiGq3SQBgZJoCOSvjmObiJtcAsGcCMcjsDG40eG+7Ill92xhJc5W03Ady0f/FnK4tx 7RtC13lr0P7tnQ6Q83BkiTiicI/5 X-Google-Smtp-Source: AGHT+IEpgAodxjtqUpcVD+B5D4zLpjXQX3UjkvWQ2QhnjO4y5pu4qkI5syH83Ju/3uSYp/23grbVzA== X-Received: by 2002:a05:600c:444d:b0:434:f335:85c with SMTP id 5b1f17b1804b1-438914321ebmr96939905e9.6.1737639813313; Thu, 23 Jan 2025 05:43:33 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:32 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 05/11] Add optimized assembly for strcmp Date: Thu, 23 Jan 2025 14:43:01 +0100 Message-Id: <20250123134308.1785777-7-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Cherry-picked ff356419673a5d122335dd81bd5726de7bc5e08f from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/strcmp.S | 228 +++++++++++++++++++++++++----------------- 1 file changed, 137 insertions(+), 91 deletions(-) diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S index 36379be021..4878cd3aac 100644 --- a/sysdeps/mips/strcmp.S +++ b/sysdeps/mips/strcmp.S @@ -1,4 +1,5 @@ /* Copyright (C) 2014-2024 Free Software Foundation, Inc. + Optimized strcmp for MIPS This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -22,9 +23,6 @@ # include # include # include -#elif defined _COMPILING_NEWLIB -# include "machine/asm.h" -# include "machine/regdef.h" #else # include # include @@ -46,6 +44,10 @@ performance loss, so we are not turning it on by default. */ #if defined(ENABLE_CLZ) && (__mips_isa_rev > 1) # define USE_CLZ +#elif (__mips_isa_rev >= 2) +# define USE_EXT 1 +#else +# define USE_EXT 0 #endif /* Some asm.h files do not have the L macro definition. */ @@ -66,6 +68,10 @@ # endif #endif +/* Haven't yet found a configuration where DSP code outperforms + normal assembly. */ +#define __mips_using_dsp 0 + /* Allow the routine to be named something else if desired. */ #ifndef STRCMP_NAME # define STRCMP_NAME strcmp @@ -77,28 +83,35 @@ LEAF(STRCMP_NAME, 0) LEAF(STRCMP_NAME) #endif .set nomips16 - .set noreorder - or t0, a0, a1 - andi t0,0x3 + andi t0, t0, 0x3 bne t0, zero, L(byteloop) /* Both strings are 4 byte aligned at this point. */ + li t8, 0x01010101 +#if !__mips_using_dsp + li t9, 0x7f7f7f7f +#endif - lui t8, 0x0101 - ori t8, t8, 0x0101 - lui t9, 0x7f7f - ori t9, 0x7f7f - -#define STRCMP32(OFFSET) \ - lw v0, OFFSET(a0); \ - lw v1, OFFSET(a1); \ - subu t0, v0, t8; \ - bne v0, v1, L(worddiff); \ - nor t1, v0, t9; \ - and t0, t0, t1; \ +#if __mips_using_dsp +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu_s.qb t0, t8, a2; \ + bne a2, a3, L(worddiff); \ bne t0, zero, L(returnzero) +#else /* !__mips_using_dsp */ +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu t0, a2, t8; \ + nor t1, a2, t9; \ + bne a2, a3, L(worddiff); \ + and t1, t0, t1; \ + bne t1, zero, L(returnzero) +#endif /* __mips_using_dsp */ + .align 2 L(wordloop): STRCMP32(0) DELAY_READ @@ -113,112 +126,143 @@ L(wordloop): STRCMP32(20) DELAY_READ STRCMP32(24) - DELAY_READ - STRCMP32(28) + lw a2, 28(a0) + lw a3, 28(a1) +#if __mips_using_dsp + subu_s.qb t0, t8, a2 +#else + subu t0, a2, t8 + nor t1, a2, t9 + and t1, t0, t1 +#endif + PTR_ADDIU a0, a0, 32 - b L(wordloop) + bne a2, a3, L(worddiff) PTR_ADDIU a1, a1, 32 + beq t1, zero, L(wordloop) L(returnzero): - j ra move v0, zero + jr ra + .align 2 L(worddiff): #ifdef USE_CLZ - subu t0, v0, t8 - nor t1, v0, t9 - and t1, t0, t1 - xor t0, v0, v1 + xor t0, a2, a3 or t0, t0, t1 # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ wsbh t0, t0 rotr t0, t0, 16 -# endif +# endif /* LITTLE_ENDIAN */ clz t1, t0 - and t1, 0xf8 -# if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ - neg t1 - addu t1, 24 + or t0, t1, 24 /* Only care about multiples of 8. */ + xor t1, t1, t0 /* {0,8,16,24} => {24,16,8,0} */ +# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + sllv a2,a2,t1 + sllv a3,a3,t1 +# else + srlv a2,a2,t1 + srlv a3,a3,t1 # endif - rotrv v0, v0, t1 - rotrv v1, v1, t1 - and v0, v0, 0xff - and v1, v1, 0xff - j ra - subu v0, v0, v1 + subu v0, a2, a3 + jr ra #else /* USE_CLZ */ # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - andi t0, v0, 0xff - beq t0, zero, L(wexit01) - andi t1, v1, 0xff - bne t0, t1, L(wexit01) - - srl t8, v0, 8 - srl t9, v1, 8 - andi t8, t8, 0xff + andi a0, a2, 0xff /* abcd => d */ + andi a1, a3, 0xff + beq a0, zero, L(wexit01) +# if USE_EXT + ext t8, a2, 8, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 8, 8 beq t8, zero, L(wexit89) + ext a0, a2, 16, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 16, 8 +# else /* !USE_EXT */ + srl t8, a2, 8 + bne a0, a1, L(wexit01) + srl t9, a3, 8 + andi t8, t8, 0xff andi t9, t9, 0xff + beq t8, zero, L(wexit89) + srl a0, a2, 16 bne t8, t9, L(wexit89) + srl a1, a3, 16 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* !USE_EXT */ - srl t0, v0, 16 - srl t1, v1, 16 - andi t0, t0, 0xff - beq t0, zero, L(wexit01) - andi t1, t1, 0xff - bne t0, t1, L(wexit01) - - srl t8, v0, 24 - srl t9, v1, 24 # else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ - srl t0, v0, 24 - beq t0, zero, L(wexit01) - srl t1, v1, 24 - bne t0, t1, L(wexit01) + srl a0, a2, 24 /* abcd => a */ + srl a1, a3, 24 + beq a0, zero, L(wexit01) - srl t8, v0, 16 - srl t9, v1, 16 - andi t8, t8, 0xff +# if USE_EXT + ext t8, a2, 16, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 16, 8 beq t8, zero, L(wexit89) + ext a0, a2, 8, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 8, 8 +# else /* ! USE_EXT */ + srl t8, a2, 8 + bne a0, a1, L(wexit01) + srl t9, a3, 8 + andi t8, t8, 0xff andi t9, t9, 0xff + beq t8, zero, L(wexit89) + srl a0, a2, 16 bne t8, t9, L(wexit89) + srl a1, a3, 16 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* USE_EXT */ - srl t0, v0, 8 - srl t1, v1, 8 - andi t0, t0, 0xff - beq t0, zero, L(wexit01) - andi t1, t1, 0xff - bne t0, t1, L(wexit01) - - andi t8, v0, 0xff - andi t9, v1, 0xff # endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ + beq a0, zero, L(wexit01) + bne a0, a1, L(wexit01) + + /* The other bytes are identical, so just subract the 2 words + and return the difference. */ + move a0, a2 + move a1, a3 + +L(wexit01): + subu v0, a0, a1 + jr ra + L(wexit89): - j ra subu v0, t8, t9 -L(wexit01): - j ra - subu v0, t0, t1 + jr ra + #endif /* USE_CLZ */ +#define DELAY_NOP nop + /* It might seem better to do the 'beq' instruction between the two 'lbu' instructions so that the nop is not needed but testing showed that this code is actually faster (based on glibc strcmp test). */ -#define BYTECMP01(OFFSET) \ - lbu v0, OFFSET(a0); \ - lbu v1, OFFSET(a1); \ - beq v0, zero, L(bexit01); \ - nop; \ - bne v0, v1, L(bexit01) - -#define BYTECMP89(OFFSET) \ - lbu t8, OFFSET(a0); \ + +#define BYTECMP01(OFFSET) \ + lbu a3, OFFSET(a1); \ + DELAY_NOP; \ + beq a2, zero, L(bexit01); \ + lbu t8, OFFSET+1(a0); \ + bne a2, a3, L(bexit01) + +#define BYTECMP89(OFFSET) \ lbu t9, OFFSET(a1); \ + DELAY_NOP; \ beq t8, zero, L(bexit89); \ - nop; \ + lbu a2, OFFSET+1(a0); \ bne t8, t9, L(bexit89) + .align 2 L(byteloop): + lbu a2, 0(a0) BYTECMP01(0) BYTECMP89(1) BYTECMP01(2) @@ -226,20 +270,22 @@ L(byteloop): BYTECMP01(4) BYTECMP89(5) BYTECMP01(6) - BYTECMP89(7) + lbu t9, 7(a1) + PTR_ADDIU a0, a0, 8 - b L(byteloop) + beq t8, zero, L(bexit89) PTR_ADDIU a1, a1, 8 + beq t8, t9, L(byteloop) -L(bexit01): - j ra - subu v0, v0, v1 L(bexit89): - j ra subu v0, t8, t9 + jr ra + +L(bexit01): + subu v0, a2, a3 + jr ra .set at - .set reorder END(STRCMP_NAME) #ifndef ANDROID_CHANGES