From patchwork Thu Jan 23 13:42:57 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105296 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B2F93858430 for ; Thu, 23 Jan 2025 13:45:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B2F93858430 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=SYSI2OxB X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x333.google.com (mail-wm1-x333.google.com [IPv6:2a00:1450:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id 589543858D3C for ; Thu, 23 Jan 2025 13:43:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 589543858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 589543858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::333 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639810; cv=none; b=Tdy3PuyCAGbeO4QdOYYPY2FeJKaQcs/JYTRbd0rSTA0NY2raICzijZdxWEVfWxNwGZH2CZPTlXuZDRXTdMugMDDCGjSQkpAk3LKRGF7aFhMnBEE+GCrS473+NBVDr1nIAlGVNP6RoLwWaEF00/SLTNzFm9dHRNBEfwxmkS4MAe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639810; c=relaxed/simple; bh=c9zlApb2tWmRoclPf9fDmcQE4PU8ZBBldc4axfS5TD8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=STZkRHuP2aQ4PBixPh8hUvtsD4gLmmXuvm0g6Z5vRvQbP4Cv9y2jbrM6RH4IGJV1ENfkBhkbDbm3jBit3DyY9Sq8NkBP476UFac9CknOLteSxIJY3xpoSc7G84jYwZKEBOa/yF15SkGajFll9uCnQBdcAdnZXR4ICXwy/Q/ufjs= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 589543858D3C Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-4363298fff2so740115e9.3 for ; Thu, 23 Jan 2025 05:43:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639809; x=1738244609; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=f+1tnpTFdo0eF1lSbqNh84VKm68dtzZA5HerW3a/Kp4=; b=SYSI2OxBpLBTAGIuuk22k5USWG+B1Z7QrFChpErj4exNSY/FDCEMddQNqGelHLemYl Y+YBPMQhUkj4a6CXl1oyZ3YjyY8CETJ5+RpvuaWN5oCZXdKxpoNQp1uDwoee8FrwzDrR jrkQPg8z4JO8QxvvgQLgp1PTJ+kzW+Dwitfw8GHD90A25uBydMM0pGdLUUU5QwFspjLG 4qspHMizk/Lfc2qu/LkRRjbpRWN61poaCfYSZtxrGgI5mlo08UBKmwqYyZpJ5oUfeIgf uOeGs9QcA9PB00y26Zr4q6e3OhDp7OeKabWulCpz1wxPAJAxi+SAmLOmG1Zny6emQIb2 ktag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639809; x=1738244609; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=f+1tnpTFdo0eF1lSbqNh84VKm68dtzZA5HerW3a/Kp4=; b=HtITgUJkVeXT/J5fmsOWDO40Rn5im+6zwrb/mO8c2FUoK0ETvKLH4fmgoF3Sr5DQDP hu+OmOsHbs/hJ6VDiSHMhwhZ2uxZEqknZJzMoVzXAK0/ITcqPB1066X1CnLnE3rjDwZ0 jtBT+kiD9tPXhASaVI7YrK8QEdn9Ts5OAr/OZbzBABdQgKgXJZGF7z7Q0eB/XbK3jAgZ VxIc3kq2FVWdY7qteox4lDo6EyqgjU/QjSQS23MJSvwhSUY++cec78oSUEVd87S9TV0w FUya9KZsGbkWGF4FbWLhLWduxnwPPccIPu3rKbMgDMJVcdjra5E/eQLNllkFLDW/VKEs 4EYA== X-Gm-Message-State: AOJu0YygTESCgixwKnWf/LHhmoNRVrPRc4pZ1huqRKu0/MZoJ4m4gkc+ cvw3RHAFhaLILtZoHAIf6f2wQihmd6gZhc0GWQMlJuGH2Z/IGBX7oM8oig== X-Gm-Gg: ASbGnctavmg5b9258+GqxlNC1n25GbSx44M8jDkqF6z2k4TAAQCunQ6y2Jy441dmbWT ZVYVdA7BLPNIqHqQzE9QVPqR2khmmOs9Xj8sHbWrGEUNAHmfRLp/83XAnOSRwRKc9Xy3KSQFziw Do4P6QPT90//7cS7XXBpBWt8s593ZeGyt7gepu+q+AuV3q/iF1s3Q+onlqeiUKJ91DafCZKAgJ4 WCNyBG/XCx9R6VRFnYor0Sd4VoIsxnXrQQoLW4tNylXo1Xd/5ct3TeorOHr/D5QaI23nqfqp/Ep mY0gIEHnhLssDRE8AASzjXqu5poe X-Google-Smtp-Source: AGHT+IGh7rypOkzYPjvlziGR/5zP0P4xjXx+PFH/Ay3UZIO2cM2kQM45hS/O4sHVJyhVgvWgI4WF+g== X-Received: by 2002:a05:600c:3b0d:b0:42c:aeee:80a with SMTP id 5b1f17b1804b1-438b17d5b09mr31197645e9.7.1737639807866; Thu, 23 Jan 2025 05:43:27 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:27 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Matthew Fortune , Andrew Bennett , Faraz Shahbazker Subject: [PATCH 01/11] Updates for microMIPS Release 6 Date: Thu, 23 Jan 2025 14:42:57 +0100 Message-Id: <20250123134308.1785777-3-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_NUMSUBJECT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org * Remove noreorder * Fix PC relative code label calculations for microMIPSR6 * Add special versions of code that would be de-optimised by removing noreorder * Avoid use of un-aligned ADDIUPC instruction for address calculation. Cherry-picked 94a52199502361be4a5b1cc616661e287416cc8d from https://github.com/MIPS/glibc Signed-off-by: Matthew Fortune Signed-off-by: Andrew Bennett Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/add_n.S | 12 +- sysdeps/mips/addmul_1.S | 11 +- sysdeps/mips/dl-machine.h | 15 ++- sysdeps/mips/dl-trampoline.c | 4 - sysdeps/mips/lshift.S | 12 +- sysdeps/mips/machine-gmon.h | 82 +++++++++++++ sysdeps/mips/memcpy.S | 120 +++++++++++-------- sysdeps/mips/memset.S | 62 +++++----- sysdeps/mips/mips32/crtn.S | 12 +- sysdeps/mips/mips64/__longjmp.c | 2 +- sysdeps/mips/mips64/add_n.S | 12 +- sysdeps/mips/mips64/addmul_1.S | 11 +- sysdeps/mips/mips64/lshift.S | 12 +- sysdeps/mips/mips64/mul_1.S | 11 +- sysdeps/mips/mips64/n32/crtn.S | 12 +- sysdeps/mips/mips64/n64/crtn.S | 12 +- sysdeps/mips/mips64/rshift.S | 12 +- sysdeps/mips/mips64/sub_n.S | 12 +- sysdeps/mips/mips64/submul_1.S | 11 +- sysdeps/mips/mul_1.S | 11 +- sysdeps/mips/rshift.S | 12 +- sysdeps/mips/sub_n.S | 12 +- sysdeps/mips/submul_1.S | 11 +- sysdeps/mips/sys/asm.h | 20 +--- sysdeps/unix/mips/mips32/sysdep.h | 4 - sysdeps/unix/mips/mips64/sysdep.h | 4 - sysdeps/unix/mips/sysdep.h | 2 - sysdeps/unix/sysv/linux/mips/mips32/sysdep.h | 10 -- sysdeps/unix/sysv/linux/mips/mips64/sysdep.h | 14 --- 29 files changed, 260 insertions(+), 277 deletions(-) diff --git a/sysdeps/mips/add_n.S b/sysdeps/mips/add_n.S index 234e1e3c8d..f4d98fa38c 100644 --- a/sysdeps/mips/add_n.S +++ b/sysdeps/mips/add_n.S @@ -31,19 +31,16 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_add_n) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - lw $10,0($5) lw $11,0($6) addiu $7,$7,-1 and $9,$7,4-1 /* number of limbs in first loop */ - beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ move $2,$0 + beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ subu $7,$7,$9 @@ -61,11 +58,10 @@ L(Loop0): addiu $9,$9,-1 addiu $6,$6,4 move $10,$12 move $11,$13 - bne $9,$0,L(Loop0) addiu $4,$4,4 + bne $9,$0,L(Loop0) L(L0): beq $7,$0,L(end) - nop L(Loop): addiu $7,$7,-4 @@ -108,14 +104,14 @@ L(Loop): addiu $7,$7,-4 addiu $5,$5,16 addiu $6,$6,16 - bne $7,$0,L(Loop) addiu $4,$4,16 + bne $7,$0,L(Loop) L(end): addu $11,$11,$2 sltu $8,$11,$2 addu $11,$10,$11 sltu $2,$11,$10 sw $11,0($4) - j $31 or $2,$2,$8 + jr $31 END (__mpn_add_n) diff --git a/sysdeps/mips/addmul_1.S b/sysdeps/mips/addmul_1.S index 523478d7e8..eea26630fc 100644 --- a/sysdeps/mips/addmul_1.S +++ b/sysdeps/mips/addmul_1.S @@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_addmul_1) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - /* warm up phase 0 */ lw $8,0($5) @@ -50,12 +47,12 @@ ENTRY (__mpn_addmul_1) #endif addiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 /* zero cy2 */ + beq $6,$0,L(LC0) addiu $6,$6,-1 - beq $6,$0,L(LC1) lw $8,0($5) /* load new s1 limb as early as possible */ + beq $6,$0,L(LC1) L(Loop): lw $10,0($4) #if __mips_isa_rev < 6 @@ -81,8 +78,8 @@ L(Loop): lw $10,0($4) addu $2,$2,$10 sw $3,0($4) addiu $4,$4,4 - bne $6,$0,L(Loop) /* should be "bnel" */ addu $2,$9,$2 /* add high product limb and carry from addition */ + bne $6,$0,L(Loop) /* should be "bnel" */ /* cool down phase 1 */ L(LC1): lw $10,0($4) @@ -123,6 +120,6 @@ L(LC0): lw $10,0($4) sltu $10,$3,$10 addu $2,$2,$10 sw $3,0($4) - j $31 addu $2,$9,$2 /* add high product limb and carry from addition */ + jr $31 END (__mpn_addmul_1) diff --git a/sysdeps/mips/dl-machine.h b/sysdeps/mips/dl-machine.h index 10e30f1e90..a360dfcd63 100644 --- a/sysdeps/mips/dl-machine.h +++ b/sysdeps/mips/dl-machine.h @@ -127,16 +127,13 @@ elf_machine_load_address (void) { ElfW(Addr) addr; #ifndef __mips16 - asm (" .set noreorder\n" - " " STRINGXP (PTR_LA) " %0, 0f\n" + asm (" " STRINGXP (PTR_LA) " %0, 0f\n" # if !defined __mips_isa_rev || __mips_isa_rev < 6 " bltzal $0, 0f\n" - " nop\n" +#else + " bal 0f\n" +#endif "0: " STRINGXP (PTR_SUBU) " %0, $31, %0\n" -# else - "0: addiupc $31, 0\n" - " " STRINGXP (PTR_SUBU) " %0, $31, %0\n" -# endif " .set reorder\n" : "=r" (addr) : /* No inputs */ @@ -237,7 +234,9 @@ do { \ and not just plain _start. */ #ifndef __mips16 -# if !defined __mips_isa_rev || __mips_isa_rev < 6 +/* Although microMIPSr6 has an ADDIUPC instruction, it must be 4-byte aligned + for the address calculation to be valid. */ +# if !defined __mips_isa_rev || __mips_isa_rev < 6 || defined __mips_micromips # define LCOFF STRINGXP(.Lcof2) # define LOAD_31 STRINGXP(bltzal $8) "," STRINGXP(.Lcof2) # else diff --git a/sysdeps/mips/dl-trampoline.c b/sysdeps/mips/dl-trampoline.c index 603ee2d2f8..915e1da6ad 100644 --- a/sysdeps/mips/dl-trampoline.c +++ b/sysdeps/mips/dl-trampoline.c @@ -301,7 +301,6 @@ asm ("\n\ .ent _dl_runtime_resolve\n\ _dl_runtime_resolve:\n\ .frame $29, " STRINGXP(ELF_DL_FRAME_SIZE) ", $31\n\ - .set noreorder\n\ # Save GP.\n\ 1: move $3, $28\n\ # Save arguments and sp value in stack.\n\ @@ -311,7 +310,6 @@ _dl_runtime_resolve:\n\ # Compute GP.\n\ 2: " STRINGXP(SETUP_GP) "\n\ " STRINGXV(SETUP_GP64 (0, _dl_runtime_resolve)) "\n\ - .set reorder\n\ # Save slot call pc.\n\ move $2, $31\n\ " IFABIO32(STRINGXP(CPRESTORE(32))) "\n\ @@ -358,7 +356,6 @@ asm ("\n\ .ent _dl_runtime_pltresolve\n\ _dl_runtime_pltresolve:\n\ .frame $29, " STRINGXP(ELF_DL_PLT_FRAME_SIZE) ", $31\n\ - .set noreorder\n\ # Save arguments and sp value in stack.\n\ 1: " STRINGXP(PTR_SUBIU) " $29, " STRINGXP(ELF_DL_PLT_FRAME_SIZE) "\n\ " IFABIO32(STRINGXP(PTR_L) " $13, " STRINGXP(PTRSIZE) "($28)") "\n\ @@ -368,7 +365,6 @@ _dl_runtime_pltresolve:\n\ # Compute GP.\n\ 2: " STRINGXP(SETUP_GP) "\n\ " STRINGXV(SETUP_GP64 (0, _dl_runtime_pltresolve)) "\n\ - .set reorder\n\ " IFABIO32(STRINGXP(CPRESTORE(32))) "\n\ " ELF_DL_PLT_SAVE_ARG_REGS "\ move $4, $13\n\ diff --git a/sysdeps/mips/lshift.S b/sysdeps/mips/lshift.S index 04caa76a84..c6c42aa1f5 100644 --- a/sysdeps/mips/lshift.S +++ b/sysdeps/mips/lshift.S @@ -30,12 +30,9 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_lshift) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - sll $2,$6,2 addu $5,$5,$2 /* make r5 point at end of src */ lw $10,-4($5) /* load first limb */ @@ -43,8 +40,8 @@ ENTRY (__mpn_lshift) addu $4,$4,$2 /* make r4 point at end of res */ addiu $6,$6,-1 and $9,$6,4-1 /* number of limbs in first loop */ - beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ srl $2,$10,$13 /* compute function result */ + beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ subu $6,$6,$9 @@ -56,11 +53,10 @@ L(Loop0): lw $3,-8($5) srl $12,$3,$13 move $10,$3 or $8,$11,$12 - bne $9,$0,L(Loop0) sw $8,0($4) + bne $9,$0,L(Loop0) L(L0): beq $6,$0,L(Lend) - nop L(Loop): lw $3,-8($5) addiu $4,$4,-16 @@ -88,10 +84,10 @@ L(Loop): lw $3,-8($5) addiu $5,$5,-16 or $8,$14,$9 - bgtz $6,L(Loop) sw $8,0($4) + bgtz $6,L(Loop) L(Lend): sll $8,$10,$7 - j $31 sw $8,-4($4) + jr $31 END (__mpn_lshift) diff --git a/sysdeps/mips/machine-gmon.h b/sysdeps/mips/machine-gmon.h index e2e0756575..d890e5ec19 100644 --- a/sysdeps/mips/machine-gmon.h +++ b/sysdeps/mips/machine-gmon.h @@ -34,6 +34,42 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc) # define CPRESTORE #endif +#if __mips_isa_rev > 5 && defined (__mips_micromips) +#define MCOUNT asm(\ + ".globl _mcount;\n\t" \ + ".align 2;\n\t" \ + ".set push;\n\t" \ + ".set nomips16;\n\t" \ + ".type _mcount,@function;\n\t" \ + ".ent _mcount\n\t" \ + "_mcount:\n\t" \ + ".frame $sp,44,$31\n\t" \ + ".set noat;\n\t" \ + CPLOAD \ + "subu $29,$29,48;\n\t" \ + CPRESTORE \ + "sw $4,24($29);\n\t" \ + "sw $5,28($29);\n\t" \ + "sw $6,32($29);\n\t" \ + "sw $7,36($29);\n\t" \ + "sw $2,40($29);\n\t" \ + "sw $1,16($29);\n\t" \ + "sw $31,20($29);\n\t" \ + "move $5,$31;\n\t" \ + "move $4,$1;\n\t" \ + "balc __mcount;\n\t" \ + "lw $4,24($29);\n\t" \ + "lw $5,28($29);\n\t" \ + "lw $6,32($29);\n\t" \ + "lw $7,36($29);\n\t" \ + "lw $2,40($29);\n\t" \ + "lw $1,20($29);\n\t" \ + "lw $31,16($29);\n\t" \ + "addu $29,$29,56;\n\t" \ + "jrc $1;\n\t" \ + ".end _mcount;\n\t" \ + ".set pop"); +#else #define MCOUNT asm(\ ".globl _mcount;\n\t" \ ".align 2;\n\t" \ @@ -71,6 +107,7 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc) "move $31,$1;\n\t" \ ".end _mcount;\n\t" \ ".set pop"); +#endif #else @@ -97,6 +134,50 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc) # error "Unknown ABI" #endif +#if __mips_isa_rev > 5 && defined (__mips_micromips) +#define MCOUNT asm(\ + ".globl _mcount;\n\t" \ + ".align 3;\n\t" \ + ".set push;\n\t" \ + ".set nomips16;\n\t" \ + ".type _mcount,@function;\n\t" \ + ".ent _mcount\n\t" \ + "_mcount:\n\t" \ + ".frame $sp,88,$31\n\t" \ + ".set noat;\n\t" \ + PTR_SUBU_STRING " $29,$29,96;\n\t" \ + CPSETUP \ + "sd $4,24($29);\n\t" \ + "sd $5,32($29);\n\t" \ + "sd $6,40($29);\n\t" \ + "sd $7,48($29);\n\t" \ + "sd $8,56($29);\n\t" \ + "sd $9,64($29);\n\t" \ + "sd $10,72($29);\n\t" \ + "sd $11,80($29);\n\t" \ + "sd $2,16($29);\n\t" \ + "sd $1,0($29);\n\t" \ + "sd $31,8($29);\n\t" \ + "move $5,$31;\n\t" \ + "move $4,$1;\n\t" \ + "balc __mcount;\n\t" \ + "ld $4,24($29);\n\t" \ + "ld $5,32($29);\n\t" \ + "ld $6,40($29);\n\t" \ + "ld $7,48($29);\n\t" \ + "ld $8,56($29);\n\t" \ + "ld $9,64($29);\n\t" \ + "ld $10,72($29);\n\t" \ + "ld $11,80($29);\n\t" \ + "ld $2,16($29);\n\t" \ + "ld $1,8($29);\n\t" \ + "ld $31,0($29);\n\t" \ + CPRETURN \ + PTR_ADDU_STRING " $29,$29,96;\n\t" \ + "jrc $1;\n\t" \ + ".end _mcount;\n\t" \ + ".set pop"); +#else #define MCOUNT asm(\ ".globl _mcount;\n\t" \ ".align 3;\n\t" \ @@ -142,5 +223,6 @@ static void __attribute_used__ __mcount (u_long frompc, u_long selfpc) "move $31,$1;\n\t" \ ".end _mcount;\n\t" \ ".set pop"); +#endif #endif diff --git a/sysdeps/mips/memcpy.S b/sysdeps/mips/memcpy.S index 5b277e07c5..96d1c92d89 100644 --- a/sysdeps/mips/memcpy.S +++ b/sysdeps/mips/memcpy.S @@ -86,6 +86,12 @@ # endif #endif +#if __mips_isa_rev > 5 && defined (__mips_micromips) +# define PTR_BC bc16 +#else +# define PTR_BC bc +#endif + /* * Using PREFETCH_HINT_LOAD_STREAMED instead of PREFETCH_LOAD on load * prefetches appear to offer a slight performance advantage. @@ -272,7 +278,6 @@ LEAF(MEMCPY_NAME, 0) LEAF(MEMCPY_NAME) #endif .set nomips16 - .set noreorder /* * Below we handle the case where memcpy is called with overlapping src and dst. * Although memcpy is not required to handle this case, some parts of Android @@ -284,10 +289,9 @@ LEAF(MEMCPY_NAME) xor t1,t0,t2 PTR_SUBU t0,t1,t2 sltu t2,t0,a2 - beq t2,zero,L(memcpy) la t9,memmove + beq t2,zero,L(memcpy) jr t9 - nop L(memcpy): #endif /* @@ -295,12 +299,12 @@ L(memcpy): * size, copy dst pointer to v0 for the return value. */ slti t2,a2,(2 * NSIZE) - bne t2,zero,L(lasts) #if defined(RETURN_FIRST_PREFETCH) || defined(RETURN_LAST_PREFETCH) move v0,zero #else move v0,a0 #endif + bne t2,zero,L(lasts) #ifndef R6_CODE @@ -312,12 +316,12 @@ L(memcpy): */ xor t8,a1,a0 andi t8,t8,(NSIZE-1) /* t8 is a0/a1 word-displacement */ - bne t8,zero,L(unaligned) PTR_SUBU a3, zero, a0 + bne t8,zero,L(unaligned) andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */ + PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */ beq a3,zero,L(aligned) /* if a3=0, it is already aligned */ - PTR_SUBU a2,a2,a3 /* a2 is the remaining bytes count */ C_LDHI t8,0(a1) PTR_ADDU a1,a1,a3 @@ -332,18 +336,24 @@ L(memcpy): * align instruction. */ andi t8,a0,7 +#ifdef __mips_micromips + auipc t9,%pcrel_hi(L(atable)) + addiu t9,t9,%pcrel_lo(L(atable)+4) + PTR_LSA t9,t8,t9,1 +#else lapc t9,L(atable) PTR_LSA t9,t8,t9,2 +#endif jrc t9 L(atable): - bc L(lb0) - bc L(lb7) - bc L(lb6) - bc L(lb5) - bc L(lb4) - bc L(lb3) - bc L(lb2) - bc L(lb1) + PTR_BC L(lb0) + PTR_BC L(lb7) + PTR_BC L(lb6) + PTR_BC L(lb5) + PTR_BC L(lb4) + PTR_BC L(lb3) + PTR_BC L(lb2) + PTR_BC L(lb1) L(lb7): lb a3, 6(a1) sb a3, 6(a0) @@ -374,20 +384,26 @@ L(lb1): L(lb0): andi t8,a1,(NSIZE-1) +#ifdef __mips_micromips + auipc t9,%pcrel_hi(L(jtable)) + addiu t9,t9,%pcrel_lo(L(jtable)+4) + PTR_LSA t9,t8,t9,1 +#else lapc t9,L(jtable) PTR_LSA t9,t8,t9,2 +#endif jrc t9 L(jtable): - bc L(aligned) - bc L(r6_unaligned1) - bc L(r6_unaligned2) - bc L(r6_unaligned3) -# ifdef USE_DOUBLE - bc L(r6_unaligned4) - bc L(r6_unaligned5) - bc L(r6_unaligned6) - bc L(r6_unaligned7) -# endif + PTR_BC L(aligned) + PTR_BC L(r6_unaligned1) + PTR_BC L(r6_unaligned2) + PTR_BC L(r6_unaligned3) +#ifdef USE_DOUBLE + PTR_BC L(r6_unaligned4) + PTR_BC L(r6_unaligned5) + PTR_BC L(r6_unaligned6) + PTR_BC L(r6_unaligned7) +#endif #endif /* R6_CODE */ L(aligned): @@ -401,8 +417,8 @@ L(aligned): */ andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ + beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ /* When in the loop we may prefetch with the 'prepare to store' hint, @@ -428,7 +444,6 @@ L(aligned): # if PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE sltu v1,t9,a0 bgtz v1,L(skip_set) - nop PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4) L(skip_set): # else @@ -444,11 +459,16 @@ L(skip_set): #endif L(loop16w): C_LD t0,UNIT(0)(a1) +/* We need to separate out the C_LD instruction here so that it will work + both when it is used by itself and when it is used with the branch + instruction. */ #if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */ + C_LD t1,UNIT(1)(a1) bgtz v1,L(skip_pref) -#endif +#else C_LD t1,UNIT(1)(a1) +#endif #ifdef R6_CODE PREFETCH_FOR_STORE (2, a0) #else @@ -502,8 +522,8 @@ L(skip_pref): C_ST REG6,UNIT(14)(a0) C_ST REG7,UNIT(15)(a0) PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ - bne a0,a3,L(loop16w) PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */ + bne a0,a3,L(loop16w) move a2,t8 /* Here we have src and dest word-aligned but less than 64-bytes or @@ -517,7 +537,6 @@ L(chkw): andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */ /* The t8 is the reminder count past 32-bytes */ beq a2,t8,L(chk1w) /* When a2=t8, no 32-byte chunk */ - nop C_LD t0,UNIT(0)(a1) C_LD t1,UNIT(1)(a1) C_LD REG2,UNIT(2)(a1) @@ -546,8 +565,8 @@ L(chkw): */ L(chk1w): andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - beq a2,t8,L(lastw) PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ + beq a2,t8,L(lastw) PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ /* copying in words (4-byte or 8-byte chunks) */ @@ -555,8 +574,8 @@ L(wordCopy_loop): C_LD REG3,UNIT(0)(a1) PTR_ADDIU a0,a0,UNIT(1) PTR_ADDIU a1,a1,UNIT(1) - bne a0,a3,L(wordCopy_loop) C_ST REG3,UNIT(-1)(a0) + bne a0,a3,L(wordCopy_loop) /* If we have been copying double words, see if we can copy a single word before doing byte copies. We can have, at most, one word to copy. */ @@ -574,17 +593,16 @@ L(lastw): /* Copy the last 8 (or 16) bytes */ L(lastb): - blez a2,L(leave) PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ + blez a2,L(leave) L(lastbloop): lb v1,0(a1) PTR_ADDIU a0,a0,1 PTR_ADDIU a1,a1,1 - bne a0,a3,L(lastbloop) sb v1,-1(a0) + bne a0,a3,L(lastbloop) L(leave): - j ra - nop + jr ra /* We jump here with a memcpy of less than 8 or 16 bytes, depending on whether or not USE_DOUBLE is defined. Instead of just doing byte @@ -625,8 +643,8 @@ L(wcopy_loop): L(unaligned): andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */ + PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */ beqz a3,L(ua_chk16w) /* if a3=0, it is already aligned */ - PTR_SUBU a2,a2,a3 /* a2 is the remaining bytes count */ C_LDHI v1,UNIT(0)(a1) C_LDLO v1,UNITM1(1)(a1) @@ -644,8 +662,8 @@ L(unaligned): L(ua_chk16w): andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ + beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ # if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) @@ -664,7 +682,6 @@ L(ua_chk16w): # if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) sltu v1,t9,a0 bgtz v1,L(ua_skip_set) - nop PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4) L(ua_skip_set): # else @@ -676,11 +693,16 @@ L(ua_loop16w): C_LDHI t0,UNIT(0)(a1) C_LDHI t1,UNIT(1)(a1) C_LDHI REG2,UNIT(2)(a1) +/* We need to separate out the C_LDHI instruction here so that it will work + both when it is used by itself and when it is used with the branch + instruction. */ # if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) sltu v1,t9,a0 + C_LDHI REG3,UNIT(3)(a1) bgtz v1,L(ua_skip_pref) -# endif +# else C_LDHI REG3,UNIT(3)(a1) +# endif PREFETCH_FOR_STORE (4, a0) PREFETCH_FOR_STORE (5, a0) L(ua_skip_pref): @@ -731,8 +753,8 @@ L(ua_skip_pref): C_ST REG6,UNIT(14)(a0) C_ST REG7,UNIT(15)(a0) PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ - bne a0,a3,L(ua_loop16w) PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */ + bne a0,a3,L(ua_loop16w) move a2,t8 /* Here we have src and dest word-aligned but less than 64-bytes or @@ -745,7 +767,6 @@ L(ua_chkw): andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */ /* t8 is the reminder count past 32-bytes */ beq a2,t8,L(ua_chk1w) /* When a2=t8, no 32-byte chunk */ - nop C_LDHI t0,UNIT(0)(a1) C_LDHI t1,UNIT(1)(a1) C_LDHI REG2,UNIT(2)(a1) @@ -778,8 +799,8 @@ L(ua_chkw): */ L(ua_chk1w): andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - beq a2,t8,L(ua_smallCopy) PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ + beq a2,t8,L(ua_smallCopy) PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ /* copying in words (4-byte or 8-byte chunks) */ @@ -788,22 +809,21 @@ L(ua_wordCopy_loop): C_LDLO v1,UNITM1(1)(a1) PTR_ADDIU a0,a0,UNIT(1) PTR_ADDIU a1,a1,UNIT(1) - bne a0,a3,L(ua_wordCopy_loop) C_ST v1,UNIT(-1)(a0) + bne a0,a3,L(ua_wordCopy_loop) /* Copy the last 8 (or 16) bytes */ L(ua_smallCopy): - beqz a2,L(leave) PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ + beqz a2,L(leave) L(ua_smallCopy_loop): lb v1,0(a1) PTR_ADDIU a0,a0,1 PTR_ADDIU a1,a1,1 - bne a0,a3,L(ua_smallCopy_loop) sb v1,-1(a0) + bne a0,a3,L(ua_smallCopy_loop) - j ra - nop + jr ra #else /* R6_CODE */ @@ -816,9 +836,9 @@ L(ua_smallCopy_loop): # endif # define R6_UNALIGNED_WORD_COPY(BYTEOFFSET) \ andi REG7, a2, (NSIZE-1);/* REG7 is # of bytes to by bytes. */ \ - beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \ PTR_SUBU a3, a2, REG7; /* a3 is number of bytes to be copied in */ \ /* (d)word chunks. */ \ + beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \ move a2, REG7; /* a2 is # of bytes to copy byte by byte */ \ /* after word loop is finished. */ \ PTR_ADDU REG6, a0, a3; /* REG6 is the dst address after loop. */ \ @@ -831,10 +851,9 @@ L(r6_ua_wordcopy##BYTEOFFSET): \ PTR_ADDIU a0, a0, UNIT(1); /* Increment destination pointer. */ \ PTR_ADDIU REG2, REG2, UNIT(1); /* Increment aligned source pointer.*/ \ move t0, t1; /* Move second part of source to first. */ \ - bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \ C_ST REG3, UNIT(-1)(a0); \ + bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \ j L(lastb); \ - nop /* We are generating R6 code, the destination is 4 byte aligned and the source is not 4 byte aligned. t8 is 1, 2, or 3 depending on the @@ -859,7 +878,6 @@ L(r6_unaligned7): #endif /* R6_CODE */ .set at - .set reorder END(MEMCPY_NAME) #ifndef ANDROID_CHANGES # ifdef _LIBC diff --git a/sysdeps/mips/memset.S b/sysdeps/mips/memset.S index 466599b9f4..0c8375c9f5 100644 --- a/sysdeps/mips/memset.S +++ b/sysdeps/mips/memset.S @@ -82,6 +82,12 @@ # endif #endif +#if __mips_isa_rev > 5 && defined (__mips_micromips) +# define PTR_BC bc16 +#else +# define PTR_BC bc +#endif + /* Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE or PREFETCH_STORE_STREAMED offers a large performance advantage but PREPAREFORSTORE has some special restrictions to consider. @@ -205,17 +211,16 @@ LEAF(MEMSET_NAME) #endif .set nomips16 - .set noreorder -/* If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of +/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of size, copy dst pointer to v0 for the return value. */ - slti t2,a2,(2 * NSIZE) - bne t2,zero,L(lastb) + slti t2,a2,(4 * NSIZE) move v0,a0 + bne t2,zero,L(lastb) /* If memset value is not zero, we copy it to all the bytes in a 32 or 64 bit word. */ - beq a1,zero,L(set0) /* If memset value is zero no smear */ PTR_SUBU a3,zero,a0 + beq a1,zero,L(set0) /* If memset value is zero no smear */ nop /* smear byte into 32 or 64 bit word */ @@ -251,26 +256,30 @@ LEAF(MEMSET_NAME) L(set0): #ifndef R6_CODE andi t2,a3,(NSIZE-1) /* word-unaligned address? */ - beq t2,zero,L(aligned) /* t2 is the unalignment count */ PTR_SUBU a2,a2,t2 + beq t2,zero,L(aligned) /* t2 is the unalignment count */ C_STHI a1,0(a0) PTR_ADDU a0,a0,t2 #else /* R6_CODE */ - andi t2,a0,(NSIZE-1) + andi t2,a0,7 +# ifdef __mips_micromips + auipc t9,%pcrel_hi(L(atable)) + addiu t9,t9,%pcrel_lo(L(atable)+4) + PTR_LSA t9,t2,t9,1 +# else lapc t9,L(atable) PTR_LSA t9,t2,t9,2 +# endif jrc t9 L(atable): - bc L(aligned) -# ifdef USE_DOUBLE - bc L(lb7) - bc L(lb6) - bc L(lb5) - bc L(lb4) -# endif - bc L(lb3) - bc L(lb2) - bc L(lb1) + PTR_BC L(aligned) + PTR_BC L(lb7) + PTR_BC L(lb6) + PTR_BC L(lb5) + PTR_BC L(lb4) + PTR_BC L(lb3) + PTR_BC L(lb2) + PTR_BC L(lb1) L(lb7): sb a1,6(a0) L(lb6): @@ -300,8 +309,8 @@ L(aligned): left to store or we would have jumped to L(lastb) earlier in the code. */ #ifdef DOUBLE_ALIGN andi t2,a3,4 - beq t2,zero,L(double_aligned) PTR_SUBU a2,a2,t2 + beq t2,zero,L(double_aligned) sw a1,0(a0) PTR_ADDU a0,a0,t2 L(double_aligned): @@ -313,8 +322,8 @@ L(double_aligned): chunks have been copied. We will loop, incrementing a0 until it equals a3. */ andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ + beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ /* When in the loop we may prefetch with the 'prepare to store' hint, @@ -339,7 +348,6 @@ L(loop16w): && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */ bgtz v1,L(skip_pref) - nop #endif #ifdef R6_CODE PREFETCH_FOR_STORE (2, a0) @@ -366,7 +374,6 @@ L(skip_pref): C_ST a1,UNIT(15)(a0) PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ bne a0,a3,L(loop16w) - nop move a2,t8 /* Here we have dest word-aligned but less than 64-bytes or 128 bytes to go. @@ -376,7 +383,6 @@ L(chkw): andi t8,a2,NSIZEMASK /* is there a 32-byte/64-byte chunk. */ /* the t8 is the reminder count past 32-bytes */ beq a2,t8,L(chk1w)/* when a2==t8, no 32-byte chunk */ - nop C_ST a1,UNIT(0)(a0) C_ST a1,UNIT(1)(a0) C_ST a1,UNIT(2)(a0) @@ -394,30 +400,28 @@ L(chkw): been copied. We will loop, incrementing a0 until a0 equals a3. */ L(chk1w): andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - beq a2,t8,L(lastb) PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ + beq a2,t8,L(lastb) PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ /* copying in words (4-byte or 8 byte chunks) */ L(wordCopy_loop): PTR_ADDIU a0,a0,UNIT(1) - bne a0,a3,L(wordCopy_loop) C_ST a1,UNIT(-1)(a0) + bne a0,a3,L(wordCopy_loop) /* Copy the last 8 (or 16) bytes */ L(lastb): - blez a2,L(leave) PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ + blez a2,L(leave) L(lastbloop): PTR_ADDIU a0,a0,1 - bne a0,a3,L(lastbloop) sb a1,-1(a0) + bne a0,a3,L(lastbloop) L(leave): - j ra - nop + jr ra .set at - .set reorder END(MEMSET_NAME) #ifndef ANDROID_CHANGES # ifdef _LIBC diff --git a/sysdeps/mips/mips32/crtn.S b/sysdeps/mips/mips32/crtn.S index 89ecbd9882..568aabd86e 100644 --- a/sysdeps/mips/mips32/crtn.S +++ b/sysdeps/mips/mips32/crtn.S @@ -40,18 +40,10 @@ .section .init,"ax",@progbits lw $31,28($sp) - .set noreorder - .set nomacro - j $31 addiu $sp,$sp,32 - .set macro - .set reorder + jr $31 .section .fini,"ax",@progbits lw $31,28($sp) - .set noreorder - .set nomacro - j $31 addiu $sp,$sp,32 - .set macro - .set reorder + jr $31 diff --git a/sysdeps/mips/mips64/__longjmp.c b/sysdeps/mips/mips64/__longjmp.c index 4a93e884c0..1a9bb7b23e 100644 --- a/sysdeps/mips/mips64/__longjmp.c +++ b/sysdeps/mips/mips64/__longjmp.c @@ -87,7 +87,7 @@ __longjmp (__jmp_buf env_arg, int val_arg) else asm volatile ("move $2, %0" : : "r" (val)); - asm volatile ("j $31"); + asm volatile ("jr $31"); /* Avoid `volatile function does return' warnings. */ for (;;); diff --git a/sysdeps/mips/mips64/add_n.S b/sysdeps/mips/mips64/add_n.S index 345d62fbc5..bab523fd5a 100644 --- a/sysdeps/mips/mips64/add_n.S +++ b/sysdeps/mips/mips64/add_n.S @@ -37,16 +37,13 @@ ENTRY (__mpn_add_n) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - ld $10,0($5) ld $11,0($6) daddiu $7,$7,-1 and $9,$7,4-1 # number of limbs in first loop - beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop move $2,$0 + beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsubu $7,$7,$9 @@ -64,11 +61,10 @@ L(Loop0): daddiu $9,$9,-1 daddiu $6,$6,8 move $10,$12 move $11,$13 - bne $9,$0,L(Loop0) daddiu $4,$4,8 + bne $9,$0,L(Loop0) L(L0): beq $7,$0,L(Lend) - nop L(Loop): daddiu $7,$7,-4 @@ -111,15 +107,15 @@ L(Loop): daddiu $7,$7,-4 daddiu $5,$5,32 daddiu $6,$6,32 - bne $7,$0,L(Loop) daddiu $4,$4,32 + bne $7,$0,L(Loop) L(Lend): daddu $11,$11,$2 sltu $8,$11,$2 daddu $11,$10,$11 sltu $2,$11,$10 sd $11,0($4) - j $31 or $2,$2,$8 + jr $31 END (__mpn_add_n) diff --git a/sysdeps/mips/mips64/addmul_1.S b/sysdeps/mips/mips64/addmul_1.S index d105938f00..d84edd76a0 100644 --- a/sysdeps/mips/mips64/addmul_1.S +++ b/sysdeps/mips/mips64/addmul_1.S @@ -36,9 +36,6 @@ ENTRY (__mpn_addmul_1) #ifdef PIC SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - # warm up phase 0 ld $8,0($5) @@ -52,12 +49,12 @@ ENTRY (__mpn_addmul_1) #endif daddiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 # zero cy2 + beq $6,$0,L(LC0) daddiu $6,$6,-1 - beq $6,$0,L(LC1) ld $8,0($5) # load new s1 limb as early as possible + beq $6,$0,L(LC1) L(Loop): ld $10,0($4) #if __mips_isa_rev < 6 @@ -83,8 +80,8 @@ L(Loop): ld $10,0($4) daddu $2,$2,$10 sd $3,0($4) daddiu $4,$4,8 - bne $6,$0,L(Loop) daddu $2,$9,$2 # add high product limb and carry from addition + bne $6,$0,L(Loop) # cool down phase 1 L(LC1): ld $10,0($4) @@ -125,7 +122,7 @@ L(LC0): ld $10,0($4) sltu $10,$3,$10 daddu $2,$2,$10 sd $3,0($4) - j $31 daddu $2,$9,$2 # add high product limb and carry from addition + jr $31 END (__mpn_addmul_1) diff --git a/sysdeps/mips/mips64/lshift.S b/sysdeps/mips/mips64/lshift.S index 2ea2e58b85..ca84385998 100644 --- a/sysdeps/mips/mips64/lshift.S +++ b/sysdeps/mips/mips64/lshift.S @@ -36,9 +36,6 @@ ENTRY (__mpn_lshift) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - dsll $2,$6,3 daddu $5,$5,$2 # make r5 point at end of src ld $10,-8($5) # load first limb @@ -46,8 +43,8 @@ ENTRY (__mpn_lshift) daddu $4,$4,$2 # make r4 point at end of res daddiu $6,$6,-1 and $9,$6,4-1 # number of limbs in first loop - beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsrl $2,$10,$13 # compute function result + beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsubu $6,$6,$9 @@ -59,11 +56,10 @@ L(Loop0): ld $3,-16($5) dsrl $12,$3,$13 move $10,$3 or $8,$11,$12 - bne $9,$0,L(Loop0) sd $8,0($4) + bne $9,$0,L(Loop0) L(L0): beq $6,$0,L(Lend) - nop L(Loop): ld $3,-16($5) daddiu $4,$4,-32 @@ -91,10 +87,10 @@ L(Loop): ld $3,-16($5) daddiu $5,$5,-32 or $8,$14,$9 - bgtz $6,L(Loop) sd $8,0($4) + bgtz $6,L(Loop) L(Lend): dsll $8,$10,$7 - j $31 sd $8,-8($4) + jr $31 END (__mpn_lshift) diff --git a/sysdeps/mips/mips64/mul_1.S b/sysdeps/mips/mips64/mul_1.S index 321789b345..7604bac3a2 100644 --- a/sysdeps/mips/mips64/mul_1.S +++ b/sysdeps/mips/mips64/mul_1.S @@ -37,9 +37,6 @@ ENTRY (__mpn_mul_1) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - # warm up phase 0 ld $8,0($5) @@ -53,12 +50,12 @@ ENTRY (__mpn_mul_1) #endif daddiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 # zero cy2 + beq $6,$0,L(LC0) daddiu $6,$6,-1 - beq $6,$0,L(LC1) ld $8,0($5) # load new s1 limb as early as possible + beq $6,$0,L(LC1) #if __mips_isa_rev < 6 L(Loop): mflo $10 @@ -80,8 +77,8 @@ L(Loop): move $10,$11 sltu $2,$10,$2 # carry from previous addition -> $2 sd $10,0($4) daddiu $4,$4,8 - bne $6,$0,L(Loop) daddu $2,$9,$2 # add high product limb and carry from addition + bne $6,$0,L(Loop) # cool down phase 1 #if __mips_isa_rev < 6 @@ -114,7 +111,7 @@ L(LC0): move $10,$11 daddu $10,$10,$2 sltu $2,$10,$2 sd $10,0($4) - j $31 daddu $2,$9,$2 # add high product limb and carry from addition + jr $31 END (__mpn_mul_1) diff --git a/sysdeps/mips/mips64/n32/crtn.S b/sysdeps/mips/mips64/n32/crtn.S index 633d79cfad..8d4c83381c 100644 --- a/sysdeps/mips/mips64/n32/crtn.S +++ b/sysdeps/mips/mips64/n32/crtn.S @@ -41,19 +41,11 @@ .section .init,"ax",@progbits ld $31,8($sp) ld $28,0($sp) - .set noreorder - .set nomacro - j $31 addiu $sp,$sp,16 - .set macro - .set reorder + jr $31 .section .fini,"ax",@progbits ld $31,8($sp) ld $28,0($sp) - .set noreorder - .set nomacro - j $31 addiu $sp,$sp,16 - .set macro - .set reorder + jr $31 diff --git a/sysdeps/mips/mips64/n64/crtn.S b/sysdeps/mips/mips64/n64/crtn.S index 99ed1e3263..110040c9fc 100644 --- a/sysdeps/mips/mips64/n64/crtn.S +++ b/sysdeps/mips/mips64/n64/crtn.S @@ -41,19 +41,11 @@ .section .init,"ax",@progbits ld $31,8($sp) ld $28,0($sp) - .set noreorder - .set nomacro - j $31 daddiu $sp,$sp,16 - .set macro - .set reorder + jr $31 .section .fini,"ax",@progbits ld $31,8($sp) ld $28,0($sp) - .set noreorder - .set nomacro - j $31 daddiu $sp,$sp,16 - .set macro - .set reorder + jr $31 diff --git a/sysdeps/mips/mips64/rshift.S b/sysdeps/mips/mips64/rshift.S index 1f6e3a2a12..153aacfd86 100644 --- a/sysdeps/mips/mips64/rshift.S +++ b/sysdeps/mips/mips64/rshift.S @@ -36,15 +36,12 @@ ENTRY (__mpn_rshift) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - ld $10,0($5) # load first limb dsubu $13,$0,$7 daddiu $6,$6,-1 and $9,$6,4-1 # number of limbs in first loop - beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsll $2,$10,$13 # compute function result + beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsubu $6,$6,$9 @@ -56,11 +53,10 @@ L(Loop0): ld $3,8($5) dsll $12,$3,$13 move $10,$3 or $8,$11,$12 - bne $9,$0,L(Loop0) sd $8,-8($4) + bne $9,$0,L(Loop0) L(L0): beq $6,$0,L(Lend) - nop L(Loop): ld $3,8($5) daddiu $4,$4,32 @@ -88,10 +84,10 @@ L(Loop): ld $3,8($5) daddiu $5,$5,32 or $8,$14,$9 - bgtz $6,L(Loop) sd $8,-8($4) + bgtz $6,L(Loop) L(Lend): dsrl $8,$10,$7 - j $31 sd $8,0($4) + jr $31 END (__mpn_rshift) diff --git a/sysdeps/mips/mips64/sub_n.S b/sysdeps/mips/mips64/sub_n.S index b83d5ccab6..5b7337472f 100644 --- a/sysdeps/mips/mips64/sub_n.S +++ b/sysdeps/mips/mips64/sub_n.S @@ -37,16 +37,13 @@ ENTRY (__mpn_sub_n) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - ld $10,0($5) ld $11,0($6) daddiu $7,$7,-1 and $9,$7,4-1 # number of limbs in first loop - beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop move $2,$0 + beq $9,$0,L(L0) # if multiple of 4 limbs, skip first loop dsubu $7,$7,$9 @@ -64,11 +61,10 @@ L(Loop0): daddiu $9,$9,-1 daddiu $6,$6,8 move $10,$12 move $11,$13 - bne $9,$0,L(Loop0) daddiu $4,$4,8 + bne $9,$0,L(Loop0) L(L0): beq $7,$0,L(Lend) - nop L(Loop): daddiu $7,$7,-4 @@ -111,15 +107,15 @@ L(Loop): daddiu $7,$7,-4 daddiu $5,$5,32 daddiu $6,$6,32 - bne $7,$0,L(Loop) daddiu $4,$4,32 + bne $7,$0,L(Loop) L(Lend): daddu $11,$11,$2 sltu $8,$11,$2 dsubu $11,$10,$11 sltu $2,$10,$11 sd $11,0($4) - j $31 or $2,$2,$8 + jr $31 END (__mpn_sub_n) diff --git a/sysdeps/mips/mips64/submul_1.S b/sysdeps/mips/mips64/submul_1.S index 46f26e8dde..121433d232 100644 --- a/sysdeps/mips/mips64/submul_1.S +++ b/sysdeps/mips/mips64/submul_1.S @@ -37,9 +37,6 @@ ENTRY (__mpn_submul_1) #ifdef __PIC__ SETUP_GP /* ??? unused */ #endif - .set noreorder - .set nomacro - # warm up phase 0 ld $8,0($5) @@ -53,12 +50,12 @@ ENTRY (__mpn_submul_1) #endif daddiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 # zero cy2 + beq $6,$0,L(LC0) daddiu $6,$6,-1 - beq $6,$0,L(LC1) ld $8,0($5) # load new s1 limb as early as possible + beq $6,$0,L(LC1) L(Loop): ld $10,0($4) #if __mips_isa_rev < 6 @@ -84,8 +81,8 @@ L(Loop): ld $10,0($4) daddu $2,$2,$10 sd $3,0($4) daddiu $4,$4,8 - bne $6,$0,L(Loop) daddu $2,$9,$2 # add high product limb and carry from addition + bne $6,$0,L(Loop) # cool down phase 1 L(LC1): ld $10,0($4) @@ -126,7 +123,7 @@ L(LC0): ld $10,0($4) sgtu $10,$3,$10 daddu $2,$2,$10 sd $3,0($4) - j $31 daddu $2,$9,$2 # add high product limb and carry from addition + jr $31 END (__mpn_submul_1) diff --git a/sysdeps/mips/mul_1.S b/sysdeps/mips/mul_1.S index cfd4cc7cd5..ae65ebe79d 100644 --- a/sysdeps/mips/mul_1.S +++ b/sysdeps/mips/mul_1.S @@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_mul_1) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - /* warm up phase 0 */ lw $8,0($5) @@ -50,12 +47,12 @@ ENTRY (__mpn_mul_1) #endif addiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 /* zero cy2 */ + beq $6,$0,L(LC0) addiu $6,$6,-1 - beq $6,$0,L(LC1) lw $8,0($5) /* load new s1 limb as early as possible */ + beq $6,$0,L(LC1) #if __mips_isa_rev < 6 @@ -78,8 +75,8 @@ L(Loop): move $10,$11 sltu $2,$10,$2 /* carry from previous addition -> $2 */ sw $10,0($4) addiu $4,$4,4 - bne $6,$0,L(Loop) /* should be "bnel" */ addu $2,$9,$2 /* add high product limb and carry from addition */ + bne $6,$0,L(Loop) /* should be "bnel" */ /* cool down phase 1 */ #if __mips_isa_rev < 6 @@ -112,6 +109,6 @@ L(LC0): move $10,$11 addu $10,$10,$2 sltu $2,$10,$2 sw $10,0($4) - j $31 addu $2,$9,$2 /* add high product limb and carry from addition */ + jr $31 END (__mpn_mul_1) diff --git a/sysdeps/mips/rshift.S b/sysdeps/mips/rshift.S index e19fa41234..b453ca2ba7 100644 --- a/sysdeps/mips/rshift.S +++ b/sysdeps/mips/rshift.S @@ -30,18 +30,15 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_rshift) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - lw $10,0($5) /* load first limb */ subu $13,$0,$7 addiu $6,$6,-1 and $9,$6,4-1 /* number of limbs in first loop */ + sll $2,$10,$13 /* compute function result */ beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop*/ - sll $2,$10,$13 /* compute function result */ subu $6,$6,$9 @@ -53,11 +50,10 @@ L(Loop0): lw $3,4($5) sll $12,$3,$13 move $10,$3 or $8,$11,$12 + sw $8,-4($4) bne $9,$0,L(Loop0) - sw $8,-4($4) L(L0): beq $6,$0,L(Lend) - nop L(Loop): lw $3,4($5) addiu $4,$4,16 @@ -85,10 +81,10 @@ L(Loop): lw $3,4($5) addiu $5,$5,16 or $8,$14,$9 + sw $8,-4($4) bgtz $6,L(Loop) - sw $8,-4($4) L(Lend): srl $8,$10,$7 - j $31 sw $8,0($4) + jr $31 END (__mpn_rshift) diff --git a/sysdeps/mips/sub_n.S b/sysdeps/mips/sub_n.S index 3e988ecbb4..9f7cb5458d 100644 --- a/sysdeps/mips/sub_n.S +++ b/sysdeps/mips/sub_n.S @@ -31,19 +31,16 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_sub_n) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - lw $10,0($5) lw $11,0($6) addiu $7,$7,-1 and $9,$7,4-1 /* number of limbs in first loop */ - beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ move $2,$0 + beq $9,$0,L(L0) /* if multiple of 4 limbs, skip first loop */ subu $7,$7,$9 @@ -61,11 +58,10 @@ L(Loop0): addiu $9,$9,-1 addiu $6,$6,4 move $10,$12 move $11,$13 - bne $9,$0,L(Loop0) addiu $4,$4,4 + bne $9,$0,L(Loop0) L(L0): beq $7,$0,L(Lend) - nop L(Loop): addiu $7,$7,-4 @@ -108,14 +104,14 @@ L(Loop): addiu $7,$7,-4 addiu $5,$5,16 addiu $6,$6,16 - bne $7,$0,L(Loop) addiu $4,$4,16 + bne $7,$0,L(Loop) L(Lend): addu $11,$11,$2 sltu $8,$11,$2 subu $11,$10,$11 sltu $2,$10,$11 sw $11,0($4) - j $31 or $2,$2,$8 + jr $31 END (__mpn_sub_n) diff --git a/sysdeps/mips/submul_1.S b/sysdeps/mips/submul_1.S index be8e2844ef..8405801c57 100644 --- a/sysdeps/mips/submul_1.S +++ b/sysdeps/mips/submul_1.S @@ -31,12 +31,9 @@ along with the GNU MP Library. If not, see .option pic2 #endif ENTRY (__mpn_submul_1) - .set noreorder #ifdef __PIC__ .cpload t9 #endif - .set nomacro - /* warm up phase 0 */ lw $8,0($5) @@ -50,12 +47,12 @@ ENTRY (__mpn_submul_1) #endif addiu $6,$6,-1 - beq $6,$0,L(LC0) move $2,$0 /* zero cy2 */ + beq $6,$0,L(LC0) addiu $6,$6,-1 - beq $6,$0,L(LC1) lw $8,0($5) /* load new s1 limb as early as possible */ + beq $6,$0,L(LC1) L(Loop): lw $10,0($4) #if __mips_isa_rev < 6 @@ -81,8 +78,8 @@ L(Loop): lw $10,0($4) addu $2,$2,$10 sw $3,0($4) addiu $4,$4,4 - bne $6,$0,L(Loop) /* should be "bnel" */ addu $2,$9,$2 /* add high product limb and carry from addition */ + bne $6,$0,L(Loop) /* should be "bnel" */ /* cool down phase 1 */ L(LC1): lw $10,0($4) @@ -123,6 +120,6 @@ L(LC0): lw $10,0($4) sgtu $10,$3,$10 addu $2,$2,$10 sw $3,0($4) - j $31 addu $2,$9,$2 /* add high product limb and carry from addition */ + jr $31 END (__mpn_submul_1) diff --git a/sysdeps/mips/sys/asm.h b/sysdeps/mips/sys/asm.h index e43eb39ca3..62f9e549c6 100644 --- a/sysdeps/mips/sys/asm.h +++ b/sysdeps/mips/sys/asm.h @@ -71,23 +71,21 @@ .set reorder /* Set gp when not at 1st instruction */ # define SETUP_GPX(r) \ - .set noreorder; \ move r, $31; /* Save old ra. */ \ bal 10f; /* Find addr of cpload. */ \ - nop; \ 10: \ + .set noreorder; \ .cpload $31; \ - move $31, r; \ - .set reorder + .set reorder; \ + move $31, r; # define SETUP_GPX_L(r, l) \ - .set noreorder; \ move r, $31; /* Save old ra. */ \ bal l; /* Find addr of cpload. */ \ - nop; \ l: \ + .set noreorder; \ .cpload $31; \ - move $31, r; \ - .set reorder + .set reorder; \ + move $31, r; # define SAVE_GP(x) \ .cprestore x /* Save gp trigger t9/jalr conversion. */ # define SETUP_GP64(a, b) @@ -108,20 +106,14 @@ l: \ .cpsetup $25, gpoffset, proc # define SETUP_GPX64(cp_reg, ra_save) \ move ra_save, $31; /* Save old ra. */ \ - .set noreorder; \ bal 10f; /* Find addr of .cpsetup. */ \ - nop; \ 10: \ - .set reorder; \ .cpsetup $31, cp_reg, 10b; \ move $31, ra_save # define SETUP_GPX64_L(cp_reg, ra_save, l) \ move ra_save, $31; /* Save old ra. */ \ - .set noreorder; \ bal l; /* Find addr of .cpsetup. */ \ - nop; \ l: \ - .set reorder; \ .cpsetup $31, cp_reg, l; \ move $31, ra_save # define RESTORE_GP64 \ diff --git a/sysdeps/unix/mips/mips32/sysdep.h b/sysdeps/unix/mips/mips32/sysdep.h index c515b94540..df3f73a4eb 100644 --- a/sysdeps/unix/mips/mips32/sysdep.h +++ b/sysdeps/unix/mips/mips32/sysdep.h @@ -38,18 +38,14 @@ L(syse1): #else #define PSEUDO(name, syscall_name, args) \ - .set noreorder; \ .set nomips16; \ .align 2; \ cfi_startproc; \ 99: j __syscall_error; \ - nop; \ cfi_endproc; \ ENTRY(name) \ - .set noreorder; \ li v0, SYS_ify(syscall_name); \ syscall; \ - .set reorder; \ bne a3, zero, 99b; \ L(syse1): #endif diff --git a/sysdeps/unix/mips/mips64/sysdep.h b/sysdeps/unix/mips/mips64/sysdep.h index 6565b84e3a..c0772002e6 100644 --- a/sysdeps/unix/mips/mips64/sysdep.h +++ b/sysdeps/unix/mips/mips64/sysdep.h @@ -45,18 +45,14 @@ L(syse1): #else #define PSEUDO(name, syscall_name, args) \ - .set noreorder; \ .align 2; \ .set nomips16; \ cfi_startproc; \ 99: j __syscall_error; \ - nop; \ cfi_endproc; \ ENTRY(name) \ - .set noreorder; \ li v0, SYS_ify(syscall_name); \ syscall; \ - .set reorder; \ bne a3, zero, 99b; \ L(syse1): #endif diff --git a/sysdeps/unix/mips/sysdep.h b/sysdeps/unix/mips/sysdep.h index d1e0460260..07cd5c4a06 100644 --- a/sysdeps/unix/mips/sysdep.h +++ b/sysdeps/unix/mips/sysdep.h @@ -48,7 +48,6 @@ .align 2; \ ENTRY(name) \ .set nomips16; \ - .set noreorder; \ li v0, SYS_ify(syscall_name); \ syscall @@ -61,7 +60,6 @@ .align 2; \ ENTRY(name) \ .set nomips16; \ - .set noreorder; \ li v0, SYS_ify(syscall_name); \ syscall diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h index 47a1b97351..647a66ee1f 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h +++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h @@ -140,10 +140,8 @@ union __mips_syscall_return register long int __v0 asm ("$2"); \ register long int __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set reorder" \ : "=r" (__v0), "=r" (__a3) \ : input \ : __SYSCALL_CLOBBERS); \ @@ -164,10 +162,8 @@ union __mips_syscall_return register long int __a0 asm ("$4") = _arg1; \ register long int __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set reorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0) \ : __SYSCALL_CLOBBERS); \ @@ -190,10 +186,8 @@ union __mips_syscall_return register long int __a1 asm ("$5") = _arg2; \ register long int __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0), "r" (__a1) \ : __SYSCALL_CLOBBERS); \ @@ -219,10 +213,8 @@ union __mips_syscall_return register long int __a2 asm ("$6") = _arg3; \ register long int __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2) \ : __SYSCALL_CLOBBERS); \ @@ -249,10 +241,8 @@ union __mips_syscall_return register long int __a2 asm ("$6") = _arg3; \ register long int __a3 asm ("$7") = _arg4; \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2) \ : __SYSCALL_CLOBBERS); \ diff --git a/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h index 0438bed23d..8f4787352a 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h +++ b/sysdeps/unix/sysv/linux/mips/mips64/sysdep.h @@ -95,10 +95,8 @@ register __syscall_arg_t __v0 asm ("$2"); \ register __syscall_arg_t __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set reorder" \ : "=r" (__v0), "=r" (__a3) \ : input \ : __SYSCALL_CLOBBERS); \ @@ -119,10 +117,8 @@ register __syscall_arg_t __a0 asm ("$4") = _arg1; \ register __syscall_arg_t __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set reorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0) \ : __SYSCALL_CLOBBERS); \ @@ -145,10 +141,8 @@ register __syscall_arg_t __a1 asm ("$5") = _arg2; \ register __syscall_arg_t __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0), "r" (__a1) \ : __SYSCALL_CLOBBERS); \ @@ -173,10 +167,8 @@ register __syscall_arg_t __a2 asm ("$6") = _arg3; \ register __syscall_arg_t __a3 asm ("$7"); \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "=r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2) \ : __SYSCALL_CLOBBERS); \ @@ -203,10 +195,8 @@ register __syscall_arg_t __a2 asm ("$6") = _arg3; \ register __syscall_arg_t __a3 asm ("$7") = _arg4; \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2) \ : __SYSCALL_CLOBBERS); \ @@ -235,10 +225,8 @@ register __syscall_arg_t __a3 asm ("$7") = _arg4; \ register __syscall_arg_t __a4 asm ("$8") = _arg5; \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a4) \ : __SYSCALL_CLOBBERS); \ @@ -269,10 +257,8 @@ register __syscall_arg_t __a4 asm ("$8") = _arg5; \ register __syscall_arg_t __a5 asm ("$9") = _arg6; \ __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ v0_init \ "syscall\n\t" \ - ".set\treorder" \ : "=r" (__v0), "+r" (__a3) \ : input, "r" (__a0), "r" (__a1), "r" (__a2), "r" (__a4), \ "r" (__a5) \ From patchwork Thu Jan 23 13:42:58 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105297 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A9B3C385840F for ; Thu, 23 Jan 2025 13:46:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A9B3C385840F Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=GxGm72gL X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id D37B43858C48 for ; Thu, 23 Jan 2025 13:43:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D37B43858C48 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D37B43858C48 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::336 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639811; cv=none; b=PI5W6rxxPkvAn58V+e4BcK3vsOHqD1wKytliG0ph7Cy/bLtp+5lsaz69dZ5Y7NCFseLfyX1jBXMzANlV8/6OrdbMPKk2s/VkbVBsKZEaSeMMxEVB0O/2juushl9fhgP5/rD5pKTw2jjyChTXqpbw0JP1HMhTApqU7O23pieL18E= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639811; c=relaxed/simple; bh=mJuqW9skyLrvL8d9kSlbP1lC5OO0x5yyg97BQkIo+zk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=ppNG7yQaV9iUuyvBZyMH7osxoeX+PBOylFXmr2rrTyRPzzhc9+CmKr19mnTjloOYVUWI6Vez9YkvT+sq7jKabWjotOZgtLiQbtTUUcKZ+op4kYmXi1tiKCxF47zMYudYE1yuxhi//WVHxDJ6glA1iL0zOEbfXWUpCHWpicBK1cc= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D37B43858C48 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-436381876e2so750645e9.1 for ; Thu, 23 Jan 2025 05:43:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639809; x=1738244609; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=u2mQRBx/+ZiwjVwjobQZDD4AkdhP/OtxQUSJe70PGR4=; b=GxGm72gLltn/38/bQTbPel3BrY/blnFQ7Hd+ASWf3OzkK803uFA8AYDddgWpH0Devj YMo5azB95QbL3istIQXcNCxF8pIbgbkoRr4HhthAz6epxd21/URt2zjDAaK0S8XbCEbM G/WbeZy09kI7L2a2MEWPWMK0sNmboDnknYH8phqQpHukm/xdqDOMV75hipIBnaurZ7To ccy1pH02NrwND7WoRcwerwWJWkKFx1Au+l7Rde1sJinXY8XhkGVtAcmRv4oLuS1qhhSe JB3j0lyeTO/Ta80x52INFZQ4gv79ZIQFQJZ7OVHJCfzy3KM4akA3WankGO74bFvT/VtI 0ARg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639809; x=1738244609; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u2mQRBx/+ZiwjVwjobQZDD4AkdhP/OtxQUSJe70PGR4=; b=Wizf9CsTBwZEIQlBYRWXiuoe/+NDm/gFipQngQ212W7GMIc8xdYBKJV5PJFG/XCBzf Ywc69RSN/7B0zOHPMggEDFRxW4Q5wMdW0x1O1jfiKWp2UfYLRLHZkuiABhNl7IwZezhG Zq8iQNbdf45K+uxL8byPH25rB3VjjHqDGOOT6dL/RpvrCSk1+TqLQeYYnU1v3JbwQurE 0n61fS/v0cOCe+nqBd9BnYHLta5Y5ChfRjJtgXFgo8rF+0eMnsfNMlEH2H82LXN6OBVD vjTuudNNVXoZCGulGLj348OOi6cpPBrfbW0K7HBfXAS6m9OiqKl/OBEVuoXaF/buH4Rb xiaw== X-Gm-Message-State: AOJu0Yx4jYXuw8YQHjelPEO9kMODzAUOg9OazWVHnh+0akY07I23kHXP OxBcc/Eg4rbb0t/PpKC+pnPP3lHJoUF/z9iA++DmJjXbTc97k2wdqII3lg== X-Gm-Gg: ASbGnctbVU0s9hJQpkWTNv7eW9Ma8Fp+KRV3asAE53NYp67YTb5frzIyCVe7iKc7IwM Igk6LsejlKBejimVftZHWpg6Rc3TnGImdcPYfyw6fAPFs62HQUFXYFAEhpVAQcbcBmIudkgRLWX wK+UA0zbE120jpt1UVas+3RSGwCwxOtpOotDoFdUrM6kRjnjIcBAigYM2CN7NApXHbWdZ9xCSG3 Uo+x8lD3tdgQXMVVXc/UYuRQgp3szBIYrZsfSKmPpLHHLuNzZIqY9usbBfUb8WlDiU9fTFp44o6 NWX+5a9T/lDdLiXDRfjBPQe0RSC2yBVUCky1pdk= X-Google-Smtp-Source: AGHT+IFsvj8yvsUQj8Gp5X73JtikbaY7/Poi+lDVEERwgjv2z6cMGqofPMWRP6Ys3FuP0jt3UBAfWQ== X-Received: by 2002:a05:600c:3b0d:b0:42c:aeee:80a with SMTP id 5b1f17b1804b1-438b17d5b09mr31197895e9.7.1737639808992; Thu, 23 Jan 2025 05:43:28 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:28 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Matthew Fortune , Faraz Shahbazker Subject: [PATCH 02/11] Fix rtld link_map initialization issues Date: Thu, 23 Jan 2025 14:42:58 +0100 Message-Id: <20250123134308.1785777-4-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Import patch fixing rtld link_map initialization issues from: https://sourceware.org/ml/libc-alpha/2015-03/msg00704.html Author: Sandra Loosemore Cherry-picked 1507c7be47ef07d4b264168ab031d8c2ed4678f2 from https://github.com/MIPS/glibc Signed-off-by: Matthew Fortune Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- elf/rtld.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/elf/rtld.c b/elf/rtld.c index 1e2e9ad5a8..252f4d6666 100644 --- a/elf/rtld.c +++ b/elf/rtld.c @@ -522,7 +522,7 @@ _dl_start (void *arg) rtld_timer_start (&info.start_time); #endif - /* Partly clean the `bootstrap_map' structure up. Don't use + /* Zero-initialize the `bootstrap_map' structure. Don't use `memset' since it might not be built in or inlined and we cannot make function calls at this point. Use '__builtin_memset' if we know it is available. We do not have to clear the memory if we @@ -530,12 +530,14 @@ _dl_start (void *arg) are initialized to zero by default. */ #ifndef DONT_USE_BOOTSTRAP_MAP # ifdef HAVE_BUILTIN_MEMSET - __builtin_memset (bootstrap_map.l_info, '\0', sizeof (bootstrap_map.l_info)); + __builtin_memset (&bootstrap_map, '\0', sizeof (struct link_map)); # else - for (size_t cnt = 0; - cnt < sizeof (bootstrap_map.l_info) / sizeof (bootstrap_map.l_info[0]); - ++cnt) - bootstrap_map.l_info[cnt] = 0; + { + char *p = (char *) &bootstrap_map; + char *pend = p + sizeof (struct link_map); + while (p < pend) + *(p++) = '\0'; + } # endif #endif From patchwork Thu Jan 23 13:42:59 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105303 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A79CD3858D3C for ; Thu, 23 Jan 2025 13:53:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A79CD3858D3C Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=f1PkQBdx X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 7D5D53858C56 for ; Thu, 23 Jan 2025 13:43:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7D5D53858C56 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7D5D53858C56 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::329 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639812; cv=none; b=WNhtmm2sO16wcCEsfIYaB6pTaLN4ST/k4y7oBcK14v2+sxAJeZdzx74c3nNjcrd3awWJjOBM788Gqbd4ALklcqn927lGBV8DWVxlbtHE2CkIXO8PIv1zUjJDXKINaT2xMxe2PCwwHLBmXmkoMPYfw3RzcKbmg2cI/chznHIgncU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639812; c=relaxed/simple; bh=EgaH/DSBkVJ4x0Ugb79/7vA91SOQigK4alSHU/TiRhw=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=iXRiGK1eTdqQCKoS+Vq+7LGtW8W15QqNdzg5uvClydv4JWMeIdybKuvpEYG7G89aqsPCai5t8uSNV3krNdqKxSFR9lNeSj5fJNIqjmnb57hFilaHLZ0g/ikG60+XUkkRksgndjeKU4qdtYqogrZG6OqGtgXB9ZC2S7zAen24qvM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D5D53858C56 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-4361aa6e517so1393535e9.2 for ; Thu, 23 Jan 2025 05:43:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639811; x=1738244611; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=igLxu6ZKwfsZhkVfCJxDcN50bfQcx0soCh4/X+cJ2G0=; b=f1PkQBdxoV+dsozc+1Uqot8WMHXqv9EK7hCckTOMVTpafBjjqC18eCRNoJI9Fn0MWB FTJXWUgM5O2rf4asu0xwuYYMYrayZWBjUsltLmkwAlqSbWVTeG/hyFKR21/RaKWwCpvR RGFb+DLJH0PGMMhAtZygt0TgAP8ncbwCL6gZ/3xp5LgbswKRZ4Xqvy32ZvphVHvg41VX LRQGhH89Fah32Yz0B3uD1ZMjEdCq+4mn1uvvI/Eh7wT8ZxB4PSdo6Fu5bkd0S4W2Kt5A 4sj6xJKzi286fLltHWCLkZke2CO2f8VZSPVKpaOqlN1Lb80oG16V7jKe6/HBVV9IyWF6 QdcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639811; x=1738244611; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=igLxu6ZKwfsZhkVfCJxDcN50bfQcx0soCh4/X+cJ2G0=; b=MbNsu36fNkvWGhcNWxiU7pmY7LOOBnw+7+82jtgDMOQxBLQ58DdmPyEpvQvsjCLCOJ qDV5Vb6ZGf6jSO3l4Z8v8KgbvNwfQYYHWjV+RBLxp7bwCLXBG7oyHtfw9xovHDX+e6kc hDsCHxI6TPacJBchhERknGEC7TWQ+W6Np8MUJO3Uf8Im6Bw/tyg+K5gUMpAr8+0ZS+kw Uy4MxcGRcQplJ9E1FQ02rJ3CN09LvQ2cbYbRd4Ybe/h9E4sRyKpdf3vyMeWykAbIkJ3w ItSnda5KtuWHCMHXZ6z+mR41tfsDh1Ae7dG+mDf46M7lmVrfV1xlv8fPnmmjRzdV8nXJ gc3A== X-Gm-Message-State: AOJu0YxlzdLiOQaxLyDBgybx9j1oIasTK/jyBy0aLJy1wJOaue/SZXhn SNsuoRWHOwQk6t+wiVd5gCOx18Ix5WkZkYZ8enS2TZTxQ5JxmJzE1BfeMQ== X-Gm-Gg: ASbGncvk+TMpw6THhJMHjleQ743vB66TQDC8rzZeajvO8yCZD/c1u6s2f3USUAAjFv4 b5KnyMINYmV1K0nif6d7QtAIAl4akgXkBJZOnCq0JdQj5kuVnffPT87lSpI1WtHC7ekPGhzXk/C p675DhmI4fd1S2G6WHfWECPjml2/reRUHPA9AuN57Ie1oC1M3PxXmu0IBUjtcNx9ZjazV3/TKLl heqpQvbVVG4KH2rcwz2nd31RVk8jJ47knB1CJSq7/+AKReeCFo4PCjkcjjOmkXHyYgOnrh0MFuY hX/8G4/WzzVUhpU3mlBYV9zovDrL X-Google-Smtp-Source: AGHT+IGmytugBtEGq1/bIeX6iPKJep5qES1Eqiz3+QNihexDe8xy7NOa5Lpg+YwejuYl4U/u++JdQg== X-Received: by 2002:a05:600c:1d07:b0:436:17f4:9b3d with SMTP id 5b1f17b1804b1-4389141cda7mr90734725e9.4.1737639810500; Thu, 23 Jan 2025 05:43:30 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:30 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Andrew Bennett , Faraz Shahbazker Subject: [PATCH 03/11] Fix issues with removing no-reorder directives Date: Thu, 23 Jan 2025 14:42:59 +0100 Message-Id: <20250123134308.1785777-5-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_STOCKGEN, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org 1. Added -O2 to the Makefile to ensure that assembly sources have their delay slots filled. 2. Also move the no-reorder directive into the PIC section of the setjmp code. Cherry-picked 4e451260675b2e54535eafc2df35d92653acd084 from https://github.com/MIPS/glibc Signed-off-by: Andrew Bennett Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/Makefile | 2 ++ sysdeps/mips/bsd-setjmp.S | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/sysdeps/mips/Makefile b/sysdeps/mips/Makefile index d189973aa0..17ddc2a97c 100644 --- a/sysdeps/mips/Makefile +++ b/sysdeps/mips/Makefile @@ -18,9 +18,11 @@ CPPFLAGS-crtn.S += $(pic-ccflag) endif ASFLAGS-.os += $(pic-ccflag) + # libc.a and libc_p.a must be compiled with -fPIE/-fpie for static PIE. ASFLAGS-.o += $(pie-default) ASFLAGS-.op += $(pie-default) +ASFLAGS += -O2 ifeq ($(subdir),elf) diff --git a/sysdeps/mips/bsd-setjmp.S b/sysdeps/mips/bsd-setjmp.S index 7e4d7dcb0b..8c06b9957c 100644 --- a/sysdeps/mips/bsd-setjmp.S +++ b/sysdeps/mips/bsd-setjmp.S @@ -28,8 +28,8 @@ .option pic2 #endif ENTRY (setjmp) - .set noreorder #ifdef __PIC__ + .set noreorder .cpload t9 .set reorder la t9, C_SYMBOL_NAME (__sigsetjmp) From patchwork Thu Jan 23 13:43:00 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105298 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 16C3F3858429 for ; Thu, 23 Jan 2025 13:47:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 16C3F3858429 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=S7gDBPj2 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by sourceware.org (Postfix) with ESMTPS id A812A3858C39 for ; Thu, 23 Jan 2025 13:43:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A812A3858C39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A812A3858C39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; cv=none; b=vB07zb6luvxo+K3jnYUMg22JQHM/Eq1hPhnwpr1dEPdVkuWkECXJQRzr88XkQ8pV2ESlxrwrvd0Ura6iskeJm/9KKFtONxUaEtZGKVYSILxINSRxTb/2PNdzZ652VfM1U31U4odxK3AZeh+sGJ6lcit40leoCC50gZbruA911NY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; c=relaxed/simple; bh=Xgzi/mH8JDWkjr4A9b/5BLol9E09++BvCa6vMlkhdKg=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=pdXpH/+gWD1UcNhXYeLKmCoGG3Xg/7P5QRFJUxzKud9o4K586A69by3a/N+r3JTDDRYhSNWx/dAa9vQY+sHWLGwx25KkP7Jbeqo70yUFedXRIDeeLml943s5wlSfpU0pHuFVkY/N7AyyXL3TJDPeiZpYYGM1+ZgLW6w/T7qcYsI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A812A3858C39 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-43616c12d72so1012595e9.2 for ; Thu, 23 Jan 2025 05:43:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639813; x=1738244613; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PYIMgkFgBnARLesDsxy7Hqjl2UMm/8y/rNR+EUk7VHk=; b=S7gDBPj2cLPTh3F9GKRGKGHLMEpzL6B64cJ6COWGcFRDp2WgwMv2nXxJQ16EoBbcTX CAAq1NiDRLZLnWEC1qsZmGcxrVAml/uLbbUs1v1dJmBW7W9SEjq9TJVq0wjkKPW9SoXR HY42r3901tumNM0SYQXEZhQyisiLd4Avq9kix+mdJQPhj4lds/PiablKwxowM+nH0ZpL TmGexh3qKDddpZTMasdm+GQWAh/5hLHYLisen9aWpaxEiQU2ozJTODzrxweN1LBkd0I2 IHu1CWjHMq954Zh9oJc6mWGB29WIKbqfFT6RhyriYJrUvMgrnftDY2wQ7+Cx5KNRidFF DqUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639813; x=1738244613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PYIMgkFgBnARLesDsxy7Hqjl2UMm/8y/rNR+EUk7VHk=; b=ByhpEXfSxKZh9mXwcj0CB88vameEhZGT6fa4WnjlwffTqPZL7zaAGhrYVdlaavL4Nj myTdG2N6zBikvwzzB9zv1YGI3gbdN3XsKQ7cAqUY7anMWCz7/TnP1LDXtBhTAz7jr2YP VEqyqPgDU/lc4NAUN/jU6zwM9yG6y1elyNRGe6K34XuA3Fw0L8nKOzJnDt/uG53dYiJN HemfRi1qhzQ4xcOCpMvpRw31/P9ug/o0/OpG4X/pyFtrUPDh3RV48mJmchUspWoHBn/k TVzVGDZnuDSEgEjGGZdVmLvibpDQJLA5+ZXaUd9NolvHB9Nodi22R2pvi6gkLs780mp7 AQMQ== X-Gm-Message-State: AOJu0YytydkKXdS+dZfVkB6HDUKKbaglIdZP5E3K0tVSMQdIEKoqKllc L0xGczdElSz/kOkT93yu4+1wTNfx/RafIcQnorAojO7QqMhQmkt+rV2e7A== X-Gm-Gg: ASbGnct+eFIwbkGWdj/JROrtj27H5Q1mfa3qC0LpXkpEWX70V2cv5Wy0NYsk4MaVdy5 bfPrS3eB5DLx0qbIolgBQmRISgjMJDTXCpHcI37NDvA2LTaF0Awqg7vJlU4S4ECNkL4I8o8t7uG N3YFAq32sR5pKbMMQlsQw/pJI0LSVROHuaIFoH9YmeN8KgtCdPgbjKWYwNJVo7DRna3r1apKIos pFMzL6mAxLe0lhkpPS98AX24x2yYMal/WZWQy2uEaz031e54v3ihjpApl6bUhg+wSAD99ZW2rGe z1iwUnM9b3E1o/eAZmlEuaDltKBvamVv7m1kOw4= X-Google-Smtp-Source: AGHT+IFQ3EP2XoTkY+pHKNMVaGSHT+ko3NDLfjhFmbcdyKnXKmZ51npTb98zWygGmjLn+gownliSig== X-Received: by 2002:a05:600c:1c88:b0:436:17f4:9b3b with SMTP id 5b1f17b1804b1-4389143c524mr99338785e9.6.1737639812102; Thu, 23 Jan 2025 05:43:32 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:31 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 04/11] Add C implementation of memcpy/memset Date: Thu, 23 Jan 2025 14:43:00 +0100 Message-Id: <20250123134308.1785777-6-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Add improved C implementation of memcpy/memset and remove corresponding .S files. Cherry-picked 6b74133706246af94b71e4154e4ca09482828c9f from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/memcpy.S | 886 ------------------------------------------ sysdeps/mips/memcpy.c | 415 ++++++++++++++++++++ sysdeps/mips/memset.S | 430 -------------------- sysdeps/mips/memset.c | 187 +++++++++ 4 files changed, 602 insertions(+), 1316 deletions(-) delete mode 100644 sysdeps/mips/memcpy.S create mode 100644 sysdeps/mips/memcpy.c delete mode 100644 sysdeps/mips/memset.S create mode 100644 sysdeps/mips/memset.c diff --git a/sysdeps/mips/memcpy.S b/sysdeps/mips/memcpy.S deleted file mode 100644 index 96d1c92d89..0000000000 --- a/sysdeps/mips/memcpy.S +++ /dev/null @@ -1,886 +0,0 @@ -/* Copyright (C) 2012-2024 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#ifdef ANDROID_CHANGES -# include "machine/asm.h" -# include "machine/regdef.h" -# define USE_MEMMOVE_FOR_OVERLAP -# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#elif _LIBC -# include -# include -# include -# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#elif defined _COMPILING_NEWLIB -# include "machine/asm.h" -# include "machine/regdef.h" -# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD_STREAMED -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#else -# include -# include -#endif - -#if (_MIPS_ISA == _MIPS_ISA_MIPS4) || (_MIPS_ISA == _MIPS_ISA_MIPS5) || \ - (_MIPS_ISA == _MIPS_ISA_MIPS32) || (_MIPS_ISA == _MIPS_ISA_MIPS64) -# ifndef DISABLE_PREFETCH -# define USE_PREFETCH -# endif -#endif - -#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABI64) || (_MIPS_SIM == _ABIN32)) -# ifndef DISABLE_DOUBLE -# define USE_DOUBLE -# endif -#endif - -/* Some asm.h files do not have the L macro definition. */ -#ifndef L -# if _MIPS_SIM == _ABIO32 -# define L(label) $L ## label -# else -# define L(label) .L ## label -# endif -#endif - -/* Some asm.h files do not have the PTR_ADDIU macro definition. */ -#ifndef PTR_ADDIU -# ifdef USE_DOUBLE -# define PTR_ADDIU daddiu -# else -# define PTR_ADDIU addiu -# endif -#endif - -/* Some asm.h files do not have the PTR_SRA macro definition. */ -#ifndef PTR_SRA -# ifdef USE_DOUBLE -# define PTR_SRA dsra -# else -# define PTR_SRA sra -# endif -#endif - -/* New R6 instructions that may not be in asm.h. */ -#ifndef PTR_LSA -# if _MIPS_SIM == _ABI64 -# define PTR_LSA dlsa -# else -# define PTR_LSA lsa -# endif -#endif - -#if __mips_isa_rev > 5 && defined (__mips_micromips) -# define PTR_BC bc16 -#else -# define PTR_BC bc -#endif - -/* - * Using PREFETCH_HINT_LOAD_STREAMED instead of PREFETCH_LOAD on load - * prefetches appear to offer a slight performance advantage. - * - * Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE - * or PREFETCH_STORE_STREAMED offers a large performance advantage - * but PREPAREFORSTORE has some special restrictions to consider. - * - * Prefetch with the 'prepare for store' hint does not copy a memory - * location into the cache, it just allocates a cache line and zeros - * it out. This means that if you do not write to the entire cache - * line before writing it out to memory some data will get zero'ed out - * when the cache line is written back to memory and data will be lost. - * - * Also if you are using this memcpy to copy overlapping buffers it may - * not behave correctly when using the 'prepare for store' hint. If you - * use the 'prepare for store' prefetch on a memory area that is in the - * memcpy source (as well as the memcpy destination), then you will get - * some data zero'ed out before you have a chance to read it and data will - * be lost. - * - * If you are going to use this memcpy routine with the 'prepare for store' - * prefetch you may want to set USE_MEMMOVE_FOR_OVERLAP in order to avoid - * the problem of running memcpy on overlapping buffers. - * - * There are ifdef'ed sections of this memcpy to make sure that it does not - * do prefetches on cache lines that are not going to be completely written. - * This code is only needed and only used when PREFETCH_STORE_HINT is set to - * PREFETCH_HINT_PREPAREFORSTORE. This code assumes that cache lines are - * 32 bytes and if the cache line is larger it will not work correctly. - */ - -#ifdef USE_PREFETCH -# define PREFETCH_HINT_LOAD 0 -# define PREFETCH_HINT_STORE 1 -# define PREFETCH_HINT_LOAD_STREAMED 4 -# define PREFETCH_HINT_STORE_STREAMED 5 -# define PREFETCH_HINT_LOAD_RETAINED 6 -# define PREFETCH_HINT_STORE_RETAINED 7 -# define PREFETCH_HINT_WRITEBACK_INVAL 25 -# define PREFETCH_HINT_PREPAREFORSTORE 30 - -/* - * If we have not picked out what hints to use at this point use the - * standard load and store prefetch hints. - */ -# ifndef PREFETCH_STORE_HINT -# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE -# endif -# ifndef PREFETCH_LOAD_HINT -# define PREFETCH_LOAD_HINT PREFETCH_HINT_LOAD -# endif - -/* - * We double everything when USE_DOUBLE is true so we do 2 prefetches to - * get 64 bytes in that case. The assumption is that each individual - * prefetch brings in 32 bytes. - */ - -# ifdef USE_DOUBLE -# define PREFETCH_CHUNK 64 -# define PREFETCH_FOR_LOAD(chunk, reg) \ - pref PREFETCH_LOAD_HINT, (chunk)*64(reg); \ - pref PREFETCH_LOAD_HINT, ((chunk)*64)+32(reg) -# define PREFETCH_FOR_STORE(chunk, reg) \ - pref PREFETCH_STORE_HINT, (chunk)*64(reg); \ - pref PREFETCH_STORE_HINT, ((chunk)*64)+32(reg) -# else -# define PREFETCH_CHUNK 32 -# define PREFETCH_FOR_LOAD(chunk, reg) \ - pref PREFETCH_LOAD_HINT, (chunk)*32(reg) -# define PREFETCH_FOR_STORE(chunk, reg) \ - pref PREFETCH_STORE_HINT, (chunk)*32(reg) -# endif -/* MAX_PREFETCH_SIZE is the maximum size of a prefetch, it must not be less - * than PREFETCH_CHUNK, the assumed size of each prefetch. If the real size - * of a prefetch is greater than MAX_PREFETCH_SIZE and the PREPAREFORSTORE - * hint is used, the code will not work correctly. If PREPAREFORSTORE is not - * used then MAX_PREFETCH_SIZE does not matter. */ -# define MAX_PREFETCH_SIZE 128 -/* PREFETCH_LIMIT is set based on the fact that we never use an offset greater - * than 5 on a STORE prefetch and that a single prefetch can never be larger - * than MAX_PREFETCH_SIZE. We add the extra 32 when USE_DOUBLE is set because - * we actually do two prefetches in that case, one 32 bytes after the other. */ -# ifdef USE_DOUBLE -# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + 32 + MAX_PREFETCH_SIZE -# else -# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + MAX_PREFETCH_SIZE -# endif -# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) \ - && ((PREFETCH_CHUNK * 4) < MAX_PREFETCH_SIZE) -/* We cannot handle this because the initial prefetches may fetch bytes that - * are before the buffer being copied. We start copies with an offset - * of 4 so avoid this situation when using PREPAREFORSTORE. */ -#error "PREFETCH_CHUNK is too large and/or MAX_PREFETCH_SIZE is too small." -# endif -#else /* USE_PREFETCH not defined */ -# define PREFETCH_FOR_LOAD(offset, reg) -# define PREFETCH_FOR_STORE(offset, reg) -#endif - -#if __mips_isa_rev > 5 -# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) -# undef PREFETCH_STORE_HINT -# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE_STREAMED -# endif -# define R6_CODE -#endif - -/* Allow the routine to be named something else if desired. */ -#ifndef MEMCPY_NAME -# define MEMCPY_NAME memcpy -#endif - -/* We use these 32/64 bit registers as temporaries to do the copying. */ -#define REG0 t0 -#define REG1 t1 -#define REG2 t2 -#define REG3 t3 -#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABIO32) || (_MIPS_SIM == _ABIO64)) -# define REG4 t4 -# define REG5 t5 -# define REG6 t6 -# define REG7 t7 -#else -# define REG4 ta0 -# define REG5 ta1 -# define REG6 ta2 -# define REG7 ta3 -#endif - -/* We load/store 64 bits at a time when USE_DOUBLE is true. - * The C_ prefix stands for CHUNK and is used to avoid macro name - * conflicts with system header files. */ - -#ifdef USE_DOUBLE -# define C_ST sd -# define C_LD ld -# ifdef __MIPSEB -# define C_LDHI ldl /* high part is left in big-endian */ -# define C_STHI sdl /* high part is left in big-endian */ -# define C_LDLO ldr /* low part is right in big-endian */ -# define C_STLO sdr /* low part is right in big-endian */ -# else -# define C_LDHI ldr /* high part is right in little-endian */ -# define C_STHI sdr /* high part is right in little-endian */ -# define C_LDLO ldl /* low part is left in little-endian */ -# define C_STLO sdl /* low part is left in little-endian */ -# endif -# define C_ALIGN dalign /* r6 align instruction */ -#else -# define C_ST sw -# define C_LD lw -# ifdef __MIPSEB -# define C_LDHI lwl /* high part is left in big-endian */ -# define C_STHI swl /* high part is left in big-endian */ -# define C_LDLO lwr /* low part is right in big-endian */ -# define C_STLO swr /* low part is right in big-endian */ -# else -# define C_LDHI lwr /* high part is right in little-endian */ -# define C_STHI swr /* high part is right in little-endian */ -# define C_LDLO lwl /* low part is left in little-endian */ -# define C_STLO swl /* low part is left in little-endian */ -# endif -# define C_ALIGN align /* r6 align instruction */ -#endif - -/* Bookkeeping values for 32 vs. 64 bit mode. */ -#ifdef USE_DOUBLE -# define NSIZE 8 -# define NSIZEMASK 0x3f -# define NSIZEDMASK 0x7f -#else -# define NSIZE 4 -# define NSIZEMASK 0x1f -# define NSIZEDMASK 0x3f -#endif -#define UNIT(unit) ((unit)*NSIZE) -#define UNITM1(unit) (((unit)*NSIZE)-1) - -#ifdef ANDROID_CHANGES -LEAF(MEMCPY_NAME, 0) -#else -LEAF(MEMCPY_NAME) -#endif - .set nomips16 -/* - * Below we handle the case where memcpy is called with overlapping src and dst. - * Although memcpy is not required to handle this case, some parts of Android - * like Skia rely on such usage. We call memmove to handle such cases. - */ -#ifdef USE_MEMMOVE_FOR_OVERLAP - PTR_SUBU t0,a0,a1 - PTR_SRA t2,t0,31 - xor t1,t0,t2 - PTR_SUBU t0,t1,t2 - sltu t2,t0,a2 - la t9,memmove - beq t2,zero,L(memcpy) - jr t9 -L(memcpy): -#endif -/* - * If the size is less than 2*NSIZE (8 or 16), go to L(lastb). Regardless of - * size, copy dst pointer to v0 for the return value. - */ - slti t2,a2,(2 * NSIZE) -#if defined(RETURN_FIRST_PREFETCH) || defined(RETURN_LAST_PREFETCH) - move v0,zero -#else - move v0,a0 -#endif - bne t2,zero,L(lasts) - -#ifndef R6_CODE - -/* - * If src and dst have different alignments, go to L(unaligned), if they - * have the same alignment (but are not actually aligned) do a partial - * load/store to make them aligned. If they are both already aligned - * we can start copying at L(aligned). - */ - xor t8,a1,a0 - andi t8,t8,(NSIZE-1) /* t8 is a0/a1 word-displacement */ - PTR_SUBU a3, zero, a0 - bne t8,zero,L(unaligned) - - andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */ - PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */ - beq a3,zero,L(aligned) /* if a3=0, it is already aligned */ - - C_LDHI t8,0(a1) - PTR_ADDU a1,a1,a3 - C_STHI t8,0(a0) - PTR_ADDU a0,a0,a3 - -#else /* R6_CODE */ - -/* - * Align the destination and hope that the source gets aligned too. If it - * doesn't we jump to L(r6_unaligned*) to do unaligned copies using the r6 - * align instruction. - */ - andi t8,a0,7 -#ifdef __mips_micromips - auipc t9,%pcrel_hi(L(atable)) - addiu t9,t9,%pcrel_lo(L(atable)+4) - PTR_LSA t9,t8,t9,1 -#else - lapc t9,L(atable) - PTR_LSA t9,t8,t9,2 -#endif - jrc t9 -L(atable): - PTR_BC L(lb0) - PTR_BC L(lb7) - PTR_BC L(lb6) - PTR_BC L(lb5) - PTR_BC L(lb4) - PTR_BC L(lb3) - PTR_BC L(lb2) - PTR_BC L(lb1) -L(lb7): - lb a3, 6(a1) - sb a3, 6(a0) -L(lb6): - lb a3, 5(a1) - sb a3, 5(a0) -L(lb5): - lb a3, 4(a1) - sb a3, 4(a0) -L(lb4): - lb a3, 3(a1) - sb a3, 3(a0) -L(lb3): - lb a3, 2(a1) - sb a3, 2(a0) -L(lb2): - lb a3, 1(a1) - sb a3, 1(a0) -L(lb1): - lb a3, 0(a1) - sb a3, 0(a0) - - li t9,8 - subu t8,t9,t8 - PTR_SUBU a2,a2,t8 - PTR_ADDU a0,a0,t8 - PTR_ADDU a1,a1,t8 -L(lb0): - - andi t8,a1,(NSIZE-1) -#ifdef __mips_micromips - auipc t9,%pcrel_hi(L(jtable)) - addiu t9,t9,%pcrel_lo(L(jtable)+4) - PTR_LSA t9,t8,t9,1 -#else - lapc t9,L(jtable) - PTR_LSA t9,t8,t9,2 -#endif - jrc t9 -L(jtable): - PTR_BC L(aligned) - PTR_BC L(r6_unaligned1) - PTR_BC L(r6_unaligned2) - PTR_BC L(r6_unaligned3) -#ifdef USE_DOUBLE - PTR_BC L(r6_unaligned4) - PTR_BC L(r6_unaligned5) - PTR_BC L(r6_unaligned6) - PTR_BC L(r6_unaligned7) -#endif -#endif /* R6_CODE */ - -L(aligned): - -/* - * Now dst/src are both aligned to (word or double word) aligned addresses - * Set a2 to count how many bytes we have to copy after all the 64/128 byte - * chunks are copied and a3 to the dst pointer after all the 64/128 byte - * chunks have been copied. We will loop, incrementing a0 and a1 until a0 - * equals a3. - */ - - andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ - beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ - PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ - -/* When in the loop we may prefetch with the 'prepare to store' hint, - * in this case the a0+x should not be past the "t0-32" address. This - * means: for x=128 the last "safe" a0 address is "t0-160". Alternatively, - * for x=64 the last "safe" a0 address is "t0-96" In the current version we - * will use "prefetch hint,128(a0)", so "t0-160" is the limit. - */ -#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */ - PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */ -#endif - PREFETCH_FOR_LOAD (0, a1) - PREFETCH_FOR_LOAD (1, a1) - PREFETCH_FOR_LOAD (2, a1) - PREFETCH_FOR_LOAD (3, a1) -#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE) - PREFETCH_FOR_STORE (1, a0) - PREFETCH_FOR_STORE (2, a0) - PREFETCH_FOR_STORE (3, a0) -#endif -#if defined(RETURN_FIRST_PREFETCH) && defined(USE_PREFETCH) -# if PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE - sltu v1,t9,a0 - bgtz v1,L(skip_set) - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4) -L(skip_set): -# else - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*1) -# endif -#endif -#if defined(RETURN_LAST_PREFETCH) && defined(USE_PREFETCH) \ - && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE) - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*3) -# ifdef USE_DOUBLE - PTR_ADDIU v0,v0,32 -# endif -#endif -L(loop16w): - C_LD t0,UNIT(0)(a1) -/* We need to separate out the C_LD instruction here so that it will work - both when it is used by itself and when it is used with the branch - instruction. */ -#if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */ - C_LD t1,UNIT(1)(a1) - bgtz v1,L(skip_pref) -#else - C_LD t1,UNIT(1)(a1) -#endif -#ifdef R6_CODE - PREFETCH_FOR_STORE (2, a0) -#else - PREFETCH_FOR_STORE (4, a0) - PREFETCH_FOR_STORE (5, a0) -#endif -#if defined(RETURN_LAST_PREFETCH) && defined(USE_PREFETCH) - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*5) -# ifdef USE_DOUBLE - PTR_ADDIU v0,v0,32 -# endif -#endif -L(skip_pref): - C_LD REG2,UNIT(2)(a1) - C_LD REG3,UNIT(3)(a1) - C_LD REG4,UNIT(4)(a1) - C_LD REG5,UNIT(5)(a1) - C_LD REG6,UNIT(6)(a1) - C_LD REG7,UNIT(7)(a1) -#ifdef R6_CODE - PREFETCH_FOR_LOAD (3, a1) -#else - PREFETCH_FOR_LOAD (4, a1) -#endif - C_ST t0,UNIT(0)(a0) - C_ST t1,UNIT(1)(a0) - C_ST REG2,UNIT(2)(a0) - C_ST REG3,UNIT(3)(a0) - C_ST REG4,UNIT(4)(a0) - C_ST REG5,UNIT(5)(a0) - C_ST REG6,UNIT(6)(a0) - C_ST REG7,UNIT(7)(a0) - - C_LD t0,UNIT(8)(a1) - C_LD t1,UNIT(9)(a1) - C_LD REG2,UNIT(10)(a1) - C_LD REG3,UNIT(11)(a1) - C_LD REG4,UNIT(12)(a1) - C_LD REG5,UNIT(13)(a1) - C_LD REG6,UNIT(14)(a1) - C_LD REG7,UNIT(15)(a1) -#ifndef R6_CODE - PREFETCH_FOR_LOAD (5, a1) -#endif - C_ST t0,UNIT(8)(a0) - C_ST t1,UNIT(9)(a0) - C_ST REG2,UNIT(10)(a0) - C_ST REG3,UNIT(11)(a0) - C_ST REG4,UNIT(12)(a0) - C_ST REG5,UNIT(13)(a0) - C_ST REG6,UNIT(14)(a0) - C_ST REG7,UNIT(15)(a0) - PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ - PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */ - bne a0,a3,L(loop16w) - move a2,t8 - -/* Here we have src and dest word-aligned but less than 64-bytes or - * 128 bytes to go. Check for a 32(64) byte chunk and copy if there - * is one. Otherwise jump down to L(chk1w) to handle the tail end of - * the copy. - */ - -L(chkw): - PREFETCH_FOR_LOAD (0, a1) - andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */ - /* The t8 is the reminder count past 32-bytes */ - beq a2,t8,L(chk1w) /* When a2=t8, no 32-byte chunk */ - C_LD t0,UNIT(0)(a1) - C_LD t1,UNIT(1)(a1) - C_LD REG2,UNIT(2)(a1) - C_LD REG3,UNIT(3)(a1) - C_LD REG4,UNIT(4)(a1) - C_LD REG5,UNIT(5)(a1) - C_LD REG6,UNIT(6)(a1) - C_LD REG7,UNIT(7)(a1) - PTR_ADDIU a1,a1,UNIT(8) - C_ST t0,UNIT(0)(a0) - C_ST t1,UNIT(1)(a0) - C_ST REG2,UNIT(2)(a0) - C_ST REG3,UNIT(3)(a0) - C_ST REG4,UNIT(4)(a0) - C_ST REG5,UNIT(5)(a0) - C_ST REG6,UNIT(6)(a0) - C_ST REG7,UNIT(7)(a0) - PTR_ADDIU a0,a0,UNIT(8) - -/* - * Here we have less than 32(64) bytes to copy. Set up for a loop to - * copy one word (or double word) at a time. Set a2 to count how many - * bytes we have to copy after all the word (or double word) chunks are - * copied and a3 to the dst pointer after all the (d)word chunks have - * been copied. We will loop, incrementing a0 and a1 until a0 equals a3. - */ -L(chk1w): - andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ - beq a2,t8,L(lastw) - PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ - -/* copying in words (4-byte or 8-byte chunks) */ -L(wordCopy_loop): - C_LD REG3,UNIT(0)(a1) - PTR_ADDIU a0,a0,UNIT(1) - PTR_ADDIU a1,a1,UNIT(1) - C_ST REG3,UNIT(-1)(a0) - bne a0,a3,L(wordCopy_loop) - -/* If we have been copying double words, see if we can copy a single word - before doing byte copies. We can have, at most, one word to copy. */ - -L(lastw): -#ifdef USE_DOUBLE - andi t8,a2,3 /* a2 is the remainder past 4 byte chunks. */ - beq t8,a2,L(lastb) - move a2,t8 - lw REG3,0(a1) - sw REG3,0(a0) - PTR_ADDIU a0,a0,4 - PTR_ADDIU a1,a1,4 -#endif - -/* Copy the last 8 (or 16) bytes */ -L(lastb): - PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ - blez a2,L(leave) -L(lastbloop): - lb v1,0(a1) - PTR_ADDIU a0,a0,1 - PTR_ADDIU a1,a1,1 - sb v1,-1(a0) - bne a0,a3,L(lastbloop) -L(leave): - jr ra - -/* We jump here with a memcpy of less than 8 or 16 bytes, depending on - whether or not USE_DOUBLE is defined. Instead of just doing byte - copies, check the alignment and size and use lw/sw if possible. - Otherwise, do byte copies. */ - -L(lasts): - andi t8,a2,3 - beq t8,a2,L(lastb) - - andi t9,a0,3 - bne t9,zero,L(lastb) - andi t9,a1,3 - bne t9,zero,L(lastb) - - PTR_SUBU a3,a2,t8 - PTR_ADDU a3,a0,a3 - -L(wcopy_loop): - lw REG3,0(a1) - PTR_ADDIU a0,a0,4 - PTR_ADDIU a1,a1,4 - bne a0,a3,L(wcopy_loop) - sw REG3,-4(a0) - - b L(lastb) - move a2,t8 - -#ifndef R6_CODE -/* - * UNALIGNED case, got here with a3 = "negu a0" - * This code is nearly identical to the aligned code above - * but only the destination (not the source) gets aligned - * so we need to do partial loads of the source followed - * by normal stores to the destination (once we have aligned - * the destination). - */ - -L(unaligned): - andi a3,a3,(NSIZE-1) /* copy a3 bytes to align a0/a1 */ - PTR_SUBU a2,a2,a3 /* a2 is the remining bytes count */ - beqz a3,L(ua_chk16w) /* if a3=0, it is already aligned */ - - C_LDHI v1,UNIT(0)(a1) - C_LDLO v1,UNITM1(1)(a1) - PTR_ADDU a1,a1,a3 - C_STHI v1,UNIT(0)(a0) - PTR_ADDU a0,a0,a3 - -/* - * Now the destination (but not the source) is aligned - * Set a2 to count how many bytes we have to copy after all the 64/128 byte - * chunks are copied and a3 to the dst pointer after all the 64/128 byte - * chunks have been copied. We will loop, incrementing a0 and a1 until a0 - * equals a3. - */ - -L(ua_chk16w): - andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ - beq a2,t8,L(ua_chkw) /* if a2==t8, no 64-byte/128-byte chunks */ - PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ - -# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */ - PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */ -# endif - PREFETCH_FOR_LOAD (0, a1) - PREFETCH_FOR_LOAD (1, a1) - PREFETCH_FOR_LOAD (2, a1) -# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE) - PREFETCH_FOR_STORE (1, a0) - PREFETCH_FOR_STORE (2, a0) - PREFETCH_FOR_STORE (3, a0) -# endif -# if defined(RETURN_FIRST_PREFETCH) && defined(USE_PREFETCH) -# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - sltu v1,t9,a0 - bgtz v1,L(ua_skip_set) - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*4) -L(ua_skip_set): -# else - PTR_ADDIU v0,a0,(PREFETCH_CHUNK*1) -# endif -# endif -L(ua_loop16w): - PREFETCH_FOR_LOAD (3, a1) - C_LDHI t0,UNIT(0)(a1) - C_LDHI t1,UNIT(1)(a1) - C_LDHI REG2,UNIT(2)(a1) -/* We need to separate out the C_LDHI instruction here so that it will work - both when it is used by itself and when it is used with the branch - instruction. */ -# if defined(USE_PREFETCH) && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - sltu v1,t9,a0 - C_LDHI REG3,UNIT(3)(a1) - bgtz v1,L(ua_skip_pref) -# else - C_LDHI REG3,UNIT(3)(a1) -# endif - PREFETCH_FOR_STORE (4, a0) - PREFETCH_FOR_STORE (5, a0) -L(ua_skip_pref): - C_LDHI REG4,UNIT(4)(a1) - C_LDHI REG5,UNIT(5)(a1) - C_LDHI REG6,UNIT(6)(a1) - C_LDHI REG7,UNIT(7)(a1) - C_LDLO t0,UNITM1(1)(a1) - C_LDLO t1,UNITM1(2)(a1) - C_LDLO REG2,UNITM1(3)(a1) - C_LDLO REG3,UNITM1(4)(a1) - C_LDLO REG4,UNITM1(5)(a1) - C_LDLO REG5,UNITM1(6)(a1) - C_LDLO REG6,UNITM1(7)(a1) - C_LDLO REG7,UNITM1(8)(a1) - PREFETCH_FOR_LOAD (4, a1) - C_ST t0,UNIT(0)(a0) - C_ST t1,UNIT(1)(a0) - C_ST REG2,UNIT(2)(a0) - C_ST REG3,UNIT(3)(a0) - C_ST REG4,UNIT(4)(a0) - C_ST REG5,UNIT(5)(a0) - C_ST REG6,UNIT(6)(a0) - C_ST REG7,UNIT(7)(a0) - C_LDHI t0,UNIT(8)(a1) - C_LDHI t1,UNIT(9)(a1) - C_LDHI REG2,UNIT(10)(a1) - C_LDHI REG3,UNIT(11)(a1) - C_LDHI REG4,UNIT(12)(a1) - C_LDHI REG5,UNIT(13)(a1) - C_LDHI REG6,UNIT(14)(a1) - C_LDHI REG7,UNIT(15)(a1) - C_LDLO t0,UNITM1(9)(a1) - C_LDLO t1,UNITM1(10)(a1) - C_LDLO REG2,UNITM1(11)(a1) - C_LDLO REG3,UNITM1(12)(a1) - C_LDLO REG4,UNITM1(13)(a1) - C_LDLO REG5,UNITM1(14)(a1) - C_LDLO REG6,UNITM1(15)(a1) - C_LDLO REG7,UNITM1(16)(a1) - PREFETCH_FOR_LOAD (5, a1) - C_ST t0,UNIT(8)(a0) - C_ST t1,UNIT(9)(a0) - C_ST REG2,UNIT(10)(a0) - C_ST REG3,UNIT(11)(a0) - C_ST REG4,UNIT(12)(a0) - C_ST REG5,UNIT(13)(a0) - C_ST REG6,UNIT(14)(a0) - C_ST REG7,UNIT(15)(a0) - PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ - PTR_ADDIU a1,a1,UNIT(16) /* adding 64/128 to src */ - bne a0,a3,L(ua_loop16w) - move a2,t8 - -/* Here we have src and dest word-aligned but less than 64-bytes or - * 128 bytes to go. Check for a 32(64) byte chunk and copy if there - * is one. Otherwise jump down to L(ua_chk1w) to handle the tail end of - * the copy. */ - -L(ua_chkw): - PREFETCH_FOR_LOAD (0, a1) - andi t8,a2,NSIZEMASK /* Is there a 32-byte/64-byte chunk. */ - /* t8 is the reminder count past 32-bytes */ - beq a2,t8,L(ua_chk1w) /* When a2=t8, no 32-byte chunk */ - C_LDHI t0,UNIT(0)(a1) - C_LDHI t1,UNIT(1)(a1) - C_LDHI REG2,UNIT(2)(a1) - C_LDHI REG3,UNIT(3)(a1) - C_LDHI REG4,UNIT(4)(a1) - C_LDHI REG5,UNIT(5)(a1) - C_LDHI REG6,UNIT(6)(a1) - C_LDHI REG7,UNIT(7)(a1) - C_LDLO t0,UNITM1(1)(a1) - C_LDLO t1,UNITM1(2)(a1) - C_LDLO REG2,UNITM1(3)(a1) - C_LDLO REG3,UNITM1(4)(a1) - C_LDLO REG4,UNITM1(5)(a1) - C_LDLO REG5,UNITM1(6)(a1) - C_LDLO REG6,UNITM1(7)(a1) - C_LDLO REG7,UNITM1(8)(a1) - PTR_ADDIU a1,a1,UNIT(8) - C_ST t0,UNIT(0)(a0) - C_ST t1,UNIT(1)(a0) - C_ST REG2,UNIT(2)(a0) - C_ST REG3,UNIT(3)(a0) - C_ST REG4,UNIT(4)(a0) - C_ST REG5,UNIT(5)(a0) - C_ST REG6,UNIT(6)(a0) - C_ST REG7,UNIT(7)(a0) - PTR_ADDIU a0,a0,UNIT(8) -/* - * Here we have less than 32(64) bytes to copy. Set up for a loop to - * copy one word (or double word) at a time. - */ -L(ua_chk1w): - andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ - beq a2,t8,L(ua_smallCopy) - PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ - -/* copying in words (4-byte or 8-byte chunks) */ -L(ua_wordCopy_loop): - C_LDHI v1,UNIT(0)(a1) - C_LDLO v1,UNITM1(1)(a1) - PTR_ADDIU a0,a0,UNIT(1) - PTR_ADDIU a1,a1,UNIT(1) - C_ST v1,UNIT(-1)(a0) - bne a0,a3,L(ua_wordCopy_loop) - -/* Copy the last 8 (or 16) bytes */ -L(ua_smallCopy): - PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ - beqz a2,L(leave) -L(ua_smallCopy_loop): - lb v1,0(a1) - PTR_ADDIU a0,a0,1 - PTR_ADDIU a1,a1,1 - sb v1,-1(a0) - bne a0,a3,L(ua_smallCopy_loop) - - jr ra - -#else /* R6_CODE */ - -# ifdef __MIPSEB -# define SWAP_REGS(X,Y) X, Y -# define ALIGN_OFFSET(N) (N) -# else -# define SWAP_REGS(X,Y) Y, X -# define ALIGN_OFFSET(N) (NSIZE-N) -# endif -# define R6_UNALIGNED_WORD_COPY(BYTEOFFSET) \ - andi REG7, a2, (NSIZE-1);/* REG7 is # of bytes to by bytes. */ \ - PTR_SUBU a3, a2, REG7; /* a3 is number of bytes to be copied in */ \ - /* (d)word chunks. */ \ - beq REG7, a2, L(lastb); /* Check for bytes to copy by word */ \ - move a2, REG7; /* a2 is # of bytes to copy byte by byte */ \ - /* after word loop is finished. */ \ - PTR_ADDU REG6, a0, a3; /* REG6 is the dst address after loop. */ \ - PTR_SUBU REG2, a1, t8; /* REG2 is the aligned src address. */ \ - PTR_ADDU a1, a1, a3; /* a1 is addr of source after word loop. */ \ - C_LD t0, UNIT(0)(REG2); /* Load first part of source. */ \ -L(r6_ua_wordcopy##BYTEOFFSET): \ - C_LD t1, UNIT(1)(REG2); /* Load second part of source. */ \ - C_ALIGN REG3, SWAP_REGS(t1,t0), ALIGN_OFFSET(BYTEOFFSET); \ - PTR_ADDIU a0, a0, UNIT(1); /* Increment destination pointer. */ \ - PTR_ADDIU REG2, REG2, UNIT(1); /* Increment aligned source pointer.*/ \ - move t0, t1; /* Move second part of source to first. */ \ - C_ST REG3, UNIT(-1)(a0); \ - bne a0, REG6,L(r6_ua_wordcopy##BYTEOFFSET); \ - j L(lastb); \ - - /* We are generating R6 code, the destination is 4 byte aligned and - the source is not 4 byte aligned. t8 is 1, 2, or 3 depending on the - alignment of the source. */ - -L(r6_unaligned1): - R6_UNALIGNED_WORD_COPY(1) -L(r6_unaligned2): - R6_UNALIGNED_WORD_COPY(2) -L(r6_unaligned3): - R6_UNALIGNED_WORD_COPY(3) -# ifdef USE_DOUBLE -L(r6_unaligned4): - R6_UNALIGNED_WORD_COPY(4) -L(r6_unaligned5): - R6_UNALIGNED_WORD_COPY(5) -L(r6_unaligned6): - R6_UNALIGNED_WORD_COPY(6) -L(r6_unaligned7): - R6_UNALIGNED_WORD_COPY(7) -# endif -#endif /* R6_CODE */ - - .set at -END(MEMCPY_NAME) -#ifndef ANDROID_CHANGES -# ifdef _LIBC -libc_hidden_builtin_def (MEMCPY_NAME) -# endif -#endif diff --git a/sysdeps/mips/memcpy.c b/sysdeps/mips/memcpy.c new file mode 100644 index 0000000000..8c3aec7b36 --- /dev/null +++ b/sysdeps/mips/memcpy.c @@ -0,0 +1,415 @@ +/* + * Copyright (C) 2024 MIPS Tech, LLC + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * 3. Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from this + * software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifdef __GNUC__ + +#undef memcpy + +/* Typical observed latency in cycles in fetching from DRAM. */ +#define LATENCY_CYCLES 63 + +/* Pre-fetch performance is subject to accurate prefetch ahead, + which in turn depends on both the cache-line size and the amount + of look-ahead. Since cache-line size is not nominally fixed in + a typically library built for multiple platforms, we make conservative + assumptions in the default case. This code will typically operate + on such conservative assumptions, but if compiled with the correct + -mtune=xx options, will perform even better on those specific + platforms. */ +#if defined(_MIPS_TUNE_OCTEON2) || defined(_MIPS_TUNE_OCTEON3) + #define CACHE_LINE 128 + #define BLOCK_CYCLES 30 + #undef LATENCY_CYCLES + #define LATENCY_CYCLES 150 +#elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500) + #define CACHE_LINE 64 + #define BLOCK_CYCLES 16 +#elif defined(_MIPS_TUNE_P6600) + #define CACHE_LINE 32 + #define BLOCK_CYCLES 12 +#elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2) + #define CACHE_LINE 32 + #define BLOCK_CYCLES 30 +#else + #define CACHE_LINE 32 + #define BLOCK_CYCLES 11 +#endif + +/* Pre-fetch look ahead = ceil (latency / block-cycles) */ +#define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \ + + ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1)) + +/* Unroll-factor, controls how many words at a time in the core loop. */ +#define BLOCK (CACHE_LINE == 128 ? 16 : 8) + +#define __overloadable +#if !defined(UNALIGNED_INSTR_SUPPORT) +/* does target have unaligned lw/ld/ualw/uald instructions? */ + #define UNALIGNED_INSTR_SUPPORT 0 +#if (__mips_isa_rev < 6 && !defined(__mips1)) + #undef UNALIGNED_INSTR_SUPPORT + #define UNALIGNED_INSTR_SUPPORT 1 + #endif +#endif +#if !defined(HW_UNALIGNED_SUPPORT) +/* Does target have hardware support for unaligned accesses? */ + #define HW_UNALIGNED_SUPPORT 0 + #if __mips_isa_rev >= 6 + #undef HW_UNALIGNED_SUPPORT + #define HW_UNALIGNED_SUPPORT 1 + #endif +#endif +#define ENABLE_PREFETCH 1 +#if ENABLE_PREFETCH + #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0) +#else + #define PREFETCH(addr) +#endif + +#include + +#ifdef __mips64 +typedef unsigned long long reg_t; +typedef struct +{ + reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8; +} bits_t; +#else +typedef unsigned long reg_t; +typedef struct +{ + reg_t B0:8, B1:8, B2:8, B3:8; +} bits_t; +#endif + +#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \ + (BLOCK * sizeof (reg_t) / CACHE_LINE) \ + : 1) + +typedef union +{ + reg_t v; + bits_t b; +} bitfields_t; + +#define DO_BYTE(a, i) \ + a[i] = bw.b.B##i; \ + len--; \ + if(!len) return ret; \ + +/* This code is called when aligning a pointer, there are remaining bytes + after doing word compares, or architecture does not have some form + of unaligned support. */ +static inline void * __attribute__ ((always_inline)) +do_bytes (void *a, const void *b, unsigned long len, void *ret) +{ + unsigned char *x = (unsigned char *) a; + unsigned char *y = (unsigned char *) b; + unsigned long i; + /* 'len' might be zero here, so preloading the first two values + before the loop may access unallocated memory. */ + for (i = 0; i < len; i++) + { + *x = *y; + x++; + y++; + } + return ret; +} + +/* This code is called to copy only remaining bytes within word or doubleword */ +static inline void * __attribute__ ((always_inline)) +do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) +{ + unsigned char *x = (unsigned char *) a; + bitfields_t bw; + if(len > 0) + { + bw.v = *(reg_t *)b; + DO_BYTE(x, 0); + DO_BYTE(x, 1); + DO_BYTE(x, 2); +#ifdef __mips64 + DO_BYTE(x, 3); + DO_BYTE(x, 4); + DO_BYTE(x, 5); + DO_BYTE(x, 6); +#endif + } + return ret; +} + +static inline void * __attribute__ ((always_inline)) +do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, + unsigned long bytes, void *ret) +{ + /* Use a set-back so that load/stores have incremented addresses in + order to promote bonding. */ + int off = (BLOCK - words); + a -= off; + b -= off; + switch (off) + { + case 1: a[1] = b[1]; // Fall through + case 2: a[2] = b[2]; // Fall through + case 3: a[3] = b[3]; // Fall through + case 4: a[4] = b[4]; // Fall through + case 5: a[5] = b[5]; // Fall through + case 6: a[6] = b[6]; // Fall through + case 7: a[7] = b[7]; // Fall through +#if BLOCK==16 + case 8: a[8] = b[8]; // Fall through + case 9: a[9] = b[9]; // Fall through + case 10: a[10] = b[10]; // Fall through + case 11: a[11] = b[11]; // Fall through + case 12: a[12] = b[12]; // Fall through + case 13: a[13] = b[13]; // Fall through + case 14: a[14] = b[14]; // Fall through + case 15: a[15] = b[15]; +#endif + } + return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); +} + +#if !HW_UNALIGNED_SUPPORT +#if UNALIGNED_INSTR_SUPPORT +/* For MIPS GCC, there are no unaligned builtins - so this struct forces + the compiler to treat the pointer access as unaligned. */ +struct ulw +{ + reg_t uli; +} __attribute__ ((packed)); +static inline void * __attribute__ ((always_inline)) +do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, + unsigned long bytes, void *ret) +{ + /* Use a set-back so that load/stores have incremented addresses in + order to promote bonding. */ + int off = (BLOCK - words); + a -= off; + b -= off; + switch (off) + { + case 1: a[1].uli = b[1]; // Fall through + case 2: a[2].uli = b[2]; // Fall through + case 3: a[3].uli = b[3]; // Fall through + case 4: a[4].uli = b[4]; // Fall through + case 5: a[5].uli = b[5]; // Fall through + case 6: a[6].uli = b[6]; // Fall through + case 7: a[7].uli = b[7]; // Fall through +#if BLOCK==16 + case 8: a[8].uli = b[8]; // Fall through + case 9: a[9].uli = b[9]; // Fall through + case 10: a[10].uli = b[10]; // Fall through + case 11: a[11].uli = b[11]; // Fall through + case 12: a[12].uli = b[12]; // Fall through + case 13: a[13].uli = b[13]; // Fall through + case 14: a[14].uli = b[14]; // Fall through + case 15: a[15].uli = b[15]; +#endif + } + return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); +} + +/* The first pointer is not aligned while second pointer is. */ +static void * +unaligned_words (struct ulw *a, const reg_t * b, + unsigned long words, unsigned long bytes, void *ret) +{ + unsigned long i, words_by_block, words_by_1; + words_by_1 = words % BLOCK; + words_by_block = words / BLOCK; + for (; words_by_block > 0; words_by_block--) + { + if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) + PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)); + + reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3]; + reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7]; + a[0].uli = y0; + a[1].uli = y1; + a[2].uli = y2; + a[3].uli = y3; + a[4].uli = y4; + a[5].uli = y5; + a[6].uli = y6; + a[7].uli = y7; +#if BLOCK==16 + y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11]; + y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15]; + a[8].uli = y0; + a[9].uli = y1; + a[10].uli = y2; + a[11].uli = y3; + a[12].uli = y4; + a[13].uli = y5; + a[14].uli = y6; + a[15].uli = y7; +#endif + a += BLOCK; + b += BLOCK; + } + + /* Mop up any remaining bytes. */ + return do_uwords_remaining (a, b, words_by_1, bytes, ret); +} + +#else + +/* No HW support or unaligned lw/ld/ualw/uald instructions. */ +static void * +unaligned_words (reg_t * a, const reg_t * b, + unsigned long words, unsigned long bytes, void *ret) +{ + unsigned long i; + unsigned char *x; + for (i = 0; i < words; i++) + { + bitfields_t bw; + bw.v = *((reg_t*) b); + x = (unsigned char *) a; + x[0] = bw.b.B0; + x[1] = bw.b.B1; + x[2] = bw.b.B2; + x[3] = bw.b.B3; +#ifdef __mips64 + x[4] = bw.b.B4; + x[5] = bw.b.B5; + x[6] = bw.b.B6; + x[7] = bw.b.B7; +#endif + a += 1; + b += 1; + } + /* Mop up any remaining bytes. */ + return do_bytes_remaining (a, b, bytes, ret); +} + +#endif /* UNALIGNED_INSTR_SUPPORT */ +#endif /* HW_UNALIGNED_SUPPORT */ + +/* both pointers are aligned, or first isn't and HW support for unaligned. */ +static void * +aligned_words (reg_t * a, const reg_t * b, + unsigned long words, unsigned long bytes, void *ret) +{ + unsigned long i, words_by_block, words_by_1; + words_by_1 = words % BLOCK; + words_by_block = words / BLOCK; + for (; words_by_block > 0; words_by_block--) + { + if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) + PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i))); + + reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3]; + reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7]; + a[0] = x0; + a[1] = x1; + a[2] = x2; + a[3] = x3; + a[4] = x4; + a[5] = x5; + a[6] = x6; + a[7] = x7; +#if BLOCK==16 + x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11]; + x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15]; + a[8] = x0; + a[9] = x1; + a[10] = x2; + a[11] = x3; + a[12] = x4; + a[13] = x5; + a[14] = x6; + a[15] = x7; +#endif + a += BLOCK; + b += BLOCK; + } + + /* mop up any remaining bytes. */ + return do_words_remaining (a, b, words_by_1, bytes, ret); +} + +void * +memcpy (void *a, const void *b, size_t len) __overloadable +{ + unsigned long bytes, words, i; + void *ret = a; + /* shouldn't hit that often. */ + if (len <= 8) + return do_bytes (a, b, len, a); + + /* Start pre-fetches ahead of time. */ + if (len > CACHE_LINE * (PREF_AHEAD - 1)) + for (i = 1; i < PREF_AHEAD - 1; i++) + PREFETCH ((char *)b + CACHE_LINE * i); + else + for (i = 1; i < len / CACHE_LINE; i++) + PREFETCH ((char *)b + CACHE_LINE * i); + + /* Align the second pointer to word/dword alignment. + Note that the pointer is only 32-bits for o32/n32 ABIs. For + n32, loads are done as 64-bit while address remains 32-bit. */ + bytes = ((unsigned long) b) % (sizeof (reg_t)); + + if (bytes) + { + bytes = (sizeof (reg_t)) - bytes; + if (bytes > len) + bytes = len; + do_bytes (a, b, bytes, ret); + if (len == bytes) + return ret; + len -= bytes; + a = (void *) (((unsigned char *) a) + bytes); + b = (const void *) (((unsigned char *) b) + bytes); + } + + /* Second pointer now aligned. */ + words = len / sizeof (reg_t); + bytes = len % sizeof (reg_t); + +#if HW_UNALIGNED_SUPPORT + /* treat possible unaligned first pointer as aligned. */ + return aligned_words (a, b, words, bytes, ret); +#else + if (((unsigned long) a) % sizeof (reg_t) == 0) + return aligned_words (a, b, words, bytes, ret); + /* need to use unaligned instructions on first pointer. */ + return unaligned_words (a, b, words, bytes, ret); +#endif +} + +libc_hidden_builtin_def (memcpy) + +#else +#include +#endif diff --git a/sysdeps/mips/memset.S b/sysdeps/mips/memset.S deleted file mode 100644 index 0c8375c9f5..0000000000 --- a/sysdeps/mips/memset.S +++ /dev/null @@ -1,430 +0,0 @@ -/* Copyright (C) 2013-2024 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#ifdef ANDROID_CHANGES -# include "machine/asm.h" -# include "machine/regdef.h" -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#elif _LIBC -# include -# include -# include -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#elif defined _COMPILING_NEWLIB -# include "machine/asm.h" -# include "machine/regdef.h" -# define PREFETCH_STORE_HINT PREFETCH_HINT_PREPAREFORSTORE -#else -# include -# include -#endif - -/* Check to see if the MIPS architecture we are compiling for supports - prefetching. */ - -#if (__mips == 4) || (__mips == 5) || (__mips == 32) || (__mips == 64) -# ifndef DISABLE_PREFETCH -# define USE_PREFETCH -# endif -#endif - -#if defined(_MIPS_SIM) && ((_MIPS_SIM == _ABI64) || (_MIPS_SIM == _ABIN32)) -# ifndef DISABLE_DOUBLE -# define USE_DOUBLE -# endif -#endif - -#ifndef USE_DOUBLE -# ifndef DISABLE_DOUBLE_ALIGN -# define DOUBLE_ALIGN -# endif -#endif - - -/* Some asm.h files do not have the L macro definition. */ -#ifndef L -# if _MIPS_SIM == _ABIO32 -# define L(label) $L ## label -# else -# define L(label) .L ## label -# endif -#endif - -/* Some asm.h files do not have the PTR_ADDIU macro definition. */ -#ifndef PTR_ADDIU -# ifdef USE_DOUBLE -# define PTR_ADDIU daddiu -# else -# define PTR_ADDIU addiu -# endif -#endif - -/* New R6 instructions that may not be in asm.h. */ -#ifndef PTR_LSA -# if _MIPS_SIM == _ABI64 -# define PTR_LSA dlsa -# else -# define PTR_LSA lsa -# endif -#endif - -#if __mips_isa_rev > 5 && defined (__mips_micromips) -# define PTR_BC bc16 -#else -# define PTR_BC bc -#endif - -/* Using PREFETCH_HINT_PREPAREFORSTORE instead of PREFETCH_STORE - or PREFETCH_STORE_STREAMED offers a large performance advantage - but PREPAREFORSTORE has some special restrictions to consider. - - Prefetch with the 'prepare for store' hint does not copy a memory - location into the cache, it just allocates a cache line and zeros - it out. This means that if you do not write to the entire cache - line before writing it out to memory some data will get zero'ed out - when the cache line is written back to memory and data will be lost. - - There are ifdef'ed sections of this memcpy to make sure that it does not - do prefetches on cache lines that are not going to be completely written. - This code is only needed and only used when PREFETCH_STORE_HINT is set to - PREFETCH_HINT_PREPAREFORSTORE. This code assumes that cache lines are - less than MAX_PREFETCH_SIZE bytes and if the cache line is larger it will - not work correctly. */ - -#ifdef USE_PREFETCH -# define PREFETCH_HINT_STORE 1 -# define PREFETCH_HINT_STORE_STREAMED 5 -# define PREFETCH_HINT_STORE_RETAINED 7 -# define PREFETCH_HINT_PREPAREFORSTORE 30 - -/* If we have not picked out what hints to use at this point use the - standard load and store prefetch hints. */ -# ifndef PREFETCH_STORE_HINT -# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE -# endif - -/* We double everything when USE_DOUBLE is true so we do 2 prefetches to - get 64 bytes in that case. The assumption is that each individual - prefetch brings in 32 bytes. */ -# ifdef USE_DOUBLE -# define PREFETCH_CHUNK 64 -# define PREFETCH_FOR_STORE(chunk, reg) \ - pref PREFETCH_STORE_HINT, (chunk)*64(reg); \ - pref PREFETCH_STORE_HINT, ((chunk)*64)+32(reg) -# else -# define PREFETCH_CHUNK 32 -# define PREFETCH_FOR_STORE(chunk, reg) \ - pref PREFETCH_STORE_HINT, (chunk)*32(reg) -# endif - -/* MAX_PREFETCH_SIZE is the maximum size of a prefetch, it must not be less - than PREFETCH_CHUNK, the assumed size of each prefetch. If the real size - of a prefetch is greater than MAX_PREFETCH_SIZE and the PREPAREFORSTORE - hint is used, the code will not work correctly. If PREPAREFORSTORE is not - used than MAX_PREFETCH_SIZE does not matter. */ -# define MAX_PREFETCH_SIZE 128 -/* PREFETCH_LIMIT is set based on the fact that we never use an offset greater - than 5 on a STORE prefetch and that a single prefetch can never be larger - than MAX_PREFETCH_SIZE. We add the extra 32 when USE_DOUBLE is set because - we actually do two prefetches in that case, one 32 bytes after the other. */ -# ifdef USE_DOUBLE -# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + 32 + MAX_PREFETCH_SIZE -# else -# define PREFETCH_LIMIT (5 * PREFETCH_CHUNK) + MAX_PREFETCH_SIZE -# endif - -# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) \ - && ((PREFETCH_CHUNK * 4) < MAX_PREFETCH_SIZE) -/* We cannot handle this because the initial prefetches may fetch bytes that - are before the buffer being copied. We start copies with an offset - of 4 so avoid this situation when using PREPAREFORSTORE. */ -# error "PREFETCH_CHUNK is too large and/or MAX_PREFETCH_SIZE is too small." -# endif -#else /* USE_PREFETCH not defined */ -# define PREFETCH_FOR_STORE(offset, reg) -#endif - -#if __mips_isa_rev > 5 -# if (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) -# undef PREFETCH_STORE_HINT -# define PREFETCH_STORE_HINT PREFETCH_HINT_STORE_STREAMED -# endif -# define R6_CODE -#endif - -/* Allow the routine to be named something else if desired. */ -#ifndef MEMSET_NAME -# define MEMSET_NAME memset -#endif - -/* We load/store 64 bits at a time when USE_DOUBLE is true. - The C_ prefix stands for CHUNK and is used to avoid macro name - conflicts with system header files. */ - -#ifdef USE_DOUBLE -# define C_ST sd -# ifdef __MIPSEB -# define C_STHI sdl /* high part is left in big-endian */ -# else -# define C_STHI sdr /* high part is right in little-endian */ -# endif -#else -# define C_ST sw -# ifdef __MIPSEB -# define C_STHI swl /* high part is left in big-endian */ -# else -# define C_STHI swr /* high part is right in little-endian */ -# endif -#endif - -/* Bookkeeping values for 32 vs. 64 bit mode. */ -#ifdef USE_DOUBLE -# define NSIZE 8 -# define NSIZEMASK 0x3f -# define NSIZEDMASK 0x7f -#else -# define NSIZE 4 -# define NSIZEMASK 0x1f -# define NSIZEDMASK 0x3f -#endif -#define UNIT(unit) ((unit)*NSIZE) -#define UNITM1(unit) (((unit)*NSIZE)-1) - -#ifdef ANDROID_CHANGES -LEAF(MEMSET_NAME,0) -#else -LEAF(MEMSET_NAME) -#endif - - .set nomips16 -/* If the size is less than 4*NSIZE (16 or 32), go to L(lastb). Regardless of - size, copy dst pointer to v0 for the return value. */ - slti t2,a2,(4 * NSIZE) - move v0,a0 - bne t2,zero,L(lastb) - -/* If memset value is not zero, we copy it to all the bytes in a 32 or 64 - bit word. */ - PTR_SUBU a3,zero,a0 - beq a1,zero,L(set0) /* If memset value is zero no smear */ - nop - - /* smear byte into 32 or 64 bit word */ -#if ((__mips == 64) || (__mips == 32)) && (__mips_isa_rev >= 2) -# ifdef USE_DOUBLE - dins a1, a1, 8, 8 /* Replicate fill byte into half-word. */ - dins a1, a1, 16, 16 /* Replicate fill byte into word. */ - dins a1, a1, 32, 32 /* Replicate fill byte into dbl word. */ -# else - ins a1, a1, 8, 8 /* Replicate fill byte into half-word. */ - ins a1, a1, 16, 16 /* Replicate fill byte into word. */ -# endif -#else -# ifdef USE_DOUBLE - and a1,0xff - dsll t2,a1,8 - or a1,t2 - dsll t2,a1,16 - or a1,t2 - dsll t2,a1,32 - or a1,t2 -# else - and a1,0xff - sll t2,a1,8 - or a1,t2 - sll t2,a1,16 - or a1,t2 -# endif -#endif - -/* If the destination address is not aligned do a partial store to get it - aligned. If it is already aligned just jump to L(aligned). */ -L(set0): -#ifndef R6_CODE - andi t2,a3,(NSIZE-1) /* word-unaligned address? */ - PTR_SUBU a2,a2,t2 - beq t2,zero,L(aligned) /* t2 is the unalignment count */ - C_STHI a1,0(a0) - PTR_ADDU a0,a0,t2 -#else /* R6_CODE */ - andi t2,a0,7 -# ifdef __mips_micromips - auipc t9,%pcrel_hi(L(atable)) - addiu t9,t9,%pcrel_lo(L(atable)+4) - PTR_LSA t9,t2,t9,1 -# else - lapc t9,L(atable) - PTR_LSA t9,t2,t9,2 -# endif - jrc t9 -L(atable): - PTR_BC L(aligned) - PTR_BC L(lb7) - PTR_BC L(lb6) - PTR_BC L(lb5) - PTR_BC L(lb4) - PTR_BC L(lb3) - PTR_BC L(lb2) - PTR_BC L(lb1) -L(lb7): - sb a1,6(a0) -L(lb6): - sb a1,5(a0) -L(lb5): - sb a1,4(a0) -L(lb4): - sb a1,3(a0) -L(lb3): - sb a1,2(a0) -L(lb2): - sb a1,1(a0) -L(lb1): - sb a1,0(a0) - - li t9,NSIZE - subu t2,t9,t2 - PTR_SUBU a2,a2,t2 - PTR_ADDU a0,a0,t2 -#endif /* R6_CODE */ - -L(aligned): -/* If USE_DOUBLE is not set we may still want to align the data on a 16 - byte boundary instead of an 8 byte boundary to maximize the opportunity - of proAptiv chips to do memory bonding (combining two sequential 4 - byte stores into one 8 byte store). We know there are at least 4 bytes - left to store or we would have jumped to L(lastb) earlier in the code. */ -#ifdef DOUBLE_ALIGN - andi t2,a3,4 - PTR_SUBU a2,a2,t2 - beq t2,zero,L(double_aligned) - sw a1,0(a0) - PTR_ADDU a0,a0,t2 -L(double_aligned): -#endif - -/* Now the destination is aligned to (word or double word) aligned address - Set a2 to count how many bytes we have to copy after all the 64/128 byte - chunks are copied and a3 to the dest pointer after all the 64/128 byte - chunks have been copied. We will loop, incrementing a0 until it equals - a3. */ - andi t8,a2,NSIZEDMASK /* any whole 64-byte/128-byte chunks? */ - PTR_SUBU a3,a2,t8 /* subtract from a2 the reminder */ - beq a2,t8,L(chkw) /* if a2==t8, no 64-byte/128-byte chunks */ - PTR_ADDU a3,a0,a3 /* Now a3 is the final dst after loop */ - -/* When in the loop we may prefetch with the 'prepare to store' hint, - in this case the a0+x should not be past the "t0-32" address. This - means: for x=128 the last "safe" a0 address is "t0-160". Alternatively, - for x=64 the last "safe" a0 address is "t0-96" In the current version we - will use "prefetch hint,128(a0)", so "t0-160" is the limit. */ -#if defined(USE_PREFETCH) \ - && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - PTR_ADDU t0,a0,a2 /* t0 is the "past the end" address */ - PTR_SUBU t9,t0,PREFETCH_LIMIT /* t9 is the "last safe pref" address */ -#endif -#if defined(USE_PREFETCH) \ - && (PREFETCH_STORE_HINT != PREFETCH_HINT_PREPAREFORSTORE) - PREFETCH_FOR_STORE (1, a0) - PREFETCH_FOR_STORE (2, a0) - PREFETCH_FOR_STORE (3, a0) -#endif - -L(loop16w): -#if defined(USE_PREFETCH) \ - && (PREFETCH_STORE_HINT == PREFETCH_HINT_PREPAREFORSTORE) - sltu v1,t9,a0 /* If a0 > t9 don't use next prefetch */ - bgtz v1,L(skip_pref) -#endif -#ifdef R6_CODE - PREFETCH_FOR_STORE (2, a0) -#else - PREFETCH_FOR_STORE (4, a0) - PREFETCH_FOR_STORE (5, a0) -#endif -L(skip_pref): - C_ST a1,UNIT(0)(a0) - C_ST a1,UNIT(1)(a0) - C_ST a1,UNIT(2)(a0) - C_ST a1,UNIT(3)(a0) - C_ST a1,UNIT(4)(a0) - C_ST a1,UNIT(5)(a0) - C_ST a1,UNIT(6)(a0) - C_ST a1,UNIT(7)(a0) - C_ST a1,UNIT(8)(a0) - C_ST a1,UNIT(9)(a0) - C_ST a1,UNIT(10)(a0) - C_ST a1,UNIT(11)(a0) - C_ST a1,UNIT(12)(a0) - C_ST a1,UNIT(13)(a0) - C_ST a1,UNIT(14)(a0) - C_ST a1,UNIT(15)(a0) - PTR_ADDIU a0,a0,UNIT(16) /* adding 64/128 to dest */ - bne a0,a3,L(loop16w) - move a2,t8 - -/* Here we have dest word-aligned but less than 64-bytes or 128 bytes to go. - Check for a 32(64) byte chunk and copy if there is one. Otherwise - jump down to L(chk1w) to handle the tail end of the copy. */ -L(chkw): - andi t8,a2,NSIZEMASK /* is there a 32-byte/64-byte chunk. */ - /* the t8 is the reminder count past 32-bytes */ - beq a2,t8,L(chk1w)/* when a2==t8, no 32-byte chunk */ - C_ST a1,UNIT(0)(a0) - C_ST a1,UNIT(1)(a0) - C_ST a1,UNIT(2)(a0) - C_ST a1,UNIT(3)(a0) - C_ST a1,UNIT(4)(a0) - C_ST a1,UNIT(5)(a0) - C_ST a1,UNIT(6)(a0) - C_ST a1,UNIT(7)(a0) - PTR_ADDIU a0,a0,UNIT(8) - -/* Here we have less than 32(64) bytes to set. Set up for a loop to - copy one word (or double word) at a time. Set a2 to count how many - bytes we have to copy after all the word (or double word) chunks are - copied and a3 to the dest pointer after all the (d)word chunks have - been copied. We will loop, incrementing a0 until a0 equals a3. */ -L(chk1w): - andi a2,t8,(NSIZE-1) /* a2 is the reminder past one (d)word chunks */ - PTR_SUBU a3,t8,a2 /* a3 is count of bytes in one (d)word chunks */ - beq a2,t8,L(lastb) - PTR_ADDU a3,a0,a3 /* a3 is the dst address after loop */ - -/* copying in words (4-byte or 8 byte chunks) */ -L(wordCopy_loop): - PTR_ADDIU a0,a0,UNIT(1) - C_ST a1,UNIT(-1)(a0) - bne a0,a3,L(wordCopy_loop) - -/* Copy the last 8 (or 16) bytes */ -L(lastb): - PTR_ADDU a3,a0,a2 /* a3 is the last dst address */ - blez a2,L(leave) -L(lastbloop): - PTR_ADDIU a0,a0,1 - sb a1,-1(a0) - bne a0,a3,L(lastbloop) -L(leave): - jr ra - - .set at -END(MEMSET_NAME) -#ifndef ANDROID_CHANGES -# ifdef _LIBC -libc_hidden_builtin_def (MEMSET_NAME) -# endif -#endif diff --git a/sysdeps/mips/memset.c b/sysdeps/mips/memset.c new file mode 100644 index 0000000000..813b3bc0e6 --- /dev/null +++ b/sysdeps/mips/memset.c @@ -0,0 +1,187 @@ +/* + * Copyright (C) 2024 MIPS Tech, LLC + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + * this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * 3. Neither the name of the copyright holder nor the names of its + * contributors may be used to endorse or promote products derived from this + * software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifdef __GNUC__ + +#undef memset + +#include + +#if _MIPS_SIM == _ABIO32 +#define SIZEOF_reg_t 4 +typedef unsigned long reg_t; +#else +#define SIZEOF_reg_t 8 +typedef unsigned long long reg_t; +#endif + +typedef struct bits8 +{ + reg_t B0:8, B1:8, B2:8, B3:8; +#if SIZEOF_reg_t == 8 + reg_t B4:8, B5:8, B6:8, B7:8; +#endif +} bits8_t; +typedef struct bits16 +{ + reg_t B0:16, B1:16; +#if SIZEOF_reg_t == 8 + reg_t B2:16, B3:16; +#endif +} bits16_t; +typedef struct bits32 +{ + reg_t B0:32; +#if SIZEOF_reg_t == 8 + reg_t B1:32; +#endif +} bits32_t; + +/* This union assumes that small structures can be in registers. If + not, then memory accesses will be done - not optimal, but ok. */ +typedef union +{ + reg_t v; + bits8_t b8; + bits16_t b16; + bits32_t b32; +} bitfields_t; + +/* This code is called when aligning a pointer or there are remaining bytes + after doing word sets. */ +static inline void * __attribute__ ((always_inline)) +do_bytes (void *a, void *retval, unsigned char fill, const unsigned long len) +{ + unsigned char *x = ((unsigned char *) a); + unsigned long i; + + for (i = 0; i < len; i++) + *x++ = fill; + + return retval; +} + +/* Pointer is aligned. */ +static void * +do_aligned_words (reg_t * a, void * retval, reg_t fill, + unsigned long words, unsigned long bytes) +{ + unsigned long i, words_by_1, words_by_16; + + words_by_1 = words % 16; + words_by_16 = words / 16; + + /* + * Note: prefetching the store memory is not beneficial on most + * cores since the ls/st unit has store buffers that will be filled + * before the cache line is actually needed. + * + * Also, using prepare-for-store cache op is problematic since we + * don't know the implementation-defined cache line length and we + * don't want to touch unintended memory. + */ + for (i = 0; i < words_by_16; i++) + { + a[0] = fill; + a[1] = fill; + a[2] = fill; + a[3] = fill; + a[4] = fill; + a[5] = fill; + a[6] = fill; + a[7] = fill; + a[8] = fill; + a[9] = fill; + a[10] = fill; + a[11] = fill; + a[12] = fill; + a[13] = fill; + a[14] = fill; + a[15] = fill; + a += 16; + } + + /* do remaining words. */ + for (i = 0; i < words_by_1; i++) + *a++ = fill; + + /* mop up any remaining bytes. */ + return do_bytes (a, retval, fill, bytes); +} + +void * +memset (void *a, int ifill, size_t len) +{ + unsigned long bytes, words; + bitfields_t fill; + void *retval = (void *) a; + + /* shouldn't hit that often. */ + if (len < 16) + return do_bytes (a, retval, ifill, len); + + /* Align the pointer to word/dword alignment. + Note that the pointer is only 32-bits for o32/n32 ABIs. For + n32, loads are done as 64-bit while address remains 32-bit. */ + bytes = ((unsigned long) a) % (sizeof (reg_t) * 2); + if (bytes) + { + bytes = (sizeof (reg_t) * 2 - bytes); + if (bytes > len) + bytes = len; + do_bytes (a, retval, ifill, bytes); + if (len == bytes) + return retval; + len -= bytes; + a = (void *) (((unsigned char *) a) + bytes); + } + + /* Create correct fill value for reg_t sized variable. */ + if (ifill != 0) + { + fill.b8.B0 = (unsigned char) ifill; + fill.b8.B1 = fill.b8.B0; + fill.b16.B1 = fill.b16.B0; +#if SIZEOF_reg_t == 8 + fill.b32.B1 = fill.b32.B0; +#endif + } + else + fill.v = 0; + + words = len / sizeof (reg_t); + bytes = len % sizeof (reg_t); + return do_aligned_words (a, retval, fill.v, words, bytes); +} + + +libc_hidden_builtin_def (memset) + +#else +#include +#endif From patchwork Thu Jan 23 13:43:01 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105305 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2928A3858290 for ; Thu, 23 Jan 2025 13:57:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2928A3858290 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=hJ0IJrd+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by sourceware.org (Postfix) with ESMTPS id 874463858C66 for ; Thu, 23 Jan 2025 13:43:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 874463858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 874463858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::331 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; cv=none; b=nkTbofs7EmFJiz9uj3iQV3wMGpk75tlwSu+4w+Y6QY4mOrfip04MLuIeFUFp8LD4hoU/vd6yvBjp9V7hNP7xH/JlJFZ7qKAPcBfpEFnRZ6oFzHVu6r1SbichbdKPGIGn89Qnzrmph+N7M/4aFXQ+G9OVHqILXWj1bhnHKEYBYm0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639815; c=relaxed/simple; bh=sHPCYvhE9fHuBEGeT56Vo+9UGXSYVBbGgikc7fk7YDY=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=fjwd1o+nFUqVFPgt0NDJiHNRphRPtCA2h8CX9SFQ1shsxSGspTCX1hVbgxcV8ZH5MVO4zT/AFRAEhJMzsU9VW3lCb+Y4Ew4dquxnMk/q8Rc6RF67oAmknj/9+ip3AoZBa7CQ9fEh1nsjFoGqOBNxEC4Aw2zT+ZVuXx1OopDVtUI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 874463858C66 Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-43616bf3358so1392085e9.3 for ; Thu, 23 Jan 2025 05:43:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639813; x=1738244613; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=thcRS8XnNRIb5etwmQcQcSlXfuzf6lwZY/J3zYB7QsQ=; b=hJ0IJrd+Z8rrRoWRxPYYSV8VgRII6yDRkHI64hxl4bAq8tCp6xkzi/OKYnoJ1g8UrW 6WFH6vF2Q5AkzSjWuPEZm48wZR3/KfQg8OyRBYTwnus1p+FkCucPaMNxcVcjQRasRJUs HG6DxAyASufBGW8hhHSFUCYW0KqrgvzYtRegBf3gQns8cME3bW9y4yH6tvOf3VzItRu8 kP2e1yX8C8infDhc48anzoz1mlnA5LK6IhTUGBtSHcp207Xn77zlCZXoFHQe21t14FJp sfkskKfFnRypPUPUNEnqeWbZNkp7A1msePAsexAEzk57UgO0Fkn1Zov3m5D8D9Cnmja0 dp2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639813; x=1738244613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=thcRS8XnNRIb5etwmQcQcSlXfuzf6lwZY/J3zYB7QsQ=; b=sPJ+kyiyic3qsov/YbiHx/low+JZxrNdcWQu93CrostI8lBirHvzn7+Jmnx4Vqc5FS a0BzGmFf2yTOtHWLUyJvtSIxTf/nB5Dd6mDh++Wu19zwRVSSsfiQtgyt7OBG2hY5sgbI GcyrC30zohMZ6LIeMueypC54LXLVQ2TwMu6Q5XwXl5kZf5rdwbMS0huVsdTnQpKdIvds 2Tk9cAKc0r83DnayamztW98zt6LGC1FC3DzK1tR5ZwlgRl+N24C8yQXuWyLuqV5IFNgm CKXj1Rl5nORR0y+7uIN6jCPPlDHfoEqjSCKqLIQhNSPDDtIY+TEM+99LMW8/8U8Cguoj fcaQ== X-Gm-Message-State: AOJu0YzLB30VQ5gajFjoW8HLMPhXO4QfXhuOG7wgZHVpgADeyKyok/KO IXJxNoBHn3pEQd/uQEy8+mF890AumOmSp1MIfjBn1M/MtT2sRcig2P97wg== X-Gm-Gg: ASbGncvheTjk5LmfRVuYYKSr8S5HehTl6nIqsvFbtO5bzJApvlcHq/80Z7zPuUKcVxy lzlfyJrdgMhSL9MqBLIK+r7nRt9HFwqtXM58gWhkG1Mpetkeb9YoiGfaDTnPOirDWDvoMTP9fBZ yZsWkN7ocGuwzG9mWgPZYIUf61b8ajFcbq7pGs1pZBk2FTmpCuwAr6ebc8Fn2HOEaNIgb2Dmo// uRfZDXXTGiiGq3SQBgZJoCOSvjmObiJtcAsGcCMcjsDG40eG+7Ill92xhJc5W03Ady0f/FnK4tx 7RtC13lr0P7tnQ6Q83BkiTiicI/5 X-Google-Smtp-Source: AGHT+IEpgAodxjtqUpcVD+B5D4zLpjXQX3UjkvWQ2QhnjO4y5pu4qkI5syH83Ju/3uSYp/23grbVzA== X-Received: by 2002:a05:600c:444d:b0:434:f335:85c with SMTP id 5b1f17b1804b1-438914321ebmr96939905e9.6.1737639813313; Thu, 23 Jan 2025 05:43:33 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:32 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 05/11] Add optimized assembly for strcmp Date: Thu, 23 Jan 2025 14:43:01 +0100 Message-Id: <20250123134308.1785777-7-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Cherry-picked ff356419673a5d122335dd81bd5726de7bc5e08f from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/strcmp.S | 228 +++++++++++++++++++++++++----------------- 1 file changed, 137 insertions(+), 91 deletions(-) diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S index 36379be021..4878cd3aac 100644 --- a/sysdeps/mips/strcmp.S +++ b/sysdeps/mips/strcmp.S @@ -1,4 +1,5 @@ /* Copyright (C) 2014-2024 Free Software Foundation, Inc. + Optimized strcmp for MIPS This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or @@ -22,9 +23,6 @@ # include # include # include -#elif defined _COMPILING_NEWLIB -# include "machine/asm.h" -# include "machine/regdef.h" #else # include # include @@ -46,6 +44,10 @@ performance loss, so we are not turning it on by default. */ #if defined(ENABLE_CLZ) && (__mips_isa_rev > 1) # define USE_CLZ +#elif (__mips_isa_rev >= 2) +# define USE_EXT 1 +#else +# define USE_EXT 0 #endif /* Some asm.h files do not have the L macro definition. */ @@ -66,6 +68,10 @@ # endif #endif +/* Haven't yet found a configuration where DSP code outperforms + normal assembly. */ +#define __mips_using_dsp 0 + /* Allow the routine to be named something else if desired. */ #ifndef STRCMP_NAME # define STRCMP_NAME strcmp @@ -77,28 +83,35 @@ LEAF(STRCMP_NAME, 0) LEAF(STRCMP_NAME) #endif .set nomips16 - .set noreorder - or t0, a0, a1 - andi t0,0x3 + andi t0, t0, 0x3 bne t0, zero, L(byteloop) /* Both strings are 4 byte aligned at this point. */ + li t8, 0x01010101 +#if !__mips_using_dsp + li t9, 0x7f7f7f7f +#endif - lui t8, 0x0101 - ori t8, t8, 0x0101 - lui t9, 0x7f7f - ori t9, 0x7f7f - -#define STRCMP32(OFFSET) \ - lw v0, OFFSET(a0); \ - lw v1, OFFSET(a1); \ - subu t0, v0, t8; \ - bne v0, v1, L(worddiff); \ - nor t1, v0, t9; \ - and t0, t0, t1; \ +#if __mips_using_dsp +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu_s.qb t0, t8, a2; \ + bne a2, a3, L(worddiff); \ bne t0, zero, L(returnzero) +#else /* !__mips_using_dsp */ +# define STRCMP32(OFFSET) \ + lw a2, OFFSET(a0); \ + lw a3, OFFSET(a1); \ + subu t0, a2, t8; \ + nor t1, a2, t9; \ + bne a2, a3, L(worddiff); \ + and t1, t0, t1; \ + bne t1, zero, L(returnzero) +#endif /* __mips_using_dsp */ + .align 2 L(wordloop): STRCMP32(0) DELAY_READ @@ -113,112 +126,143 @@ L(wordloop): STRCMP32(20) DELAY_READ STRCMP32(24) - DELAY_READ - STRCMP32(28) + lw a2, 28(a0) + lw a3, 28(a1) +#if __mips_using_dsp + subu_s.qb t0, t8, a2 +#else + subu t0, a2, t8 + nor t1, a2, t9 + and t1, t0, t1 +#endif + PTR_ADDIU a0, a0, 32 - b L(wordloop) + bne a2, a3, L(worddiff) PTR_ADDIU a1, a1, 32 + beq t1, zero, L(wordloop) L(returnzero): - j ra move v0, zero + jr ra + .align 2 L(worddiff): #ifdef USE_CLZ - subu t0, v0, t8 - nor t1, v0, t9 - and t1, t0, t1 - xor t0, v0, v1 + xor t0, a2, a3 or t0, t0, t1 # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ wsbh t0, t0 rotr t0, t0, 16 -# endif +# endif /* LITTLE_ENDIAN */ clz t1, t0 - and t1, 0xf8 -# if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ - neg t1 - addu t1, 24 + or t0, t1, 24 /* Only care about multiples of 8. */ + xor t1, t1, t0 /* {0,8,16,24} => {24,16,8,0} */ +# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + sllv a2,a2,t1 + sllv a3,a3,t1 +# else + srlv a2,a2,t1 + srlv a3,a3,t1 # endif - rotrv v0, v0, t1 - rotrv v1, v1, t1 - and v0, v0, 0xff - and v1, v1, 0xff - j ra - subu v0, v0, v1 + subu v0, a2, a3 + jr ra #else /* USE_CLZ */ # if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ - andi t0, v0, 0xff - beq t0, zero, L(wexit01) - andi t1, v1, 0xff - bne t0, t1, L(wexit01) - - srl t8, v0, 8 - srl t9, v1, 8 - andi t8, t8, 0xff + andi a0, a2, 0xff /* abcd => d */ + andi a1, a3, 0xff + beq a0, zero, L(wexit01) +# if USE_EXT + ext t8, a2, 8, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 8, 8 beq t8, zero, L(wexit89) + ext a0, a2, 16, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 16, 8 +# else /* !USE_EXT */ + srl t8, a2, 8 + bne a0, a1, L(wexit01) + srl t9, a3, 8 + andi t8, t8, 0xff andi t9, t9, 0xff + beq t8, zero, L(wexit89) + srl a0, a2, 16 bne t8, t9, L(wexit89) + srl a1, a3, 16 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* !USE_EXT */ - srl t0, v0, 16 - srl t1, v1, 16 - andi t0, t0, 0xff - beq t0, zero, L(wexit01) - andi t1, t1, 0xff - bne t0, t1, L(wexit01) - - srl t8, v0, 24 - srl t9, v1, 24 # else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ - srl t0, v0, 24 - beq t0, zero, L(wexit01) - srl t1, v1, 24 - bne t0, t1, L(wexit01) + srl a0, a2, 24 /* abcd => a */ + srl a1, a3, 24 + beq a0, zero, L(wexit01) - srl t8, v0, 16 - srl t9, v1, 16 - andi t8, t8, 0xff +# if USE_EXT + ext t8, a2, 16, 8 + bne a0, a1, L(wexit01) + ext t9, a3, 16, 8 beq t8, zero, L(wexit89) + ext a0, a2, 8, 8 + bne t8, t9, L(wexit89) + ext a1, a3, 8, 8 +# else /* ! USE_EXT */ + srl t8, a2, 8 + bne a0, a1, L(wexit01) + srl t9, a3, 8 + andi t8, t8, 0xff andi t9, t9, 0xff + beq t8, zero, L(wexit89) + srl a0, a2, 16 bne t8, t9, L(wexit89) + srl a1, a3, 16 + andi a0, a0, 0xff + andi a1, a1, 0xff +# endif /* USE_EXT */ - srl t0, v0, 8 - srl t1, v1, 8 - andi t0, t0, 0xff - beq t0, zero, L(wexit01) - andi t1, t1, 0xff - bne t0, t1, L(wexit01) - - andi t8, v0, 0xff - andi t9, v1, 0xff # endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ + beq a0, zero, L(wexit01) + bne a0, a1, L(wexit01) + + /* The other bytes are identical, so just subract the 2 words + and return the difference. */ + move a0, a2 + move a1, a3 + +L(wexit01): + subu v0, a0, a1 + jr ra + L(wexit89): - j ra subu v0, t8, t9 -L(wexit01): - j ra - subu v0, t0, t1 + jr ra + #endif /* USE_CLZ */ +#define DELAY_NOP nop + /* It might seem better to do the 'beq' instruction between the two 'lbu' instructions so that the nop is not needed but testing showed that this code is actually faster (based on glibc strcmp test). */ -#define BYTECMP01(OFFSET) \ - lbu v0, OFFSET(a0); \ - lbu v1, OFFSET(a1); \ - beq v0, zero, L(bexit01); \ - nop; \ - bne v0, v1, L(bexit01) - -#define BYTECMP89(OFFSET) \ - lbu t8, OFFSET(a0); \ + +#define BYTECMP01(OFFSET) \ + lbu a3, OFFSET(a1); \ + DELAY_NOP; \ + beq a2, zero, L(bexit01); \ + lbu t8, OFFSET+1(a0); \ + bne a2, a3, L(bexit01) + +#define BYTECMP89(OFFSET) \ lbu t9, OFFSET(a1); \ + DELAY_NOP; \ beq t8, zero, L(bexit89); \ - nop; \ + lbu a2, OFFSET+1(a0); \ bne t8, t9, L(bexit89) + .align 2 L(byteloop): + lbu a2, 0(a0) BYTECMP01(0) BYTECMP89(1) BYTECMP01(2) @@ -226,20 +270,22 @@ L(byteloop): BYTECMP01(4) BYTECMP89(5) BYTECMP01(6) - BYTECMP89(7) + lbu t9, 7(a1) + PTR_ADDIU a0, a0, 8 - b L(byteloop) + beq t8, zero, L(bexit89) PTR_ADDIU a1, a1, 8 + beq t8, t9, L(byteloop) -L(bexit01): - j ra - subu v0, v0, v1 L(bexit89): - j ra subu v0, t8, t9 + jr ra + +L(bexit01): + subu v0, a2, a3 + jr ra .set at - .set reorder END(STRCMP_NAME) #ifndef ANDROID_CHANGES From patchwork Thu Jan 23 13:43:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105299 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C801A3858427 for ; Thu, 23 Jan 2025 13:49:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C801A3858427 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=ECX1p5vi X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 9B3173858431 for ; Thu, 23 Jan 2025 13:43:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B3173858431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9B3173858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639816; cv=none; b=Tan3LWDvoWNCBoKpdC01UroY07RTg0YyCLMcYX2bfAOSYKu3TzCKKlqSYh4gVEqOnJJbndkHzc/8Ifbdos2J8nyxWnvl7e9onKCUGE4BApjjyAZpGaHZ+Kf7BulOmX9Xa/OVjnrgcHsKIGTtoGxZqWslGY6idYcYpmTeFqZHaSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639816; c=relaxed/simple; bh=WU3DYvkNv4Ix5oZ63dgsHeUJfVqxC+9NbBDOC3JQO2I=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Vsd/cCOMKD6Zkc/utboh1CHxX8jDMfCzNGd8jYZQqQu93pLaq9Y6QVGBzKcv0J8EFdc10SAg9+b9+4qpUXhloz3zNP/E2u+zZ2N4iO/E1pS2CTEqK4v8f02H+oTN9EFsiO+mUpEdq9x9lOd0cokBkIxZkk85SiMrSko6gPxNAuY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B3173858431 Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-43658c452f5so760655e9.0 for ; Thu, 23 Jan 2025 05:43:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639815; x=1738244615; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CALNUis3DFZ/OH03W8sF13KwmvUJBdu8BeN+l6ZXHpQ=; b=ECX1p5vi6FKutHgW0rhxAlIt6zyPXNnxBkS0OrKwnBAaqy5wHaBfgVqe1+NHiMzYtE tuhnCw+24mQmJ+Cb1UM55LKcfGez3D2CBUxJyLZfaqXXyTXe24i6Vq/0k69itv8lVXxD oz6MeNApCUwD4Pu81XTctc/n6PqBm/xwChqyWgRveazih+tWU/Gb2rRbkgLeQQ8Nd38b UrrfCZL1hIMwMCESzHWe1abEaf9I+GruJbzIYjoCHQZT8rZby2X1OasiKCNh1Zxceetv BGDU5VQoeChimI47SJ9GgfUhA15JuWgFglfJdu4kqfqYaa5Oymwv+FYyS41k0buWFwOG SdPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639815; x=1738244615; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CALNUis3DFZ/OH03W8sF13KwmvUJBdu8BeN+l6ZXHpQ=; b=vi4VUgT3sjLXSlutja66y9ma5m0rx+1mdB3MYPPnr/gB9Xt3SeZLLerB6Sh8l07IM1 rROgOEhhe3VNnaFWhOJQplXJejCpcbNqJcSCHOCrf0xBmW0vXhECW7nQysZ9SRqo9YJ4 J8FD0m/C/y1Ei/rn4xLCorJJbWW2BMfwEtdfllpDKqzCzhiZhLgXhUDeLNAL6f6/ub/1 S+vK8MzScPLaSFXG2SJp3DKZhsfcuRPUearxYm3fLuzsGgvS1H30hhBGfUwxhdvfIvMj 8UUCG1CxjLQXZQnKZX8e/SZsEXAX4khgdzDKcRB4q/jXwemDxdeiZMFnGvCec7bRk2+4 fCBw== X-Gm-Message-State: AOJu0Yx1yXb8GNzJLDqkJeGZ73sFJdsz2aIcO0wcRCtS8SDxpTbBMTWT zYCYCJB4w7LUdTGQDqOIbXkgbpMhpiXpZbigvwo8CJLrA65yaFnmT3rskQ== X-Gm-Gg: ASbGncu7URQTope8XXB6X8g8iIkkYe+eD2R8VzTdIrE33q5bV3DxU4STIDEylOCcrH1 nP2AnExfZ41Hjx0+LBxp/dc6lzBY+JpP7xOl4GpjfyuHfcXh5GDU5ZjHksBRZvF8viVKuFeVza5 eNcNSV3jv/TUr7RzDVh5nlKaTCR/tgYvZoKrii3TCDXfh/tKoYeQbVOyG1ZNFuNg4C3qZJMBmG6 ju2fELpXey9koswnlBpZIyxPw63n5Hyru+aIUXwx+NPNpvnAECe0qF/syZh8CDiNd8vvdFUkgrR LxZuxfnOq5Ka5WUi+XLPvZHlZbFw X-Google-Smtp-Source: AGHT+IHRQg8fIQ8vEkXS28mxqUJRY+xT+vei5onzfe9lXv7UvpPlykUJjSju5v6SvsWtR0/HpF0UBA== X-Received: by 2002:a05:600c:3b0d:b0:42c:aeee:80a with SMTP id 5b1f17b1804b1-438b17d5b09mr31199165e9.7.1737639814700; Thu, 23 Jan 2025 05:43:34 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:34 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 06/11] Fix prefetching beyond copied memory Date: Thu, 23 Jan 2025 14:43:02 +0100 Message-Id: <20250123134308.1785777-8-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org GTM18-287/PP118771: memcpy prefetches beyond copied memory. Fix prefetching in core loop to avoid exceeding the operated upon memory region. Revert accidentally changed prefetch-hint back to streaming mode. Refactor various bits and provide pre-processor checks to allow parameters to be overridden from compiler command line. Cherry-picked 132e0bbbbed01f95ec88b68b5f7f2056f6125531 from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/memcpy.c | 188 +++++++++++++++++++++++++----------------- 1 file changed, 111 insertions(+), 77 deletions(-) diff --git a/sysdeps/mips/memcpy.c b/sysdeps/mips/memcpy.c index 8c3aec7b36..798e991f6d 100644 --- a/sysdeps/mips/memcpy.c +++ b/sysdeps/mips/memcpy.c @@ -1,37 +1,29 @@ -/* - * Copyright (C) 2024 MIPS Tech, LLC - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived from this - * software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" - * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. -*/ +/* Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Wave Computing + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ #ifdef __GNUC__ #undef memcpy /* Typical observed latency in cycles in fetching from DRAM. */ -#define LATENCY_CYCLES 63 +#ifndef LATENCY_CYCLES + #define LATENCY_CYCLES 63 +#endif /* Pre-fetch performance is subject to accurate prefetch ahead, which in turn depends on both the cache-line size and the amount @@ -48,30 +40,42 @@ #define LATENCY_CYCLES 150 #elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500) #define CACHE_LINE 64 - #define BLOCK_CYCLES 16 + #define BLOCK_CYCLES 15 #elif defined(_MIPS_TUNE_P6600) #define CACHE_LINE 32 - #define BLOCK_CYCLES 12 + #define BLOCK_CYCLES 15 #elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2) #define CACHE_LINE 32 #define BLOCK_CYCLES 30 #else - #define CACHE_LINE 32 - #define BLOCK_CYCLES 11 + #ifndef CACHE_LINE + #define CACHE_LINE 32 + #endif + #ifndef BLOCK_CYCLES + #ifdef __nanomips__ + #define BLOCK_CYCLES 20 + #else + #define BLOCK_CYCLES 11 + #endif + #endif #endif /* Pre-fetch look ahead = ceil (latency / block-cycles) */ #define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \ + ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1)) -/* Unroll-factor, controls how many words at a time in the core loop. */ -#define BLOCK (CACHE_LINE == 128 ? 16 : 8) +/* The unroll-factor controls how many words at a time in the core loop. */ +#ifndef BLOCK_SIZE + #define BLOCK_SIZE (CACHE_LINE == 128 ? 16 : 8) +#elif BLOCK_SIZE != 8 && BLOCK_SIZE != 16 + #error "BLOCK_SIZE must be 8 or 16" +#endif #define __overloadable #if !defined(UNALIGNED_INSTR_SUPPORT) /* does target have unaligned lw/ld/ualw/uald instructions? */ #define UNALIGNED_INSTR_SUPPORT 0 -#if (__mips_isa_rev < 6 && !defined(__mips1)) +#if (__mips_isa_rev < 6 && !defined(__mips1)) || defined(__nanomips__) #undef UNALIGNED_INSTR_SUPPORT #define UNALIGNED_INSTR_SUPPORT 1 #endif @@ -79,17 +83,35 @@ #if !defined(HW_UNALIGNED_SUPPORT) /* Does target have hardware support for unaligned accesses? */ #define HW_UNALIGNED_SUPPORT 0 - #if __mips_isa_rev >= 6 + #if __mips_isa_rev >= 6 && !defined(__nanomips__) #undef HW_UNALIGNED_SUPPORT #define HW_UNALIGNED_SUPPORT 1 #endif #endif -#define ENABLE_PREFETCH 1 + +#ifndef ENABLE_PREFETCH + #define ENABLE_PREFETCH 1 +#endif + +#ifndef ENABLE_PREFETCH_CHECK + #define ENABLE_PREFETCH_CHECK 0 +#endif + #if ENABLE_PREFETCH - #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0) -#else + #if ENABLE_PREFETCH_CHECK +#include +static char *limit; +#define PREFETCH(addr) \ + do { \ + assert ((char *)(addr) < limit); \ + __builtin_prefetch ((addr), 0, 1); \ + } while (0) +#else /* ENABLE_PREFETCH_CHECK */ + #define PREFETCH(addr) __builtin_prefetch (addr, 0, 1) + #endif /* ENABLE_PREFETCH_CHECK */ +#else /* ENABLE_PREFETCH */ #define PREFETCH(addr) -#endif +#endif /* ENABLE_PREFETCH */ #include @@ -99,17 +121,18 @@ typedef struct { reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8; } bits_t; -#else +#else /* __mips64 */ typedef unsigned long reg_t; typedef struct { reg_t B0:8, B1:8, B2:8, B3:8; } bits_t; -#endif +#endif /* __mips64 */ -#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \ - (BLOCK * sizeof (reg_t) / CACHE_LINE) \ - : 1) +#define CACHE_LINES_PER_BLOCK \ + ((BLOCK_SIZE * sizeof (reg_t) > CACHE_LINE) \ + ? (BLOCK_SIZE * sizeof (reg_t) / CACHE_LINE) \ + : 1) typedef union { @@ -120,7 +143,7 @@ typedef union #define DO_BYTE(a, i) \ a[i] = bw.b.B##i; \ len--; \ - if(!len) return ret; \ + if (!len) return ret; \ /* This code is called when aligning a pointer, there are remaining bytes after doing word compares, or architecture does not have some form @@ -148,7 +171,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) { unsigned char *x = (unsigned char *) a; bitfields_t bw; - if(len > 0) + if (len > 0) { bw.v = *(reg_t *)b; DO_BYTE(x, 0); @@ -159,7 +182,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) DO_BYTE(x, 4); DO_BYTE(x, 5); DO_BYTE(x, 6); -#endif +#endif /* __mips64 */ } return ret; } @@ -170,7 +193,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -182,7 +205,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 5: a[5] = b[5]; // Fall through case 6: a[6] = b[6]; // Fall through case 7: a[7] = b[7]; // Fall through -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8] = b[8]; // Fall through case 9: a[9] = b[9]; // Fall through case 10: a[10] = b[10]; // Fall through @@ -191,9 +214,9 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 13: a[13] = b[13]; // Fall through case 14: a[14] = b[14]; // Fall through case 15: a[15] = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } #if !HW_UNALIGNED_SUPPORT @@ -210,7 +233,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -222,7 +245,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 5: a[5].uli = b[5]; // Fall through case 6: a[6].uli = b[6]; // Fall through case 7: a[7].uli = b[7]; // Fall through -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8].uli = b[8]; // Fall through case 9: a[9].uli = b[9]; // Fall through case 10: a[10].uli = b[10]; // Fall through @@ -231,9 +254,9 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 13: a[13].uli = b[13]; // Fall through case 14: a[14].uli = b[14]; // Fall through case 15: a[15].uli = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } /* The first pointer is not aligned while second pointer is. */ @@ -242,13 +265,19 @@ unaligned_words (struct ulw *a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + /* This condition is deliberately conservative. One could theoretically + pre-fetch another time around in some cases without crossing the page + boundary at the limit, but checking for the right conditions here is + too expensive to be worth it. */ + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3]; reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7]; @@ -260,7 +289,7 @@ unaligned_words (struct ulw *a, const reg_t * b, a[5].uli = y5; a[6].uli = y6; a[7].uli = y7; -#if BLOCK==16 +#if BLOCK_SIZE==16 y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11]; y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15]; a[8].uli = y0; @@ -271,16 +300,16 @@ unaligned_words (struct ulw *a, const reg_t * b, a[13].uli = y5; a[14].uli = y6; a[15].uli = y7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* Mop up any remaining bytes. */ return do_uwords_remaining (a, b, words_by_1, bytes, ret); } -#else +#else /* !UNALIGNED_INSTR_SUPPORT */ /* No HW support or unaligned lw/ld/ualw/uald instructions. */ static void * @@ -320,13 +349,15 @@ aligned_words (reg_t * a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i))); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3]; reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7]; @@ -338,7 +369,7 @@ aligned_words (reg_t * a, const reg_t * b, a[5] = x5; a[6] = x6; a[7] = x7; -#if BLOCK==16 +#if BLOCK_SIZE==16 x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11]; x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15]; a[8] = x0; @@ -349,9 +380,9 @@ aligned_words (reg_t * a, const reg_t * b, a[13] = x5; a[14] = x6; a[15] = x7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* mop up any remaining bytes. */ @@ -363,13 +394,16 @@ memcpy (void *a, const void *b, size_t len) __overloadable { unsigned long bytes, words, i; void *ret = a; +#if ENABLE_PREFETCH_CHECK + limit = (char *)b + len; +#endif /* ENABLE_PREFETCH_CHECK */ /* shouldn't hit that often. */ if (len <= 8) return do_bytes (a, b, len, a); /* Start pre-fetches ahead of time. */ - if (len > CACHE_LINE * (PREF_AHEAD - 1)) - for (i = 1; i < PREF_AHEAD - 1; i++) + if (len > CACHE_LINE * PREF_AHEAD) + for (i = 1; i < PREF_AHEAD; i++) PREFETCH ((char *)b + CACHE_LINE * i); else for (i = 1; i < len / CACHE_LINE; i++) @@ -400,12 +434,12 @@ memcpy (void *a, const void *b, size_t len) __overloadable #if HW_UNALIGNED_SUPPORT /* treat possible unaligned first pointer as aligned. */ return aligned_words (a, b, words, bytes, ret); -#else +#else /* !HW_UNALIGNED_SUPPORT */ if (((unsigned long) a) % sizeof (reg_t) == 0) return aligned_words (a, b, words, bytes, ret); /* need to use unaligned instructions on first pointer. */ return unaligned_words (a, b, words, bytes, ret); -#endif +#endif /* HW_UNALIGNED_SUPPORT */ } libc_hidden_builtin_def (memcpy) From patchwork Thu Jan 23 13:43:03 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105306 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 114AA3858C48 for ; Thu, 23 Jan 2025 14:03:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 114AA3858C48 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=h88ptRMv X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x336.google.com (mail-wm1-x336.google.com [IPv6:2a00:1450:4864:20::336]) by sourceware.org (Postfix) with ESMTPS id BDDFE3858282 for ; Thu, 23 Jan 2025 13:43:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BDDFE3858282 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BDDFE3858282 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::336 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639818; cv=none; b=JAYxJOxCoLS2iPsC3O8nuqO6/ge1G+gMkkLNoicuWa6UTwVDGNpU0af+3Zlxi/JizSVvzJW9lKQvbfhLRzyuAEN0pFavtfbAviTPThUOeELfqD4phxttyVMFtg8m4+Ve9yzD3spv0C3RC1hWJ89fOLsQQGzeLiP1JKMXa5lOauk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639818; c=relaxed/simple; bh=QDWud4zjgJ01CHQ7/mxdsO+BUmA3XrAcK/GdaVtTntg=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=MsQ5+ozQjKP1WVnWU/4HBQUM/2BEg4hmAYyxP3bvnC6BXWvVYvRSD3bN3LsyllBu8fdJAe2qQwbcCKxOHoerK4k6jjt9lDszWxEQO0HZA/fFloCknqRzdcpYy6AURINbTNI4GyTErozD5Nx8d2mlIzo+Uu2wHQWqLbsUcH/BbZA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BDDFE3858282 Received: by mail-wm1-x336.google.com with SMTP id 5b1f17b1804b1-4361e82e3c3so1033225e9.0 for ; Thu, 23 Jan 2025 05:43:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639816; x=1738244616; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uteVyvy729/dkj5GAvhhE/0TjOWAtl55t75qCavBnI8=; b=h88ptRMv6KXShuUAeFhDWcDFfMH8KPrH+SWZWXMP1uW9/T3XdCtr2EenL38AWilfl5 8GubG4ZIlCw+mzpWFYgWYCD8wU0zslBCMT4+fqruL0PcepNZr/9ySSVROVLD2bgekOdO P7JDCwOQl71ahM+Xi1tA1KwTKUMUqM2XV4VFa9rG5agtpuJjYNFUuQBTvwvuDjz+oU6x Ja8Y9W58bXMiKwuovU3vFsiokEGe6yebwMv0K6TtS+OUYclKpsGRDENt28pkAsJ0f9JM GFtfIjOmLpvqm697l9F7ZL/ZCG/6FLA9iJTEu3dm7ZpZMH9q2ZL/qgP+C8hmGkZq872q 0BMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639816; x=1738244616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uteVyvy729/dkj5GAvhhE/0TjOWAtl55t75qCavBnI8=; b=mo320Nk0RMXcbWt0IEdi1LnwnKD5wFvK9EomlG5qth14qS8UFqF18bKJlooiCj1P+t L8XmyoREw/YvN3b9jvj3B7lVAqvITY3rDQjnSNsZ/oP70Um4Il7IMEx3g6XXx3b+4uHF 0GwiO///mOfoq9gbhhX1idC648JNlkd1yZvIIRohHc5pqpDAYMENV+/UTrz9w+2XM/5Z 2bjmFBQrYgfaschPe76Z4IMLgarUuboHXxKUTiXbX2cdLTQYPvPviud6ohSeueHhkSTA NU9+ZRvBPYobQBtr9dPyM9TcX2J/8pPO87aklg0cTYTR3htWtybEhrI+O8ujhq5UoNqi CHyQ== X-Gm-Message-State: AOJu0Yy7IWYYLdI5wz/joudc4OjLCxBDJv6fPvF4y/VFPLRjkro/cyZi 72Vl6SIU1rsiUFuNKM8Ii7qkAlpsWxQgPfeopT4xAu5n/VOF0NoCBz72MA== X-Gm-Gg: ASbGnctes/zpKH46f7+2PDhgxFySM+l5W6o2R0sH8SPpRxRCfPQLLo5DIh4nwq4me5y 1DxV1n1tfrI0IbhkFvo/jNcVn2ujdqMmChVrw2wuXMoT9m3aQR5RSUYazKpX9CFd83ab8NYjRUk eHSpNnfkrilTWziZEed9+IMCKzswrElWe8E1OyhjwT6IbaHxGxC8PNBO9mdry7muz0plrwMe8/6 oi1S3HFuOwBjH1GlXzuyGltjH4oK15Z4U7rbEJyB3YSzqgWoT5Ua/9HovMPoszBvBHzXgHSa3zh jVgajjbvLVRML49hg61cofLMWVE5 X-Google-Smtp-Source: AGHT+IEuEIhQBPd9I6fNR/SewDH0/l+Pom16K6FHbtXP5FaZIGL+lUmif1IZLBAlDyPw8uPnx0BVYw== X-Received: by 2002:a05:600c:cc8:b0:42c:aeee:e603 with SMTP id 5b1f17b1804b1-4389db13a7fmr91611955e9.7.1737639816177; Thu, 23 Jan 2025 05:43:36 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:35 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 07/11] Fix strcmp bug for little endian target Date: Thu, 23 Jan 2025 14:43:03 +0100 Message-Id: <20250123134308.1785777-9-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Strcmp gives incorrect result for little endian targets under the following conditions: 1. Length of 1st string is 1 less than a multiple of 4 (i.e len%4=3) 2. First string is a prefix of the second string 3. The first differing character in the second string is extended ASCII (that is > 127) Cherry-picked 7c709e878f836069bbdbf42979937794623cfa68 from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/strcmp.S | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sysdeps/mips/strcmp.S b/sysdeps/mips/strcmp.S index 4878cd3aac..8d1bab12ec 100644 --- a/sysdeps/mips/strcmp.S +++ b/sysdeps/mips/strcmp.S @@ -225,10 +225,13 @@ L(worddiff): beq a0, zero, L(wexit01) bne a0, a1, L(wexit01) - /* The other bytes are identical, so just subract the 2 words - and return the difference. */ +# if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + srl a0, a2, 24 + srl a1, a3, 24 +# else /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ move a0, a2 move a1, a3 +# endif /* __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ */ L(wexit01): subu v0, a0, a1 From patchwork Thu Jan 23 13:43:04 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105302 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D93CF3858C66 for ; Thu, 23 Jan 2025 13:53:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D93CF3858C66 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=gaAlBZUS X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id C21183858424 for ; Thu, 23 Jan 2025 13:43:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C21183858424 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C21183858424 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::329 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639820; cv=none; b=Be3nQmb9b36Av673Zo3BWInBhwbS86Xa421hUPOhig8ZMBHUexsOLoBdk+O2ZyXJ/5pdn/bSv5XtoBKb6XLcWQQ93gVZaIrwEXKB6ZbjTwBcqVcGC1IdsjnhG9dP5Q81TO3KQt0QAfch95Q8spX2rxPKThu5otzaLLiYwiNPG38= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639820; c=relaxed/simple; bh=jtZgK/7yyg8zzmU/6lq7zXsoXWfqFz4wdg8ucDmtw70=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=rh8B74ct1owsY80BLY8IiAPk0TL93Fa1TcZUgxs6oXNGTeUGgZ8Kr2xsMvO96hDDNZ0XL3utFxMePf3zx+HFkZ/zrnGlIHajnDGyG1pkdMU/kaInAJYLBm1S86NM/gsEWN9JnkKF/TQm01YUkHHUNqOA6xpsxA7Gfe5+KOL5Ojk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C21183858424 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-4361e82e3c3so1033265e9.0 for ; Thu, 23 Jan 2025 05:43:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639818; x=1738244618; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=cnqjasWI7KDQzCKH16lpKRtb9GJIQs7xavuGVaPZdKU=; b=gaAlBZUSsp8zBeq4aRyH7zAN/fjQ0i4YcuOAKbv3hq4YajfNA7fbCVTISy7sUo3NLK CJ+CoDvSUsLOOT46tq4xZIGS8vgU+FknH8sXAENixtZB040HQiUoC2lSnguJTkRov97W Dbm/XdoWdk7Bt2UMFwt+sqX8mfBmYGXQpI7EvDwoD/jBTIjyOdqHFPQchoJ+MUTRI2bL IGmPEDRtDBt94KRbQUEoOIRwd5WuMHCtYrTfRmIoC5V0qxL4KxaHZUKiPRJ4WRFNFC5f 39aGIWIPV77rcjeMre8yVYzvIu+9iCGtVh4jJjhJLExPw3e3EaRCqEojub1NZSr/47lF tt/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639818; x=1738244618; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cnqjasWI7KDQzCKH16lpKRtb9GJIQs7xavuGVaPZdKU=; b=mWNVaUSTGCgNi1LCxrfXHpTPgxMIVvD9SVGa/Gwzp2t5eKxE1ezjoP3+A9OKu1AgRU IltbZUF4DvMk6pjdtqCMy6LaqkhCvsifd8gWQrYyw5bzQoy+cVOvX7L6KEyRGHdzsCbR 2+veoNEWWy2VmGy5NmiIwSXEh1MzBRSgeOOR4Pz7kHxUW8VXuhluWu7NuDksqsFTs5Ml JHtR9oq916DCPiL1pUIngHgsgVhBK5EJuMtRVYGEJ4vc2kYi3OPpPdxd+QA5GRgsf2Wo 3umCrAJ4XoxqjWYOqpDI5tODVgGbdpTL1wlu9vc7EMAv/bF0liDaUZbIlFQi31y8izyB 2UpQ== X-Gm-Message-State: AOJu0Yw4FtfVQ795tI/rk+/gXFa+DlaapU7WuCVtek7adeD9dOGaHE/z EH9VpCGJjItZMaMY792Coqxxcdp3j8Fk/6KPAa+TJr9OaV1Kr4hwuWfm0g== X-Gm-Gg: ASbGncuTqcXnFis3jenbWYpbAxoUrqdL/4E/j2NR5vT2nEBE/Vj9fAoMSwOB0+aTvpZ Rz0B+ltQpOosiJBf6/H6dgdfaW6G6Dv/qRGR+TiAWVNBXASdqbHL5c1FTi3RFewSyVWlQK5K3Qs si5ePGHtInV1ASiRkcY18QRYlkhHlUjrHrn4qDLPr3jAWsDaA9IJ36D7VxxjpEaWuPYKepmX+sp NxHlRzS59+6DicBgzdXFJCTSmBW7XOCj4USmNcvO41rz9uLi1/2jNlVBDJWCF+NP40d3D98WpYX vvxd224O0ytZZI46VuPA1N40Tw/v X-Google-Smtp-Source: AGHT+IHHaJeNn8AwFJAsxyLZ/b+Ff3mTPhTu8gjG30mTOnlnljogeYHyRAdEC3RlWWc93r5WeHiHNQ== X-Received: by 2002:a05:600c:1c0a:b0:42c:bfd6:9d2f with SMTP id 5b1f17b1804b1-438913bde78mr97596535e9.1.1737639817465; Thu, 23 Jan 2025 05:43:37 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:37 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 08/11] Add script to run tests through a qemu wrapper Date: Thu, 23 Jan 2025 14:43:04 +0100 Message-Id: <20250123134308.1785777-10-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org GTM19-545: Add script to run tests through a qemu wrapper Cherry-picked 9f9923a4f14406026426d857acf9c2babe2908bf from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- scripts/cross-test-qemu.sh | 152 +++++++++++++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100755 scripts/cross-test-qemu.sh diff --git a/scripts/cross-test-qemu.sh b/scripts/cross-test-qemu.sh new file mode 100755 index 0000000000..7636414141 --- /dev/null +++ b/scripts/cross-test-qemu.sh @@ -0,0 +1,152 @@ +#!/bin/bash +# Run a testcase on a remote system, via qemu. +# Copyright (C) 2024 Free Software Foundation, Inc. +# This file is part of the GNU C Library. + +# The GNU C Library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. + +# The GNU C Library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. + +# You should have received a copy of the GNU Lesser General Public +# License along with the GNU C Library; if not, see +# . + +# usage: cross-test-qemu.sh HOST COMMAND ... +# Run with --help flag to get more detailed help. + +progname="$(basename $0)" + +usage="usage: ${progname} [--ssh SSH] HOST COMMAND ..." +timeoutfactor=$TIMEOUTFACTOR +addon_libpath="" +while [ $# -gt 0 ]; do + case "$1" in + + "--timeoutfactor") + shift + if [ $# -lt 1 ]; then + break + fi + timeoutfactor="$1" + ;; + + "--addon-libpath") + shift + if [ $# -lt 1 ]; then + break + fi + addon_libpath="$1" + ;; + + "--help") + echo "$usage" + echo "$help" + exit 0 + ;; + + *) + break + ;; + esac + shift +done + +if [ $# -lt 1 ]; then + echo "$usage" >&2 + echo "Type '${progname} --help' for more detailed help." >&2 + exit 1 +fi + +emulator="$1"; shift +envpat="[:alpha:]*=.*" +ldpat=".*/.*ld.*\.so.*" +lgccpat="libgcc_s.so.1" +libpat="--library-path" +ldpath="" +lgccpath="" +envlist="" +liblist="" +command="" +toolchain=`dirname \`dirname $emulator\`` +target=`ls $toolchain | grep -e linux-gnu` +# Print the sequence of arguments as strings properly quoted for the +# Bourne shell, separated by spaces. +bourne_quote () +{ + local arg qarg libflag variant + libflag=0 + + for arg in $@; do + if [ "x$done" != "x" ]; then + command="$command $arg" + elif [[ $arg =~ $envpat ]]; then + if [ -z $envlist ]; then + envlist="$arg" + else + envlist="$arg,$envlist" + fi + elif [[ $arg =~ $ldpat ]]; then + ldfile=`basename $arg` + variant=`basename \`dirname \\\`dirname $arg\\\`\`` + libdir=${variant##*_} + variant=${variant%_*} + variant=${variant#obj_} + ldpath=$toolchain/sysroot/$variant + if [ ! -f $ldpath/$libdir/$ldfile ]; then + ldpath=`dirname $arg` + fi + lgccpath=$toolchain/$target/lib/$variant/$libdir + liblist="$ldpath:$lgccpath:$liblist" + elif [[ $arg =~ $libpat ]]; then + libflag=1 + elif [ $libflag -ne 0 ]; then + liblist="$arg:$liblist" + libflag=0 + elif [ "x$arg" != "xenv" ]; then + if [[ $arg =~ "tst-" ]]; then + if [ -f $arg ]; then + done=1 + fi + fi + command="$command $arg" + fi + done +} + +# Transform the current argument list into a properly quoted Bourne shell +# command string. +bourne_quote "$@" + +liblist=$addon_libpath:$liblist +liblist=`tr -s : <<< $liblist` +liblist=${liblist#:*} +liblist=${liblist%*:} + +if [ "x$liblist" != "x" ]; then + LIBPATH_OPT="-E LD_LIBRARY_PATH=$liblist" +fi + +if [ "x$envlist" != "x" ]; then + ENV_OPT="-E $envlist" +fi + +if [ "x$ldpath" != "x" ]; then + LDPATH_OPT="-L $ldpath" +fi + +if [ "x$timeoutfactor" != "x" ]; then + $emulator $LDPATH_OPT $LIBPATH_OPT $ENV_OPT $command & + pid=$! + trap "kill -SIGINT $pid" SIGALRM + sleep $timeoutfactor && kill -SIGALRM $$ + exit 1 +else + $emulator $LDPATH_OPT $LIBPATH_OPT $ENV_OPT $command +fi + From patchwork Thu Jan 23 13:43:05 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105301 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5DD3E3858CD9 for ; Thu, 23 Jan 2025 13:51:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5DD3E3858CD9 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=Vpllnl6o X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id 716983858423 for ; Thu, 23 Jan 2025 13:43:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 716983858423 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 716983858423 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639820; cv=none; b=p8rvEo9bO80RJIbs7FO4X9daQFJYTTnKb7A7TyBY2E9Xi/0YO2cMjrV6ihinjUDOoeO1UEDPwwUgCtcPJoqltxM28Ne7i/I5tiiDOiWCFwGMrJ2ZUpJzKttzBmkHW7ZZ/TY1KFHQJ2V0Y+y+HNssLkG7HoMvsppXbkb1yM3X5TY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639820; c=relaxed/simple; bh=wU5Cu883p3HxWdmzqrpdI7lzrx5zxxAnKACMdVU3A9k=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=clQi5ayKLBmPSt2qmd9ZizUGk0DMwDq0TG04ATFpCJ2Sw44dd6+JVlF+9MmVjS7kSofmvvyuzIPWtk0MXew5zvCdMVipvjNkXzOreO39nCepnwQ37tmoC5sEnAjk9MfkjTCcea/C4JZEblna6PpJe0wT/LWh7IfONa3R6WnQWNg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 716983858423 Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-4363298fff2so740455e9.3 for ; Thu, 23 Jan 2025 05:43:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639819; x=1738244619; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/khGVgTLLdkpsoQczwQJhQ7bpPQD80RTPzNzQT8Y6Z8=; b=Vpllnl6oXjJ2HeKi/KFUk/juJOhKGN/dl7Xu0qVNWULRa0MqXs6nklfCDGFR4l1mA+ inT6IXL2F/Sar6OC5ETJQH38ECJ47+twqIuN8nS32tAANtU3SWxqi08eF5R3uSUqtO4g akBTWkuSudUBqnNSO51qeS6kGyGE+DnG9PU6i+qkCIntqnlK8vvIUhQItGSupZBZ3aBB K3PDzog2QPcBg84wNTcgXzHiypRv9H0qCgpgUWte+TI6SgRXP5zIiiGQV18+/LFKC4PF y+ILeZJio8GLhjbKw9plLLk6rHakznBf/5RgZfrGm0PwbUTAH5qhGUf8cI4Kh4kYxafO ydMA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639819; x=1738244619; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/khGVgTLLdkpsoQczwQJhQ7bpPQD80RTPzNzQT8Y6Z8=; b=puiTdC34Z7GhL5oua5FfczxAaKAZCztOhyXGF04ZmfEZreePC7+nIper00+c+wUXM9 2v0smeBCZG7Z88BNub9A9gctpCHEejy2OMvkLB/tNAdSye2JHQ1SEa2HQzpDtC77MLIL CBVwsRzI1yHbueTWxLA0eidq39cLLuQM5ONxNx9xIaWRjsrhuAdHZ7WjQ4k+E77j/qWG nvKJ5K2Iw/vufCUC90xTd2/OEJU71RjbBAIV5Yc6I58QW3H95W4hLLqbedqH4na43jQy 3W+z5r6ZozlNmcBg1Hjtw2x1Jc4LhWFM1hd838B6bo5YazRHftKnljYpYtPw7VtoV5cH GLNQ== X-Gm-Message-State: AOJu0YzbFegTPUkrRcl1xhPlbKYZrjDNosR1PCXa5qlYGu6Vd5mz5OpN UHkE6U8eQYzuO9EuggYoeTZQm0hozS2n14mgwBbi80r0D1DSPfxf+ToWsA== X-Gm-Gg: ASbGncteh/EuHuX0XXcwupKGxsCXY8NuSL5sNJgLUJYY1QckqzLs4nayIURKgt9ZFBs Ey5nwM9jvHuNc61p0Joh4MqrZou/opTCIncjDvtCgUz4pmzbFGNPIyyvXXzvz5FUnL5p36GyILy rS7+QRsSC15KQpmNa768vT1Xh5SFYdViqLg55YHK2OtVAo9zSNBbyHqIRHbl0NOUzIuvZxBYrOx KNJq0odmL5CpIG4FA5paphjqhAqFLPLS41ksIoQ79nLDNHqumKoIyA3oH6cT5v11LDsJnSu8ALp GIlSdX4pXTflzXgm+3A6PBE3XlMV X-Google-Smtp-Source: AGHT+IFwtMcU9UCHQHiLvEftiqyNZvY9ib0lOJMvMhzLe+Xiiob/HvtB7srmF/yHA921RVifNSXNzg== X-Received: by 2002:a05:600c:4e52:b0:42c:c0d8:bf49 with SMTP id 5b1f17b1804b1-438b17079femr29883175e9.0.1737639818544; Thu, 23 Jan 2025 05:43:38 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:38 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com Subject: [PATCH 09/11] Avoid warning from -Wbuiltin-declaration-mismatch Date: Thu, 23 Jan 2025 14:43:05 +0100 Message-Id: <20250123134308.1785777-11-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Avoid GCC 11 warning from -Wbuiltin-declaration-mismatch for modfl and sincosl under MIPS o32 ABI. Cherry-picked 056065bbe644d396a6fadd7c759f91bba1855bd6 from https://github.com/MIPS/glibc Signed-off-by: Chao-ying Fu Signed-off-by: Aleksandar Rakic --- sysdeps/ieee754/dbl-64/s_modf.c | 4 ++++ sysdeps/ieee754/dbl-64/s_sincos.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/sysdeps/ieee754/dbl-64/s_modf.c b/sysdeps/ieee754/dbl-64/s_modf.c index 0de2084caf..eda2d65b51 100644 --- a/sysdeps/ieee754/dbl-64/s_modf.c +++ b/sysdeps/ieee754/dbl-64/s_modf.c @@ -23,6 +23,7 @@ #include #include #include +#include static const double one = 1.0; @@ -60,5 +61,8 @@ __modf(double x, double *iptr) } } #ifndef __modf +DIAG_PUSH_NEEDS_COMMENT; +DIAG_IGNORE_NEEDS_COMMENT (11, "-Wbuiltin-declaration-mismatch"); libm_alias_double (__modf, modf) +DIAG_POP_NEEDS_COMMENT; #endif diff --git a/sysdeps/ieee754/dbl-64/s_sincos.c b/sysdeps/ieee754/dbl-64/s_sincos.c index adbc57af28..531940d4c8 100644 --- a/sysdeps/ieee754/dbl-64/s_sincos.c +++ b/sysdeps/ieee754/dbl-64/s_sincos.c @@ -23,6 +23,7 @@ #include #include #include +#include #ifndef SECTION # define SECTION @@ -106,5 +107,8 @@ __sincos (double x, double *sinx, double *cosx) *sinx = *cosx = x / x; } #ifndef __sincos +DIAG_PUSH_NEEDS_COMMENT; +DIAG_IGNORE_NEEDS_COMMENT (11, "-Wbuiltin-declaration-mismatch"); libm_alias_double (__sincos, sincos) +DIAG_POP_NEEDS_COMMENT; #endif From patchwork Thu Jan 23 13:43:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105300 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD9023858C42 for ; Thu, 23 Jan 2025 13:50:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD9023858C42 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=l2p54AQ5 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32b.google.com (mail-wm1-x32b.google.com [IPv6:2a00:1450:4864:20::32b]) by sourceware.org (Postfix) with ESMTPS id 8DA733858432 for ; Thu, 23 Jan 2025 13:43:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8DA733858432 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8DA733858432 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639835; cv=none; b=GVY1DpWKUzjg5HY4L7H3UCAkfVHLpbdRxHYh1eE+b6vfgKylTD/Y3EQO+z4IcY8LChXAHApGnryftQydJdp6O6PgAjH/Y/tQ1y19oZk7GeN77hjpXJEGa7OKRPKAWEtMxJgXH0VsOM+KlQVVIP4bPysoyPxmxYSnhejhklCr7aM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639835; c=relaxed/simple; bh=ODYJSt/34xrogePmvZHUoyoo5tXIvkk36bbwYVCrNSc=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=SomLdq8XAgnGrk00YysIgqdHQt++8qPOD+v+Gga+vWeY+7Sv+G4r5ImmMpu5ldY6qguSP2hO1dggEYXW5O5PNyhEVOfVecDtHSF627c/b05Lbe5Y0Gbk9+pA1gvsnuJb9oLT6d3OqdF/iUvkiGzA0t2LWbgi9WBZz9CIUNtJ13g= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8DA733858432 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-43637977fa4so519465e9.3 for ; Thu, 23 Jan 2025 05:43:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639820; x=1738244620; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=k5TPerymEyq+fm1DV5BNdGK4w0VMxko7fkAE2X7tolA=; b=l2p54AQ58tNUvYpdPb8UItdX9qNYZY0r777w+EaqSaXJehhHbU0bxIpift0JZmEfZo 2lvXScAJv8e1bFKi+Pq+Ix2ugtVtKZv5XW+cziiiHHocnntlvDt8S/pxYAI+y62UumZ9 j/rDFFKNDmLf+32kkjws0CaMbxSIA81GiJ65Im3XCYlLG+JzUQWcyvMHmseZiaLVfDT3 E5n3cdRyuzE4N8D07AXoNi1Nk16Y/HpCwaOU4yWaO6/wWFbtSY4tVzxjDUrs723JGaqu ISlZFXlQxFj7QVot+zDdTuRb31gb3LCH6tn0uSKXuRTTAYYHK+KpyyCqZ382hsHzVjrQ gNSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639820; x=1738244620; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k5TPerymEyq+fm1DV5BNdGK4w0VMxko7fkAE2X7tolA=; b=rKW72r5Jn0qI/14sCDINP5NTTuXLVi64CGuphpRr0BOhNqXj42hNoF2MzHQPpEWeh5 Ty0/BYqHmMjHDy6Lo1Z2/DlKDG8N37OfdGRg/QJwxsAhZDMiGy1HER4FPrt3hyX5CEzz hr4WYoMMZFU8EB63qb4lL7ANm6JdZ1A4sXZIF50TYOR/Wg20H+zT6voLVi5p9UaiMdSo gaMHYmhOi499f/kmNrmlwK53RkmmoX/en9Y5xa+MYoIkysQuWMCs0BgxmjsJ3+JOsUgX PKsWxtg9PGGLomyhbBVIIRA4FHrFLcC/Bd/GyWI/6dVXWjjhxxzy2rCblD4FRAgpz7lI /t8A== X-Gm-Message-State: AOJu0YxLUNH7nysPCC7a2kXtLUPozyM5J1D7JunRa4x+Vc1RYmcQT9gR ifLQ0/Nk582yW+sPqKzUi8liWUbm7f/HxNzg7NiI2oEUhQwXzwlr2ygvtA== X-Gm-Gg: ASbGncvOAD77BBkcyhSYoIHiXNBfFqbv4T0Q/rrpwbAA7RoER2EL8jLtaKcsAwtSXYV G6ykVDeeSEcAaymoGbkwLVWz7WTmdtYjj1qd7Om9duWe/GyL0IETQ9o0NT+OrWaI7N2P8OKJLc+ cnoCdOQ7eErEnYN80SXy5tYkoyLeaEPj5knvLcfZuhN8RY6Fezl6sJKc3dnK75JhdczTbYBWtRe 3Hlh4XlPo7xH1NT4ItIfRsaQlHVPnqUqRdBtROT9rnU3J4S7ME9qYmSR3CU7yiLuGQfeMSiYxsa gHI3kGzMLhSwrs+si8LBjIytutVD X-Google-Smtp-Source: AGHT+IEWyRXNHqR0+uqYJYpOZveu5eHR9O86Rzjm5+ybp1MyrPhqkiNO/opfWxRKsmI0+aWHiQNWdA== X-Received: by 2002:a05:600c:1d0f:b0:434:a0fd:95d0 with SMTP id 5b1f17b1804b1-43891440a8amr90611865e9.4.1737639819804; Thu, 23 Jan 2025 05:43:39 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:39 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com Subject: [PATCH 10/11] Avoid GCC 11 warning from -Wmaybe-uninitialized Date: Thu, 23 Jan 2025 14:43:06 +0100 Message-Id: <20250123134308.1785777-12-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Cherry-picked 4dad697124b3bc82d9f4fbad62f30224216ab996 from https://github.com/MIPS/glibc Signed-off-by: Chao-ying Fu Signed-off-by: Aleksandar Rakic --- sysdeps/ieee754/soft-fp/s_fdiv.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sysdeps/ieee754/soft-fp/s_fdiv.c b/sysdeps/ieee754/soft-fp/s_fdiv.c index 8c92aa6fb2..d02da4ca71 100644 --- a/sysdeps/ieee754/soft-fp/s_fdiv.c +++ b/sysdeps/ieee754/soft-fp/s_fdiv.c @@ -35,6 +35,7 @@ may be where the macro is defined. This happens only with -O1. */ DIAG_PUSH_NEEDS_COMMENT; DIAG_IGNORE_NEEDS_COMMENT (8, "-Wmaybe-uninitialized"); +DIAG_IGNORE_NEEDS_COMMENT (11, "-Wmaybe-uninitialized"); #include #include #include From patchwork Thu Jan 23 13:43:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105304 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CAB93858C66 for ; Thu, 23 Jan 2025 13:56:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9CAB93858C66 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=OM06UG/2 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by sourceware.org (Postfix) with ESMTPS id 049C53858427 for ; Thu, 23 Jan 2025 13:43:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 049C53858427 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 049C53858427 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::334 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639823; cv=none; b=M/bs05Tqp8LLRR9PENvIUW9a6vRJCbcjWij5V0M55d73KXlBnr9GWnT+yL8zIBYi7t22GNyzxPfK8HwNb5IT8ifQkrwTFGV35mgA+2S7/W7MDs7JWBhrVwvXSZDHhh/x/Ki4vFT+VM6/E3HVCeK4jICwq5U8GVYSQH7104MssZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639823; c=relaxed/simple; bh=/v494ncOGJf/uO4Npjv/Jr972paWxbuvQbXKKd/3Yn8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=R435ALwIgKhkl2j7ZGY6/f8w1v430HTFft3m2lUTMKTVekCBnpJLXn8eKaG/z1kPeE6EUsZ1L7THAjw5kjyD6mWDMkMqVsQla4IMI5nwh9mJDttLnatRdUPysbBrH/WiuikW7rU5kXTjquUqOthg/1WGSsqhAa7/LBf47VeUb6w= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 049C53858427 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-436246b1f9bso1021915e9.1 for ; Thu, 23 Jan 2025 05:43:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639821; x=1738244621; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c6ZqDPe4B6tYvQNgZjAm4EqetEF1Azza79AjlEA/kNg=; b=OM06UG/2XScaeoDoTsXceLGqWTQUw10fw2JrNPjGO7bB8Z2QQxrfbuq9L65CukqVw9 G00dyhh47H1eNBEXMuK1L2xl3rNPyNPH925Ty4v7BlaonnqCST7QuUKHEd2IfZjUht3x XvuuclXMW2RDF83Q3nhgzoiuZnCZLyMeR2RxFS5ioS2S3ZCv9iJLpb9WVtdJbK9huTSn Y5cTLf7ML2tfj1u9bN/qwGrSgmuUQij+K4HclF+BmMm3LzEEXDybuAtE5gnQLWhSyg6S JzU3LAg2iYsio5l1DqxVRWnsmsN7JC/F1TKLcd0fzhPu6mK/G8FlC7fk5oIYnvCaTFOk BykQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639821; x=1738244621; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c6ZqDPe4B6tYvQNgZjAm4EqetEF1Azza79AjlEA/kNg=; b=opSdQdZzxThCe8p+a1NcRdad42phDOG4JUF7Dp0bG/PtsYsGHik34npuUwU6HLUiDu W0xfo787elJQSLTYS9/C+m4yaIiG03ZevY8EMQOVNzO+S/rBkVaTlxxKm/z1CX+nMiQ8 QnXlEl/KEgg8O5+M782/WViMisaYzwI5Y3Szjc+ZqMuDth8DQhWylfaL+vx6r9ERaStZ eWQ0EBuvG/bB6qSsTWYWdMKy4WJBUZkl/8YzbbQlCGVo4HNanv5M9KkVUpF7R+S2qQFm Q1OF42b4xSdKGwYoC8QtPbFbevHSa7DVIC35TsRbJmAgiy7Vh+D77WDfrlRXTWninZ3S H+8A== X-Gm-Message-State: AOJu0Yw/YJP7GNW3cZuC+sa2RsTOYvjVvNR+Nyl84vB8iUMKal9wu9ZR Ifmdt9xlsjH9hdgMAP6p9F7nqEKRU3D7gVmKPJxI7rxz/nN5d13dpluRoQ== X-Gm-Gg: ASbGncvFGBlIn3ahzXUpqsYIzhyQimzI9B/c/VqJVgGOL7q7x350L8lYJxvHLvttAbQ oQwytFMaTg1AjuRCqVkVh71IJ4QU4BNBY0n3mbjglxG34nkfJeXt8uGRjA8qvn7+Jcq6n3Z9nTO 3yUN7Y7By4Mns/I7HNvbhGjiaC3H10C6Kspbk/+P0fc9Oa4DuGH2gB8VVabM89jcy8pGtra3TL8 cgH0GVasw3btxh25LD5GVb4hPDJ6MuTOwXDDbp40biJEoiB1e/aUNJelKCy7y57we+KOP13kD5i wjLVayY7a3kYUAsppPem9NZ5exEI X-Google-Smtp-Source: AGHT+IGSaMPkmv1ZgRnZkaPbAZ8WfxUrEsQ0c8CT85XkpV3Ufqgf8jtL40bwvBZAQnNAQcgngOIh6Q== X-Received: by 2002:a05:600c:cc8:b0:42c:aeee:e603 with SMTP id 5b1f17b1804b1-4389db13a7fmr91612895e9.7.1737639821260; Thu, 23 Jan 2025 05:43:41 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:40 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Dragan Mladjenovic Subject: [PATCH 11/11] Prevent turning memset into self-recursion Date: Thu, 23 Jan 2025 14:43:07 +0100 Message-Id: <20250123134308.1785777-13-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Prevent GCC 11 from turning memset into self-recursion. GCC11 transforms byte-by-byte set loop pattern in memset.c into a memset call, causing runtime failures. Apply -fno-builtin for both the memset.c and memcpy.c to prevent similar bugs in future. Cherry-picked 31906b3556bc18cfdb7a3d84a669d95486450704 from https://github.com/MIPS/glibc Signed-off-by: Dragan Mladjenovic Signed-off-by: Aleksandar Rakic --- sysdeps/mips/Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sysdeps/mips/Makefile b/sysdeps/mips/Makefile index 17ddc2a97c..4464d73902 100644 --- a/sysdeps/mips/Makefile +++ b/sysdeps/mips/Makefile @@ -24,6 +24,9 @@ ASFLAGS-.o += $(pie-default) ASFLAGS-.op += $(pie-default) ASFLAGS += -O2 +CFLAGS-memset.c += -fno-builtin +CFLAGS-memcpy.c += -fno-builtin + ifeq ($(subdir),elf) # These tests fail on all mips configurations (BZ 29404)