From patchwork Mon Jul 10 09:41:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: caiyinyu X-Patchwork-Id: 72411 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B7BC385AFAA for ; Mon, 10 Jul 2023 09:42:19 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 421F93858CDA for ; Mon, 10 Jul 2023 09:41:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 421F93858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.2.5.5]) by gateway (Coremail) with SMTP id _____8Ax1fBj0qtkABIDAA--.9013S3; Mon, 10 Jul 2023 17:41:55 +0800 (CST) Received: from 5.5.5 (unknown [10.2.5.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8CxriNg0qtkDxgnAA--.5242S5; Mon, 10 Jul 2023 17:41:53 +0800 (CST) From: caiyinyu To: libc-alpha@sourceware.org Cc: adhemerval.zanella@linaro.org, xry111@xry111.site, caiyinyu Subject: [PATCH 2/2] LoongArch: Add vector implementation for _dl_runtime_resolve. Date: Mon, 10 Jul 2023 17:41:51 +0800 Message-Id: <20230710094151.3002001-2-caiyinyu@loongson.cn> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230710094151.3002001-1-caiyinyu@loongson.cn> References: <20230710094151.3002001-1-caiyinyu@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8CxriNg0qtkDxgnAA--.5242S5 X-CM-SenderInfo: 5fdl5xhq1xqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoW3Kw47JryfWF48XF45XF15KFX_yoW8GrWDWo WrJF43Cw47Ka1xAw4UXwnIqrZ2qr40g3Z8tFWxA3WxCr4UCFWUGFyFv3WrWrnFyw18Wrsx A3y5tFykJ347Zrn8l-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUY17kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1j6r1xM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIEc7CjxVAF wI0_Gr1j6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI 0UMc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280 aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28Icx kI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2Iq xVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42 IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY 6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aV CY1x0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxUwmhFDUUUU X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" --- sysdeps/loongarch/dl-machine.h | 13 +- sysdeps/loongarch/dl-trampoline.S | 84 +++-------- sysdeps/loongarch/dl-trampoline.h | 131 ++++++++++++++++++ sysdeps/loongarch/ldsodefs.h | 1 + sysdeps/loongarch/sys/asm.h | 2 + sysdeps/loongarch/sys/regdef.h | 18 +++ .../unix/sysv/linux/loongarch/bits/hwcap.h | 37 +++++ .../unix/sysv/linux/loongarch/cpu-features.h | 29 ++++ 8 files changed, 246 insertions(+), 69 deletions(-) create mode 100644 sysdeps/loongarch/dl-trampoline.h create mode 100644 sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h create mode 100644 sysdeps/unix/sysv/linux/loongarch/cpu-features.h diff --git a/sysdeps/loongarch/dl-machine.h b/sysdeps/loongarch/dl-machine.h index e217d37c4b..02ce17852c 100644 --- a/sysdeps/loongarch/dl-machine.h +++ b/sysdeps/loongarch/dl-machine.h @@ -270,6 +270,10 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* If using PLTs, fill in the first two entries of .got.plt. */ if (l->l_info[DT_JMPREL]) { +#if HAVE_LOONGARCH_VEC_ASM + extern void _dl_runtime_resolve_lasx (void) attribute_hidden; + extern void _dl_runtime_resolve_lsx (void) attribute_hidden; +#endif extern void _dl_runtime_resolve (void) attribute_hidden; extern void _dl_runtime_profile (void) attribute_hidden; @@ -296,7 +300,14 @@ elf_machine_runtime_setup (struct link_map *l, struct r_scope_elem *scope[], /* This function will get called to fix up the GOT entry indicated by the offset on the stack, and then jump to the resolved address. */ - gotplt[0] = (ElfW (Addr)) & _dl_runtime_resolve; +#if HAVE_LOONGARCH_VEC_ASM + if (SUPPORT_LASX) + gotplt[0] = (ElfW(Addr)) &_dl_runtime_resolve_lasx; + else if (SUPPORT_LSX) + gotplt[0] = (ElfW(Addr)) &_dl_runtime_resolve_lsx; + else +#endif + gotplt[0] = (ElfW(Addr)) &_dl_runtime_resolve; } gotplt[1] = (ElfW (Addr)) l; } diff --git a/sysdeps/loongarch/dl-trampoline.S b/sysdeps/loongarch/dl-trampoline.S index ed9ec0901c..2a561b7136 100644 --- a/sysdeps/loongarch/dl-trampoline.S +++ b/sysdeps/loongarch/dl-trampoline.S @@ -19,77 +19,25 @@ #include #include -#include "dl-link.h" - -/* Assembler veneer called from the PLT header code for lazy loading. - The PLT header passes its own args in t0-t2. */ -#ifdef __loongarch_soft_float -#define FRAME_SIZE (-((-10 * SZREG) & ALMASK)) -#else -#define FRAME_SIZE (-((-10 * SZREG - 8 * SZFREG) & ALMASK)) +#if HAVE_LOONGARCH_VEC_ASM +#define USE_LASX +#define _dl_runtime_resolve _dl_runtime_resolve_lasx +#include "dl-trampoline.h" +#undef FRAME_SIZE +#undef USE_LASX +#undef _dl_runtime_resolve + +#define USE_LSX +#define _dl_runtime_resolve _dl_runtime_resolve_lsx +#include "dl-trampoline.h" +#undef FRAME_SIZE +#undef USE_LSX +#undef _dl_runtime_resolve #endif -ENTRY (_dl_runtime_resolve) - - /* Save arguments to stack. */ - ADDI sp, sp, -FRAME_SIZE - REG_S ra, sp, 9*SZREG - REG_S a0, sp, 1*SZREG - REG_S a1, sp, 2*SZREG - REG_S a2, sp, 3*SZREG - REG_S a3, sp, 4*SZREG - REG_S a4, sp, 5*SZREG - REG_S a5, sp, 6*SZREG - REG_S a6, sp, 7*SZREG - REG_S a7, sp, 8*SZREG - -#ifndef __loongarch_soft_float - FREG_S fa0, sp, 10*SZREG + 0*SZFREG - FREG_S fa1, sp, 10*SZREG + 1*SZFREG - FREG_S fa2, sp, 10*SZREG + 2*SZFREG - FREG_S fa3, sp, 10*SZREG + 3*SZFREG - FREG_S fa4, sp, 10*SZREG + 4*SZFREG - FREG_S fa5, sp, 10*SZREG + 5*SZFREG - FREG_S fa6, sp, 10*SZREG + 6*SZFREG - FREG_S fa7, sp, 10*SZREG + 7*SZFREG -#endif - - /* Update .got.plt and obtain runtime address of callee */ - SLLI a1, t1, 1 - or a0, t0, zero - ADD a1, a1, t1 - la a2, _dl_fixup - jirl ra, a2, 0 - or t1, v0, zero - - /* Restore arguments from stack. */ - REG_L ra, sp, 9*SZREG - REG_L a0, sp, 1*SZREG - REG_L a1, sp, 2*SZREG - REG_L a2, sp, 3*SZREG - REG_L a3, sp, 4*SZREG - REG_L a4, sp, 5*SZREG - REG_L a5, sp, 6*SZREG - REG_L a6, sp, 7*SZREG - REG_L a7, sp, 8*SZREG - -#ifndef __loongarch_soft_float - FREG_L fa0, sp, 10*SZREG + 0*SZFREG - FREG_L fa1, sp, 10*SZREG + 1*SZFREG - FREG_L fa2, sp, 10*SZREG + 2*SZFREG - FREG_L fa3, sp, 10*SZREG + 3*SZFREG - FREG_L fa4, sp, 10*SZREG + 4*SZFREG - FREG_L fa5, sp, 10*SZREG + 5*SZFREG - FREG_L fa6, sp, 10*SZREG + 6*SZFREG - FREG_L fa7, sp, 10*SZREG + 7*SZFREG -#endif - - ADDI sp, sp, FRAME_SIZE - - /* Invoke the callee. */ - jirl zero, t1, 0 -END (_dl_runtime_resolve) +#include "dl-trampoline.h" +#include "dl-link.h" ENTRY (_dl_runtime_profile) /* LoongArch we get called with: diff --git a/sysdeps/loongarch/dl-trampoline.h b/sysdeps/loongarch/dl-trampoline.h new file mode 100644 index 0000000000..d2833488df --- /dev/null +++ b/sysdeps/loongarch/dl-trampoline.h @@ -0,0 +1,131 @@ +/* PLT trampolines. + Copyright (C) 2022-2023 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +/* Assembler veneer called from the PLT header code for lazy loading. + The PLT header passes its own args in t0-t2. */ +#ifndef __loongarch_soft_float +# ifdef USE_LASX +# define FRAME_SIZE (-((-9 * SZREG - 8 * SZFREG - 8 * SZXREG) & ALMASK)) +# elif defined USE_LSX +# define FRAME_SIZE (-((-9 * SZREG - 8 * SZFREG - 8 * SZVREG) & ALMASK)) +# else +# define FRAME_SIZE (-((-9 * SZREG - 8 * SZFREG) & ALMASK)) +# endif +#else +# define FRAME_SIZE (-((-9 * SZREG) & ALMASK)) +#endif + +ENTRY (_dl_runtime_resolve) + + /* Save arguments to stack. */ + ADDI sp, sp, -FRAME_SIZE + + REG_S ra, sp, 0*SZREG + REG_S a0, sp, 1*SZREG + REG_S a1, sp, 2*SZREG + REG_S a2, sp, 3*SZREG + REG_S a3, sp, 4*SZREG + REG_S a4, sp, 5*SZREG + REG_S a5, sp, 6*SZREG + REG_S a6, sp, 7*SZREG + REG_S a7, sp, 8*SZREG + +#ifndef __loongarch_soft_float + FREG_S fa0, sp, 9*SZREG + 0*SZFREG + FREG_S fa1, sp, 9*SZREG + 1*SZFREG + FREG_S fa2, sp, 9*SZREG + 2*SZFREG + FREG_S fa3, sp, 9*SZREG + 3*SZFREG + FREG_S fa4, sp, 9*SZREG + 4*SZFREG + FREG_S fa5, sp, 9*SZREG + 5*SZFREG + FREG_S fa6, sp, 9*SZREG + 6*SZFREG + FREG_S fa7, sp, 9*SZREG + 7*SZFREG +#ifdef USE_LASX + xvst xr0, sp, 9*SZREG + 8*SZFREG + 0*SZXREG + xvst xr1, sp, 9*SZREG + 8*SZFREG + 1*SZXREG + xvst xr2, sp, 9*SZREG + 8*SZFREG + 2*SZXREG + xvst xr3, sp, 9*SZREG + 8*SZFREG + 3*SZXREG + xvst xr4, sp, 9*SZREG + 8*SZFREG + 4*SZXREG + xvst xr5, sp, 9*SZREG + 8*SZFREG + 5*SZXREG + xvst xr6, sp, 9*SZREG + 8*SZFREG + 6*SZXREG + xvst xr7, sp, 9*SZREG + 8*SZFREG + 7*SZXREG +#elif defined USE_LSX + vst vr0, sp, 9*SZREG + 8*SZFREG + 0*SZVREG + vst vr1, sp, 9*SZREG + 8*SZFREG + 1*SZVREG + vst vr2, sp, 9*SZREG + 8*SZFREG + 2*SZVREG + vst vr3, sp, 9*SZREG + 8*SZFREG + 3*SZVREG + vst vr4, sp, 9*SZREG + 8*SZFREG + 4*SZVREG + vst vr5, sp, 9*SZREG + 8*SZFREG + 5*SZVREG + vst vr6, sp, 9*SZREG + 8*SZFREG + 6*SZVREG + vst vr7, sp, 9*SZREG + 8*SZFREG + 7*SZVREG +#endif +#endif + + /* Update .got.plt and obtain runtime address of callee */ + SLLI a1, t1, 1 + or a0, t0, zero + ADD a1, a1, t1 + la a2, _dl_fixup + jirl ra, a2, 0 + or t1, v0, zero + + /* Restore arguments from stack. */ + REG_L ra, sp, 0*SZREG + REG_L a0, sp, 1*SZREG + REG_L a1, sp, 2*SZREG + REG_L a2, sp, 3*SZREG + REG_L a3, sp, 4*SZREG + REG_L a4, sp, 5*SZREG + REG_L a5, sp, 6*SZREG + REG_L a6, sp, 7*SZREG + REG_L a7, sp, 8*SZREG + +#ifndef __loongarch_soft_float + FREG_L fa0, sp, 9*SZREG + 0*SZFREG + FREG_L fa1, sp, 9*SZREG + 1*SZFREG + FREG_L fa2, sp, 9*SZREG + 2*SZFREG + FREG_L fa3, sp, 9*SZREG + 3*SZFREG + FREG_L fa4, sp, 9*SZREG + 4*SZFREG + FREG_L fa5, sp, 9*SZREG + 5*SZFREG + FREG_L fa6, sp, 9*SZREG + 6*SZFREG + FREG_L fa7, sp, 9*SZREG + 7*SZFREG +#ifdef USE_LASX + xvld xr0, sp, 9*SZREG + 8*SZFREG + 0*SZXREG + xvld xr1, sp, 9*SZREG + 8*SZFREG + 1*SZXREG + xvld xr2, sp, 9*SZREG + 8*SZFREG + 2*SZXREG + xvld xr3, sp, 9*SZREG + 8*SZFREG + 3*SZXREG + xvld xr4, sp, 9*SZREG + 8*SZFREG + 4*SZXREG + xvld xr5, sp, 9*SZREG + 8*SZFREG + 5*SZXREG + xvld xr6, sp, 9*SZREG + 8*SZFREG + 6*SZXREG + xvld xr7, sp, 9*SZREG + 8*SZFREG + 7*SZXREG +#elif defined USE_LSX + vld vr0, sp, 9*SZREG + 8*SZFREG + 0*SZVREG + vld vr1, sp, 9*SZREG + 8*SZFREG + 1*SZVREG + vld vr2, sp, 9*SZREG + 8*SZFREG + 2*SZVREG + vld vr3, sp, 9*SZREG + 8*SZFREG + 3*SZVREG + vld vr4, sp, 9*SZREG + 8*SZFREG + 4*SZVREG + vld vr5, sp, 9*SZREG + 8*SZFREG + 5*SZVREG + vld vr6, sp, 9*SZREG + 8*SZFREG + 6*SZVREG + vld vr7, sp, 9*SZREG + 8*SZFREG + 7*SZVREG +#endif +#endif + + ADDI sp, sp, FRAME_SIZE + + /* Invoke the callee. */ + jirl zero, t1, 0 +END (_dl_runtime_resolve) diff --git a/sysdeps/loongarch/ldsodefs.h b/sysdeps/loongarch/ldsodefs.h index a8ef803aec..3b7c4ab83d 100644 --- a/sysdeps/loongarch/ldsodefs.h +++ b/sysdeps/loongarch/ldsodefs.h @@ -20,6 +20,7 @@ #define _LOONGARCH_LDSODEFS_H 1 #include +#include struct La_loongarch_regs; struct La_loongarch_retval; diff --git a/sysdeps/loongarch/sys/asm.h b/sysdeps/loongarch/sys/asm.h index 0bb430bb05..d1a279b8fb 100644 --- a/sysdeps/loongarch/sys/asm.h +++ b/sysdeps/loongarch/sys/asm.h @@ -25,6 +25,8 @@ /* Macros to handle different pointer/register sizes for 32/64-bit code. */ #define SZREG 8 #define SZFREG 8 +#define SZVREG 16 +#define SZXREG 32 #define REG_L ld.d #define REG_S st.d #define SRLI srli.d diff --git a/sysdeps/loongarch/sys/regdef.h b/sysdeps/loongarch/sys/regdef.h index 91810f5e8e..5100f36d24 100644 --- a/sysdeps/loongarch/sys/regdef.h +++ b/sysdeps/loongarch/sys/regdef.h @@ -90,4 +90,22 @@ #define fs6 $f30 #define fs7 $f31 +#define vr0 $vr0 +#define vr1 $vr1 +#define vr2 $vr2 +#define vr3 $vr3 +#define vr4 $vr4 +#define vr5 $vr5 +#define vr6 $vr6 +#define vr7 $vr7 + +#define xr0 $xr0 +#define xr1 $xr1 +#define xr2 $xr2 +#define xr3 $xr3 +#define xr4 $xr4 +#define xr5 $xr5 +#define xr6 $xr6 +#define xr7 $xr7 + #endif /* _SYS_REGDEF_H */ diff --git a/sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h b/sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h new file mode 100644 index 0000000000..5104b69cbc --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/bits/hwcap.h @@ -0,0 +1,37 @@ +/* Defines for bits in AT_HWCAP. LoongArch64 Linux version. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if !defined (_SYS_AUXV_H) +# error "Never include directly; use instead." +#endif + +/* The following must match the kernel's . */ +/* HWCAP flags */ +#define HWCAP_LOONGARCH_CPUCFG (1 << 0) +#define HWCAP_LOONGARCH_LAM (1 << 1) +#define HWCAP_LOONGARCH_UAL (1 << 2) +#define HWCAP_LOONGARCH_FPU (1 << 3) +#define HWCAP_LOONGARCH_LSX (1 << 4) +#define HWCAP_LOONGARCH_LASX (1 << 5) +#define HWCAP_LOONGARCH_CRC32 (1 << 6) +#define HWCAP_LOONGARCH_COMPLEX (1 << 7) +#define HWCAP_LOONGARCH_CRYPTO (1 << 8) +#define HWCAP_LOONGARCH_LVZ (1 << 9) +#define HWCAP_LOONGARCH_LBT_X86 (1 << 10) +#define HWCAP_LOONGARCH_LBT_ARM (1 << 11) +#define HWCAP_LOONGARCH_LBT_MIPS (1 << 12) diff --git a/sysdeps/unix/sysv/linux/loongarch/cpu-features.h b/sysdeps/unix/sysv/linux/loongarch/cpu-features.h new file mode 100644 index 0000000000..e371e13b15 --- /dev/null +++ b/sysdeps/unix/sysv/linux/loongarch/cpu-features.h @@ -0,0 +1,29 @@ +/* Initialize CPU feature data. LoongArch64 version. + This file is part of the GNU C Library. + Copyright (C) 2022 Free Software Foundation, Inc. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _CPU_FEATURES_LOONGARCH64_H +#define _CPU_FEATURES_LOONGARCH64_H + +#include + +#define SUPPORT_UAL (GLRO (dl_hwcap) & HWCAP_LOONGARCH_UAL) +#define SUPPORT_LSX (GLRO (dl_hwcap) & HWCAP_LOONGARCH_LSX) +#define SUPPORT_LASX (GLRO (dl_hwcap) & HWCAP_LOONGARCH_LASX) + +#endif /* _CPU_FEATURES_LOONGARCH64_H */ +