From patchwork Wed Aug 18 14:19:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44693 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1838939874CD for ; Wed, 18 Aug 2021 14:22:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1838939874CD DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629296568; bh=AeC/pPeKInw5MZwcOfwSSoVZy5OtDK/t339aZyoB9w0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=jJG+hQ0XE2/oL4P6KnEgHvjKNAMByL9HEwLUTyDscrDvERA9lA4hQXDfQ6F9utOO2 U6bJEWw+6n23lMoPWhuaXpI6pgk2Qq0Oc0M0gEuywgdsBN9JqLpRFONOmTS+TlSx3p SWwNCoSIBW+8uoDQR5C2iCc6Euf6GNvc8mUscrak= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 5730F3986423 for ; Wed, 18 Aug 2021 14:20:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5730F3986423 Received: by mail-pf1-x433.google.com with SMTP id i133so2283006pfe.12 for ; Wed, 18 Aug 2021 07:20:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AeC/pPeKInw5MZwcOfwSSoVZy5OtDK/t339aZyoB9w0=; b=F6lq7aVUktPj5XaV7/BUpxBTNvGnpnBMn3aR0wR92CC37Bu73EOLEDkcdAunHJYcvC JQD8RSuTzOxjh9bMU1jGbnQ3jf1dl5KgHxzFn9FYSLnBl/2cWTRKX9okurk8vtLl/sOh gvnIcvcahRBtSqEJJd2NVaxe61FEtu5LLXFqiKeBbL/0nCbfEFSICWk3cOMqUV3Nkbwt +WoKAYLUveCCiq892dsccSw+x2qEqH7bYwmfo52t83nNo0jFOQ5zWfrTUIL81z803sVA w03LZXL9pCHHxCgDW1b54s3tQTOHfYh1ugmPLGvcnLqom0aE4dmj/mCxpEMiR+omln3C vQWA== X-Gm-Message-State: AOAM530Yro+iU3ac/8RMtO1VzfrZmJgAqiBb//7ySHS+Mo6kWyUSMFQD P9YtvDFowRNQ//zexwxU9+w5u1+z381eqA== X-Google-Smtp-Source: ABdhPJyeAsjJzhyjzO4kzPEZPB8N/LSBeMKDaBbuJwjQvsCGhote5Tys+qTcFbs6YmGziURxvg/4NQ== X-Received: by 2002:a65:6717:: with SMTP id u23mr9287855pgf.28.1629296406247; Wed, 18 Aug 2021 07:20:06 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:8c0a:d250:6dae:d807]) by smtp.gmail.com with ESMTPSA id c133sm6805015pfb.39.2021.08.18.07.20.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 07:20:05 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Date: Wed, 18 Aug 2021 11:19:57 -0300 Message-Id: <20210818142000.128752-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818142000.128752-1-adhemerval.zanella@linaro.org> References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Linux Transparent Huge Pages (THP) current support three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous memory. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.thp_madvise', where setting to a value diffent than 0 enables the madvise call. Linux current only support one page size for THP, even if the architecture supports multiple sizes. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never". Checked on x86_64-linux-gnu. --- NEWS | 5 +- elf/dl-tunables.list | 5 ++ elf/tst-rtld-list-tunables.exp | 1 + malloc/arena.c | 5 ++ malloc/malloc-internal.h | 1 + malloc/malloc.c | 48 ++++++++++++++ manual/tunables.texi | 9 +++ sysdeps/generic/Makefile | 8 +++ sysdeps/generic/malloc-hugepages.c | 31 +++++++++ sysdeps/generic/malloc-hugepages.h | 37 +++++++++++ sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++ 11 files changed, 225 insertions(+), 1 deletion(-) create mode 100644 sysdeps/generic/malloc-hugepages.c create mode 100644 sysdeps/generic/malloc-hugepages.h create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c diff --git a/NEWS b/NEWS index 79c895e382..9b2345d08c 100644 --- a/NEWS +++ b/NEWS @@ -9,7 +9,10 @@ Version 2.35 Major new features: - [Add new features here] +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to + make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. + It might improve performance with Transparent Huge Pages madvise mode + depending of the workload. Deprecated and removed features, and other changes affecting compatibility: diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 8ddd4a2314..67df6dbc2c 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -92,6 +92,11 @@ glibc { minval: 0 security_level: SXID_IGNORE } + thp_madvise { + type: INT_32 + minval: 0 + maxval: 1 + } } cpu { hwcap_mask { diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp index 9f66c52885..d8109fa31c 100644 --- a/elf/tst-rtld-list-tunables.exp +++ b/elf/tst-rtld-list-tunables.exp @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255) glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+) +glibc.malloc.thp_madvise: 0 (min: 0, max: 1) glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+) glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10) diff --git a/malloc/arena.c b/malloc/arena.c index 667484630e..81bff54303 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t) TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t) #endif TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t) +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t) #else /* Initialization routine. */ #include @@ -331,6 +332,7 @@ ptmalloc_init (void) TUNABLE_CALLBACK (set_tcache_unsorted_limit)); # endif TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast)); + TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise)); #else if (__glibc_likely (_environ != NULL)) { @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad) __munmap (p2, HEAP_MAX_SIZE); return 0; } + + sysmadvise_thp (p2, size); + h = (heap_info *) p2; h->size = size; h->mprotect_size = size; diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h index 0c7b5a183c..7493e34d86 100644 --- a/malloc/malloc-internal.h +++ b/malloc/malloc-internal.h @@ -22,6 +22,7 @@ #include #include #include +#include /* Called in the parent process before a fork. */ void __malloc_fork_lock_parent (void) attribute_hidden; diff --git a/malloc/malloc.c b/malloc/malloc.c index e065785af7..ad3eec41ac 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1881,6 +1881,11 @@ struct malloc_par INTERNAL_SIZE_T arena_test; INTERNAL_SIZE_T arena_max; +#if HAVE_TUNABLES + /* Transparent Large Page support. */ + INTERNAL_SIZE_T thp_pagesize; +#endif + /* Memory map support */ int n_mmaps; int n_mmaps_max; @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n) #include +/* ----------- Routines dealing with transparent huge pages ----------- */ + +static inline void +sysmadvise_thp (void *p, INTERNAL_SIZE_T size) +{ +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + /* Do not consider areas smaller than a huge page or if the tunable is + not active. */ + if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) + return; + __madvise (p, size, MADV_HUGEPAGE); +#endif +} + /* ------------------- Support for multiple arenas -------------------- */ #include "arena.c" @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (mm != MAP_FAILED) { + sysmadvise_thp (mm, size); + /* The offset to the start of the mmapped region is stored in the prev_size field of the chunk. This allows us to adjust @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (size > 0) { brk = (char *) (MORECORE (size)); + if (brk != (char *) (MORECORE_FAILURE)) + sysmadvise_thp (brk, size); LIBC_PROBE (memory_sbrk_more, 2, brk, size); } @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (mbrk != MAP_FAILED) { + sysmadvise_thp (mbrk, size); + /* We do not need, and cannot use, another sbrk call to find end */ brk = mbrk; snd_brk = brk + size; @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) correction = 0; snd_brk = (char *) (MORECORE (0)); } + else + sysmadvise_thp (snd_brk, correction); } /* handle non-contiguous cases */ @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size) if (cp == MAP_FAILED) return 0; + sysmadvise_thp (cp, new_size); + p = (mchunkptr) (cp + offset); assert (aligned_OK (chunk2mem (p))); @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value) return 0; } +#if HAVE_TUNABLES +static __always_inline int +do_set_thp_madvise (int32_t value) +{ + if (value > 0) + { + enum malloc_thp_mode_t thp_mode = __malloc_thp_mode (); + /* + Only enables THP usage is system does support it and has at least + always or madvise mode. Otherwise the madvise() call is wasteful. + */ + if (thp_mode != malloc_thp_mode_not_supported + && thp_mode != malloc_thp_mode_never) + mp_.thp_pagesize = __malloc_default_thp_pagesize (); + } + return 0; +} +#endif + int __libc_mallopt (int param_number, int value) { diff --git a/manual/tunables.texi b/manual/tunables.texi index 658547c613..93c46807f9 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size passed to @code{malloc} for the largest bin size to enable. @end deftp +@deftp Tunable glibc.malloc.thp_madivse +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after +the system allocator allocated memory through @code{mmap} if the system supports +Transparent Huge Page (currently only Linux). + +The default value of this tunable is @code{0}, which disable its usage. +Setting to a positive value enable the @code{madvise} call. +@end deftp + @node Dynamic Linking Tunables @section Dynamic Linking Tunables @cindex dynamic linking tunables diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile index a209e85cc4..8eef83c94d 100644 --- a/sysdeps/generic/Makefile +++ b/sysdeps/generic/Makefile @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe shared-only-routines += framestate unwind-pe endif endif + +ifeq ($(subdir),malloc) +sysdep_malloc_debug_routines += malloc-hugepages +endif + +ifeq ($(subdir),misc) +sysdep_routines += malloc-hugepages +endif diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c new file mode 100644 index 0000000000..262bcdbeb8 --- /dev/null +++ b/sysdeps/generic/malloc-hugepages.c @@ -0,0 +1,31 @@ +/* Huge Page support. Generic implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#include + +size_t +__malloc_default_thp_pagesize (void) +{ + return 0; +} + +enum malloc_thp_mode_t +__malloc_thp_mode (void) +{ + return malloc_thp_mode_not_supported; +} diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h new file mode 100644 index 0000000000..664cda9b67 --- /dev/null +++ b/sysdeps/generic/malloc-hugepages.h @@ -0,0 +1,37 @@ +/* Malloc huge page support. Generic implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#ifndef _MALLOC_HUGEPAGES_H +#define _MALLOC_HUGEPAGES_H + +#include + +/* Return the default transparent huge page size. */ +size_t __malloc_default_thp_pagesize (void) attribute_hidden; + +enum malloc_thp_mode_t +{ + malloc_thp_mode_always, + malloc_thp_mode_madvise, + malloc_thp_mode_never, + malloc_thp_mode_not_supported +}; + +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; + +#endif /* _MALLOC_HUGEPAGES_H */ diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c new file mode 100644 index 0000000000..66589127cd --- /dev/null +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -0,0 +1,76 @@ +/* Huge Page support. Linux implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#include +#include +#include + +size_t +__malloc_default_thp_pagesize (void) +{ + int fd = __open64_nocancel ( + "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); + if (fd == -1) + return 0; + + + char str[INT_BUFSIZE_BOUND (size_t)]; + ssize_t s = __read_nocancel (fd, str, sizeof (str)); + __close_nocancel (fd); + + if (s < 0) + return 0; + + int r = 0; + for (ssize_t i = 0; i < s; i++) + { + if (str[i] == '\n') + break; + r *= 10; + r += str[i] - '0'; + } + return r; +} + +enum malloc_thp_mode_t +__malloc_thp_mode (void) +{ + int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", + O_RDONLY); + if (fd == -1) + return malloc_thp_mode_not_supported; + + static const char mode_always[] = "[always] madvise never\n"; + static const char mode_madvise[] = "always [madvise] never\n"; + static const char mode_never[] = "always madvise [never]\n"; + + char str[sizeof(mode_always)]; + ssize_t s = __read_nocancel (fd, str, sizeof (str)); + __close_nocancel (fd); + + if (s == sizeof (mode_always) - 1) + { + if (strcmp (str, mode_always) == 0) + return malloc_thp_mode_always; + else if (strcmp (str, mode_madvise) == 0) + return malloc_thp_mode_madvise; + else if (strcmp (str, mode_never) == 0) + return malloc_thp_mode_never; + } + return malloc_thp_mode_not_supported; +} From patchwork Wed Aug 18 14:19:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44691 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A36553890421 for ; Wed, 18 Aug 2021 14:21:22 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A36553890421 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629296482; bh=jcM+C6JZQ1A3uo9l1MlqaP46WTLcRHVMqwYvbBGqhLU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xB1BvSkI+Tmj18Hs6kFhQoOC6An/47pxGIPEMmNR0tURGIraoYJoAa5feEBzs9QUQ PVWsR/ABYps77IfKBUPFVnTh4iLjND6uO030zEXpv9eQiSdpjvFNyD1lRIJ8t15b0q pYsvhT4uqrAlkB6uELg8UFH1NAIRBOPSgTgUSuvU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by sourceware.org (Postfix) with ESMTPS id 3DE083986430 for ; Wed, 18 Aug 2021 14:20:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3DE083986430 Received: by mail-pf1-x434.google.com with SMTP id y11so2283514pfl.13 for ; Wed, 18 Aug 2021 07:20:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jcM+C6JZQ1A3uo9l1MlqaP46WTLcRHVMqwYvbBGqhLU=; b=IGIcNskkstBe6WfVfZZYFxauG3nnehxk5/n6crtu4kF3GDm7hTe69FJZyvRGqWLast sqeTCngxMcFmZAlJgKwI2Zx4RKdT2RsMQ2p9Upmijefnz7V6SwfSrHSyfr79FgSFqSWS a/VboHkXiWCZ5Z7xe7/pzMpFqtWiosSbZ6J5iezoYYaL1y0WOul7k3MhBf/9QTRb7tFZ 23pcDjnyvhub6oD+yTfHi6X6wniHIgSqYyskP/9BwkTUT6ZdMd1MgsCeslqmiSx4M9Y1 BKNTzirWOUXBTI8Axu8QhsR9PZEee7YRVmaZ5IUJqkMGyryXkzepRT22c+WmLlbMjyv0 vXrQ== X-Gm-Message-State: AOAM533nP9jf7lr0hQC+wiYDEEYflsBFH0Z/7vAFwNc/O8H5wrqp8RwN plX1eIk4ifXcrD+awykwTZLH2wKLynt7AQ== X-Google-Smtp-Source: ABdhPJzitzoOgboIaNHJhlG3s7KdM4HTe1QQ8E6Xb6VFFV9u6+Z+kKUJDPJSwydFzWFtZXOorAR3Tg== X-Received: by 2002:aa7:9056:0:b0:3e3:332:c8b7 with SMTP id n22-20020aa79056000000b003e30332c8b7mr1593898pfo.69.1629296407955; Wed, 18 Aug 2021 07:20:07 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:8c0a:d250:6dae:d807]) by smtp.gmail.com with ESMTPSA id c133sm6805015pfb.39.2021.08.18.07.20.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 07:20:07 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk Date: Wed, 18 Aug 2021 11:19:58 -0300 Message-Id: <20210818142000.128752-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818142000.128752-1-adhemerval.zanella@linaro.org> References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" For the main arena, the sbrk() might the preferable syscall instead of mmap(). And the granularity used when increasing the program segment is the default page size. To increase effectiveness with Transparent Huge Page with madvise, the large page size is use instead. This is enabled with the new tunable 'glibc.malloc.thp_pagesize'. Checked on x86_64-linux-gnu. --- include/libc-pointer-arith.h | 10 ++++++++++ malloc/malloc.c | 35 ++++++++++++++++++++++++++++++----- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h index 04ba537617..f592cbafec 100644 --- a/include/libc-pointer-arith.h +++ b/include/libc-pointer-arith.h @@ -37,6 +37,16 @@ /* Cast an integer or a pointer VAL to integer with proper type. */ # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val)) +/* Check if SIZE is aligned on SIZE */ +#define IS_ALIGNED(base, size) \ + (((base) & (size - 1)) == 0) + +#define PTR_IS_ALIGNED(base, size) \ + ((((uintptr_t) (base)) & (size - 1)) == 0) + +#define PTR_DIFF(p1, p2) \ + ((ptrdiff_t)((uintptr_t)(p1) - (uintptr_t)(p2))) + /* Cast an integer VAL to void * pointer. */ # define cast_to_pointer(val) ((void *) (uintptr_t) (val)) diff --git a/malloc/malloc.c b/malloc/malloc.c index ad3eec41ac..1a2c798a35 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2024,6 +2024,17 @@ sysmadvise_thp (void *p, INTERNAL_SIZE_T size) not active. */ if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) return; + + /* madvise() requires at least the input to be aligned to system page and + MADV_HUGEPAGE should handle unaligned address. Also unaligned inputs + should happen only for the initial data segment. */ + if (__glibc_unlikely (!PTR_IS_ALIGNED (p, GLRO (dl_pagesize)))) + { + void *q = PTR_ALIGN_DOWN (p, GLRO (dl_pagesize)); + size += PTR_DIFF (p, q); + p = q; + } + __madvise (p, size, MADV_HUGEPAGE); #endif } @@ -2610,14 +2621,25 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) size -= old_size; /* - Round to a multiple of page size. + Round to a multiple of page size or huge page size. If MORECORE is not contiguous, this ensures that we only call it with whole-page arguments. And if MORECORE is contiguous and this is not first time through, this preserves page-alignment of previous calls. Otherwise, we correct to page-align below. */ - size = ALIGN_UP (size, pagesize); +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + /* Defined in brk.c. */ + extern void *__curbrk; + if (mp_.thp_pagesize != 0) + { + uintptr_t top = ALIGN_UP ((uintptr_t) __curbrk + size, + mp_.thp_pagesize); + size = top - (uintptr_t) __curbrk; + } + else +#endif + size = ALIGN_UP (size, GLRO(dl_pagesize)); /* Don't try to call MORECORE if argument is so big as to appear @@ -2900,10 +2922,8 @@ systrim (size_t pad, mstate av) long released; /* Amount actually released */ char *current_brk; /* address returned by pre-check sbrk call */ char *new_brk; /* address returned by post-check sbrk call */ - size_t pagesize; long top_area; - pagesize = GLRO (dl_pagesize); top_size = chunksize (av->top); top_area = top_size - MINSIZE - 1; @@ -2911,7 +2931,12 @@ systrim (size_t pad, mstate av) return 0; /* Release in pagesize units and round down to the nearest page. */ - extra = ALIGN_DOWN(top_area - pad, pagesize); +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + if (mp_.thp_pagesize != 0) + extra = ALIGN_DOWN (top_area - pad, mp_.thp_pagesize); + else +#endif + extra = ALIGN_DOWN (top_area - pad, GLRO(dl_pagesize)); if (extra == 0) return 0; From patchwork Wed Aug 18 14:19:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44692 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2A46E3986407 for ; Wed, 18 Aug 2021 14:22:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2A46E3986407 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629296525; bh=kU8Jnx8CXfZO+RRmTfFFfrSJvcNr/nzYWm18XvJeKm0=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=joEMTWuIU/1oRigDIxCUufOSvmdX1ZvWf3aU6OAcOh3YgnHc4aKQEOQdijSQEM3Es KDkIEzfkX/7Qvb1tnedR5IU+bNORBUPDJXuqipJxGxRYW9q7kYJ8gYtZYzw2s16ObK sApvSqlDRvLUYA3s8UI9etLUCqr3AODFGJBaOpSU= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by sourceware.org (Postfix) with ESMTPS id C9F633986434 for ; Wed, 18 Aug 2021 14:20:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9F633986434 Received: by mail-pf1-x436.google.com with SMTP id y190so2280488pfg.7 for ; Wed, 18 Aug 2021 07:20:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kU8Jnx8CXfZO+RRmTfFFfrSJvcNr/nzYWm18XvJeKm0=; b=MoULgeLJ76ccMnMB5uwoUubQUjvzf7fhroj/lQjF3/0ViThNWffVh4cNKCJIr9A8rO FUy6Q+6WwnINlol5EHYScnZmE2ManQSIOPfmdogXhczbFOIW33bb/yTNejRDS9YkVX1r 4Ki0LlE88EklTWfOnDGW71wuk6nVMJ61VpLNwny78CxbGPN3/LUOoG9IKBl0Fb9XtQEp 5XPhWSCLVFfJXf7Nf04JSMerI991bifROYz6j3g3CXWGQ+rM/kKRifGoMi+Xo9RL3ZKv Jta/W04UWntC45Y2w0u2Y/skqc5cFaawVM5uilwuWWnL0kxFyc5RfjkY8iY2R0YSg7K6 l9Rg== X-Gm-Message-State: AOAM530eJILiEWq5L8jW7flllUnxALLGUPWR9asBdexqLZpFoYg2ydPT GbE+5MGZ1uI2bkm9VZhnYP1ThCysiUkCeA== X-Google-Smtp-Source: ABdhPJzeeYYVaX9kGSpgHMpDrsElq2fbD7gsXw0UoF9dDUAvRnrxDNJSQs/+btjqczzAScflntqpnw== X-Received: by 2002:a65:448a:: with SMTP id l10mr9091771pgq.313.1629296409692; Wed, 18 Aug 2021 07:20:09 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:8c0a:d250:6dae:d807]) by smtp.gmail.com with ESMTPSA id c133sm6805015pfb.39.2021.08.18.07.20.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 07:20:09 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 3/4] malloc: Move mmap logic to its own function Date: Wed, 18 Aug 2021 11:19:59 -0300 Message-Id: <20210818142000.128752-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818142000.128752-1-adhemerval.zanella@linaro.org> References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" So it can be used with different pagesize and flags. --- malloc/malloc.c | 155 +++++++++++++++++++++++++----------------------- 1 file changed, 82 insertions(+), 73 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index 1a2c798a35..4bfcea286f 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2414,6 +2414,85 @@ do_check_malloc_state (mstate av) be extended or replaced. */ +static void * +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) +{ + long int size; + + /* + Round up size to nearest page. For mmapped chunks, the overhead is one + SIZE_SZ unit larger than for normal chunks, because there is no + following chunk whose prev_size field could be used. + + See the front_misalign handling below, for glibc there is no need for + further alignments unless we have have high alignment. + */ + if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) + size = ALIGN_UP (nb + SIZE_SZ, pagesize); + else + size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize); + + /* Don't try if size wraps around 0. */ + if ((unsigned long) (size) <= (unsigned long) (nb)) + return MAP_FAILED; + + char *mm = (char *) MMAP (0, size, + mtag_mmap_flags | PROT_READ | PROT_WRITE, + extra_flags); + if (mm == MAP_FAILED) + return mm; + + sysmadvise_thp (mm, size); + + /* + The offset to the start of the mmapped region is stored in the prev_size + field of the chunk. This allows us to adjust returned start address to + meet alignment requirements here and in memalign(), and still be able to + compute proper address argument for later munmap in free() and realloc(). + */ + + INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */ + + if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) + { + /* For glibc, chunk2mem increases the address by CHUNK_HDR_SZ and + MALLOC_ALIGN_MASK is CHUNK_HDR_SZ-1. Each mmap'ed area is page + aligned and therefore definitely MALLOC_ALIGN_MASK-aligned. */ + assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0); + front_misalign = 0; + } + else + front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK; + + mchunkptr p; /* the allocated/returned chunk */ + + if (front_misalign > 0) + { + ptrdiff_t correction = MALLOC_ALIGNMENT - front_misalign; + p = (mchunkptr) (mm + correction); + set_prev_size (p, correction); + set_head (p, (size - correction) | IS_MMAPPED); + } + else + { + p = (mchunkptr) mm; + set_prev_size (p, 0); + set_head (p, size | IS_MMAPPED); + } + + /* update statistics */ + int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1; + atomic_max (&mp_.max_n_mmaps, new); + + unsigned long sum; + sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size; + atomic_max (&mp_.max_mmapped_mem, sum); + + check_chunk (av, p); + + return chunk2mem (p); +} + static void * sysmalloc (INTERNAL_SIZE_T nb, mstate av) { @@ -2451,81 +2530,11 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold) && (mp_.n_mmaps < mp_.n_mmaps_max))) { - char *mm; /* return value from mmap call*/ - try_mmap: - /* - Round up size to nearest page. For mmapped chunks, the overhead - is one SIZE_SZ unit larger than for normal chunks, because there - is no following chunk whose prev_size field could be used. - - See the front_misalign handling below, for glibc there is no - need for further alignments unless we have have high alignment. - */ - if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) - size = ALIGN_UP (nb + SIZE_SZ, pagesize); - else - size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize); + char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + if (mm != MAP_FAILED) + return mm; tried_mmap = true; - - /* Don't try if size wraps around 0 */ - if ((unsigned long) (size) > (unsigned long) (nb)) - { - mm = (char *) (MMAP (0, size, - mtag_mmap_flags | PROT_READ | PROT_WRITE, 0)); - - if (mm != MAP_FAILED) - { - sysmadvise_thp (mm, size); - - /* - The offset to the start of the mmapped region is stored - in the prev_size field of the chunk. This allows us to adjust - returned start address to meet alignment requirements here - and in memalign(), and still be able to compute proper - address argument for later munmap in free() and realloc(). - */ - - if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) - { - /* For glibc, chunk2mem increases the address by - CHUNK_HDR_SZ and MALLOC_ALIGN_MASK is - CHUNK_HDR_SZ-1. Each mmap'ed area is page - aligned and therefore definitely - MALLOC_ALIGN_MASK-aligned. */ - assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0); - front_misalign = 0; - } - else - front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK; - if (front_misalign > 0) - { - correction = MALLOC_ALIGNMENT - front_misalign; - p = (mchunkptr) (mm + correction); - set_prev_size (p, correction); - set_head (p, (size - correction) | IS_MMAPPED); - } - else - { - p = (mchunkptr) mm; - set_prev_size (p, 0); - set_head (p, size | IS_MMAPPED); - } - - /* update statistics */ - - int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1; - atomic_max (&mp_.max_n_mmaps, new); - - unsigned long sum; - sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size; - atomic_max (&mp_.max_mmapped_mem, sum); - - check_chunk (av, p); - - return chunk2mem (p); - } - } } /* There are no usable arenas and mmap also failed. */ From patchwork Wed Aug 18 14:20:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44694 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 81D693986426 for ; Wed, 18 Aug 2021 14:23:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 81D693986426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629296616; bh=0Su8RN99GkvWv7OAGCt70b9Xh8hBKgrZYtu5WqAEALI=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=YwQg/6Pv+xrCSZp5iolW4D/v5T1Rms9OKpq2mEsoN4ckKmT5s7g6zF2c6MUXVRFde jCjWeA1eg6jsqv878vtqjE3h0F0YBTSTAcJexQOWXOUWOhCRjkywyukXojN07xbGBB u0CVqW3ZZXPSzC70UPWYGNj30BpGx8dpU4a3LjPo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 9507F39874C9 for ; Wed, 18 Aug 2021 14:20:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9507F39874C9 Received: by mail-pg1-x536.google.com with SMTP id e7so2386921pgk.2 for ; Wed, 18 Aug 2021 07:20:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0Su8RN99GkvWv7OAGCt70b9Xh8hBKgrZYtu5WqAEALI=; b=aWlmlKIN8w8nO579YK6RraKgdcCipuvEmoiB3rFhvRsYdlVWmpmIo2Vp7HR2OppWR7 1GRIN++fqOXrTq03NgZAIpLddexPR/n0Maub2HsVXKNusMIoGP6s0dVOsoTqmXvspCF9 Y3MfYVTf0yB5oUSnHTx2kE2ip0sILYxCXBDJjDK/wtQvGi7gUTLo6k5R4MKi45S7qxML vVceCtyboldA3u5DZFluOhoB98zUJeVCNJPdDYYq5U4Cpy0TRUlfkn2WGwULKCfCJKN5 1UnTv38mHxGOC1xCzpBKu2zZVPg/e77visykVm2KCna29VaIcj+Ptc7ID9oYGU7q9UWO AJ2Q== X-Gm-Message-State: AOAM530JmcOtkthdyZdhkDYgl5vukC/QDgoPitJfNzSgPyU6ICMNqKv3 R+cXkVkkuIzOT57BD8UzDR8IBuFkbl/OYA== X-Google-Smtp-Source: ABdhPJwUHQDubV+NUMFjw646MQMO1PmXejNm+GYHxgcWeW3ESXzt/cBbDfoqjk7Iwm4X5kF3+UHHDg== X-Received: by 2002:a63:ed0f:: with SMTP id d15mr9296419pgi.52.1629296411491; Wed, 18 Aug 2021 07:20:11 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:8c0a:d250:6dae:d807]) by smtp.gmail.com with ESMTPSA id c133sm6805015pfb.39.2021.08.18.07.20.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Aug 2021 07:20:11 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Date: Wed, 18 Aug 2021 11:20:00 -0300 Message-Id: <20210818142000.128752-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210818142000.128752-1-adhemerval.zanella@linaro.org> References: <20210818142000.128752-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" A new tunable, 'glibc.malloc.mmap_hugetlb', adds support to use Huge Page support directly with mmap() calls. The required supported sizes and flags for mmap() are provided by an arch-specific internal hook malloc_hp_config(). Currently it first try mmap() using the huge page size and fallback to default page size and sbrk() call if kernel returns MMAP_FAILED. The default malloc_hp_config() implementation does not enable it even if the tunable is set. Checked on x86_64-linux-gnu. --- NEWS | 4 + elf/dl-tunables.list | 4 + elf/tst-rtld-list-tunables.exp | 1 + malloc/arena.c | 2 + malloc/malloc.c | 35 +++++- manual/tunables.texi | 14 +++ sysdeps/generic/malloc-hugepages.c | 6 + sysdeps/generic/malloc-hugepages.h | 12 ++ sysdeps/unix/sysv/linux/malloc-hugepages.c | 125 +++++++++++++++++++++ 9 files changed, 200 insertions(+), 3 deletions(-) diff --git a/NEWS b/NEWS index 9b2345d08c..412bf3e6f8 100644 --- a/NEWS +++ b/NEWS @@ -14,6 +14,10 @@ Major new features: It might improve performance with Transparent Huge Pages madvise mode depending of the workload. +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to + instruct malloc to try use Huge Pages when allocate memory with mmap() + calls (through the use of MAP_HUGETLB). + Deprecated and removed features, and other changes affecting compatibility: [Add deprecations, removals and changes affecting compatibility here] diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 67df6dbc2c..209c2d8592 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -97,6 +97,10 @@ glibc { minval: 0 maxval: 1 } + mmap_hugetlb { + type: SIZE_T + minval: 0 + } } cpu { hwcap_mask { diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp index d8109fa31c..49f033ce91 100644 --- a/elf/tst-rtld-list-tunables.exp +++ b/elf/tst-rtld-list-tunables.exp @@ -1,6 +1,7 @@ glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.check: 0 (min: 0, max: 3) +glibc.malloc.mmap_hugetlb: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647) glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+) diff --git a/malloc/arena.c b/malloc/arena.c index 81bff54303..4efb5581c1 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -232,6 +232,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t) #endif TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t) TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t) +TUNABLE_CALLBACK_FNDECL (set_mmap_hugetlb, size_t) #else /* Initialization routine. */ #include @@ -333,6 +334,7 @@ ptmalloc_init (void) # endif TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast)); TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise)); + TUNABLE_GET (mmap_hugetlb, size_t, TUNABLE_CALLBACK (set_mmap_hugetlb)); #else if (__glibc_likely (_environ != NULL)) { diff --git a/malloc/malloc.c b/malloc/malloc.c index 4bfcea286f..8cf2d6855e 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1884,6 +1884,10 @@ struct malloc_par #if HAVE_TUNABLES /* Transparent Large Page support. */ INTERNAL_SIZE_T thp_pagesize; + /* A value different than 0 means to align mmap allocation to hp_pagesize + add hp_flags on flags. */ + INTERNAL_SIZE_T hp_pagesize; + int hp_flags; #endif /* Memory map support */ @@ -2415,7 +2419,8 @@ do_check_malloc_state (mstate av) */ static void * -sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av, + bool set_thp) { long int size; @@ -2442,7 +2447,8 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) if (mm == MAP_FAILED) return mm; - sysmadvise_thp (mm, size); + if (set_thp) + sysmadvise_thp (mm, size); /* The offset to the start of the mmapped region is stored in the prev_size @@ -2531,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) && (mp_.n_mmaps < mp_.n_mmaps_max))) { try_mmap: - char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + char *mm; +#if HAVE_TUNABLES + if (mp_.hp_pagesize > 0) + { + /* There is no need to isse the THP madvise call if Huge Pages are + used directly. */ + mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av, false); + if (mm != MAP_FAILED) + return mm; + } +#endif + mm = sysmalloc_mmap (nb, pagesize, 0, av, true); if (mm != MAP_FAILED) return mm; tried_mmap = true; @@ -5405,6 +5422,18 @@ do_set_thp_madvise (int32_t value) } return 0; } + +static __always_inline int +do_set_mmap_hugetlb (size_t value) +{ + if (value > 0) + { + struct malloc_hugepage_config_t cfg = __malloc_hugepage_config (value); + mp_.hp_pagesize = cfg.pagesize; + mp_.hp_flags = cfg.flags; + } + return 0; +} #endif int diff --git a/manual/tunables.texi b/manual/tunables.texi index 93c46807f9..4da6a02778 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -279,6 +279,20 @@ The default value of this tunable is @code{0}, which disable its usage. Setting to a positive value enable the @code{madvise} call. @end deftp +@deftp Tunable glibc.malloc.mmap_hugetlb +This tunable enable the use of Huge Pages when the system supports it (currently +only Linux). It is done by aligning the memory size and passing the required +flags (@code{MAP_HUGETLB} on Linux) when issuing the @code{mmap} to allocate +memory from the system. + +The default value of this tunable is @code{0}, which disable its usage. +The special value @code{1} will try to gather the system default huge page size, +while a value larger than @code{1} will try to match it with the supported system +huge page size. If either no default huge page size could be obtained or if the +requested size does not match the supported ones, the huge pages supports will be +disabled. +@end deftp + @node Dynamic Linking Tunables @section Dynamic Linking Tunables @cindex dynamic linking tunables diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c index 262bcdbeb8..e5f5c1ec98 100644 --- a/sysdeps/generic/malloc-hugepages.c +++ b/sysdeps/generic/malloc-hugepages.c @@ -29,3 +29,9 @@ __malloc_thp_mode (void) { return malloc_thp_mode_not_supported; } + +/* Return the default transparent huge page size. */ +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested) +{ + return (struct malloc_hugepage_config_t) { 0, 0 }; +} diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h index 664cda9b67..27f7adfea5 100644 --- a/sysdeps/generic/malloc-hugepages.h +++ b/sysdeps/generic/malloc-hugepages.h @@ -34,4 +34,16 @@ enum malloc_thp_mode_t enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; +struct malloc_hugepage_config_t +{ + size_t pagesize; + int flags; +}; + +/* Returned the support huge page size from the requested PAGESIZE along + with the requires extra mmap flags. Returning a 0 value for pagesize + disables its usage. */ +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested) + attribute_hidden; + #endif /* _MALLOC_HUGEPAGES_H */ diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c index 66589127cd..0eb0c764ad 100644 --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -17,8 +17,10 @@ not, see . */ #include +#include #include #include +#include size_t __malloc_default_thp_pagesize (void) @@ -74,3 +76,126 @@ __malloc_thp_mode (void) } return malloc_thp_mode_not_supported; } + +static size_t +malloc_default_hugepage_size (void) +{ + int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY); + if (fd == -1) + return 0; + + char buf[512]; + off64_t off = 0; + while (1) + { + ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off); + if (r < 0) + break; + buf[r - 1] = '\0'; + + const char *s = strstr (buf, "Hugepagesize:"); + if (s == NULL) + { + char *nl = strrchr (buf, '\n'); + if (nl == NULL) + break; + off += (nl + 1) - buf; + continue; + } + + /* The default huge page size is in the form: + Hugepagesize: NUMBER kB */ + size_t hpsize = 0; + s += sizeof ("Hugepagesize: ") - 1; + for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++) + { + if (s[i] == ' ') + continue; + hpsize *= 10; + hpsize += s[i] - '0'; + } + return hpsize * 1024; + } + + __close_nocancel (fd); + + return 0; +} + +static inline struct malloc_hugepage_config_t +make_malloc_hugepage_config (size_t pagesize) +{ + int flags = MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT); + return (struct malloc_hugepage_config_t) { pagesize, flags }; +} + +struct malloc_hugepage_config_t +__malloc_hugepage_config (size_t requested) +{ + if (requested == 1) + { + size_t pagesize = malloc_default_hugepage_size (); + if (pagesize != 0) + return make_malloc_hugepage_config (pagesize); + } + + int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages", + O_RDONLY | O_DIRECTORY, 0); + if (dirfd == -1) + return (struct malloc_hugepage_config_t) { 0, 0 }; + + bool found = false; + + char buffer[1024]; + while (true) + { +#if !IS_IN(libc) +# define __getdents64 getdents64 +#endif + ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer)); + if (ret == -1) + break; + else if (ret == 0) + break; + + char *begin = buffer, *end = buffer + ret; + while (begin != end) + { + unsigned short int d_reclen; + memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen), + sizeof (d_reclen)); + const char *dname = begin + offsetof (struct dirent64, d_name); + begin += d_reclen; + + if (dname[0] == '.' + || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0) + continue; + + /* Each entry represents a supported huge page in the form of: + hugepages-kB. */ + size_t hpsize = 0; + const char *sizestr = dname + sizeof ("hugepages-") - 1; + for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++) + { + hpsize *= 10; + hpsize += sizestr[i] - '0'; + } + hpsize *= 1024; + + if (hpsize == requested) + { + found = true; + break; + } + } + if (found) + break; + } + + __close_nocancel (dirfd); + + if (found) + return make_malloc_hugepage_config (requested); + + return (struct malloc_hugepage_config_t) { 0, 0 }; +}