From patchwork Mon Aug 23 21:57:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44770 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 62ECD385781C for ; Mon, 23 Aug 2021 21:59:57 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 62ECD385781C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629755997; bh=a8PG33W0hKGrVTgRwtE+eQIbwrXZ0v05vTF7hEuyV6c=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xhcZhHeltS3XaLnZYtHW6ltswlnc1JrJw1pPeZMsdnZGEFzsnAROs1s8Neib6pcW5 ctKqEFtPzzO9QnJq+zck2AFvouRpnejTQWrnmr0ZWuhOltzho932Zpm41BapBJZpMJ N8HFAGZmMdGpMg5jJqAkAq1eKTftYid29qqwEB4c= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com [IPv6:2607:f8b0:4864:20::836]) by sourceware.org (Postfix) with ESMTPS id 29FAE3857C40 for ; Mon, 23 Aug 2021 21:57:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 29FAE3857C40 Received: by mail-qt1-x836.google.com with SMTP id t9so7071014qtp.2 for ; Mon, 23 Aug 2021 14:57:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=a8PG33W0hKGrVTgRwtE+eQIbwrXZ0v05vTF7hEuyV6c=; b=PxL1xcUWCW1fGTnvfECe2gZRqRBBKO+DfxS9WWCQF5VIkrMDheZL9eYTbXDexkSudf XPZGIsHk1oM0yX3LxQqX0CgFbMReSa6Nf4uWtSqHy62JQz1vUyyu/bb78WU201FfW7E5 H1H3BQQfydk4ppTFxmb2rvE67F9Au8Skz8ChD6Eg3rDUQ4/IOY2fX/vur9J3ZEBAPtwN XFEMluf4lo6mp9Y780MEnpqPyYuMOpT5xBPoxSWlp27fIKCefHPw+HMJtgi1S9Xt84dU zMEhRI6AmR6ZSOmtnz6kQMiPI74Pfl8BNxZtICN0dxaeVVZ4iMwSGlO8yZZ4sJVu+0xZ Ym4w== X-Gm-Message-State: AOAM531Uk/vJw671edDYMv30uR3QSRR/QUeRUl8uE0CZAuY3RAsNB4x9 VsoUw6tE6ZGWdZKOdPUpdtZY7LO1h66Piw== X-Google-Smtp-Source: ABdhPJy7PgMgFO0z/c4zwtY6gbbWvIPmGtQdpVS881mK77zPmqutYkz41JkhSSs2WUaEaczcNasJUw== X-Received: by 2002:ac8:4d41:: with SMTP id x1mr4650039qtv.283.1629755839117; Mon, 23 Aug 2021 14:57:19 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:18 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 1/5] malloc: Add madvise support for Transparent Huge Pages Date: Mon, 23 Aug 2021 18:57:09 -0300 Message-Id: <20210823215713.3304523-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" Linux Transparent Huge Pages (THP) current support three different states: 'never', 'madvise', and 'always'. The 'never' is self-explanatory and 'always' will enable THP for all anonymous memory. However, 'madvise' is still the default for some system and for such case THP will be only used if the memory range is explicity advertise by the program through a madvise(MADV_HUGEPAGE) call. To enable it a new tunable is provided, 'glibc.malloc.hugetlb', where setting to a value diffent than 0 enables the madvise call. This patch issues the madvise(MADV_HUGEPAGE) call after a successful mmap() call at sysmalloc() with sizes larger than the default huge page size. The madvise() call is disable is system does not support THP or if it has the mode set to "never" and on Linux only support one page size for THP, even if the architecture supports multiple sizes. To test is a new rule is added tests-malloc-hugetlb1, which run the addes tests with the required GLIBC_TUNABLE setting. Checked on x86_64-linux-gnu. --- NEWS | 5 +- Rules | 19 ++++++ elf/dl-tunables.list | 5 ++ elf/tst-rtld-list-tunables.exp | 1 + malloc/Makefile | 16 +++++ malloc/arena.c | 5 ++ malloc/malloc-internal.h | 1 + malloc/malloc.c | 47 +++++++++++++ manual/tunables.texi | 9 +++ sysdeps/generic/Makefile | 8 +++ sysdeps/generic/malloc-hugepages.c | 31 +++++++++ sysdeps/generic/malloc-hugepages.h | 37 +++++++++++ sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++ 13 files changed, 259 insertions(+), 1 deletion(-) create mode 100644 sysdeps/generic/malloc-hugepages.c create mode 100644 sysdeps/generic/malloc-hugepages.h create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c diff --git a/NEWS b/NEWS index 79c895e382..5c9486b468 100644 --- a/NEWS +++ b/NEWS @@ -9,7 +9,10 @@ Version 2.35 Major new features: - [Add new features here] +* On Linux, a new tunable, glibc.malloc.hugetlb, can be used to + make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. + It might improve performance with Transparent Huge Pages madvise mode + depending of the workload. Deprecated and removed features, and other changes affecting compatibility: diff --git a/Rules b/Rules index b1137afe71..471458ad4a 100644 --- a/Rules +++ b/Rules @@ -157,6 +157,7 @@ tests: $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \ $(tests-container:%=$(objpfx)%.out) \ $(tests-mcheck:%=$(objpfx)%-mcheck.out) \ $(tests-malloc-check:%=$(objpfx)%-malloc-check.out) \ + $(tests-malloc-hugetlb1:%=$(objpfx)%-malloc-hugetlb1.out) \ $(tests-special) $(tests-printers-out) xtests: tests $(xtests:%=$(objpfx)%.out) $(xtests-special) endif @@ -168,6 +169,7 @@ tests-expected = else tests-expected = $(tests) $(tests-internal) $(tests-printers) \ $(tests-container) $(tests-malloc-check:%=%-malloc-check) \ + $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) \ $(tests-mcheck:%=%-mcheck) endif tests: @@ -196,6 +198,7 @@ binaries-pie-notests = endif binaries-mcheck-tests = $(tests-mcheck:%=%-mcheck) binaries-malloc-check-tests = $(tests-malloc-check:%=%-malloc-check) +binaries-malloc-hugetlb1-tests = $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) else binaries-all-notests = binaries-all-tests = $(tests) $(tests-internal) $(xtests) $(test-srcs) @@ -207,6 +210,7 @@ binaries-pie-tests = binaries-pie-notests = binaries-mcheck-tests = binaries-malloc-check-tests = +binaries-malloc-hugetlb1-tests = endif binaries-pie = $(binaries-pie-tests) $(binaries-pie-notests) @@ -247,6 +251,14 @@ $(addprefix $(objpfx),$(binaries-malloc-check-tests)): %-malloc-check: %.o \ $(+link-tests) endif +ifneq "$(strip $(binaries-malloc-hugetlb1-tests))" "" +$(addprefix $(objpfx),$(binaries-malloc-hugetlb1-tests)): %-malloc-hugetlb1: %.o \ + $(link-extra-libs-tests) \ + $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \ + $(addprefix $(csu-objpfx),start.o) $(+preinit) $(+postinit) + $(+link-tests) +endif + ifneq "$(strip $(binaries-pie-tests))" "" $(addprefix $(objpfx),$(binaries-pie-tests)): %: %.o \ $(link-extra-libs-tests) \ @@ -284,6 +296,13 @@ $(1)-malloc-check-ENV = MALLOC_CHECK_=3 \ endef $(foreach t,$(tests-malloc-check),$(eval $(call malloc-check-ENVS,$(t)))) +# All malloc-hugetlb1 tests will be run with GLIBC_TUNABLE=glibc.malloc.hugetlb=1 +define malloc-hugetlb1-ENVS +$(1)-malloc-hugetlb1-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=1 +endef +$(foreach t,$(tests-malloc-hugetlb1),$(eval $(call malloc-hugetlb1-ENVS,$(t)))) + + # mcheck tests need the debug DSO to support -lmcheck. define mcheck-ENVS $(1)-mcheck-ENV = LD_PRELOAD=$(common-objpfx)/malloc/libc_malloc_debug.so diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 8ddd4a2314..1b347487f7 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -92,6 +92,11 @@ glibc { minval: 0 security_level: SXID_IGNORE } + hugetlb { + type: INT_32 + minval: 0 + maxval: 1 + } } cpu { hwcap_mask { diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp index 9f66c52885..89aa5c0d40 100644 --- a/elf/tst-rtld-list-tunables.exp +++ b/elf/tst-rtld-list-tunables.exp @@ -1,6 +1,7 @@ glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.check: 0 (min: 0, max: 3) +glibc.malloc.hugetlb: 0 (min: 0, max: 1) glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647) glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+) diff --git a/malloc/Makefile b/malloc/Makefile index 63cd7c0734..e47fd660f6 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -78,6 +78,22 @@ tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \ tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \ $(tests-static),$(tests)) +# Run all testes with GLIBC_TUNABLE=glibc.malloc.hugetlb=1 that check the +# Transparent Huge Pages support. We need exclude some tests that define +# the ENV vars. +tests-exclude-hugetlb1 = \ + tst-compathooks-off \ + tst-compathooks-on \ + tst-interpose-nothread \ + tst-interpose-thread \ + tst-interpose-static-nothread \ + tst-interpose-static-thread \ + tst-malloc-usable \ + tst-malloc-usable-tunables \ + tst-mallocstate +tests-malloc-hugetlb1 = \ + $(filter-out $(tests-exclude-hugetlb1), $(tests)) + # -lmcheck needs __malloc_initialize_hook, which was deprecated in 2.24. ifeq ($(have-GLIBC_2.23)$(build-shared),yesyes) # Tests that don't play well with mcheck. They are either bugs in mcheck or diff --git a/malloc/arena.c b/malloc/arena.c index 667484630e..6decf97915 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t) TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t) #endif TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t) +TUNABLE_CALLBACK_FNDECL (set_hugetlb, int32_t) #else /* Initialization routine. */ #include @@ -331,6 +332,7 @@ ptmalloc_init (void) TUNABLE_CALLBACK (set_tcache_unsorted_limit)); # endif TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast)); + TUNABLE_GET (hugetlb, int32_t, TUNABLE_CALLBACK (set_hugetlb)); #else if (__glibc_likely (_environ != NULL)) { @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad) __munmap (p2, HEAP_MAX_SIZE); return 0; } + + madvise_thp (p2, size); + h = (heap_info *) p2; h->size = size; h->mprotect_size = size; diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h index 0c7b5a183c..7493e34d86 100644 --- a/malloc/malloc-internal.h +++ b/malloc/malloc-internal.h @@ -22,6 +22,7 @@ #include #include #include +#include /* Called in the parent process before a fork. */ void __malloc_fork_lock_parent (void) attribute_hidden; diff --git a/malloc/malloc.c b/malloc/malloc.c index e065785af7..81d3411560 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1881,6 +1881,11 @@ struct malloc_par INTERNAL_SIZE_T arena_test; INTERNAL_SIZE_T arena_max; +#if HAVE_TUNABLES + /* Transparent Large Page support. */ + INTERNAL_SIZE_T thp_pagesize; +#endif + /* Memory map support */ int n_mmaps; int n_mmaps_max; @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n) #include +/* ----------- Routines dealing with transparent huge pages ----------- */ + +static inline void +madvise_thp (void *p, INTERNAL_SIZE_T size) +{ +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + /* Do not consider areas smaller than a huge page or if the tunable is + not active. */ + if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) + return; + __madvise (p, size, MADV_HUGEPAGE); +#endif +} + /* ------------------- Support for multiple arenas -------------------- */ #include "arena.c" @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (mm != MAP_FAILED) { + madvise_thp (mm, size); + /* The offset to the start of the mmapped region is stored in the prev_size field of the chunk. This allows us to adjust @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (size > 0) { brk = (char *) (MORECORE (size)); + if (brk != (char *) (MORECORE_FAILURE)) + madvise_thp (brk, size); LIBC_PROBE (memory_sbrk_more, 2, brk, size); } @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) if (mbrk != MAP_FAILED) { + madvise_thp (mbrk, size); + /* We do not need, and cannot use, another sbrk call to find end */ brk = mbrk; snd_brk = brk + size; @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) correction = 0; snd_brk = (char *) (MORECORE (0)); } + else + madvise_thp (snd_brk, correction); } /* handle non-contiguous cases */ @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size) if (cp == MAP_FAILED) return 0; + madvise_thp (cp, new_size); + p = (mchunkptr) (cp + offset); assert (aligned_OK (chunk2mem (p))); @@ -5325,6 +5354,24 @@ do_set_mxfast (size_t value) return 0; } +#if HAVE_TUNABLES +static __always_inline int +do_set_hugetlb (int32_t value) +{ + if (value == 1) + { + enum malloc_thp_mode_t thp_mode = __malloc_thp_mode (); + /* + Only enables THP usage is system does support it and has at least + always or madvise mode. Otherwise the madvise() call is wasteful. + */ + if (thp_mode == malloc_thp_mode_madvise) + mp_.thp_pagesize = __malloc_default_thp_pagesize (); + } + return 0; +} +#endif + int __libc_mallopt (int param_number, int value) { diff --git a/manual/tunables.texi b/manual/tunables.texi index 658547c613..799fa76258 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size passed to @code{malloc} for the largest bin size to enable. @end deftp +@deftp Tunable glibc.malloc.hugetlb +This tunable control the usage of Huge Pages on @code{malloc} calls. The default +value is @code{0}, which disables any additional support on @code{malloc}. + +Setting its value to @code{1} enables the use of @code{madvise} with +@code{MADV_HUGEPAGE} after memory allocation with @code{mmap}. It is enabled +only if the system supports Transparent Huge Page (currently only on Linux). +@end deftp + @node Dynamic Linking Tunables @section Dynamic Linking Tunables @cindex dynamic linking tunables diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile index a209e85cc4..8eef83c94d 100644 --- a/sysdeps/generic/Makefile +++ b/sysdeps/generic/Makefile @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe shared-only-routines += framestate unwind-pe endif endif + +ifeq ($(subdir),malloc) +sysdep_malloc_debug_routines += malloc-hugepages +endif + +ifeq ($(subdir),misc) +sysdep_routines += malloc-hugepages +endif diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c new file mode 100644 index 0000000000..262bcdbeb8 --- /dev/null +++ b/sysdeps/generic/malloc-hugepages.c @@ -0,0 +1,31 @@ +/* Huge Page support. Generic implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#include + +size_t +__malloc_default_thp_pagesize (void) +{ + return 0; +} + +enum malloc_thp_mode_t +__malloc_thp_mode (void) +{ + return malloc_thp_mode_not_supported; +} diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h new file mode 100644 index 0000000000..664cda9b67 --- /dev/null +++ b/sysdeps/generic/malloc-hugepages.h @@ -0,0 +1,37 @@ +/* Malloc huge page support. Generic implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#ifndef _MALLOC_HUGEPAGES_H +#define _MALLOC_HUGEPAGES_H + +#include + +/* Return the default transparent huge page size. */ +size_t __malloc_default_thp_pagesize (void) attribute_hidden; + +enum malloc_thp_mode_t +{ + malloc_thp_mode_always, + malloc_thp_mode_madvise, + malloc_thp_mode_never, + malloc_thp_mode_not_supported +}; + +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; + +#endif /* _MALLOC_HUGEPAGES_H */ diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c new file mode 100644 index 0000000000..66589127cd --- /dev/null +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -0,0 +1,76 @@ +/* Huge Page support. Linux implementation. + Copyright (C) 2021 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; see the file COPYING.LIB. If + not, see . */ + +#include +#include +#include + +size_t +__malloc_default_thp_pagesize (void) +{ + int fd = __open64_nocancel ( + "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY); + if (fd == -1) + return 0; + + + char str[INT_BUFSIZE_BOUND (size_t)]; + ssize_t s = __read_nocancel (fd, str, sizeof (str)); + __close_nocancel (fd); + + if (s < 0) + return 0; + + int r = 0; + for (ssize_t i = 0; i < s; i++) + { + if (str[i] == '\n') + break; + r *= 10; + r += str[i] - '0'; + } + return r; +} + +enum malloc_thp_mode_t +__malloc_thp_mode (void) +{ + int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled", + O_RDONLY); + if (fd == -1) + return malloc_thp_mode_not_supported; + + static const char mode_always[] = "[always] madvise never\n"; + static const char mode_madvise[] = "always [madvise] never\n"; + static const char mode_never[] = "always madvise [never]\n"; + + char str[sizeof(mode_always)]; + ssize_t s = __read_nocancel (fd, str, sizeof (str)); + __close_nocancel (fd); + + if (s == sizeof (mode_always) - 1) + { + if (strcmp (str, mode_always) == 0) + return malloc_thp_mode_always; + else if (strcmp (str, mode_madvise) == 0) + return malloc_thp_mode_madvise; + else if (strcmp (str, mode_never) == 0) + return malloc_thp_mode_never; + } + return malloc_thp_mode_not_supported; +} From patchwork Mon Aug 23 21:57:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44768 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6FFA3857810 for ; Mon, 23 Aug 2021 21:58:31 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6FFA3857810 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629755911; bh=Q06qKnYEzWdecibjv6osXvIEPjVP7LVqhUT3s56vunU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=WTrdTVGJ69yt/veH775MsTwcR/p2f6TE1/1oO9s6FvsVlcpKxTWNnHM0QhnWXft/e gpQbwopr/EwUyc7zMSvN07Ir8WHo3Qfjj2z8/cxxHNxI2l1MSWflTigsH9eFtgvhLs YHthQ6+BqIt9y3LUtFAoPE7upFXI3bmVr9VjCSi8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by sourceware.org (Postfix) with ESMTPS id 3243E3857825 for ; Mon, 23 Aug 2021 21:57:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3243E3857825 Received: by mail-qk1-x72f.google.com with SMTP id c10so18381998qko.11 for ; Mon, 23 Aug 2021 14:57:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Q06qKnYEzWdecibjv6osXvIEPjVP7LVqhUT3s56vunU=; b=fHUYsy7rN5BSDwI5OxEfM9Qg06C0A5JvdkgEKDJtq/0ZWf8WZsFKY7WcvPVrVGh1xy Pj8ahB7aXBzAS+idNnMfHkKUwK9OrjttvGbEthqAsA3lmYzd3ryYtjOcHiUfGbsDl6FQ u/BJKf+ij0UEHXbSAWrqw3BFSYh5skdyvphwLkIC6EEUPzcuD7xfxmaqAD8v+2o4F2w+ ErSHcmYqKvuxmfch5XzD5f/gokYKshQCkpwpWbvrWmL5VO2u7QkA5tAs0Z/xLdJF/pJh 41ShuLDYsxAHXBrT0lomJtOru60eUXrun4ztkGJAJ/6zUr5x6KHJlrPFUovRrTpkt1FK Yjxg== X-Gm-Message-State: AOAM530QiTsO812S7ci6MfexboM7Sj2gOVCPoC47M5XTtSmPbYEQ0yHG c310X+yMoncej3HFvI4dnpNEVeSopHMxzw== X-Google-Smtp-Source: ABdhPJyXF+d+t3eqhyBOX8FRT8PhLm19Sr5Lhcc+AcmFFLOhZu1ZOq/VsrjNxit5qXC8UNZzOMurjQ== X-Received: by 2002:a37:803:: with SMTP id 3mr23451149qki.127.1629755840659; Mon, 23 Aug 2021 14:57:20 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:20 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 2/5] malloc: Add THP/madvise support for sbrk Date: Mon, 23 Aug 2021 18:57:10 -0300 Message-Id: <20210823215713.3304523-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=unavailable autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" For the main arena, the sbrk() is used as default instead of mmap() and the granularity used when increasing the program segment is the default page size. To increase effectiveness with Transparent Huge Page with madvise, the large page size is use instead. This is enabled by setting the value '1' to the new huge pages tunable 'glibc.malloc.hugetlb'. Checked on x86_64-linux-gnu. --- include/libc-pointer-arith.h | 10 ++++++++++ malloc/malloc.c | 35 ++++++++++++++++++++++++++++++----- 2 files changed, 40 insertions(+), 5 deletions(-) diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h index 04ba537617..f592cbafec 100644 --- a/include/libc-pointer-arith.h +++ b/include/libc-pointer-arith.h @@ -37,6 +37,16 @@ /* Cast an integer or a pointer VAL to integer with proper type. */ # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val)) +/* Check if SIZE is aligned on SIZE */ +#define IS_ALIGNED(base, size) \ + (((base) & (size - 1)) == 0) + +#define PTR_IS_ALIGNED(base, size) \ + ((((uintptr_t) (base)) & (size - 1)) == 0) + +#define PTR_DIFF(p1, p2) \ + ((ptrdiff_t)((uintptr_t)(p1) - (uintptr_t)(p2))) + /* Cast an integer VAL to void * pointer. */ # define cast_to_pointer(val) ((void *) (uintptr_t) (val)) diff --git a/malloc/malloc.c b/malloc/malloc.c index 81d3411560..f65e448130 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2024,6 +2024,17 @@ madvise_thp (void *p, INTERNAL_SIZE_T size) not active. */ if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize) return; + + /* madvise() requires at least the input to be aligned to system page and + MADV_HUGEPAGE should handle unaligned address. Also unaligned inputs + should happen only for the initial data segment. */ + if (__glibc_unlikely (!PTR_IS_ALIGNED (p, GLRO (dl_pagesize)))) + { + void *q = PTR_ALIGN_DOWN (p, GLRO (dl_pagesize)); + size += PTR_DIFF (p, q); + p = q; + } + __madvise (p, size, MADV_HUGEPAGE); #endif } @@ -2610,14 +2621,25 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) size -= old_size; /* - Round to a multiple of page size. + Round to a multiple of page size or huge page size. If MORECORE is not contiguous, this ensures that we only call it with whole-page arguments. And if MORECORE is contiguous and this is not first time through, this preserves page-alignment of previous calls. Otherwise, we correct to page-align below. */ - size = ALIGN_UP (size, pagesize); +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + /* Defined in brk.c. */ + extern void *__curbrk; + if (mp_.thp_pagesize != 0) + { + uintptr_t top = ALIGN_UP ((uintptr_t) __curbrk + size, + mp_.thp_pagesize); + size = top - (uintptr_t) __curbrk; + } + else +#endif + size = ALIGN_UP (size, GLRO(dl_pagesize)); /* Don't try to call MORECORE if argument is so big as to appear @@ -2900,10 +2922,8 @@ systrim (size_t pad, mstate av) long released; /* Amount actually released */ char *current_brk; /* address returned by pre-check sbrk call */ char *new_brk; /* address returned by post-check sbrk call */ - size_t pagesize; long top_area; - pagesize = GLRO (dl_pagesize); top_size = chunksize (av->top); top_area = top_size - MINSIZE - 1; @@ -2911,7 +2931,12 @@ systrim (size_t pad, mstate av) return 0; /* Release in pagesize units and round down to the nearest page. */ - extra = ALIGN_DOWN(top_area - pad, pagesize); +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE) + if (mp_.thp_pagesize != 0) + extra = ALIGN_DOWN (top_area - pad, mp_.thp_pagesize); + else +#endif + extra = ALIGN_DOWN (top_area - pad, GLRO(dl_pagesize)); if (extra == 0) return 0; From patchwork Mon Aug 23 21:57:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44769 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 10FD23858038 for ; Mon, 23 Aug 2021 21:59:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 10FD23858038 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629755954; bh=/ZASenRC8EDCHopaMxXK+A6kpcObo8zvZoGOdrZiMpE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ukhQwQlW+SnlFDSZGSDoVCzoXATBIyIjs8mZN/9kFIeltw2EYgqEIVHglWpn/swX5 IvljHGTwjA4pfok88EH9VJ1qdHjbX67qFaFv5qY/T5kVFZEjPBkAcgW75dY8MJudxL LX4lp6NYevfEF8RhRK2yQeEJnejLyPIaMI61yt6U= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by sourceware.org (Postfix) with ESMTPS id AAF1A3858025 for ; Mon, 23 Aug 2021 21:57:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AAF1A3858025 Received: by mail-qk1-x72f.google.com with SMTP id 14so20929166qkc.4 for ; Mon, 23 Aug 2021 14:57:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/ZASenRC8EDCHopaMxXK+A6kpcObo8zvZoGOdrZiMpE=; b=Acc1EsggA+sd6pyPBb5LU5LB2bQNzrCqL/8SYRvoe93l0dFGAnPRwH+hkwm2dXibJV z7Gx5GgPD4dVWyHOzxmkSiIVwlgwckq00YxHAJeqQswtKjYt8HJgcF9hGeQiIhPJ1Mhc OFQk3feJLHIziP42EDyWIYftd7PNB09xT1psIWAynW/40Ni1hcrIla8SWDJjv9IZz1Pp 3SjC6uTbIjT2BXDmzH4mg1ee/Dc20xKcHP9Uu99LpWMCK0Hv7TnuuDXMu5SilpJCDxsP 2p10YFhbKTCqbQNFjeWMg8XQ08YWOvaaSAIjDn67NBv7NoYUI3g1VcnUUyLU1ciioe/6 4b+w== X-Gm-Message-State: AOAM531NK3y3rsTJWZlbJUN0yS8fMtHuINR6rHAwNBFKRJfCR2sa6XWl aQy6DIJ+HJ1KxatXe+Y6Jl4BDjbu/S0z9Q== X-Google-Smtp-Source: ABdhPJwVQ0J3+RrC1HiHLp7G6CV2Kjk4bNwyBA6H6X4pCSvrO0Ld2W4FBwBGggpUK6eK+NcvrJhIww== X-Received: by 2002:a37:b901:: with SMTP id j1mr22918971qkf.311.1629755842157; Mon, 23 Aug 2021 14:57:22 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:21 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 3/5] malloc: Move mmap logic to its own function Date: Mon, 23 Aug 2021 18:57:11 -0300 Message-Id: <20210823215713.3304523-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" So it can be used with different pagesize and flags. --- malloc/malloc.c | 164 ++++++++++++++++++++++++++---------------------- 1 file changed, 88 insertions(+), 76 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index f65e448130..dc5ecb84c5 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2414,6 +2414,85 @@ do_check_malloc_state (mstate av) be extended or replaced. */ +static void * +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) +{ + long int size; + + /* + Round up size to nearest page. For mmapped chunks, the overhead is one + SIZE_SZ unit larger than for normal chunks, because there is no + following chunk whose prev_size field could be used. + + See the front_misalign handling below, for glibc there is no need for + further alignments unless we have have high alignment. + */ + if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) + size = ALIGN_UP (nb + SIZE_SZ, pagesize); + else + size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize); + + /* Don't try if size wraps around 0. */ + if ((unsigned long) (size) <= (unsigned long) (nb)) + return MAP_FAILED; + + char *mm = (char *) MMAP (0, size, + mtag_mmap_flags | PROT_READ | PROT_WRITE, + extra_flags); + if (mm == MAP_FAILED) + return mm; + + madvise_thp (mm, size); + + /* + The offset to the start of the mmapped region is stored in the prev_size + field of the chunk. This allows us to adjust returned start address to + meet alignment requirements here and in memalign(), and still be able to + compute proper address argument for later munmap in free() and realloc(). + */ + + INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */ + + if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) + { + /* For glibc, chunk2mem increases the address by CHUNK_HDR_SZ and + MALLOC_ALIGN_MASK is CHUNK_HDR_SZ-1. Each mmap'ed area is page + aligned and therefore definitely MALLOC_ALIGN_MASK-aligned. */ + assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0); + front_misalign = 0; + } + else + front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK; + + mchunkptr p; /* the allocated/returned chunk */ + + if (front_misalign > 0) + { + ptrdiff_t correction = MALLOC_ALIGNMENT - front_misalign; + p = (mchunkptr) (mm + correction); + set_prev_size (p, correction); + set_head (p, (size - correction) | IS_MMAPPED); + } + else + { + p = (mchunkptr) mm; + set_prev_size (p, 0); + set_head (p, size | IS_MMAPPED); + } + + /* update statistics */ + int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1; + atomic_max (&mp_.max_n_mmaps, new); + + unsigned long sum; + sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size; + atomic_max (&mp_.max_mmapped_mem, sum); + + check_chunk (av, p); + + return chunk2mem (p); +} + static void * sysmalloc (INTERNAL_SIZE_T nb, mstate av) { @@ -2451,81 +2530,10 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold) && (mp_.n_mmaps < mp_.n_mmaps_max))) { - char *mm; /* return value from mmap call*/ - - try_mmap: - /* - Round up size to nearest page. For mmapped chunks, the overhead - is one SIZE_SZ unit larger than for normal chunks, because there - is no following chunk whose prev_size field could be used. - - See the front_misalign handling below, for glibc there is no - need for further alignments unless we have have high alignment. - */ - if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) - size = ALIGN_UP (nb + SIZE_SZ, pagesize); - else - size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize); + char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + if (mm != MAP_FAILED) + return mm; tried_mmap = true; - - /* Don't try if size wraps around 0 */ - if ((unsigned long) (size) > (unsigned long) (nb)) - { - mm = (char *) (MMAP (0, size, - mtag_mmap_flags | PROT_READ | PROT_WRITE, 0)); - - if (mm != MAP_FAILED) - { - madvise_thp (mm, size); - - /* - The offset to the start of the mmapped region is stored - in the prev_size field of the chunk. This allows us to adjust - returned start address to meet alignment requirements here - and in memalign(), and still be able to compute proper - address argument for later munmap in free() and realloc(). - */ - - if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ) - { - /* For glibc, chunk2mem increases the address by - CHUNK_HDR_SZ and MALLOC_ALIGN_MASK is - CHUNK_HDR_SZ-1. Each mmap'ed area is page - aligned and therefore definitely - MALLOC_ALIGN_MASK-aligned. */ - assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0); - front_misalign = 0; - } - else - front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK; - if (front_misalign > 0) - { - correction = MALLOC_ALIGNMENT - front_misalign; - p = (mchunkptr) (mm + correction); - set_prev_size (p, correction); - set_head (p, (size - correction) | IS_MMAPPED); - } - else - { - p = (mchunkptr) mm; - set_prev_size (p, 0); - set_head (p, size | IS_MMAPPED); - } - - /* update statistics */ - - int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1; - atomic_max (&mp_.max_n_mmaps, new); - - unsigned long sum; - sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size; - atomic_max (&mp_.max_mmapped_mem, sum); - - check_chunk (av, p); - - return chunk2mem (p); - } - } } /* There are no usable arenas and mmap also failed. */ @@ -2602,8 +2610,12 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) } } else if (!tried_mmap) - /* We can at least try to use to mmap memory. */ - goto try_mmap; + { + /* We can at least try to use to mmap memory. */ + char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + if (mm != MAP_FAILED) + return mm; + } } else /* av == main_arena */ From patchwork Mon Aug 23 21:57:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44772 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B2EF23857823 for ; Mon, 23 Aug 2021 22:01:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B2EF23857823 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629756096; bh=2Mpm4t4SnBHP3yBAKy6ZZ+tq5VlBTfqO4pBI9tpyDlE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=s7JfbuWsik3rn7tEclgFvGVuKpnPOPwARo+yzVO9LgA6J2+ofOjCeDJUQ9NqcI7ej dzrAl2zKGIHn1VrYx13qbsT4IifRROCFYPjKpb72EGE2JbZ0ymdaQh4UEcxaSECs0n BwS0iGhPWCT1/uWdeEQjbk297yakohlvpHVqyGF4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qv1-xf34.google.com (mail-qv1-xf34.google.com [IPv6:2607:f8b0:4864:20::f34]) by sourceware.org (Postfix) with ESMTPS id 4FC5E385741A for ; Mon, 23 Aug 2021 21:57:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4FC5E385741A Received: by mail-qv1-xf34.google.com with SMTP id c14so10588357qvs.9 for ; Mon, 23 Aug 2021 14:57:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2Mpm4t4SnBHP3yBAKy6ZZ+tq5VlBTfqO4pBI9tpyDlE=; b=G5chn/9AJArmP5LkY5irC+vNT9SBDyEP2dUPHp6btwhhfnWUeLIaZp/E07UVmUA6CM dRX3zLZ/r8V5L7dgNGXQcgShcqw1l9yphvCZdeI/xRMfP+gfDu81J3NqNl4XNaLpBxEJ dAcH+Kp+3mFv7JcyMm1ypsA8s+JekSc3+/2Ef1pbkdRNjSpFQ3fKtKKxgN2A2PshCd4h VYLVM1A+f9I5wHXjm7F65hn4zYraqv5lGiMeF6LuDFLgdfzRqJ2uoOvdXMyhUc2iygPg eROQWChil5EhjNOkesGIP+mQr5SsI87awu3Lm8kPYwFTKezMeC9q+pLfw0fPEFFjmdzw 7plQ== X-Gm-Message-State: AOAM532g2MUVshUsMpfrEgQC4wRQ9rYKPrsDJbMHh2P9ta4Iy1r4gncQ sv4A2ceI9VUO8qGgYEm4fSH9DOqtQAL3fA== X-Google-Smtp-Source: ABdhPJxMdjeEQl4/mx/uKWxJW1JzSWX3EsbVTrzdPzuk9C65U19celIaHX1od0AeR22MRDzN7H6kXw== X-Received: by 2002:a05:6214:1787:: with SMTP id ct7mr35644824qvb.53.1629755843674; Mon, 23 Aug 2021 14:57:23 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:23 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 4/5] malloc: Add Huge Page support for mmap() Date: Mon, 23 Aug 2021 18:57:12 -0300 Message-Id: <20210823215713.3304523-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" With the morecore hook gone, there is not easy way to provide a way to force the system allocation to use huge pages directly without resorting to transparent huge pages. Some users and programs do prefer to use the huge pages directly instead of THP for multiple reasons: no splitting, re-merging by the VM, no TLB shootdowns for running processes, fast allocation from the reserve pool, no competition with the rest of the processes unlike THP, no swapping all, etc. This patch extends the 'glibc.malloc.hugetlb' tunable: the value '2' means to use huge pages directly using the system default size, while a positive value means and specific page size that is matched against the ones supported by the system. Currently only memory allocated on sysmalloc() is handled, the arenas still uses the default system page size. To test is a new rule is added tests-malloc-hugetlb2, which run the addes tests with the required GLIBC_TUNABLE setting. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu. --- NEWS | 12 +- Rules | 17 +++ elf/dl-tunables.list | 3 +- elf/tst-rtld-list-tunables.exp | 2 +- malloc/Makefile | 12 +- malloc/malloc.c | 30 ++++- manual/tunables.texi | 7 ++ sysdeps/generic/malloc-hugepages.c | 7 ++ sysdeps/generic/malloc-hugepages.h | 7 ++ sysdeps/unix/sysv/linux/malloc-hugepages.c | 126 +++++++++++++++++++++ 10 files changed, 209 insertions(+), 14 deletions(-) diff --git a/NEWS b/NEWS index 5c9486b468..ac8f31950f 100644 --- a/NEWS +++ b/NEWS @@ -10,9 +10,15 @@ Version 2.35 Major new features: * On Linux, a new tunable, glibc.malloc.hugetlb, can be used to - make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls. - It might improve performance with Transparent Huge Pages madvise mode - depending of the workload. + either make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk + or to use huge pages directly with mmap calls with the MAP_HUGETLB + flags). The former can improve performance when Transparent Huge Pages + is set to 'madvise' mode while the latter uses the system reversed + pages. + +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to + instruct malloc to try use Huge Pages when allocate memory with mmap() + calls (through the use of MAP_HUGETLB). Deprecated and removed features, and other changes affecting compatibility: diff --git a/Rules b/Rules index 471458ad4a..542a37eef0 100644 --- a/Rules +++ b/Rules @@ -158,6 +158,7 @@ tests: $(tests:%=$(objpfx)%.out) $(tests-internal:%=$(objpfx)%.out) \ $(tests-mcheck:%=$(objpfx)%-mcheck.out) \ $(tests-malloc-check:%=$(objpfx)%-malloc-check.out) \ $(tests-malloc-hugetlb1:%=$(objpfx)%-malloc-hugetlb1.out) \ + $(tests-malloc-hugetlb2:%=$(objpfx)%-malloc-hugetlb2.out) \ $(tests-special) $(tests-printers-out) xtests: tests $(xtests:%=$(objpfx)%.out) $(xtests-special) endif @@ -170,6 +171,7 @@ else tests-expected = $(tests) $(tests-internal) $(tests-printers) \ $(tests-container) $(tests-malloc-check:%=%-malloc-check) \ $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) \ + $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2) \ $(tests-mcheck:%=%-mcheck) endif tests: @@ -199,6 +201,7 @@ endif binaries-mcheck-tests = $(tests-mcheck:%=%-mcheck) binaries-malloc-check-tests = $(tests-malloc-check:%=%-malloc-check) binaries-malloc-hugetlb1-tests = $(tests-malloc-hugetlb1:%=%-malloc-hugetlb1) +binaries-malloc-hugetlb2-tests = $(tests-malloc-hugetlb2:%=%-malloc-hugetlb2) else binaries-all-notests = binaries-all-tests = $(tests) $(tests-internal) $(xtests) $(test-srcs) @@ -211,6 +214,7 @@ binaries-pie-notests = binaries-mcheck-tests = binaries-malloc-check-tests = binaries-malloc-hugetlb1-tests = +binaries-malloc-hugetlb2-tests = endif binaries-pie = $(binaries-pie-tests) $(binaries-pie-notests) @@ -259,6 +263,14 @@ $(addprefix $(objpfx),$(binaries-malloc-hugetlb1-tests)): %-malloc-hugetlb1: %.o $(+link-tests) endif +ifneq "$(strip $(binaries-malloc-hugetlb2-tests))" "" +$(addprefix $(objpfx),$(binaries-malloc-hugetlb2-tests)): %-malloc-hugetlb2: %.o \ + $(link-extra-libs-tests) \ + $(sort $(filter $(common-objpfx)lib%,$(link-libc))) \ + $(addprefix $(csu-objpfx),start.o) $(+preinit) $(+postinit) + $(+link-tests) +endif + ifneq "$(strip $(binaries-pie-tests))" "" $(addprefix $(objpfx),$(binaries-pie-tests)): %: %.o \ $(link-extra-libs-tests) \ @@ -302,6 +314,11 @@ $(1)-malloc-hugetlb1-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=1 endef $(foreach t,$(tests-malloc-hugetlb1),$(eval $(call malloc-hugetlb1-ENVS,$(t)))) +# All malloc-hugetlb2 tests will be run with GLIBC_TUNABLE=glibc.malloc.hugetlb=2 +define malloc-hugetlb2-ENVS +$(1)-malloc-hugetlb2-ENV += GLIBC_TUNABLES=glibc.malloc.hugetlb=2 +endef +$(foreach t,$(tests-malloc-hugetlb2),$(eval $(call malloc-hugetlb2-ENVS,$(t)))) # mcheck tests need the debug DSO to support -lmcheck. define mcheck-ENVS diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 1b347487f7..379701b84f 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -93,9 +93,8 @@ glibc { security_level: SXID_IGNORE } hugetlb { - type: INT_32 + type: SIZE_T minval: 0 - maxval: 1 } } cpu { diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp index 89aa5c0d40..245b074432 100644 --- a/elf/tst-rtld-list-tunables.exp +++ b/elf/tst-rtld-list-tunables.exp @@ -1,7 +1,7 @@ glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+) glibc.malloc.check: 0 (min: 0, max: 3) -glibc.malloc.hugetlb: 0 (min: 0, max: 1) +glibc.malloc.hugetlb: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647) glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+) glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+) diff --git a/malloc/Makefile b/malloc/Makefile index e47fd660f6..a03739d3e1 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -78,10 +78,10 @@ tests-exclude-malloc-check = tst-malloc-check tst-malloc-usable \ tests-malloc-check = $(filter-out $(tests-exclude-malloc-check) \ $(tests-static),$(tests)) -# Run all testes with GLIBC_TUNABLE=glibc.malloc.hugetlb=1 that check the -# Transparent Huge Pages support. We need exclude some tests that define -# the ENV vars. -tests-exclude-hugetlb1 = \ +# Run all tests with GLIBC_TUNABLE=glibc.malloc.hugetlb={1,2} which check +# the Transparent Huge Pages support (1) or Huge Page support (2). We need +# exclude some tests that define the ENV vars. +tests-exclude-hugetlb = \ tst-compathooks-off \ tst-compathooks-on \ tst-interpose-nothread \ @@ -92,7 +92,9 @@ tests-exclude-hugetlb1 = \ tst-malloc-usable-tunables \ tst-mallocstate tests-malloc-hugetlb1 = \ - $(filter-out $(tests-exclude-hugetlb1), $(tests)) + $(filter-out $(tests-exclude-hugetlb), $(tests)) +tests-malloc-hugetlb2 = \ + $(filter-out $(tests-exclude-hugetlb), $(tests)) # -lmcheck needs __malloc_initialize_hook, which was deprecated in 2.24. ifeq ($(have-GLIBC_2.23)$(build-shared),yesyes) diff --git a/malloc/malloc.c b/malloc/malloc.c index dc5ecb84c5..370d9ffac0 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1884,6 +1884,10 @@ struct malloc_par #if HAVE_TUNABLES /* Transparent Large Page support. */ INTERNAL_SIZE_T thp_pagesize; + /* A value different than 0 means to align mmap allocation to hp_pagesize + add hp_flags on flags. */ + INTERNAL_SIZE_T hp_pagesize; + int hp_flags; #endif /* Memory map support */ @@ -2442,7 +2446,10 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av) if (mm == MAP_FAILED) return mm; - madvise_thp (mm, size); +#ifdef MAP_HUGETLB + if (!(extra_flags & MAP_HUGETLB)) + madvise_thp (mm, size); +#endif /* The offset to the start of the mmapped region is stored in the prev_size @@ -2530,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold) && (mp_.n_mmaps < mp_.n_mmaps_max))) { - char *mm = sysmalloc_mmap (nb, pagesize, 0, av); + char *mm; +#if HAVE_TUNABLES + if (mp_.hp_pagesize > 0) + { + /* There is no need to isse the THP madvise call if Huge Pages are + used directly. */ + mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av); + if (mm != MAP_FAILED) + return mm; + } +#endif + mm = sysmalloc_mmap (nb, pagesize, 0, av); if (mm != MAP_FAILED) return mm; tried_mmap = true; @@ -2611,7 +2629,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av) } else if (!tried_mmap) { - /* We can at least try to use to mmap memory. */ + /* We can at least try to use to mmap memory. If new_heap() fails + it is unlikely that trying to allocage huge page succeeds. */ char *mm = sysmalloc_mmap (nb, pagesize, 0, av); if (mm != MAP_FAILED) return mm; @@ -5405,6 +5424,11 @@ do_set_hugetlb (int32_t value) if (thp_mode == malloc_thp_mode_madvise) mp_.thp_pagesize = __malloc_default_thp_pagesize (); } + else if (value >= 2) + { + __malloc_hugepage_config (value == 2 ? 0 : value, &mp_.hp_pagesize, + &mp_.hp_flags); + } return 0; } #endif diff --git a/manual/tunables.texi b/manual/tunables.texi index 799fa76258..1961adcbcb 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -277,6 +277,13 @@ value is @code{0}, which disables any additional support on @code{malloc}. Setting its value to @code{1} enables the use of @code{madvise} with @code{MADV_HUGEPAGE} after memory allocation with @code{mmap}. It is enabled only if the system supports Transparent Huge Page (currently only on Linux). + +Setting its value to @code{2} enables the use of Huge Page directly with +@code{mmap} with the use of @code{MAP_HUGETLB} flags. The huge page size +to use will be the default one provided by the system. A value larger than +@code{2} specifies a specific huge page size, which will be matched against +the system supported ones. If neither the default huge page size or if the +provided value is invalid, the huge page size usage is disabled. @end deftp @node Dynamic Linking Tunables diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c index 262bcdbeb8..258f7db7e6 100644 --- a/sysdeps/generic/malloc-hugepages.c +++ b/sysdeps/generic/malloc-hugepages.c @@ -29,3 +29,10 @@ __malloc_thp_mode (void) { return malloc_thp_mode_not_supported; } + +/* Return the default transparent huge page size. */ +void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) +{ + *pagesize = 0; + *flags = 0; +} diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h index 664cda9b67..a957611c06 100644 --- a/sysdeps/generic/malloc-hugepages.h +++ b/sysdeps/generic/malloc-hugepages.h @@ -34,4 +34,11 @@ enum malloc_thp_mode_t enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden; +/* Return the support huge page size from the REQUESTED sizes on PAGESIZE + along with the required extra mmap flags on FLAGS, Requesting the value + of 0 returns the default huge page size, otherwise the value will be + matched against the supported on by the system. */ +void __malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) + attribute_hidden; + #endif /* _MALLOC_HUGEPAGES_H */ diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c index 66589127cd..d5ec93a093 100644 --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c @@ -17,8 +17,10 @@ not, see . */ #include +#include #include #include +#include size_t __malloc_default_thp_pagesize (void) @@ -74,3 +76,127 @@ __malloc_thp_mode (void) } return malloc_thp_mode_not_supported; } + +static size_t +malloc_default_hugepage_size (void) +{ + int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY); + if (fd == -1) + return 0; + + size_t hpsize = 0; + + char buf[512]; + off64_t off = 0; + while (1) + { + ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off); + if (r < 0) + break; + buf[r - 1] = '\0'; + + const char *s = strstr (buf, "Hugepagesize:"); + if (s == NULL) + { + char *nl = strrchr (buf, '\n'); + if (nl == NULL) + break; + off += (nl + 1) - buf; + continue; + } + + /* The default huge page size is in the form: + Hugepagesize: NUMBER kB */ + s += sizeof ("Hugepagesize: ") - 1; + for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++) + { + if (s[i] == ' ') + continue; + hpsize *= 10; + hpsize += s[i] - '0'; + } + hpsize *= 1024; + break; + } + + __close_nocancel (fd); + + return hpsize; +} + +static inline int +hugepage_flags (size_t pagesize) +{ + return MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT); +} + +void +__malloc_hugepage_config (size_t requested, size_t *pagesize, int *flags) +{ + *pagesize = 0; + *flags = 0; + + if (requested == 0) + { + *pagesize = malloc_default_hugepage_size (); + if (pagesize != 0) + *flags = hugepage_flags (*pagesize); + return; + } + + int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages", + O_RDONLY | O_DIRECTORY, 0); + if (dirfd == -1) + return; + + char buffer[1024]; + while (true) + { +#if !IS_IN(libc) +# define __getdents64 getdents64 +#endif + ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer)); + if (ret == -1) + break; + else if (ret == 0) + break; + + bool found = false; + char *begin = buffer, *end = buffer + ret; + while (begin != end) + { + unsigned short int d_reclen; + memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen), + sizeof (d_reclen)); + const char *dname = begin + offsetof (struct dirent64, d_name); + begin += d_reclen; + + if (dname[0] == '.' + || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0) + continue; + + /* Each entry represents a supported huge page in the form of: + hugepages-kB. */ + size_t hpsize = 0; + const char *sizestr = dname + sizeof ("hugepages-") - 1; + for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++) + { + hpsize *= 10; + hpsize += sizestr[i] - '0'; + } + hpsize *= 1024; + + if (hpsize == requested) + { + *pagesize = hpsize; + *flags = hugepage_flags (*pagesize); + found = true; + break; + } + } + if (found) + break; + } + + __close_nocancel (dirfd); +} From patchwork Mon Aug 23 21:57:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella X-Patchwork-Id: 44771 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2710B3858018 for ; Mon, 23 Aug 2021 22:00:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2710B3858018 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1629756054; bh=Ve53WgAqwhNDe11fJPMdns9DwYXvxCMv5jC3R58ojg8=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=L8s31oEyhYKO18wrdEz6CnwysMSbm1wiMq+kLfytIg7FS+BTwBp880JIHcyaxNvsw vyfP+KP8gvTsPL3gR9t9SMUxfZIX9avWZJlJ56dl/J+wkmAIwdExsLVTioIlLxs1Ai mLAtiknbpXz9PQP5uC9wKRoSmd0UpTLgymFh7Nl8= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-qk1-x733.google.com (mail-qk1-x733.google.com [IPv6:2607:f8b0:4864:20::733]) by sourceware.org (Postfix) with ESMTPS id BEA0B385802D for ; Mon, 23 Aug 2021 21:57:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BEA0B385802D Received: by mail-qk1-x733.google.com with SMTP id t190so20938116qke.7 for ; Mon, 23 Aug 2021 14:57:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Ve53WgAqwhNDe11fJPMdns9DwYXvxCMv5jC3R58ojg8=; b=H4r17fSSqvvaK9mdN+2Qew2glx7rBtVElRsrjqwX9H3sm3ZDwP/QuqPt4J63dzZBRn yPNFm8xRgnz5JvwRhkDptUJcktnGgpijzN6aW2SRBzELkPVBDZ8uel3vriv0rSCQ9cww WJHsDEfLUkpH2gh2S1HJp998l96aJYcWL+0tTK69q47GmX37dHsOJnQheJm0L3b//Lhe pqqskIHSjRA19abK23qc1gAPGJ7DG5YZY4cfHfPRcWRhX7ryXqE7wfUjDb5Cgybpr12S VrivKUqTBZdrpgGnN5buHhvPvnjoRj8Se9k5rIlOAeRpUXow/7rncJevHL3CrO/TcE3e HITw== X-Gm-Message-State: AOAM532gOaIbfZR1spoeSLFgvCrMJGx25Vf3Q+cpcYD3UBLhZXOD8CUa 1acULgeIy7lFNNQOVeEpRtN9rZgvFJ9/Cg== X-Google-Smtp-Source: ABdhPJwDwJi1gE0LV9DfJmPp8oBf8rPtbpuYUXpQA4lEkfik3EI8K9DX5BJvWueTZmKGi/pIFyCjBw== X-Received: by 2002:a37:9cca:: with SMTP id f193mr23090742qke.368.1629755845148; Mon, 23 Aug 2021 14:57:25 -0700 (PDT) Received: from birita.. ([2804:431:c7ca:cd83:c38b:b50d:5d9a:43d4]) by smtp.gmail.com with ESMTPSA id g1sm7444540qti.56.2021.08.23.14.57.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Aug 2021 14:57:24 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v3 5/5] malloc: Add huge page support to arenas Date: Mon, 23 Aug 2021 18:57:13 -0300 Message-Id: <20210823215713.3304523-6-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> References: <20210823215713.3304523-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Reply-To: Adhemerval Zanella Cc: Norbert Manthey , Guillaume Morin , Siddhesh Poyarekar Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" This patch adds support huge page support on arena allocation, enable with tunable glibc.malloc.hugetlb=2. Currently it uses non configurable value for the minimum (one defined huge page size) and maximum (four time the defined huge page size) arena size. The arena allocation with huge pages does not use MAP_NORESERVE. As indicate by kernel internal documentation [1], the flag might trigger a SIGBUS on soft page faults if at memory access there is no left pages in the pool. On systems without a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB) allocation failure. To improve test coverage it is required to create a pool with some allocated pages. Checked on x86_64-linux-gnu with no reserved pages, 10 reserved pages (which trigger mmap(MAP_HUGETBL) failures) and with 256 reserved pages (which does not trigger mmap(MAP_HUGETLB) failures). [1] https://www.kernel.org/doc/html/v4.18/vm/hugetlbfs_reserv.html#resv-map-modifications --- malloc/arena.c | 119 ++++++++++++++++++++++++++++++++---------------- malloc/malloc.c | 2 +- 2 files changed, 80 insertions(+), 41 deletions(-) diff --git a/malloc/arena.c b/malloc/arena.c index 6decf97915..d9f17fb256 100644 --- a/malloc/arena.c +++ b/malloc/arena.c @@ -42,6 +42,21 @@ mmap threshold, so that requests with a size just below that threshold can be fulfilled without creating too many heaps. */ +/* When huge pages are used to create new arenas, the maximum and minumum + size are based on the runtime defined huge page size. */ + +static inline size_t +heap_min_size (void) +{ + return mp_.hp_pagesize == 0 ? HEAP_MIN_SIZE : mp_.hp_pagesize; +} + +static inline size_t +heap_max_size (void) +{ + return mp_.hp_pagesize == 0 ? HEAP_MAX_SIZE : mp_.hp_pagesize * 4; +} + /***************************************************************************/ #define top(ar_ptr) ((ar_ptr)->top) @@ -57,10 +72,11 @@ typedef struct _heap_info size_t size; /* Current size in bytes. */ size_t mprotect_size; /* Size in bytes that has been mprotected PROT_READ|PROT_WRITE. */ + size_t pagesize; /* Page size used when allocating the arena. */ /* Make sure the following data is properly aligned, particularly that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of MALLOC_ALIGNMENT. */ - char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK]; + char pad[-5 * SIZE_SZ & MALLOC_ALIGN_MASK]; } heap_info; /* Get a compile-time error if the heap_info padding is not correct @@ -126,10 +142,18 @@ static bool __malloc_initialized = false; /* find the heap and corresponding arena for a given ptr */ -#define heap_for_ptr(ptr) \ - ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1))) -#define arena_for_chunk(ptr) \ - (chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr) +static inline heap_info * +heap_for_ptr (void *ptr) +{ + size_t max_size = heap_max_size (); + return PTR_ALIGN_DOWN (ptr, max_size); +} + +static inline struct malloc_state * +arena_for_chunk (mchunkptr ptr) +{ + return chunk_main_arena (ptr) ? &main_arena : heap_for_ptr (ptr)->ar_ptr; +} /**************************************************************************/ @@ -444,71 +468,72 @@ static char *aligned_heap_area; of the page size. */ static heap_info * -new_heap (size_t size, size_t top_pad) +alloc_new_heap (size_t size, size_t top_pad, size_t pagesize, + int mmap_flags) { - size_t pagesize = GLRO (dl_pagesize); char *p1, *p2; unsigned long ul; heap_info *h; + size_t min_size = heap_min_size (); + size_t max_size = heap_max_size (); - if (size + top_pad < HEAP_MIN_SIZE) - size = HEAP_MIN_SIZE; - else if (size + top_pad <= HEAP_MAX_SIZE) + if (size + top_pad < min_size) + size = min_size; + else if (size + top_pad <= max_size) size += top_pad; - else if (size > HEAP_MAX_SIZE) + else if (size > max_size) return 0; else - size = HEAP_MAX_SIZE; + size = max_size; size = ALIGN_UP (size, pagesize); - /* A memory region aligned to a multiple of HEAP_MAX_SIZE is needed. + /* A memory region aligned to a multiple of max_size is needed. No swap space needs to be reserved for the following large mapping (on Linux, this is the case for all non-writable mappings anyway). */ p2 = MAP_FAILED; if (aligned_heap_area) { - p2 = (char *) MMAP (aligned_heap_area, HEAP_MAX_SIZE, PROT_NONE, - MAP_NORESERVE); + p2 = (char *) MMAP (aligned_heap_area, max_size, PROT_NONE, mmap_flags); aligned_heap_area = NULL; - if (p2 != MAP_FAILED && ((unsigned long) p2 & (HEAP_MAX_SIZE - 1))) + if (p2 != MAP_FAILED && ((unsigned long) p2 & (max_size - 1))) { - __munmap (p2, HEAP_MAX_SIZE); + __munmap (p2, max_size); p2 = MAP_FAILED; } } if (p2 == MAP_FAILED) { - p1 = (char *) MMAP (0, HEAP_MAX_SIZE << 1, PROT_NONE, MAP_NORESERVE); + p1 = (char *) MMAP (0, max_size << 1, PROT_NONE, mmap_flags); if (p1 != MAP_FAILED) { - p2 = (char *) (((unsigned long) p1 + (HEAP_MAX_SIZE - 1)) - & ~(HEAP_MAX_SIZE - 1)); + p2 = (char *) (((unsigned long) p1 + (max_size - 1)) + & ~(max_size - 1)); ul = p2 - p1; if (ul) __munmap (p1, ul); else - aligned_heap_area = p2 + HEAP_MAX_SIZE; - __munmap (p2 + HEAP_MAX_SIZE, HEAP_MAX_SIZE - ul); + aligned_heap_area = p2 + max_size; + __munmap (p2 + max_size, max_size - ul); } else { - /* Try to take the chance that an allocation of only HEAP_MAX_SIZE + /* Try to take the chance that an allocation of only max_size is already aligned. */ - p2 = (char *) MMAP (0, HEAP_MAX_SIZE, PROT_NONE, MAP_NORESERVE); + p2 = (char *) MMAP (0, max_size, PROT_NONE, mmap_flags); if (p2 == MAP_FAILED) return 0; - if ((unsigned long) p2 & (HEAP_MAX_SIZE - 1)) + if ((unsigned long) p2 & (max_size - 1)) { - __munmap (p2, HEAP_MAX_SIZE); + __munmap (p2, max_size); return 0; } } } if (__mprotect (p2, size, mtag_mmap_flags | PROT_READ | PROT_WRITE) != 0) { - __munmap (p2, HEAP_MAX_SIZE); + __munmap (p2, max_size); return 0; } @@ -517,22 +542,40 @@ new_heap (size_t size, size_t top_pad) h = (heap_info *) p2; h->size = size; h->mprotect_size = size; + h->pagesize = pagesize; LIBC_PROBE (memory_heap_new, 2, h, h->size); return h; } +static heap_info * +new_heap (size_t size, size_t top_pad) +{ + if (mp_.hp_pagesize != 0) + { + /* MAP_NORESERVE is not used for huge pages because some kernel may + not reserve the mmap() region and a subsequent access may trigger + a SIGBUS if there is no free pages in the pool. */ + heap_info *h = alloc_new_heap (size, top_pad, mp_.hp_pagesize, + mp_.hp_flags); + if (h != NULL) + return h; + } + return alloc_new_heap (size, top_pad, GLRO (dl_pagesize), MAP_NORESERVE); +} + /* Grow a heap. size is automatically rounded up to a multiple of the page size. */ static int grow_heap (heap_info *h, long diff) { - size_t pagesize = GLRO (dl_pagesize); + size_t pagesize = h->pagesize; + size_t max_size = heap_max_size (); long new_size; diff = ALIGN_UP (diff, pagesize); new_size = (long) h->size + diff; - if ((unsigned long) new_size > (unsigned long) HEAP_MAX_SIZE) + if ((unsigned long) new_size > (unsigned long) max_size) return -1; if ((unsigned long) new_size > h->mprotect_size) @@ -582,21 +625,15 @@ shrink_heap (heap_info *h, long diff) /* Delete a heap. */ -#define delete_heap(heap) \ - do { \ - if ((char *) (heap) + HEAP_MAX_SIZE == aligned_heap_area) \ - aligned_heap_area = NULL; \ - __munmap ((char *) (heap), HEAP_MAX_SIZE); \ - } while (0) - static int heap_trim (heap_info *heap, size_t pad) { mstate ar_ptr = heap->ar_ptr; - unsigned long pagesz = GLRO (dl_pagesize); + unsigned long pagesz = heap->pagesize; mchunkptr top_chunk = top (ar_ptr), p; heap_info *prev_heap; long new_size, top_size, top_area, extra, prev_size, misalign; + size_t max_size = heap_max_size (); /* Can this heap go away completely? */ while (top_chunk == chunk_at_offset (heap, sizeof (*heap))) @@ -613,12 +650,14 @@ heap_trim (heap_info *heap, size_t pad) assert (new_size > 0 && new_size < (long) (2 * MINSIZE)); if (!prev_inuse (p)) new_size += prev_size (p); - assert (new_size > 0 && new_size < HEAP_MAX_SIZE); - if (new_size + (HEAP_MAX_SIZE - prev_heap->size) < pad + MINSIZE + pagesz) + assert (new_size > 0 && new_size < max_size); + if (new_size + (max_size - prev_heap->size) < pad + MINSIZE + pagesz) break; ar_ptr->system_mem -= heap->size; LIBC_PROBE (memory_heap_free, 2, heap, heap->size); - delete_heap (heap); + if ((char *) heap + max_size == aligned_heap_area) + aligned_heap_area = NULL; + __munmap (heap, max_size); heap = prev_heap; if (!prev_inuse (p)) /* consolidate backward */ { diff --git a/malloc/malloc.c b/malloc/malloc.c index 370d9ffac0..c91554edf9 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -5311,7 +5311,7 @@ static __always_inline int do_set_mmap_threshold (size_t value) { /* Forbid setting the threshold too high. */ - if (value <= HEAP_MAX_SIZE / 2) + if (value <= heap_max_size () / 2) { LIBC_PROBE (memory_mallopt_mmap_threshold, 3, value, mp_.mmap_threshold, mp_.no_dyn_threshold);