From patchwork Tue Nov 26 07:33:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wangyang Guo X-Patchwork-Id: 101868 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 98E913858C50 for ; Tue, 26 Nov 2024 07:39:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 98E913858C50 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=nc2ITCDb X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by sourceware.org (Postfix) with ESMTPS id CC33A3858283 for ; Tue, 26 Nov 2024 07:37:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CC33A3858283 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CC33A3858283 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732606639; cv=none; b=FmDWaekwlIiZa21oydTzdIn7G/zNU/A4vPaPJ9KS6e4yUa75MPhaZHlJZzjwj4xi2bRxVw9Fi4f1KdgjbYNErwSGcMaPb4yae6nLsioBeKkatS5kI1+GTVx3I0LflabZa0caZTJyUsioc4XNRFoQljzmCTTFpNXQ9ci4ZSNa2nM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732606639; c=relaxed/simple; bh=7aTJLXNpFBrkG9kRI/jwFT805ByD3c4lXsganP8IT7A=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=ZpS2BSMYwHMS7ZKI/lMNxugtYMfME3VOHfAuQelAGU8LJMzdtnPQua4227GoY/oiSi1nUtqmoPjc6kC0y77WTHHLujtvjQz/LaysnecOstlXYaTBFk8iqQF93VUS+hdy4vvIz4OJ5YDuQlqI9PaT2XqkxKnEOwLXKqx/AOOeF5c= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CC33A3858283 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1732606639; x=1764142639; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=7aTJLXNpFBrkG9kRI/jwFT805ByD3c4lXsganP8IT7A=; b=nc2ITCDbtxN7dP8oHnU2GJW5C/Y78YWQt0hgRlvgs+aHRfQH+YAlbzk4 J02HNYq01Fhlb51a/wuA7Nx0ietUE3OgziqsJaHPcQ4jOOiYLY/8sHIGl K8EHdejYLi8rTdKY6dszRb7uwowjU2Rjw7BP6JHkfQgP2gmuD/i3Jka/X ZvXIPfwo5pFFhruM/9clnpag+Gms6meyGWw53VLscqNKCcLf3reeS+f06 WrRaQiYF+9VQrNv6P5G1L23zao+V5Xpb+/yTgn+ERZlPnREgWQCO2/BXU Z3knjUq15Zc9ldFjtv0Z/hMnW6yHbZktqYogeNR0tCufL1VkQRZctfvo4 A==; X-CSE-ConnectionGUID: m3J7G1kYQeaUmPICUiUpVg== X-CSE-MsgGUID: wv6tqL3wSPS9U3hu86k56w== X-IronPort-AV: E=McAfee;i="6700,10204,11267"; a="43816030" X-IronPort-AV: E=Sophos;i="6.12,185,1728975600"; d="scan'208";a="43816030" Received: from fmviesa005.fm.intel.com ([10.60.135.145]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Nov 2024 23:37:18 -0800 X-CSE-ConnectionGUID: SrOT62FwQjKft6Eo/D5uEw== X-CSE-MsgGUID: LUME/NYLQYqoL1ha4kmV4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,185,1728975600"; d="scan'208";a="95950783" Received: from linux-pnp-server-11.sh.intel.com ([10.239.176.178]) by fmviesa005.fm.intel.com with ESMTP; 25 Nov 2024 23:37:16 -0800 From: Wangyang Guo To: fweimer@redhat.com, hjl.tools@gmail.com Cc: libc-alpha@sourceware.org, goldstein.w.n@gmail.com, tianyou.li@intel.com, Wangyang Guo Subject: [PATCH v4 3/3] malloc: Add tcache path for calloc Date: Tue, 26 Nov 2024 15:33:40 +0800 Message-ID: <20241126073340.3724382-4-wangyang.guo@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241126073340.3724382-1-wangyang.guo@intel.com> References: <20241126073340.3724382-1-wangyang.guo@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. clear_mem() and tcache_available() is split out as a helper function for better reusing the code. Also fix tst-safe-linking failure after enabling tcache. In previous, calloc() is used as a way to by-pass tcache in memory allocation and trigger safe-linking check in fastbins path. With tcache enabled, it needs extra workarounds to bypass tcache. Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Bench Function: calloc Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.724 4 threads | 0.534 --- Changes in v3: - Split out tcache_available() as helper function. - Link to v2: https://sourceware.org/pipermail/libc-alpha/2024-August/159430.html Changes in v2: - Merge tst-safe-linking fix to make sure CI check pass. - Link to v1: https://sourceware.org/pipermail/libc-alpha/2024-August/159362.html --- malloc/malloc.c | 129 ++++++++++++++++++++++++-------------- malloc/tst-safe-linking.c | 81 ++++++++++++++++++++---- 2 files changed, 150 insertions(+), 60 deletions(-) diff --git a/malloc/malloc.c b/malloc/malloc.c index 81ddd2c3a8..1437ec20fb 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3208,6 +3208,18 @@ tcache_next (tcache_entry *e) return (tcache_entry *) REVEAL_PTR (e->next); } +/* Check if tcache is available for alloc by corresponding tc_idx. */ +static __always_inline bool +tcache_availabe (size_t tc_idx) +{ + if (tc_idx < mp_.tcache_bins + && tcache != NULL + && tcache->counts[tc_idx] > 0) + return true; + else + return false; +} + /* Verify if the suspicious tcache_entry is double free. It's not expected to execute very often, mark it as noinline. */ static __attribute__ ((noinline)) void @@ -3366,9 +3378,7 @@ __libc_malloc (size_t bytes) MAYBE_INIT_TCACHE (); DIAG_PUSH_NEEDS_COMMENT; - if (tc_idx < mp_.tcache_bins - && tcache != NULL - && tcache->counts[tc_idx] > 0) + if (tcache_availabe (tc_idx)) { victim = tcache_get (tc_idx); return tag_new_usable (victim); @@ -3667,9 +3677,7 @@ _mid_memalign (size_t alignment, size_t bytes, void *address) } size_t tc_idx = csize2tidx (tbytes); - if (tc_idx < mp_.tcache_bins - && tcache != NULL - && tcache->counts[tc_idx] > 0) + if (tcache_availabe (tc_idx)) { /* The tcache itself isn't encoded, but the chain is. */ tcache_entry **tep = & tcache->entries[tc_idx]; @@ -3747,16 +3755,55 @@ __libc_pvalloc (size_t bytes) return _mid_memalign (pagesize, rounded_bytes, address); } +static __always_inline void * +clear_mem (void *mem, INTERNAL_SIZE_T csz) +{ + INTERNAL_SIZE_T *d; + unsigned long clearsize, nclears; + + /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that + contents have an odd number of INTERNAL_SIZE_T-sized words; + minimally 3. */ + d = (INTERNAL_SIZE_T *) mem; + clearsize = csz - SIZE_SZ; + nclears = clearsize / sizeof (INTERNAL_SIZE_T); + assert (nclears >= 3); + + if (nclears > 9) + return memset (d, 0, clearsize); + + else + { + *(d + 0) = 0; + *(d + 1) = 0; + *(d + 2) = 0; + if (nclears > 4) + { + *(d + 3) = 0; + *(d + 4) = 0; + if (nclears > 6) + { + *(d + 5) = 0; + *(d + 6) = 0; + if (nclears > 8) + { + *(d + 7) = 0; + *(d + 8) = 0; + } + } + } + } + + return mem; +} + void * __libc_calloc (size_t n, size_t elem_size) { mstate av; - mchunkptr oldtop; - INTERNAL_SIZE_T sz, oldtopsize; + mchunkptr oldtop, p; + INTERNAL_SIZE_T sz, oldtopsize, csz; void *mem; - unsigned long clearsize; - unsigned long nclears; - INTERNAL_SIZE_T *d; ptrdiff_t bytes; if (__glibc_unlikely (__builtin_mul_overflow (n, elem_size, &bytes))) @@ -3772,6 +3819,27 @@ __libc_calloc (size_t n, size_t elem_size) MAYBE_INIT_TCACHE (); +#if USE_TCACHE + /* int_free also calls request2size, be careful to not pad twice. */ + size_t tbytes = checked_request2size (bytes); + if (tbytes == 0) + { + __set_errno (ENOMEM); + return NULL; + } + size_t tc_idx = csize2tidx (tbytes); + + if (tcache_availabe (tc_idx)) + { + mem = tcache_get (tc_idx); + p = mem2chunk (mem); + if (__glibc_unlikely (mtag_enabled)) + return tag_new_zero_region (mem, memsize (p)); + csz = chunksize (p); + return clear_mem (mem, csz); + } +#endif + if (SINGLE_THREAD_P) av = &main_arena; else @@ -3826,7 +3894,7 @@ __libc_calloc (size_t n, size_t elem_size) if (mem == NULL) return NULL; - mchunkptr p = mem2chunk (mem); + p = mem2chunk (mem); /* If we are using memory tagging, then we need to set the tags regardless of MORECORE_CLEARS, so we zero the whole block while @@ -3834,7 +3902,7 @@ __libc_calloc (size_t n, size_t elem_size) if (__glibc_unlikely (mtag_enabled)) return tag_new_zero_region (mem, memsize (p)); - INTERNAL_SIZE_T csz = chunksize (p); + csz = chunksize (p); /* Two optional cases in which clearing not necessary */ if (chunk_is_mmapped (p)) @@ -3853,40 +3921,7 @@ __libc_calloc (size_t n, size_t elem_size) } #endif - /* Unroll clear of <= 36 bytes (72 if 8byte sizes). We know that - contents have an odd number of INTERNAL_SIZE_T-sized words; - minimally 3. */ - d = (INTERNAL_SIZE_T *) mem; - clearsize = csz - SIZE_SZ; - nclears = clearsize / sizeof (INTERNAL_SIZE_T); - assert (nclears >= 3); - - if (nclears > 9) - return memset (d, 0, clearsize); - - else - { - *(d + 0) = 0; - *(d + 1) = 0; - *(d + 2) = 0; - if (nclears > 4) - { - *(d + 3) = 0; - *(d + 4) = 0; - if (nclears > 6) - { - *(d + 5) = 0; - *(d + 6) = 0; - if (nclears > 8) - { - *(d + 7) = 0; - *(d + 8) = 0; - } - } - } - } - - return mem; + return clear_mem (mem, csz); } #endif /* IS_IN (libc) */ diff --git a/malloc/tst-safe-linking.c b/malloc/tst-safe-linking.c index 01dd07004d..5302575ad1 100644 --- a/malloc/tst-safe-linking.c +++ b/malloc/tst-safe-linking.c @@ -111,22 +111,37 @@ test_fastbin (void *closure) int i; int mask = ((int *)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; + void * pps[TCACHE_FILL_COUNT]; printf ("++ fastbin ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + + /* Chunks for later tcache filling from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - printf ("p=%p\n", p); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin in FIFO order. */ free (a); free (b); free (c); @@ -136,11 +151,43 @@ test_fastbin (void *closure) memset (c, mask & 0xFF, size); printf ("After: c=%p, c[0]=%p\n", c, ((void **)c)[0]); + /* Filling fastbins, will be copied to tcache later. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (pps[i]); + } + + /* Drain out tcache to make sure later alloc from fastbins. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + void * volatile p = calloc (1, size); + ps[i] = p; + } + + /* This line will also filling tcache with remain pps and c. */ + pps[TCACHE_FILL_COUNT - 1] = calloc (1, size); + + /* Tcache is FILO, now the first one is c, take it out. */ c = calloc (1, size); printf ("Allocated: c=%p\n", c); + + /* Drain out remain pps from tcache. */ + for (i = 0; i < TCACHE_FILL_COUNT - 1; ++i) + { + void * volatile p = calloc (1, size); + pps[i] = p; + } + /* This line will trigger the Safe-Linking check. */ b = calloc (1, size); printf ("b=%p\n", b); + + /* Free previous pointers. */ + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + free (pps[i]); + } } /* Try corrupting the fastbin list and trigger a consolidate. */ @@ -150,21 +197,29 @@ test_fastbin_consolidate (void *closure) int i; int mask = ((int*)closure)[0]; size_t size = TCACHE_ALLOC_SIZE; + void * ps[TCACHE_FILL_COUNT]; printf ("++ fastbin consolidate ++\n"); + /* Populate the fastbin list. */ + void * volatile a = calloc (1, size); + void * volatile b = calloc (1, size); + void * volatile c = calloc (1, size); + printf ("a=%p, b=%p, c=%p\n", a, b, c); + /* Take the tcache out of the game. */ for (i = 0; i < TCACHE_FILL_COUNT; ++i) { void * volatile p = calloc (1, size); - free (p); + ps[i] = p; } - /* Populate the fastbin list. */ - void * volatile a = calloc (1, size); - void * volatile b = calloc (1, size); - void * volatile c = calloc (1, size); - printf ("a=%p, b=%p, c=%p\n", a, b, c); + for (i = 0; i < TCACHE_FILL_COUNT; ++i) + { + free (ps[i]); + } + + /* Free abc will return to fastbin. */ free (a); free (b); free (c);