From patchwork Tue Apr 21 08:39:12 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xavier Roche X-Patchwork-Id: 133449 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 54EBD4BA901E for ; Tue, 21 Apr 2026 08:39:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 54EBD4BA901E Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=algolia.com header.i=@algolia.com header.a=rsa-sha256 header.s=google header.b=K1XRfuO0 X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x330.google.com (mail-wm1-x330.google.com [IPv6:2a00:1450:4864:20::330]) by sourceware.org (Postfix) with ESMTPS id 4F9204BA901A for ; Tue, 21 Apr 2026 08:39:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4F9204BA901A Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=algolia.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=algolia.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4F9204BA901A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::330 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1776760756; cv=none; b=Ey+4sdgpho7ZrCvwesMhWZVsu8hhZi15e4BwZIQL6HnTihPEUAts090ICI58cRxE+xq1jO/dk1KmWObmqDfbCq6NwYdjHWApVr4oM6Z38T+/dPXaewqvsRUKKs5fHACgI9d3HHchiop3dexO9qtmTWxmlJwVzGcw2G1ItlkSwek= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1776760756; c=relaxed/simple; bh=3k4W6ey+nA0vELAvxA3MFaYO98jMPKhbdw6ODPBAf3U=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=UsX6t0p+PLLuzG/+uC3hTIH3nFBo1OyMQlgX/JF+WhJsnHpix49ZuPOv20Oz2Clzm9k2aDGlYDSlBXHs443uGOPws7Ao1ZB6+oo5ebjlzRfgmq9KXG3C36pAYU124mqiG77tnsjPsh1I7CdLIly6htjZdQ+bi06eptWJALRp+Ck= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4F9204BA901A Received: by mail-wm1-x330.google.com with SMTP id 5b1f17b1804b1-4891b02a0acso2588965e9.3 for ; Tue, 21 Apr 2026 01:39:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=algolia.com; s=google; t=1776760755; x=1777365555; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KBzkjlCZ6F7UM3zxSSnv4swwVMAHvO4G1orT1kQ9vII=; b=K1XRfuO0NK/jp6ZZX0XK/CZdypfl2hUED10ZxkxNSlAZKrpHjUOyUwplePS4ZYeBnB wn52mB9vCpsZ09mxwaWju01qbzssAF9hMM8VS7WdxxCtm1S/7gPBJxKnZ0pojBcJ0Jms 1N8Cqg1Jox4vyL2EmM+MPJYjyd8nAeJBaA66Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776760755; x=1777365555; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=KBzkjlCZ6F7UM3zxSSnv4swwVMAHvO4G1orT1kQ9vII=; b=KgxH2yqenwufeSTSc/TS6Ibal8wMFRU5TFczULYtFRcJPeIgu3zGqc2UJq0LzKhWpe PAdqYfjheqZfiX8n78wDQbFUvRtCj02N3xSKtsEhaIJc/Lb0UHS67/G/d0Rz3mrDPdmF US+TS/oG8a7aJHa/iIIKqIFODQRbZZuGexOgkGSd8sfdbvJxnyj9HFxs4+h6AgVC+nLK QXq0HYDvF1rhSNMuBNqGpf2haRx2LOdgF2baUupQoXcniIlM6VxSBnlaR0Z7flTSokEH ff9tM59rXfgyWQk2H38f3U/gsQ9ry5iYKh9is5Mq7YzvXQC9O7xhKJYK99TtqUs/P0eQ btxA== X-Gm-Message-State: AOJu0YzRvxqK9xwDlp5rDlxLhdz0mRoIlKhwFWwqpOYHpEa0KfH7kKXn gug8cid6BYfsR7pp0OmfG3k8Y0qNWOo60fjXhxA65HbVcft9++7nLx4iIO0A8z4j8AOamdAeFgw 7iz5/ X-Gm-Gg: AeBDietq5xudFOkXpv4ECb4ZyoI5R2O3bNbOmteZ30VGqiyf1g75aqKmRzgps3w2Apb Ana3MkZEBOpiC45LdQessMLO0fVYZ4dejrpa+bfTXt7VDaKLzDb8Wpjss3FCE4JJHLZmagDvNXS 8mIhobkpr3mQB2kDm1W6aT/UQwcg4PjzEBUqy9DsYBPkH8nYJt46nBbAFrrGbY07oWzJ8WOp5J+ RCro1dMxkAOn+ObEHJvM1lL7///T8bzyhUssQkS8/Rvb9g9tbM622sO7kylcyCtia2J3ArL/Upa J2tPhUp7bqhLeWnTjdnz9lftjlu8Fr2mDhOGBSeHzyC2ojmyZI/sjMOqjgo9IDRgVTo3mWZj9QQ /jHqMKyn2aPHc4+xwiclwy8nvecJ768Pwoy8qzuziBhQOXKGCn3ewtmnAK2tLmQO8yxv9bhTmd1 Zp9qgWqXPNlnHdCpna7ULUhUdY6qdtBWP6eONaxIDmAl/KiWVpkw== X-Received: by 2002:a05:600c:4247:b0:489:1dc6:d6e with SMTP id 5b1f17b1804b1-4891dc60e71mr27937035e9.1.1776760754649; Tue, 21 Apr 2026 01:39:14 -0700 (PDT) Received: from xavier-thinkpad.. ([2a01:e0a:1048:49a0:2de4:14b:d3be:34fd]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488fc17f642sm322503005e9.5.2026.04.21.01.39.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Apr 2026 01:39:14 -0700 (PDT) From: Xavier Roche To: libc-alpha@sourceware.org Cc: wilco.dijkstra@arm.com, adhemerval.zanella@linaro.org Subject: [RFC PATCH 1/1] malloc: madvise interior free chunks above a threshold Date: Tue, 21 Apr 2026 10:39:12 +0200 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Since glibc 2.26 introduced tcache, free() no longer returns physical pages to the kernel for chunks that sit in the interior of an arena heap. malloc_trim(0) still recovers the memory, but free() itself cannot. A bisect between 2.25 and 2.26 shows a 4x RSS regression on a reproducer that interleaves long-lived index data with short-lived filler allocations. Extend _int_free_maybe_trim to madvise the page-aligned interior of consolidated chunks >= ATTEMPT_TRIMMING_THRESHOLD (64 KB), using the same page-alignment logic as mtrim. To avoid a flood of madvise calls when many small frees merge into one chunk (the concern raised by Wilco on BZ #33886 comment 10), gate the call on the caller's pre-consolidation size: direct madvise when that size already covers a full page, accumulator-batched madvise (fires every MADVISE_PURGE_THRESHOLD = 256 KB of sub-page frees) otherwise. The per-arena accumulator is read and written only under the arena mutex, which is held by both call sites (_int_free_merge_chunk and _int_memalign). Advice type is MADV_FREE for moderate chunks and MADV_DONTNEED for chunks >= 2 * ATTEMPT_TRIMMING_THRESHOLD. MADV_DONTNEED matches the existing mtrim behavior and gives operators the predictable RSS drop they expect; MADV_FREE amortises the per-page cost for moderate chunks that are likely to be reused. Reproducer (tst-madvise-threshold): 16 threads, 256 MB live data, 10 GB short-lived churn. RSS after free, before: 1247 MB RSS after free, after: 296 MB Runtime overhead: +0.16 s on a tight malloc/free loop. Related: BZ #15321, #18910, #27976, #33886. Signed-off-by: Xavier Roche --- malloc/Makefile | 1 + malloc/malloc.c | 61 +++++++++++++--- malloc/tst-madvise-threshold.c | 128 +++++++++++++++++++++++++++++++++ 3 files changed, 182 insertions(+), 8 deletions(-) create mode 100644 malloc/tst-madvise-threshold.c diff --git a/malloc/Makefile b/malloc/Makefile index fef5021298..d663454e57 100644 --- a/malloc/Makefile +++ b/malloc/Makefile @@ -39,6 +39,7 @@ tests := \ tst-free-sized-trace \ tst-interpose-nothread \ tst-interpose-thread \ + tst-madvise-threshold \ tst-mallinfo2 \ tst-malloc \ tst-malloc-alternate-path \ diff --git a/malloc/malloc.c b/malloc/malloc.c index 57b58382b1..d20e22a463 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -1029,7 +1029,8 @@ static void _int_free_merge_chunk (mstate, mchunkptr, INTERNAL_SIZE_T); static INTERNAL_SIZE_T _int_free_create_chunk (mstate, mchunkptr, INTERNAL_SIZE_T, mchunkptr, INTERNAL_SIZE_T); -static void _int_free_maybe_trim (mstate, INTERNAL_SIZE_T); +static void _int_free_maybe_trim (mstate, mchunkptr, INTERNAL_SIZE_T, + INTERNAL_SIZE_T); static void* _int_realloc(mstate, mchunkptr, INTERNAL_SIZE_T, INTERNAL_SIZE_T); static void* _int_memalign(mstate, size_t, size_t); @@ -1691,6 +1692,18 @@ unlink_chunk (mstate av, mchunkptr p) #define ATTEMPT_TRIMMING_THRESHOLD (65536UL) +/* Cumulative bytes freed per arena before triggering madvise for + sub-page frees that individually skip the page-size gate. */ + +#define MADVISE_PURGE_THRESHOLD (4 * ATTEMPT_TRIMMING_THRESHOLD) + +/* Consolidated chunks above this size use MADV_DONTNEED (immediate + page release) instead of MADV_FREE (lazy release). Large chunks + are unlikely to be reused at the same size, and the immediate RSS + reduction is worth the higher per-call cost. */ + +#define MADVISE_DONTNEED_THRESHOLD (2 * ATTEMPT_TRIMMING_THRESHOLD) + /* NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous regions. Otherwise, contiguity is exploited in merging together, @@ -1747,6 +1760,9 @@ struct malloc_state /* Memory allocated from the system in this arena. */ INTERNAL_SIZE_T system_mem; INTERNAL_SIZE_T max_system_mem; + + /* Cumulative sub-page bytes freed since the last madvise. */ + INTERNAL_SIZE_T madvise_accumulator; }; struct malloc_par @@ -4315,6 +4331,7 @@ _int_free_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, int have_lock) static void _int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size) { + INTERNAL_SIZE_T orig_size = size; mchunkptr nextchunk = chunk_at_offset(p, size); check_inuse_chunk (av, p); @@ -4352,7 +4369,7 @@ _int_free_merge_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size) /* Write the chunk header, maybe after merging with the following chunk. */ size = _int_free_create_chunk (av, p, size, nextchunk, nextsize); - _int_free_maybe_trim (av, size); + _int_free_maybe_trim (av, p, orig_size, size); } /* Create a chunk at P of SIZE bytes, with SIZE potentially increased @@ -4432,14 +4449,41 @@ _int_free_create_chunk (mstate av, mchunkptr p, INTERNAL_SIZE_T size, } /* If the total unused topmost memory exceeds trim threshold, ask malloc_trim - to reduce top. */ + to reduce top. Also release physical pages from interior free chunks. */ static void -_int_free_maybe_trim (mstate av, INTERNAL_SIZE_T size) +_int_free_maybe_trim (mstate av, mchunkptr p, + INTERNAL_SIZE_T orig_size, INTERNAL_SIZE_T size) { - /* We don't want to trim on each free. As a compromise, trimming is attempted - if ATTEMPT_TRIMMING_THRESHOLD is reached. */ if (size >= ATTEMPT_TRIMMING_THRESHOLD) { + /* Release interior pages of the consolidated chunk. MADV_FREE + for moderate chunks (pages kept until kernel pressure, no + re-fault on quick reuse). MADV_DONTNEED for large chunks + (immediate RSS reduction, worth the cost at this size). + Sub-page frees accumulate until MADVISE_PURGE_THRESHOLD. */ + size_t ps = GLRO (dl_pagesize); + bool do_madvise + = (orig_size >= ps + sizeof (struct malloc_chunk)); + if (!do_madvise) + { + av->madvise_accumulator += orig_size; + if (av->madvise_accumulator >= MADVISE_PURGE_THRESHOLD) + do_madvise = true; + } + if (do_madvise) + { + char *paligned = PTR_ALIGN_UP ((char *) p + + sizeof (struct malloc_chunk), ps); + char *pend = PTR_ALIGN_DOWN ((char *) p + size, ps); + if (pend > paligned) + { + int advice = (size >= MADVISE_DONTNEED_THRESHOLD) + ? MADV_DONTNEED : MADV_FREE; + __madvise (paligned, pend - paligned, advice); + av->madvise_accumulator = 0; + } + } + if (av == &main_arena) { #ifndef MORECORE_CANNOT_TRIM @@ -4646,9 +4690,10 @@ _int_memalign (mstate av, size_t alignment, size_t bytes) mchunkptr nextchunk = chunk_at_offset (p, size); mchunkptr remainder = chunk_at_offset (p, nb); set_head_size (p, nb); - size = _int_free_create_chunk (av, remainder, size - nb, nextchunk, + INTERNAL_SIZE_T remainder_size = size - nb; + size = _int_free_create_chunk (av, remainder, remainder_size, nextchunk, chunksize (nextchunk)); - _int_free_maybe_trim (av, size); + _int_free_maybe_trim (av, remainder, remainder_size, size); } check_inuse_chunk (av, p); diff --git a/malloc/tst-madvise-threshold.c b/malloc/tst-madvise-threshold.c new file mode 100644 index 0000000000..964ec0ba30 --- /dev/null +++ b/malloc/tst-madvise-threshold.c @@ -0,0 +1,128 @@ +/* Test for the glibc.malloc.madvise_threshold tunable. + + Verify that when the tunable is set, free() returns physical memory + to the OS for interior free chunks (not just the top chunk). + + Copyright (C) 2026 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include + +/* Read RSS from /proc/self/statm in bytes. */ +static long +get_rss (void) +{ + FILE *f = fopen ("/proc/self/statm", "r"); + if (f == NULL) + FAIL_UNSUPPORTED ("/proc/self/statm not available"); + + long pages; + if (fscanf (f, "%*d %ld", &pages) != 1) + { + fclose (f); + FAIL_UNSUPPORTED ("cannot parse /proc/self/statm"); + } + fclose (f); + return pages * sysconf (_SC_PAGESIZE); +} + +/* Number of pinning (index) allocations. These stay alive and + prevent heap segments from being unmapped. */ +#define N_INDEX 200 + +/* Number of filler (query) allocations per round. These are freed + and should have their physical pages returned when the tunable + is set. */ +#define N_FILLER 2000 + +/* Size of each filler allocation. Must be below mmap threshold + so allocations go through arenas. */ +#define FILLER_SIZE (64 * 1024) + +/* Size of each index allocation. Small enough to fit between + filler chunks. */ +#define INDEX_SIZE 1024 + +static int +do_test (void) +{ + void *index_ptrs[N_INDEX]; + void *filler_ptrs[N_FILLER]; + + /* Phase 1: Allocate index and filler data interleaved. + This creates the fragmentation pattern: index chunks scattered + among filler chunks in the arena heaps. */ + int idx = 0; + for (int i = 0; i < N_FILLER; i++) + { + /* Every N_FILLER/N_INDEX filler allocs, insert an index alloc. */ + if (idx < N_INDEX && i % (N_FILLER / N_INDEX) == 0) + { + index_ptrs[idx] = malloc (INDEX_SIZE); + TEST_VERIFY_EXIT (index_ptrs[idx] != NULL); + memset (index_ptrs[idx], 0xAA, INDEX_SIZE); + idx++; + } + + filler_ptrs[i] = malloc (FILLER_SIZE); + TEST_VERIFY_EXIT (filler_ptrs[i] != NULL); + memset (filler_ptrs[i], 0xBB, FILLER_SIZE); + } + + long rss_peak = get_rss (); + printf ("RSS after allocation: %ld MB\n", rss_peak / (1024 * 1024)); + + /* Phase 2: Free all filler data. Index data stays alive and + pins the heap segments, so the freed space is interior. */ + for (int i = 0; i < N_FILLER; i++) + free (filler_ptrs[i]); + + long rss_after_free = get_rss (); + printf ("RSS after free: %ld MB\n", rss_after_free / (1024 * 1024)); + + /* Phase 3: Check that RSS dropped. + With madvise_threshold set (via GLIBC_TUNABLES in the test + environment), free() calls madvise(MADV_DONTNEED) on the + interior of the freed chunks, so RSS should drop. + + Without the tunable, RSS stays near the peak because the freed + memory is interior to the heap, not at the top. + + We expect at least 50% of the filler memory to be returned. */ + long filler_bytes = (long) N_FILLER * FILLER_SIZE; + long recovered = rss_peak - rss_after_free; + + printf ("Filler data: %ld MB\n", filler_bytes / (1024 * 1024)); + printf ("Recovered by free(): %ld MB\n", recovered / (1024 * 1024)); + + /* The threshold is set via the test environment. If it's working, + we should recover at least half the filler memory. */ + TEST_VERIFY (recovered > filler_bytes / 2); + + /* Cleanup. */ + for (int i = 0; i < N_INDEX; i++) + free (index_ptrs[i]); + + return 0; +} + +#include