From patchwork Sun Feb 1 19:35:09 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fabian Rast X-Patchwork-Id: 129378 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from vm01.sourceware.org (localhost [127.0.0.1]) by sourceware.org (Postfix) with ESMTP id 434534BB5894 for ; Sun, 1 Feb 2026 19:36:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 434534BB5894 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, secure) header.d=tum.de header.i=@tum.de header.a=rsa-sha256 header.s=tu-postout21 header.b=nE7SNvv+ X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from postout2.mail.lrz.de (postout2.mail.lrz.de [IPv6:2001:4ca0:0:103::81bb:ff8a]) by sourceware.org (Postfix) with ESMTPS id 38D234BB5891 for ; Sun, 1 Feb 2026 19:35:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38D234BB5891 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=tum.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tum.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 38D234BB5891 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4ca0:0:103::81bb:ff8a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1769974516; cv=none; b=A/85cS4kmdrABG9QxePZub5fr2YgqZcMR06ZNixOrSXZkKccAvU2cRO9gPEgJUwo/BR85JDJoCfP6jzxr6H3Vcc2UiuM03Y9OqQLHA2dZ/RDGNOeyuZqhr9IxWOyZct7gYiLQzUSl70Ijb6iRuOZ168MKp7f0DCARmo6yoRttkI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1769974516; c=relaxed/simple; bh=A8PyfBieS6yWBEGid1Wdn/xwCzKct13ppYp9CKXio+U=; h=DKIM-Signature:Mime-Version:Date:Message-Id:To:From:Subject; b=WNpy2aDNndf2V2/TCsoJiK2qyx8gY5Ac/XZjeuh9iv3OvcaCfbShAom7gJ6mUaDfK4bJ/YmfohQdmuUCJPGKd/KiBy01qpnArickMt1IGp9X8FuMfx76q0TjaXlHc6sFHEr3G6zwMVQoHyduj4akzaYy0Ux2CTi9SJBvv6NTKUI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 38D234BB5891 Received: from lxmhs52.srv.lrz.de (localhost [127.0.0.1]) by postout2.mail.lrz.de (Postfix) with ESMTP id 4f40MP51TLzyY4 for ; Sun, 1 Feb 2026 20:35:13 +0100 (CET) Authentication-Results: postout.lrz.de (amavis); dkim=pass (2048-bit key) reason="pass (just generated, assumed good)" header.d=tum.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tum.de; h= x-mailer:subject:subject:from:from:message-id:date:date :content-type:content-type:mime-version:received:received; s= tu-postout21; t=1769974512; bh=R4lHWcgP3pzeNplW+dA8LEn0Ha3VGhFGG G2DfW2VB9U=; b=nE7SNvv+PnHT6zxXMtJE3Dd13sqes0tDWUtRPgnNdRI90Tbfw twe2x7K075zS/7j7Ydp4SEsvi0asAcz7bM84oGOGpmc1FLutCT0RmwOxHxIWZtOF AywpSUg3f1WNn2Osb6GyDp2PKwIKgOzRYPHogz8EYou4fbABlT92TL2L3ke542yz rWikxrE/wzDmFNIdWh6JRTBqPWyqEun5mpzSFSaPoLftT8c9UaLNHYkNDQKD5uyH bi31tjzQQn0Fe6e4W1N3UdMAFTiox1+uwluLoS0pf0k4DSp7w5CWSBFtnu9FWHdC k8RYfR88Sr8Q6nSTpf7U78qTg+ybZ5YCTMgXw== X-Virus-Scanned: by amavisd-new at lrz.de in lxmhs52.srv.lrz.de X-Spam-Score: -2.872 X-Spam-Level: X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_BLOCKED, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 Received: from postout2.mail.lrz.de ([127.0.0.1]) by lxmhs52.srv.lrz.de (lxmhs52.srv.lrz.de [127.0.0.1]) (amavis, port 20024) with LMTP id 9An8gFrohtOQ for ; Sun, 1 Feb 2026 20:35:12 +0100 (CET) Received: from localhost (unknown [IPv6:2001:a61:305b:9001:82a:b093:f96f:42a5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by postout2.mail.lrz.de (Postfix) with ESMTPSA id 4f40MK6109zyXn for ; Sun, 1 Feb 2026 20:35:09 +0100 (CET) Mime-Version: 1.0 Date: Sun, 01 Feb 2026 20:35:09 +0100 Message-Id: To: From: "Fabian Rast" Subject: [PATCH] rtld: cache cpuid results on the stack for intel X-Mailer: aerc 0.21.0-0-g5549850facc2 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org Previously, the same cpuid leaves were queried multiple times. Do the query once, and cache its result on the stack. Signed-off-by: Fabian Rast --- sysdeps/x86/dl-cacheinfo.h | 112 +++++++++++++++++++++---------------- 1 file changed, 65 insertions(+), 47 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index b6520bddaa..32f6ef8007 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -108,6 +108,11 @@ static const struct intel_02_cache_info #define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0])) +struct intel_cpuid_cache { + char leaf2_valid, leaf4_valid; + unsigned int leaf2[4], leaf4[0x10][4]; +}; + static int intel_02_known_compare (const void *p1, const void *p2) { @@ -128,7 +133,8 @@ static long int __attribute__ ((noinline)) intel_check_word (int name, unsigned int value, bool *has_level_2, bool *no_level_2_or_3, - const struct cpu_features *cpu_features) + const struct cpu_features *cpu_features, + struct intel_cpuid_cache *cache) { if ((value & 0x80000000) != 0) /* The register value is reserved. */ @@ -162,7 +168,19 @@ intel_check_word (int name, unsigned int value, bool *has_level_2, unsigned int round = 0; while (1) { - __cpuid_count (4, round, eax, ebx, ecx, edx); + if (round < cache->leaf4_valid) + eax = cache->leaf4[round][0], ebx = cache->leaf4[round][1], + ecx = cache->leaf4[round][2], edx = cache->leaf4[round][3]; + else if (round == cache->leaf4_valid + && round < sizeof(cache->leaf4)/sizeof(*cache->leaf4)) + { + __cpuid_count (4, round, eax, ebx, ecx, edx); + cache->leaf4[round][0] = eax, cache->leaf4[round][1] = ebx; + cache->leaf4[round][2] = edx, cache->leaf4[round][3] = edx; + cache->leaf4_valid++; + } + else + __cpuid_count (4, round, eax, ebx, ecx, edx); enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f; if (type == null) @@ -258,7 +276,8 @@ intel_check_word (int name, unsigned int value, bool *has_level_2, static long int __attribute__ ((noinline)) -handle_intel (int name, const struct cpu_features *cpu_features) +handle_intel (int name, const struct cpu_features *cpu_features, + struct intel_cpuid_cache *cache) { unsigned int maxidx = cpu_features->basic.max_cpuid; @@ -271,41 +290,33 @@ handle_intel (int name, const struct cpu_features *cpu_features) long int result = 0; bool no_level_2_or_3 = false; bool has_level_2 = false; - unsigned int eax; - unsigned int ebx; - unsigned int ecx; - unsigned int edx; - __cpuid (2, eax, ebx, ecx, edx); + int i; + + if (!cache->leaf2_valid) + { + __cpuid (2, cache->leaf2[0], cache->leaf2[1], + cache->leaf2[2], cache->leaf2[3]); + cache->leaf2_valid = 1; + } /* The low byte of EAX of CPUID leaf 2 should always return 1 and it should be ignored. If it isn't 1, use CPUID leaf 4 instead. */ - if ((eax & 0xff) != 1) + if ((cache->leaf2[0] & 0xff) != 1) return intel_check_word (name, 0xff, &has_level_2, &no_level_2_or_3, - cpu_features); - else - { - eax &= 0xffffff00; + cpu_features, cache); - /* Process the individual registers' value. */ - result = intel_check_word (name, eax, &has_level_2, - &no_level_2_or_3, cpu_features); - if (result != 0) - return result; - - result = intel_check_word (name, ebx, &has_level_2, - &no_level_2_or_3, cpu_features); - if (result != 0) - return result; + /* Process all descriptors in leaf 2. */ + result = intel_check_word (name, cache->leaf2[0]>>8, &has_level_2, + &no_level_2_or_3, cpu_features, cache); + if (result != 0) + return result; - result = intel_check_word (name, ecx, &has_level_2, - &no_level_2_or_3, cpu_features); - if (result != 0) - return result; - - result = intel_check_word (name, edx, &has_level_2, - &no_level_2_or_3, cpu_features); + for (i = 1; i < 4; i++) + { + result = intel_check_word (name, cache->leaf2[i], &has_level_2, + &no_level_2_or_3, cpu_features, cache); if (result != 0) - return result; + return result; } if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE @@ -622,7 +633,7 @@ handle_hygon (int name) static void get_common_cache_info (long int *shared_ptr, long int * shared_per_thread_ptr, unsigned int *threads_ptr, - long int core) + long int core, struct intel_cpuid_cache *cache) { unsigned int eax; unsigned int ebx; @@ -680,7 +691,11 @@ get_common_cache_info (long int *shared_ptr, long int * shared_per_thread_ptr, u int check = 0x1 | (threads_l3 == 0) << 1; do { - __cpuid_count (4, i++, eax, ebx, ecx, edx); + if (cache && i < cache->leaf4_valid) + eax = cache->leaf4[i][0], ebx = cache->leaf4[i][1], + ecx = cache->leaf4[i][2], edx = cache->leaf4[i++][3]; + else + __cpuid_count (4, i++, eax, ebx, ecx, edx); /* There seems to be a bug in at least some Pentium Ds which sometimes fail to iterate all cache parameters. @@ -860,35 +875,38 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { - data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); + struct intel_cpuid_cache cache; + cache.leaf2_valid = cache.leaf4_valid = 0; + + data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features, &cache); + shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features, &cache); shared_per_thread = shared; level1_icache_size - = handle_intel (_SC_LEVEL1_ICACHE_SIZE, cpu_features); + = handle_intel (_SC_LEVEL1_ICACHE_SIZE, cpu_features, &cache); level1_icache_linesize - = handle_intel (_SC_LEVEL1_ICACHE_LINESIZE, cpu_features); + = handle_intel (_SC_LEVEL1_ICACHE_LINESIZE, cpu_features, &cache); level1_dcache_size = data; level1_dcache_assoc - = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); + = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features, &cache); level1_dcache_linesize - = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); + = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features, &cache); level2_cache_size - = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features, &cache); level2_cache_assoc - = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); + = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features, &cache); level2_cache_linesize - = handle_intel (_SC_LEVEL2_CACHE_LINESIZE, cpu_features); + = handle_intel (_SC_LEVEL2_CACHE_LINESIZE, cpu_features, &cache); level3_cache_size = shared; level3_cache_assoc - = handle_intel (_SC_LEVEL3_CACHE_ASSOC, cpu_features); + = handle_intel (_SC_LEVEL3_CACHE_ASSOC, cpu_features, &cache); level3_cache_linesize - = handle_intel (_SC_LEVEL3_CACHE_LINESIZE, cpu_features); + = handle_intel (_SC_LEVEL3_CACHE_LINESIZE, cpu_features, &cache); level4_cache_size - = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); + = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features, &cache); get_common_cache_info (&shared, &shared_per_thread, &threads, - level2_cache_size); + level2_cache_size, &cache); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { @@ -909,7 +927,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); get_common_cache_info (&shared, &shared_per_thread, &threads, - level2_cache_size); + level2_cache_size, NULL); } else if (cpu_features->basic.kind == arch_kind_amd) {