From patchwork Mon Jul 17 20:10:43 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Noah Goldstein <goldstein.w.n@gmail.com>
X-Patchwork-Id: 72810
Return-Path: <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>
X-Original-To: patchwork@sourceware.org
Delivered-To: patchwork@sourceware.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2AF9F385840E
	for <patchwork@sourceware.org>; Mon, 17 Jul 2023 20:11:14 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2AF9F385840E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1689624674;
	bh=bVrArga2WROybmctojyEAMgWjYemJJO4lhxciLU+yZk=;
	h=To:Cc:Subject:Date:In-Reply-To:References:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=btV1TwYhRgTAr5VA4NSs4D/WbHjwbAcRSVy9FSqJVX7XR8ub0LW45nwJwS86lkYoR
	 ZWZLEIfqhR5dafTo6XpG/govzC4GxyYl69ZKHsNmzJHOxwNa8W8b6SElDaKdJMJMxa
	 Vpbx707674poBfKUllB3UBvMqRg+5l+UbETMYt+o=
X-Original-To: libc-alpha@sourceware.org
Delivered-To: libc-alpha@sourceware.org
Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com
 [IPv6:2607:f8b0:4864:20::62b])
 by sourceware.org (Postfix) with ESMTPS id 6E5073858D28
 for <libc-alpha@sourceware.org>; Mon, 17 Jul 2023 20:10:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E5073858D28
Received: by mail-pl1-x62b.google.com with SMTP id
 d9443c01a7336-1b8b4749013so38853855ad.2
 for <libc-alpha@sourceware.org>; Mon, 17 Jul 2023 13:10:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20221208; t=1689624650; x=1692216650;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=bVrArga2WROybmctojyEAMgWjYemJJO4lhxciLU+yZk=;
 b=JYls6qLaOu8BVh34b8XdUU9PR2wNelkYnnPZ3R3INKFqcW9gtsIJgXzZ/HmPSbXZ9B
 M3DKwNFgxkZD1Gfbbi6vhJG5zQaijtXRvbmzHVlvHDk7LRn57h6BsLeyhhqL0JOXtWDd
 jOGLfzKRqFRsiZa2JBqz9hRselDHGMhVGvrrvoqSo0GIno0+ChocEw/Po/ohTQLlo30A
 nXzBH9mtSKJb0rcHOeT/ECYeTp9jm4TavKlpSNB+l08KAe1ARW6gknhORvg5KyQmZE3c
 FHj0AhTwxkT2y9MQY6taTgDcGTDkgUqzXAh5/W6yBPE/FdgTmEldIxHoHaykv9hvNE49
 oLkw==
X-Gm-Message-State: ABy/qLZP/Jael5ro89mzVH+PceGkgmtYoNC6+nB5kVzTFsqWnwHxGm80
 IuazD6lydWsLkWnUjLF1rfO50YjxXVdIHw==
X-Google-Smtp-Source: 
 APBJJlE4a9/YBD6tLMecpEUv5T+IuEDFElGTZR+y5lJ/jqIoUZRs6ujiUcPIvya24hjLrTTLeIsSng==
X-Received: by 2002:a17:902:f7cf:b0:1b8:a88c:4dc6 with SMTP id
 h15-20020a170902f7cf00b001b8a88c4dc6mr11959651plw.45.1689624649672;
 Mon, 17 Jul 2023 13:10:49 -0700 (PDT)
Received: from noahgold-DESK.. ([192.55.60.41])
 by smtp.gmail.com with ESMTPSA id
 jb3-20020a170903258300b001b9df74ba5asm257893plb.210.2023.07.17.13.10.48
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 17 Jul 2023 13:10:48 -0700 (PDT)
To: libc-alpha@sourceware.org
Cc: goldstein.w.n@gmail.com,
	hjl.tools@gmail.com,
	carlos@systemhalted.org
Subject: [PATCH v2] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT
 threshold.
Date: Mon, 17 Jul 2023 15:10:43 -0500
Message-Id: <20230717201043.105528-1-goldstein.w.n@gmail.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230714151459.3357038-1-goldstein.w.n@gmail.com>
References: <20230714151459.3357038-1-goldstein.w.n@gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-Patchwork-Original-From: Noah Goldstein via Libc-alpha
 <libc-alpha@sourceware.org>
From: Noah Goldstein <goldstein.w.n@gmail.com>
Reply-To: Noah Goldstein <goldstein.w.n@gmail.com>
Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Sender: "Libc-alpha"
 <libc-alpha-bounces+patchwork=sourceware.org@sourceware.org>

On some machines we end up with incomplete cache information. This can
make the new calculation of `sizeof(total-L3)/custom-divisor` end up
lower than intended (and lower than the prior value). So reintroduce
the old bound as a lower bound to avoid potentially regressing code
where we don't have complete information to make the decision.
---
 sysdeps/x86/dl-cacheinfo.h | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
index c98fa57a7b..cd4d0351ae 100644
--- a/sysdeps/x86/dl-cacheinfo.h
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -614,8 +614,8 @@ get_common_cache_info (long int *shared_ptr, long int * shared_per_thread_ptr, u
   /* Account for non-inclusive L2 and L3 caches.  */
   if (!inclusive_cache)
     {
-      if (threads_l2 > 0)
-	shared_per_thread += core / threads_l2;
+      long int core_per_thread = threads_l2 > 0 ? (core / threads_l2) : core;
+      shared_per_thread += core_per_thread;
       shared += core;
     }
 
@@ -745,8 +745,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
 
   /* The default setting for the non_temporal threshold is [1/8, 1/2] of size
      of the chip's cache (depending on `cachesize_non_temporal_divisor` which
-     is microarch specific. The default is 1/4). For most Intel and AMD
-     processors with an initial release date between 2017 and 2023, a thread's
+     is microarch specific. The default is 1/4). For most Intel processors
+     with an initial release date between 2017 and 2023, a thread's
      typical share of the cache is from 18-64MB. Using a reasonable size
      fraction of L3 is meant to estimate the point where non-temporal stores
      begin out-competing REP MOVSB. As well the point where the fact that
@@ -757,12 +757,21 @@ dl_init_cacheinfo (struct cpu_features *cpu_features)
      the maximum thrashing capped at 1/associativity. */
   unsigned long int non_temporal_threshold
       = shared / cachesize_non_temporal_divisor;
+
+  /* If the computed non_temporal_threshold <= 3/4 * per-thread L3, we most
+     likely have incorrect/incomplete cache info in which case, default to
+     3/4 * per-thread L3 to avoid regressions.  */
+  unsigned long int non_temporal_threshold_lowbound
+      = shared_per_thread * 3 / 4;
+  if (non_temporal_threshold < non_temporal_threshold_lowbound)
+    non_temporal_threshold = non_temporal_threshold_lowbound;
+
   /* If no ERMS, we use the per-thread L3 chunking. Normal cacheable stores run
      a higher risk of actually thrashing the cache as they don't have a HW LRU
      hint. As well, their performance in highly parallel situations is
      noticeably worse.  */
   if (!CPU_FEATURE_USABLE_P (cpu_features, ERMS))
-    non_temporal_threshold = shared_per_thread * 3 / 4;
+    non_temporal_threshold = non_temporal_threshold_lowbound;
   /* SIZE_MAX >> 4 because memmove-vec-unaligned-erms right-shifts the value of
      'x86_non_temporal_threshold' by `LOG_4X_MEMCPY_THRESH` (4) and it is best
      if that operation cannot overflow. Minimum of 0x4040 (16448) because the