From patchwork Fri Dec 15 17:03:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Tirta Halim X-Patchwork-Id: 82266 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 920B43847711 for ; Fri, 15 Dec 2023 17:07:52 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oo1-xc33.google.com (mail-oo1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) by sourceware.org (Postfix) with ESMTPS id 53E6138582BD for ; Fri, 15 Dec 2023 17:07:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53E6138582BD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 53E6138582BD Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::c33 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702660059; cv=none; b=wvIkxKKtx2GybhQpOd2mo9y003G8O1RFz6NSdL3CT0z6YkImJD6qzzNyirVCKjLxSQEMcQ6q2ABM466clByQSYfcYcD/xSG5jT/8yrAZLTqL3XW9wvxRv/UlBT3U+K5FxVNqOi8TPpMXtP1JQIsNsTBBXB6LAwtf0qPGTuvyZig= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702660059; c=relaxed/simple; bh=sPAntk4tIrL/q/jClCbt3bYwUknRdrbaB3j1FRvT04s=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=MzbPVOiwbX9Y8jSAN/mcc4aaQhA9K1kBllp4X/6JDEoiXZ0I9VhycYjHFPOYfl9J1cv19SepeCClUwV1i3od2XTL0MYL9E4xpT40MRJRJ1I59QRLSEHfGuuv8ZUD+5218Rcyn5/T80cBWRuCIY5/ctkWO03tmGScAdI+FrjMc08= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-58d08497aa1so629853eaf.0 for ; Fri, 15 Dec 2023 09:07:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702660055; x=1703264855; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ew7jNr/lxzyE4Eh3UqNKkScC1z8kB+eVTrBtwl/r/WQ=; b=mbNVm4FfWTR3zCD9JHiR97/q+fi00+GrZ9BySFYml2tx/A+nln+QkKWITNRPtt8tBv UKWdRGbhEzcTZWhjUTwk3RIgsW4MX5RBgCWueyVZguB/wUjfp6TuDH8FLvWx4THlfDeh 8ZF5e/DkQNfgF73N+eAiElzJvakcD6GyDo9SdA8VmNVuSdX58kowA5UWBn7srJjJCLKN 54cLBgFzTDTBaewxwhpS+wTIhuSWbNQAMkWHKhztN7bfjcifdJn/MN48K5bxMymfYM2w l4fqVTtraC5A7SuieVVOqi31UhMfEegRGxdV014z39MZ8ZoX5Nkpr5o07vxWpgienffB Qfvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702660055; x=1703264855; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ew7jNr/lxzyE4Eh3UqNKkScC1z8kB+eVTrBtwl/r/WQ=; b=XAd+A+nYfNDn05H1I4piNj8VONZkcMVwSjfPX+uF5takjj2fnVz2tsuJWZGCC9D1lX 5rz08GST95b8YtEfrbTHN+71OKSlyfbqBYdHE0mVfb/IBUSfRyGevqt6yDkWzF2+ag7/ nNwC3e38U6SfsNY6S3NYwHYPzibhSm0YdMHvgwfLpPkfYJRhgEW9H0Z/hzV3l28U0S3f P9JjMpt5kqA5MRDMhJA9At/4T/WTV6w0PbOK4vZJIADK7ip8EWPUbfpdD8QYbQKuEVNR cc2U3CVcjRELFgERW9POl75ScIXZMgHfLS/ibXdV1uvklIM/LcVCDyOA1tAWYGQJidaO fbsQ== X-Gm-Message-State: AOJu0YyXAwjuNhMlmpSOPEQirV0O6P90Jjgu2HERMV0ODRoHiSs499ha NpLopWio+UCS7ae+R7yE+VOlKU3Q1iI915Xn X-Google-Smtp-Source: AGHT+IEepgimQktpD/So1OVrriT4xLkDH0lV1di9NISxj6UxaWte7T75XyX/Ch2KGnkWo2gxFM569Q== X-Received: by 2002:a05:6358:591f:b0:170:ca20:6fd with SMTP id g31-20020a056358591f00b00170ca2006fdmr13534212rwf.62.1702660055079; Fri, 15 Dec 2023 09:07:35 -0800 (PST) Received: from localhost.localdomain ([2001:448a:20a0:5ec:690b:4a3e:e68e:a0e0]) by smtp.gmail.com with ESMTPSA id w66-20020a636245000000b005b82611378bsm13692595pgb.52.2023.12.15.09.07.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 09:07:34 -0800 (PST) From: James Tirta Halim To: goldstein.w.n@gmail.com Cc: libc-alpha@sourceware.org, skpgkp2@gmail.com, tirtajames45@gmail.com Subject: [PATCH] sysdeps/memmem-avx2.c: add memmem-avx2.c Date: Sat, 16 Dec 2023 00:03:15 +0700 Message-ID: <20231215170315.1806024-1-tirtajames45@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-10.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Find the rarest byte in NE. Do a naive loop until HS is aligned. Once aligned, find the parts of HS that matches the rare byte and the byte after it, shift back to the position of HS that should match NE and do a memcmp. Average timings (Core i5 8400): __memmem_avx2 basic_memmem twoway_memmem memmem 1342.942864 19100.87074 3335.335377 2745.971856 Passes make check. --- sysdeps/x86_64/multiarch/memmem-avx2.c | 83 ++++++++++++++++---------- 1 file changed, 50 insertions(+), 33 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memmem-avx2.c b/sysdeps/x86_64/multiarch/memmem-avx2.c index b0cced73aa..524d0fe45f 100644 --- a/sysdeps/x86_64/multiarch/memmem-avx2.c +++ b/sysdeps/x86_64/multiarch/memmem-avx2.c @@ -3,53 +3,70 @@ #include #include +static inline void * +__find_rarest_byte (const void *ne, + size_t n) +{ + static const unsigned char rarebyte_table[256] = { 0, 1, 13, 56, 59, 60, 61, 62, 63, 232, 248, 2, 158, 4, 5, 6, 7, 8, 9, 10, 14, 20, 26, 29, 37, 46, 52, 53, 54, 55, 57, 58, 255, 172, 242, 193, 162, 174, 178, 182, 218, 219, 212, 180, 249, 197, 221, 210, 253, 231, 230, 224, 225, 226, 227, 223, 222, 220, 176, 213, 184, 229, 188, 164, 159, 209, 181, 203, 189, 216, 196, 192, 185, 205, 161, 168, 215, 187, 211, 194, 195, 165, 206, 204, 214, 198, 173, 179, 175, 183, 167, 202, 239, 201, 160, 241, 163, 246, 233, 238, 240, 254, 237, 208, 234, 250, 169, 186, 236, 217, 245, 243, 228, 170, 247, 244, 251, 235, 199, 200, 252, 207, 177, 191, 171, 190, 166, 3, 140, 134, 124, 126, 86, 128, 95, 117, 114, 93, 81, 87, 132, 96, 112, 97, 103, 82, 139, 89, 98, 88, 119, 74, 156, 115, 104, 75, 120, 106, 76, 155, 90, 122, 107, 125, 152, 145, 136, 137, 101, 116, 102, 108, 99, 141, 77, 78, 118, 79, 109, 100, 150, 73, 94, 72, 121, 151, 113, 135, 110, 105, 83, 91, 11, 12, 64, 149, 146, 111, 65, 69, 66, 15, 16, 17, 18, 19, 130, 92, 144, 123, 21, 22, 23, 24, 131, 133, 127, 142, 25, 70, 129, 27, 28, 67, 153, 84, 143, 138, 147, 157, 148, 68, 71, 30, 31, 32, 33, 34, 35, 36, 154, 38, 39, 40, 41, 42, 80, 43, 44, 45, 47, 48, 85, 49, 50, 51 }; + const unsigned char *rare = (const unsigned char *) ne; + const unsigned char *p = (const unsigned char *) ne; + int c_rare = rarebyte_table[*rare]; + int c; + for (; n--; ++p) + { + c = rarebyte_table[*p]; + if (c < c_rare) { + rare = p; + c_rare = c; + } + } + return (void *) rare; +} + void * -__memmem_avx2 (const void *hs, size_t hs_len, const void *ne, size_t ne_len) +__memmem_avx2 (const void *hs, + size_t hs_len, + const void *ne, + size_t ne_len) { if (ne_len == 1) return (void *) memchr (hs, *(unsigned char *) ne, hs_len); if (__glibc_unlikely (ne_len == 0)) return (void *) hs; - if (__glibc_unlikely (hs_len == ne_len)) - return !memcmp (hs, ne, ne_len) ? (void *) hs : NULL; if (__glibc_unlikely (hs_len < ne_len)) return NULL; - const __m256i nv = _mm256_set1_epi8 (*(char *) ne); const unsigned char *h = (const unsigned char *) hs; - const unsigned char *n = (const unsigned char *) ne; const unsigned char *const end = h + hs_len - ne_len; - const int c1 = *(n + 1); - n += 2, ne_len -= 2; - __m256i hv; - uint32_t i, m; - if (!PTR_IS_ALIGNED (h)) { - hv = _mm256_loadu_si256 ((const __m256i *) h); - m = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); - for (; m; m = _blsr_u32 (m)) { - i = _tzcnt_u32 (m); - if (__glibc_unlikely (h + i > end)) - return NULL; - if (*(h + i + 1) == c1 && !memcmp (h + i + 2, n, ne_len)) - return (char *) h + i; - } - h += sizeof (__m256i); - if (__glibc_unlikely (h > end)) + size_t shift = PTR_DIFF (__find_rarest_byte (ne, ne_len), ne); + if (shift == ne_len - 1) + --shift; + h += shift; + for (; !PTR_IS_ALIGNED (h, sizeof (__m256i)); ++h) + { + if (__glibc_unlikely (h - shift > end)) return NULL; - h = (const unsigned char *) PTR_ALIGN_UP (h, sizeof (__m256i)); - } - for (;;) { + if (*h == *((unsigned char *) ne + shift) && !memcmp (h - shift, ne, ne_len)) + return (void *) (h - shift); + } + const __m256i nv = _mm256_set1_epi8 (*((char *) ne + shift)); + const __m256i nv1 = _mm256_set1_epi8 (*((char *) ne + shift + 1)); + __m256i hv, hv1; + uint32_t i, hm0, hm1, m; + for (; h - shift <= end; h += sizeof (__m256i)) { hv = _mm256_load_si256 ((const __m256i *) h); - m = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); - for (; m; m = _blsr_u32 (m)) { + hv1 = _mm256_loadu_si256 ((const __m256i *) (h + 1)); + hm0 = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv, nv)); + hm1 = (uint32_t) _mm256_movemask_epi8 (_mm256_cmpeq_epi8 (hv1, nv1)); + m = hm0 & hm1; + while (m) + { i = _tzcnt_u32 (m); - if (__glibc_unlikely (h + i > end)) + m = _blsr_u32 (m); + if (__glibc_unlikely (h + i - shift > end)) return NULL; - if (*(h + i + 1) == c1 && !memcmp (h + i + 2, n, ne_len)) - return (char *) h + i; - } - h += sizeof (__m256i); - if (__glibc_unlikely (h > end)) - return NULL; + if (!memcmp (h + i - shift, ne, ne_len)) + return (char *) h + i - shift; + } } return NULL; }