From patchwork Thu Jan 23 13:43:02 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aleksandar Rakic X-Patchwork-Id: 105299 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C801A3858427 for ; Thu, 23 Jan 2025 13:49:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C801A3858427 Authentication-Results: sourceware.org; dkim=pass (2048-bit key, unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=ECX1p5vi X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-wm1-x32a.google.com (mail-wm1-x32a.google.com [IPv6:2a00:1450:4864:20::32a]) by sourceware.org (Postfix) with ESMTPS id 9B3173858431 for ; Thu, 23 Jan 2025 13:43:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B3173858431 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9B3173858431 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639816; cv=none; b=Tan3LWDvoWNCBoKpdC01UroY07RTg0YyCLMcYX2bfAOSYKu3TzCKKlqSYh4gVEqOnJJbndkHzc/8Ifbdos2J8nyxWnvl7e9onKCUGE4BApjjyAZpGaHZ+Kf7BulOmX9Xa/OVjnrgcHsKIGTtoGxZqWslGY6idYcYpmTeFqZHaSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1737639816; c=relaxed/simple; bh=WU3DYvkNv4Ix5oZ63dgsHeUJfVqxC+9NbBDOC3JQO2I=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Vsd/cCOMKD6Zkc/utboh1CHxX8jDMfCzNGd8jYZQqQu93pLaq9Y6QVGBzKcv0J8EFdc10SAg9+b9+4qpUXhloz3zNP/E2u+zZ2N4iO/E1pS2CTEqK4v8f02H+oTN9EFsiO+mUpEdq9x9lOd0cokBkIxZkk85SiMrSko6gPxNAuY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9B3173858431 Received: by mail-wm1-x32a.google.com with SMTP id 5b1f17b1804b1-43658c452f5so760655e9.0 for ; Thu, 23 Jan 2025 05:43:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737639815; x=1738244615; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CALNUis3DFZ/OH03W8sF13KwmvUJBdu8BeN+l6ZXHpQ=; b=ECX1p5vi6FKutHgW0rhxAlIt6zyPXNnxBkS0OrKwnBAaqy5wHaBfgVqe1+NHiMzYtE tuhnCw+24mQmJ+Cb1UM55LKcfGez3D2CBUxJyLZfaqXXyTXe24i6Vq/0k69itv8lVXxD oz6MeNApCUwD4Pu81XTctc/n6PqBm/xwChqyWgRveazih+tWU/Gb2rRbkgLeQQ8Nd38b UrrfCZL1hIMwMCESzHWe1abEaf9I+GruJbzIYjoCHQZT8rZby2X1OasiKCNh1Zxceetv BGDU5VQoeChimI47SJ9GgfUhA15JuWgFglfJdu4kqfqYaa5Oymwv+FYyS41k0buWFwOG SdPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737639815; x=1738244615; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CALNUis3DFZ/OH03W8sF13KwmvUJBdu8BeN+l6ZXHpQ=; b=vi4VUgT3sjLXSlutja66y9ma5m0rx+1mdB3MYPPnr/gB9Xt3SeZLLerB6Sh8l07IM1 rROgOEhhe3VNnaFWhOJQplXJejCpcbNqJcSCHOCrf0xBmW0vXhECW7nQysZ9SRqo9YJ4 J8FD0m/C/y1Ei/rn4xLCorJJbWW2BMfwEtdfllpDKqzCzhiZhLgXhUDeLNAL6f6/ub/1 S+vK8MzScPLaSFXG2SJp3DKZhsfcuRPUearxYm3fLuzsGgvS1H30hhBGfUwxhdvfIvMj 8UUCG1CxjLQXZQnKZX8e/SZsEXAX4khgdzDKcRB4q/jXwemDxdeiZMFnGvCec7bRk2+4 fCBw== X-Gm-Message-State: AOJu0Yx1yXb8GNzJLDqkJeGZ73sFJdsz2aIcO0wcRCtS8SDxpTbBMTWT zYCYCJB4w7LUdTGQDqOIbXkgbpMhpiXpZbigvwo8CJLrA65yaFnmT3rskQ== X-Gm-Gg: ASbGncu7URQTope8XXB6X8g8iIkkYe+eD2R8VzTdIrE33q5bV3DxU4STIDEylOCcrH1 nP2AnExfZ41Hjx0+LBxp/dc6lzBY+JpP7xOl4GpjfyuHfcXh5GDU5ZjHksBRZvF8viVKuFeVza5 eNcNSV3jv/TUr7RzDVh5nlKaTCR/tgYvZoKrii3TCDXfh/tKoYeQbVOyG1ZNFuNg4C3qZJMBmG6 ju2fELpXey9koswnlBpZIyxPw63n5Hyru+aIUXwx+NPNpvnAECe0qF/syZh8CDiNd8vvdFUkgrR LxZuxfnOq5Ka5WUi+XLPvZHlZbFw X-Google-Smtp-Source: AGHT+IHRQg8fIQ8vEkXS28mxqUJRY+xT+vei5onzfe9lXv7UvpPlykUJjSju5v6SvsWtR0/HpF0UBA== X-Received: by 2002:a05:600c:3b0d:b0:42c:aeee:80a with SMTP id 5b1f17b1804b1-438b17d5b09mr31199165e9.7.1737639814700; Thu, 23 Jan 2025 05:43:34 -0800 (PST) Received: from L-H2N0CV05D839062.. ([79.175.87.218]) by smtp.googlemail.com with ESMTPSA id 5b1f17b1804b1-438b318c1a2sm64597575e9.7.2025.01.23.05.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2025 05:43:34 -0800 (PST) From: Aleksandar Rakic X-Google-Original-From: Aleksandar Rakic To: libc-alpha@sourceware.org Cc: aleksandar.rakic@htecgroup.com, djordje.todorovic@htecgroup.com, cfu@mips.com, Faraz Shahbazker Subject: [PATCH 06/11] Fix prefetching beyond copied memory Date: Thu, 23 Jan 2025 14:43:02 +0100 Message-Id: <20250123134308.1785777-8-aleksandar.rakic@htecgroup.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> References: <20250123134308.1785777-1-aleksandar.rakic@htecgroup.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org GTM18-287/PP118771: memcpy prefetches beyond copied memory. Fix prefetching in core loop to avoid exceeding the operated upon memory region. Revert accidentally changed prefetch-hint back to streaming mode. Refactor various bits and provide pre-processor checks to allow parameters to be overridden from compiler command line. Cherry-picked 132e0bbbbed01f95ec88b68b5f7f2056f6125531 from https://github.com/MIPS/glibc Signed-off-by: Faraz Shahbazker Signed-off-by: Aleksandar Rakic --- sysdeps/mips/memcpy.c | 188 +++++++++++++++++++++++++----------------- 1 file changed, 111 insertions(+), 77 deletions(-) diff --git a/sysdeps/mips/memcpy.c b/sysdeps/mips/memcpy.c index 8c3aec7b36..798e991f6d 100644 --- a/sysdeps/mips/memcpy.c +++ b/sysdeps/mips/memcpy.c @@ -1,37 +1,29 @@ -/* - * Copyright (C) 2024 MIPS Tech, LLC - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, - * this list of conditions and the following disclaimer. - * 2. Redistributions in binary form must reproduce the above copyright notice, - * this list of conditions and the following disclaimer in the documentation - * and/or other materials provided with the distribution. - * 3. Neither the name of the copyright holder nor the names of its - * contributors may be used to endorse or promote products derived from this - * software without specific prior written permission. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" - * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. -*/ +/* Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + Contributed by Wave Computing + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ #ifdef __GNUC__ #undef memcpy /* Typical observed latency in cycles in fetching from DRAM. */ -#define LATENCY_CYCLES 63 +#ifndef LATENCY_CYCLES + #define LATENCY_CYCLES 63 +#endif /* Pre-fetch performance is subject to accurate prefetch ahead, which in turn depends on both the cache-line size and the amount @@ -48,30 +40,42 @@ #define LATENCY_CYCLES 150 #elif defined(_MIPS_TUNE_I6400) || defined(_MIPS_TUNE_I6500) #define CACHE_LINE 64 - #define BLOCK_CYCLES 16 + #define BLOCK_CYCLES 15 #elif defined(_MIPS_TUNE_P6600) #define CACHE_LINE 32 - #define BLOCK_CYCLES 12 + #define BLOCK_CYCLES 15 #elif defined(_MIPS_TUNE_INTERAPTIV) || defined(_MIPS_TUNE_INTERAPTIV_MR2) #define CACHE_LINE 32 #define BLOCK_CYCLES 30 #else - #define CACHE_LINE 32 - #define BLOCK_CYCLES 11 + #ifndef CACHE_LINE + #define CACHE_LINE 32 + #endif + #ifndef BLOCK_CYCLES + #ifdef __nanomips__ + #define BLOCK_CYCLES 20 + #else + #define BLOCK_CYCLES 11 + #endif + #endif #endif /* Pre-fetch look ahead = ceil (latency / block-cycles) */ #define PREF_AHEAD (LATENCY_CYCLES / BLOCK_CYCLES \ + ((LATENCY_CYCLES % BLOCK_CYCLES) == 0 ? 0 : 1)) -/* Unroll-factor, controls how many words at a time in the core loop. */ -#define BLOCK (CACHE_LINE == 128 ? 16 : 8) +/* The unroll-factor controls how many words at a time in the core loop. */ +#ifndef BLOCK_SIZE + #define BLOCK_SIZE (CACHE_LINE == 128 ? 16 : 8) +#elif BLOCK_SIZE != 8 && BLOCK_SIZE != 16 + #error "BLOCK_SIZE must be 8 or 16" +#endif #define __overloadable #if !defined(UNALIGNED_INSTR_SUPPORT) /* does target have unaligned lw/ld/ualw/uald instructions? */ #define UNALIGNED_INSTR_SUPPORT 0 -#if (__mips_isa_rev < 6 && !defined(__mips1)) +#if (__mips_isa_rev < 6 && !defined(__mips1)) || defined(__nanomips__) #undef UNALIGNED_INSTR_SUPPORT #define UNALIGNED_INSTR_SUPPORT 1 #endif @@ -79,17 +83,35 @@ #if !defined(HW_UNALIGNED_SUPPORT) /* Does target have hardware support for unaligned accesses? */ #define HW_UNALIGNED_SUPPORT 0 - #if __mips_isa_rev >= 6 + #if __mips_isa_rev >= 6 && !defined(__nanomips__) #undef HW_UNALIGNED_SUPPORT #define HW_UNALIGNED_SUPPORT 1 #endif #endif -#define ENABLE_PREFETCH 1 + +#ifndef ENABLE_PREFETCH + #define ENABLE_PREFETCH 1 +#endif + +#ifndef ENABLE_PREFETCH_CHECK + #define ENABLE_PREFETCH_CHECK 0 +#endif + #if ENABLE_PREFETCH - #define PREFETCH(addr) __builtin_prefetch (addr, 0, 0) -#else + #if ENABLE_PREFETCH_CHECK +#include +static char *limit; +#define PREFETCH(addr) \ + do { \ + assert ((char *)(addr) < limit); \ + __builtin_prefetch ((addr), 0, 1); \ + } while (0) +#else /* ENABLE_PREFETCH_CHECK */ + #define PREFETCH(addr) __builtin_prefetch (addr, 0, 1) + #endif /* ENABLE_PREFETCH_CHECK */ +#else /* ENABLE_PREFETCH */ #define PREFETCH(addr) -#endif +#endif /* ENABLE_PREFETCH */ #include @@ -99,17 +121,18 @@ typedef struct { reg_t B0:8, B1:8, B2:8, B3:8, B4:8, B5:8, B6:8, B7:8; } bits_t; -#else +#else /* __mips64 */ typedef unsigned long reg_t; typedef struct { reg_t B0:8, B1:8, B2:8, B3:8; } bits_t; -#endif +#endif /* __mips64 */ -#define CACHE_LINES_PER_BLOCK ((BLOCK * sizeof (reg_t) > CACHE_LINE) ? \ - (BLOCK * sizeof (reg_t) / CACHE_LINE) \ - : 1) +#define CACHE_LINES_PER_BLOCK \ + ((BLOCK_SIZE * sizeof (reg_t) > CACHE_LINE) \ + ? (BLOCK_SIZE * sizeof (reg_t) / CACHE_LINE) \ + : 1) typedef union { @@ -120,7 +143,7 @@ typedef union #define DO_BYTE(a, i) \ a[i] = bw.b.B##i; \ len--; \ - if(!len) return ret; \ + if (!len) return ret; \ /* This code is called when aligning a pointer, there are remaining bytes after doing word compares, or architecture does not have some form @@ -148,7 +171,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) { unsigned char *x = (unsigned char *) a; bitfields_t bw; - if(len > 0) + if (len > 0) { bw.v = *(reg_t *)b; DO_BYTE(x, 0); @@ -159,7 +182,7 @@ do_bytes_remaining (void *a, const void *b, unsigned long len, void *ret) DO_BYTE(x, 4); DO_BYTE(x, 5); DO_BYTE(x, 6); -#endif +#endif /* __mips64 */ } return ret; } @@ -170,7 +193,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -182,7 +205,7 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 5: a[5] = b[5]; // Fall through case 6: a[6] = b[6]; // Fall through case 7: a[7] = b[7]; // Fall through -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8] = b[8]; // Fall through case 9: a[9] = b[9]; // Fall through case 10: a[10] = b[10]; // Fall through @@ -191,9 +214,9 @@ do_words_remaining (reg_t *a, const reg_t *b, unsigned long words, case 13: a[13] = b[13]; // Fall through case 14: a[14] = b[14]; // Fall through case 15: a[15] = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } #if !HW_UNALIGNED_SUPPORT @@ -210,7 +233,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, { /* Use a set-back so that load/stores have incremented addresses in order to promote bonding. */ - int off = (BLOCK - words); + int off = (BLOCK_SIZE - words); a -= off; b -= off; switch (off) @@ -222,7 +245,7 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 5: a[5].uli = b[5]; // Fall through case 6: a[6].uli = b[6]; // Fall through case 7: a[7].uli = b[7]; // Fall through -#if BLOCK==16 +#if BLOCK_SIZE==16 case 8: a[8].uli = b[8]; // Fall through case 9: a[9].uli = b[9]; // Fall through case 10: a[10].uli = b[10]; // Fall through @@ -231,9 +254,9 @@ do_uwords_remaining (struct ulw *a, const reg_t *b, unsigned long words, case 13: a[13].uli = b[13]; // Fall through case 14: a[14].uli = b[14]; // Fall through case 15: a[15].uli = b[15]; -#endif +#endif /* BLOCK_SIZE==16 */ } - return do_bytes_remaining (a + BLOCK, b + BLOCK, bytes, ret); + return do_bytes_remaining (a + BLOCK_SIZE, b + BLOCK_SIZE, bytes, ret); } /* The first pointer is not aligned while second pointer is. */ @@ -242,13 +265,19 @@ unaligned_words (struct ulw *a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if (words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + /* This condition is deliberately conservative. One could theoretically + pre-fetch another time around in some cases without crossing the page + boundary at the limit, but checking for the right conditions here is + too expensive to be worth it. */ + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + (BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i)); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t y0 = b[0], y1 = b[1], y2 = b[2], y3 = b[3]; reg_t y4 = b[4], y5 = b[5], y6 = b[6], y7 = b[7]; @@ -260,7 +289,7 @@ unaligned_words (struct ulw *a, const reg_t * b, a[5].uli = y5; a[6].uli = y6; a[7].uli = y7; -#if BLOCK==16 +#if BLOCK_SIZE==16 y0 = b[8], y1 = b[9], y2 = b[10], y3 = b[11]; y4 = b[12], y5 = b[13], y6 = b[14], y7 = b[15]; a[8].uli = y0; @@ -271,16 +300,16 @@ unaligned_words (struct ulw *a, const reg_t * b, a[13].uli = y5; a[14].uli = y6; a[15].uli = y7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* Mop up any remaining bytes. */ return do_uwords_remaining (a, b, words_by_1, bytes, ret); } -#else +#else /* !UNALIGNED_INSTR_SUPPORT */ /* No HW support or unaligned lw/ld/ualw/uald instructions. */ static void * @@ -320,13 +349,15 @@ aligned_words (reg_t * a, const reg_t * b, unsigned long words, unsigned long bytes, void *ret) { unsigned long i, words_by_block, words_by_1; - words_by_1 = words % BLOCK; - words_by_block = words / BLOCK; + words_by_1 = words % BLOCK_SIZE; + words_by_block = words / BLOCK_SIZE; + for (; words_by_block > 0; words_by_block--) { - if(words_by_block >= PREF_AHEAD - CACHE_LINES_PER_BLOCK) + if (words_by_block > PREF_AHEAD) for (i = 0; i < CACHE_LINES_PER_BLOCK; i++) - PREFETCH (b + ((BLOCK / CACHE_LINES_PER_BLOCK) * (PREF_AHEAD + i))); + PREFETCH (b + ((BLOCK_SIZE / CACHE_LINES_PER_BLOCK) + * (PREF_AHEAD + i))); reg_t x0 = b[0], x1 = b[1], x2 = b[2], x3 = b[3]; reg_t x4 = b[4], x5 = b[5], x6 = b[6], x7 = b[7]; @@ -338,7 +369,7 @@ aligned_words (reg_t * a, const reg_t * b, a[5] = x5; a[6] = x6; a[7] = x7; -#if BLOCK==16 +#if BLOCK_SIZE==16 x0 = b[8], x1 = b[9], x2 = b[10], x3 = b[11]; x4 = b[12], x5 = b[13], x6 = b[14], x7 = b[15]; a[8] = x0; @@ -349,9 +380,9 @@ aligned_words (reg_t * a, const reg_t * b, a[13] = x5; a[14] = x6; a[15] = x7; -#endif - a += BLOCK; - b += BLOCK; +#endif /* BLOCK_SIZE==16 */ + a += BLOCK_SIZE; + b += BLOCK_SIZE; } /* mop up any remaining bytes. */ @@ -363,13 +394,16 @@ memcpy (void *a, const void *b, size_t len) __overloadable { unsigned long bytes, words, i; void *ret = a; +#if ENABLE_PREFETCH_CHECK + limit = (char *)b + len; +#endif /* ENABLE_PREFETCH_CHECK */ /* shouldn't hit that often. */ if (len <= 8) return do_bytes (a, b, len, a); /* Start pre-fetches ahead of time. */ - if (len > CACHE_LINE * (PREF_AHEAD - 1)) - for (i = 1; i < PREF_AHEAD - 1; i++) + if (len > CACHE_LINE * PREF_AHEAD) + for (i = 1; i < PREF_AHEAD; i++) PREFETCH ((char *)b + CACHE_LINE * i); else for (i = 1; i < len / CACHE_LINE; i++) @@ -400,12 +434,12 @@ memcpy (void *a, const void *b, size_t len) __overloadable #if HW_UNALIGNED_SUPPORT /* treat possible unaligned first pointer as aligned. */ return aligned_words (a, b, words, bytes, ret); -#else +#else /* !HW_UNALIGNED_SUPPORT */ if (((unsigned long) a) % sizeof (reg_t) == 0) return aligned_words (a, b, words, bytes, ret); /* need to use unaligned instructions on first pointer. */ return unaligned_words (a, b, words, bytes, ret); -#endif +#endif /* HW_UNALIGNED_SUPPORT */ } libc_hidden_builtin_def (memcpy)