From patchwork Thu Jun 5 17:58:30 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Maciej W. Rozycki" X-Patchwork-Id: 113747 Return-Path: X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B73303856DEF for ; Thu, 5 Jun 2025 17:59:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B73303856DEF X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from angie.orcam.me.uk (angie.orcam.me.uk [IPv6:2001:4190:8020::34]) by sourceware.org (Postfix) with ESMTP id ED5543856951 for ; Thu, 5 Jun 2025 17:58:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED5543856951 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orcam.me.uk Authentication-Results: sourceware.org; spf=none smtp.mailfrom=orcam.me.uk ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ED5543856951 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:4190:8020::34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1749146311; cv=none; b=BmFT+fRZ44HGCavqLQxOnab/sTPUpcN5CCHOX126JhcLuJJgMlKVzWCZ2IoLjMRwESzjim7t847EGepkG0pQRMb5kcg1jGtdWgc9CbeBi92A26W816qbUXapbclcCSn6UgcxqLcSWf242jW5Ic1kMIG1v0FXCiivgHLr+zobm9k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1749146311; c=relaxed/simple; bh=mpMctS0no5LJOWEQIKCYcNUOJ6YE3+gwctI/9IE05Ew=; h=Date:From:To:Subject:Message-ID:MIME-Version; b=cOx1Jqi9XuFZo1cwIIM/f3KY78ns5xeu3V5itOLTLNaT588rRZ92KaMsiT+sDrUQWxdFvzHhJQQBXX/7HW6R0mA/+YMcRS83jDei1PW5l9JqsgZzm9j4zsFJ/0ezbNOXaKAda6kH6IwoOyOx+GPyEj9lI5WE52gIhNjyqhvvx1c= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by angie.orcam.me.uk (Postfix, from userid 500) id 6B9CA92009E; Thu, 5 Jun 2025 19:58:30 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by angie.orcam.me.uk (Postfix) with ESMTP id 6831392009C; Thu, 5 Jun 2025 18:58:30 +0100 (BST) Date: Thu, 5 Jun 2025 18:58:30 +0100 (BST) From: "Maciej W. Rozycki" To: libc-alpha@sourceware.org cc: "Maciej W. Rozycki" Subject: [PATCH v2 01/14] stdio-common: Don't read real input beyond the field width in scanf [BZ #13988] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 X-Spam-Status: No, score=-3486.9 required=5.0 tests=BAYES_00, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_INFOUSMEBIZ, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: "Maciej W. Rozycki" Errors-To: libc-alpha-bounces~patchwork=sourceware.org@sourceware.org From: Maciej W. Rozycki Fix a code pattern that repeats across '__vfscanf_internal' where the remaining field width of 0 is incorrectly interpreted as no width limit, which in turn results in reading input beyond the limit requested. The lack of width limit is indicated by the field width of -1 rather than 0, set earlier on in the function. The problematic code pattern is used for both integer and floating-point conversions, but in the former case a corresponding conditional earlier on prevents the field width from being 0 when executing the pattern. It does trigger in the latter case, where the decimal point is a multibyte character or for multibyte digit characters. Fix the code pattern by using 'width > 0' comparison, and apply the fix throughout even to code handling integer conversions so as to interpret the field width consistently and avoid people's confusion even if width cannot be 0 at those places. For multibyte digit characters there is an additional issue that causes code to push back a partially fetched multibyte character multiple times as execution proceeds through matching data retrieved against individual digits that have to be rejected due to the field width limit preventing the rest of the multibyte character from being retrieved. It is because code relies on 'ungetc' ignoring a request to push back EOF, however in the out-of-limit field width condition the data held is not EOF but the previously retrieved character byte instead. Fix this issue by artificially assigning EOF to the character byte storage variable where the out-of-limit field width condition prevents further processing, and also apply the fix throughout except for the decimal point/thousands separator case, which uses different code. Add test cases accordingly. Referring BZ #13988 as a class bug rather than the specific issue. Reviewed-by: Adhemerval Zanella --- Changes from v1: - Avoid adding an assignment in a condition expression by rewriting the expression in terms of an equivalent macro. --- localedata/Makefile | 2 + localedata/tst-scanf-width-digit.c | 58 +++++++++++++++++++++++++++++++++++++ localedata/tst-scanf-width-point.c | 50 +++++++++++++++++++++++++++++++ stdio-common/vfscanf-internal.c | 31 ++++++++++++------- 4 files changed, 130 insertions(+), 11 deletions(-) glibc-scanf-bz13988-number-width.diff Index: glibc/localedata/Makefile =================================================================== --- glibc.orig/localedata/Makefile +++ glibc/localedata/Makefile @@ -249,6 +249,8 @@ tests = \ tst-mbswcs4 \ tst-mbswcs5 \ tst-mbswcs6 \ + tst-scanf-width-digit \ + tst-scanf-width-point \ tst-setlocale \ tst-setlocale2 \ tst-setlocale3 \ Index: glibc/localedata/tst-scanf-width-digit.c =================================================================== --- /dev/null +++ glibc/localedata/tst-scanf-width-digit.c @@ -0,0 +1,58 @@ +/* Verify multibyte digit extending beyond scanf field width. + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +#include +#include + +#define P1 "\xdb\xb1" +#define P2 "\xdb\xb2" + +static int +do_test (void) +{ + if (setlocale (LC_ALL, "fa_IR.UTF-8") == NULL) + FAIL_EXIT1 ("setlocale (LC_ALL, \"fa_IR.UTF-8\")"); + + char s[] = P1 P2; + FILE *f = fmemopen (s, strlen (s), "r"); + + /* Avoid: "warning: 'I' flag used with '%f' gnu_scanf format [-Wformat=]"; + cf. GCC PR c/119514. */ + DIAG_PUSH_NEEDS_COMMENT; + DIAG_IGNORE_NEEDS_COMMENT (4.9, "-Wformat"); + + /* This should succeed parsing a floating-point number, and leave '\xdb', + '\xb2' in the input. */ + double d; + int c; + TEST_VERIFY_EXIT (fscanf (f, "%I3lf%n", &d, &c) == 1); + TEST_VERIFY_EXIT (d == 1.0); + TEST_VERIFY_EXIT (c == 2); + TEST_VERIFY_EXIT (fgetc (f) == 0xdb); + TEST_VERIFY_EXIT (fgetc (f) == 0xb2); + + DIAG_POP_NEEDS_COMMENT; + + return 0; +} + +#include Index: glibc/localedata/tst-scanf-width-point.c =================================================================== --- /dev/null +++ glibc/localedata/tst-scanf-width-point.c @@ -0,0 +1,50 @@ +/* Verify multibyte decimal point extending beyond scanf field width. + Copyright (C) 2025 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +#include +#include + +#define PD "\xd9\xab" + +static int +do_test (void) +{ + if (setlocale (LC_ALL, "ps_AF.UTF-8") == NULL) + FAIL_EXIT1 ("setlocale (LC_ALL, \"ps_AF.UTF-8\")"); + + char s[] = "1" PD; + FILE *f = fmemopen (s, strlen (s), "r"); + + /* This should succeed parsing a floating-point number, and leave '\xd9', + '\xab' in the input. */ + double d; + int c; + TEST_VERIFY_EXIT (fscanf (f, "%2lf%n", &d, &c) == 1); + TEST_VERIFY_EXIT (d == 1.0); + TEST_VERIFY_EXIT (c == 1); + TEST_VERIFY_EXIT (fgetc (f) == 0xd9); + TEST_VERIFY_EXIT (fgetc (f) == 0xab); + + return 0; +} + +#include Index: glibc/stdio-common/vfscanf-internal.c =================================================================== --- glibc.orig/stdio-common/vfscanf-internal.c +++ glibc/stdio-common/vfscanf-internal.c @@ -119,6 +119,15 @@ (void) (c != EOF \ ? ++read_in \ : (size_t) (inchar_errno = errno)), c)) +/* Same as INCHAR, but stop upon field exhaustion according to AVAIL. */ +# define inchar_in_field(avail) \ +({ \ + if (avail == 0) \ + c = EOF; \ + else \ + inchar (); \ + c; \ +}) # define ISSPACE(Ch) __isspace_l (Ch, loc) # define ISDIGIT(Ch) __isdigit_l (Ch, loc) # define ISXDIGIT(Ch) __isxdigit_l (Ch, loc) @@ -1639,7 +1648,7 @@ __vfscanf_internal (FILE *s, const char ++wcdigits[n]; #else const char *cmpp; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; if (__glibc_unlikely (map != NULL)) mbdigits[n] = digits_extended[n]; @@ -1657,7 +1666,7 @@ __vfscanf_internal (FILE *s, const char break; else { - if (avail == 0 || inchar () == EOF) + if (inchar_in_field (avail) == EOF) break; --avail; } @@ -1701,7 +1710,7 @@ __vfscanf_internal (FILE *s, const char ++wcdigits[n]; #else const char *cmpp; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; cmpp = mbdigits[n]; while ((unsigned char) *cmpp == c && avail >= 0) @@ -1710,7 +1719,7 @@ __vfscanf_internal (FILE *s, const char break; else { - if (avail == 0 || inchar () == EOF) + if (inchar_in_field (avail) == EOF) break; --avail; } @@ -1757,7 +1766,7 @@ __vfscanf_internal (FILE *s, const char break; #else const char *cmpp = thousands; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; while ((unsigned char) *cmpp == c && avail >= 0) { @@ -1766,7 +1775,7 @@ __vfscanf_internal (FILE *s, const char break; else { - if (avail == 0 || inchar () == EOF) + if (inchar_in_field (avail) == EOF) break; --avail; } @@ -1837,7 +1846,7 @@ __vfscanf_internal (FILE *s, const char break; #else const char *cmpp = thousands; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; while ((unsigned char) *cmpp == c && avail >= 0) { @@ -1846,7 +1855,7 @@ __vfscanf_internal (FILE *s, const char break; else { - if (avail == 0 || inchar () == EOF) + if (inchar_in_field (avail) == EOF) break; --avail; } @@ -2225,7 +2234,7 @@ __vfscanf_internal (FILE *s, const char } #else const char *cmpp = decimal; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; if (! got_dot) { @@ -2463,14 +2472,14 @@ __vfscanf_internal (FILE *s, const char } #else const char *cmpp = mbdigits[n]; - int avail = width > 0 ? width : INT_MAX; + int avail = width >= 0 ? width : INT_MAX; while ((unsigned char) *cmpp == c && avail >= 0) if (*++cmpp == '\0') break; else { - if (avail == 0 || inchar () == EOF) + if (inchar_in_field (avail) == EOF) break; --avail; }