glibc.git/localedata/Makefile, branch master

stdio-common: Reject insufficient character data in scanf [BZ #12701]

2025-08-23T00:02:46+00:00

Reject invalid formatted scanf character data with the 'c' conversion
where there is not enough input available to satisfy the field width
requested.  It is required by ISO C that this conversion matches a
sequence of characters of exactly the number specified by the field
width and it is also already documented as such in our own manual:

"It reads precisely the next N characters, and fails if it cannot get
that many."

Currently a matching success is instead incorrectly produced where the
EOF condition is encountered before the required number of characters
has been retrieved, and the characters actually obtained are stored in
the buffer provided.

Add test cases accordingly and remove placeholders from 'c' conversion
input data for the existing scanf tests.

Reviewed-by: Adhemerval Zanella

stdio-common: Don't read real input beyond the field width in scanf

2025-08-11T16:42:12+00:00

Fix a code pattern that repeats across '__vfscanf_internal' where the
remaining field width of 0 is incorrectly interpreted as no width limit,
which in turn results in reading input beyond the limit requested.  The
lack of width limit is indicated by the field width of -1 rather than 0,
set earlier on in the function.

The problematic code pattern is used for both integer and floating-point
conversions, but in the former case a corresponding conditional earlier
on prevents the field width from being 0 when executing the pattern.  It
does trigger in the latter case, where the decimal point is a multibyte
character or for multibyte digit characters.

Fix the code pattern by using 'width > 0' comparison, and apply the fix
throughout even to code handling integer conversions so as to interpret
the field width consistently and avoid people's confusion even if width
cannot be 0 at those places.

For multibyte digit characters there is an additional issue that causes
code to push back a partially fetched multibyte character multiple times
as execution proceeds through matching data retrieved against individual
digits that have to be rejected due to the field width limit preventing
the rest of the multibyte character from being retrieved.  It is because
code relies on 'ungetc' ignoring a request to push back EOF, however in
the out-of-limit field width condition the data held is not EOF but the
previously retrieved character byte instead.

Fix this issue by artificially assigning EOF to the character byte
storage variable where the out-of-limit field width condition prevents
further processing, and also apply the fix throughout except for the
decimal point/thousands separator case, which uses different code.

Add test cases accordingly.

Reviewed-by: Adhemerval Zanella

stdio-common: Also reject exp char w/o significand in i18n scanf [BZ #13988]

2025-03-28T12:35:53+00:00

Fix the handling of real 'scanf' input such as "+.e" as per BZ #13988
for the i18n case as well, complementing commit 6ecec3b616ae ("Don't
accept exp char without preceding digits in scanf float parsing"), where
the 'e' character is incorrectly consumed from input.  Add a test case
matching stdio-common/bug26.c, with bits from localedata/tst-sscanf.c.

Reviewed-by: Joseph Myers

Update copyright dates with scripts/update-copyrights

2025-01-01T19:22:09+00:00

Define ISO 639-3 "ltg" (Latgalian) and add ltg_LV locale

2024-06-17T08:53:16+00:00

Resolves: BZ # 31411

References:
https://iso639-3.sil.org/code/ltg
https://en.wikipedia.org/wiki/Latgalian_language
https://github.com/unicode-org/cldr/blob/main/common/main/ltg.xml

localedata: add mdf_RU locale

2024-05-08T12:27:40+00:00

Resolves: BZ # 31530

locale: Handle loading a missing locale twice (Bug 14247)

2024-04-22T20:03:00+00:00

Delay setting file->decided until the data has been successfully loaded
by _nl_load_locale().  If the function fails to load the data then we
must return and error and leave decided untouched to allow the caller to
attempt to load the data again at a later time.  We should not set
decided to 1 early in the function since doing so may prevent attempting
to load it again. We want to try loading it again because that allows an
open to fail and set errno correctly.

On the other side of this problem is that if we are called again with
the same inputs we will fetch the cached version of the object and carry
out no open syscalls and that fails to set errno so we must set errno to
ENOENT in that case.  There is a second code path that has to be handled
where the name of the locale matches but the codeset doesn't match.

These changes ensure that errno is correctly set on failure in all the
return paths in _nl_find_locale().

Adds tst-locale-loadlocale to cover the bug.

No regressions on x86_64.

Co-authored-by: Jeff Law 
Reviewed-by: Adhemerval Zanella

localedata: Sort Makefile variables.

2024-01-10T19:08:26+00:00

Sort Makefile variables using scrips/sort-makefile-lines.py.

No regressions on x86_64.

Update copyright dates with scripts/update-copyrights

2024-01-01T18:53:40+00:00

Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR

2023-09-21T08:34:35+00:00

I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).

It seems to be impossible to follow the CLDR rules

  &[before 1]๚<ฯ # should be "variable"

and

  &๛<ๆ # should be "variable"

exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.

There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.

Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.