Update to Unicode 17.0.0

The following patch updates GCC from Unicode 16.0.0 to 17.0.0. I've followed what the README says and updated also one script from glibc, but that needed another Unicode file - HangulSyllableType.txt - around as well, so I'm adding it. I've added one new test to named-universal-char-escape-1.c for randomly chosen character from new CJK block. Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed bugreports about that but the UnicodeData.txt changes for the range ends and the new range seems to match e.g. what is in the glyph tables, so the patch follows UnicodeData.txt and not 4-8 table here. Another thing was that makeuname2c.cc didn't handle correctly when the size of the generated string table modulo 77 was 76 or 77, in which case it forgot to emit a semicolon after the string literal and so failed to compile. And as can be seen in the emoji-data.txt diff, some properties like Extended_Pictographic have been removed from certain characters, e.g. from the Mahjong cards characters except U+1F004, and one libstdc++ test was testing that property exactly on U+1F000. Dunno why that was changed, but U+1F004 is the only colored one among tons of black and white ones. 2025-10-08 Jakub Jelinek <jakub@redhat.com> contrib/ * unicode/README: Add HangulSyllableType.txt file to the list as newest utf8_gen.py from glibc now needs it. Adjust git commit hash and change unicode 16 version to 17. * unicode/from_glibc/utf8_gen.py: Updated from glibc. * unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0. * unicode/emoji-data.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/GraphemeBreakProperty.txt: Likewise. * unicode/DerivedNormalizationProps.txt: Likewise. * unicode/NameAliases.txt: Likewise. * unicode/UnicodeData.txt: Likewise. * unicode/EastAsianWidth.txt: Likewise. * unicode/DerivedGeneralCategory.txt: Likewise. * unicode/HangulSyllableType.txt: New file. gcc/testsuite/ * c-c++-common/cpp/named-universal-char-escape-1.c: Add test for \N{CJK UNIFIED IDEOGRAPH-3340E}. libcpp/ * makeucnid.cc (write_copyright): Adjust copyright year. * makeuname2c.cc (generated_ranges): Adjust end points for a couple of ranges based on UnicodeData.txt Last changes and add a whole new CJK UNIFIED IDEOGRAPH- entry. None of these changes are in the 4-8 table, but clearly it has just been forgotten. (write_copyright): Adjust copyright year. (write_dict): Fix up condition when to print semicolon. * generated_cpp_wcwidth.h: Regenerate. * ucnid.h: Regenerate. * uname2c.h: Regenerate. libstdc++-v3/ * include/bits/unicode-data.h: Regenerate. * testsuite/ext/unicode/properties.cc: Test __is_extended_pictographic on U+1F004 rather than U+1F000.
author: Jakub Jelinek <jakub@redhat.com> 2025-10-08 17:54:11 +0200
committer: Jakub Jelinek <jakub@gcc.gnu.org> 2025-10-08 18:02:39 +0200
commit: 0c0847158caa5d3bbfe3c5457b046d3d6b7e00a5 (patch)
tree: 05bc30e87adf84a269890617c9cbe4a7840c8b5d /libcpp/makeuname2c.cc
parent: d77b548fb647d52817d0c44d45bb817d166b7a19 (diff)
1 files changed, 7 insertions, 6 deletions
diff --git a/libcpp/makeuname2c.cc b/libcpp/makeuname2c.cc
index f9b6957b711..b05d589b980 100644
--- a/libcpp/makeuname2c.cc
+++ b/libcpp/makeuname2c.cc
@@ -83,16 +83,17 @@ static struct generated generated_ranges[] =
   { "CJK UNIFIED IDEOGRAPH-", 0x3400, 0x4dbf, 0, 1, 0 }, /* NR2 rules */
   { "CJK UNIFIED IDEOGRAPH-", 0x4e00, 0x9fff, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x20000, 0x2a6df, 0, 1, 0 },
-  { "CJK UNIFIED IDEOGRAPH-", 0x2a700, 0x2b739, 0, 1, 0 },
+  { "CJK UNIFIED IDEOGRAPH-", 0x2a700, 0x2b73f, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x2b740, 0x2b81d, 0, 1, 0 },
-  { "CJK UNIFIED IDEOGRAPH-", 0x2b820, 0x2cea1, 0, 1, 0 },
+  { "CJK UNIFIED IDEOGRAPH-", 0x2b820, 0x2cead, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x2ceb0, 0x2ebe0, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x2ebf0, 0x2ee5d, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x30000, 0x3134a, 0, 1, 0 },
   { "CJK UNIFIED IDEOGRAPH-", 0x31350, 0x323af, 0, 1, 0 },
+  { "CJK UNIFIED IDEOGRAPH-", 0x323b0, 0x33479, 0, 1, 0 },
   { "EGYPTIAN HIEROGLYPH-", 0x13460, 0x143fa, 0, 2, 0 },
-  { "TANGUT IDEOGRAPH-", 0x17000, 0x187f7, 0, 3, 0 },
-  { "TANGUT IDEOGRAPH-", 0x18d00, 0x18d08, 0, 3, 0 },
+  { "TANGUT IDEOGRAPH-", 0x17000, 0x187ff, 0, 3, 0 },
+  { "TANGUT IDEOGRAPH-", 0x18d00, 0x18d1e, 0, 3, 0 },
   { "KHITAN SMALL SCRIPT CHARACTER-", 0x18b00, 0x18cd5, 0, 4, 0 },
   { "NUSHU CHARACTER-", 0x1b170, 0x1b2fb, 0, 5, 0 },
   { "CJK COMPATIBILITY IDEOGRAPH-", 0xf900, 0xfa6d, 0, 6, 0 },
@@ -671,7 +672,7 @@ write_copyright (void)
    <http://www.gnu.org/licenses/>.\n\
 \n\
 \n\
-   Copyright (C) 1991-2024 Unicode, Inc.  All rights reserved.\n\
+   Copyright (C) 1991-2025 Unicode, Inc.  All rights reserved.\n\
    Distributed under the Terms of Use in\n\
    http://www.unicode.org/copyright.html.\n\
 \n\
@@ -717,7 +718,7 @@ write_dict (void)
 
   printf ("static const char uname2c_dict[%ld] =\n", (long) (dict_size + 1));
   for (i = 0; i < dict_size; i += 77)
-    printf ("\"%.77s\"%s\n", dict + i, i + 76 > dict_size ? ";" : "");
+    printf ("\"%.77s\"%s\n", dict + i, i + 77 >= dict_size ? ";" : "");
   puts ("");
 }
author	Jakub Jelinek <jakub@redhat.com>	2025-10-08 17:54:11 +0200
committer	Jakub Jelinek <jakub@gcc.gnu.org>	2025-10-08 18:02:39 +0200
commit	0c0847158caa5d3bbfe3c5457b046d3d6b7e00a5 (patch)
tree	05bc30e87adf84a269890617c9cbe4a7840c8b5d /libcpp/makeuname2c.cc
parent	d77b548fb647d52817d0c44d45bb817d166b7a19 (diff)