<feed xmlns='http://www.w3.org/2005/Atom'>
<title>glibc.git/localedata/th_TH.UTF-8.in, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/'/>
<entry>
<title>Adapt collation in th_TH locale to use the iso14651_t1_common file and sync the collation with CLDR</title>
<updated>2023-09-21T08:34:35+00:00</updated>
<author>
<name>Mike FABIAN</name>
<email>mfabian@redhat.com</email>
</author>
<published>2023-06-01T15:02:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.belthelziquor.com/glibc.git/commit/?id=aceda10bd5131cf716830827d66da9c671dec649'/>
<id>aceda10bd5131cf716830827d66da9c671dec649</id>
<content type='text'>
I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).

It seems to be impossible to follow the CLDR rules

  &amp;[before 1]๚&lt;ฯ # should be "variable"

and

  &amp;๛&lt;ๆ # should be "variable"

exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.

There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.

Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I made it to agree as much as possible with the rules from CLDR (see:
https://github.com/unicode-org/cldr/blob/main/common/collation/th.xml).

It seems to be impossible to follow the CLDR rules

  &amp;[before 1]๚&lt;ฯ # should be "variable"

and

  &amp;๛&lt;ๆ # should be "variable"

exactly though. These ask for a primary difference in punctuation
characters whose primary weight should be "IGNORE". But using a
secondary differnence instead still sorts the test data correctly and
the previously used collation in th_TH used tertiary differences for
these characters.

There was old localedata/th_TH.in test data in TIS-620 encoding which
was not used (it was not in the localedata/Makefile). I converted this
to UTF-8 and moved it to localedata/th_TH.UTF-8.in and added it to
localedata/Makefile.

Using the existing collation rules in the th_TH locale did not sort that
test file completely correct, I think my new collation rules based on
iso14651_t1 are better.
</pre>
</div>
</content>
</entry>
</feed>
