summaryrefslogtreecommitdiff
path: root/src/unicode
AgeCommit message (Collapse)Author
2025-10-03unicode: fix lookup table generationMitchell Hashimoto
2025-09-30nuke ziglyph from orbitMitchell Hashimoto
Since we now use uucode, we don't need ziglyph anymore. Ziglyph was kept around as a test-only dep so we can verify matching but this is complicating our Zig 0.15 upgrade because ziglyph doesn't support Zig 0.15. Let's just drop it.
2025-09-23move tests over to _uucode.zig files to avoid needing deps for vt testsJacob Sandlund
2025-09-23Fix mergeJacob Sandlund
2025-09-23Merge remote-tracking branch 'upstream/main' into jacob/uucodeJacob Sandlund
2025-09-20unicode: delete props.zig and clean up symbols deps tooMitchell Hashimoto
Follow up to #8810 Same reasoning.
2025-09-20unicode: isolate properties, tables, and ziglyph into separate filesMitchell Hashimoto
This makes it cleaner to add new sources of table generation and also avoids inadvertently depending on different modules (despite Zig's lazy analysis). This also fixes up terminal to only use our look up tables which avoids bringing ziglyph in for the terminal module.
2025-09-19Remove comment above test. it's not too slowJacob Sandlund
2025-09-19pr feedback: `get`, remove todos for case_folding_simpleJacob Sandlund
2025-09-18set max for unicode grapheme executableJacob Sandlund
2025-09-18fix up diff from benchmarks, and add tests against ziglyphJacob Sandlund
2025-09-18update uucode and cleanupsJacob Sandlund
2025-09-11benchmark sourcesJacob Sandlund
2025-09-09changes after benchmarkingJacob Sandlund
2025-09-06fast getX(.is_symbol)Jacob Sandlund
2025-09-06Merge remote-tracking branch 'upstream/main' into jacob/uucodeJacob Sandlund
2025-09-06trying a bunch of things to get performance to matchJacob Sandlund
2025-09-05render: address review feedbackJeffrey C. Ollie
1. `inline` the table get. 2. Delete unused functions on the LUT table. 3. Disable the isSymbol test under valgrind
2025-09-05drop the new LUT type as no performance advantage detectedJeffrey C. Ollie
2025-09-05add two LUT-based implementations of isSymbolJeffrey C. Ollie
2025-08-21update for new grapheme_breakJacob Sandlund
2025-08-17removing all ziglyph imports (aside from unicode/grapheme.zig)Jacob Sandlund
2025-08-12update after refactor (string field config, etc)Jacob Sandlund
2025-08-05using just `get`Jacob Sandlund
2025-05-26style: use decl literalsQwerasd
This commit changes a LOT of areas of the code to use decl literals instead of redundantly referring to the type. These changes were mostly driven by some regex searches and then manual adjustment on a case-by-case basis. I almost certainly missed quite a few places where decl literals could be used, but this is a good first step in converting things, and other instances can be addressed when they're discovered. I tested GLFW+Metal and building the framework on macOS and tested a GTK build on Linux, so I'm 99% sure I didn't introduce any syntax errors or other problems with this. (fingers crossed)
2025-03-12Lots of 0.14 changesMitchell Hashimoto
2025-01-20unigen: Remove libc dependency, use ArenaAllocatorRyan Liptak
Not linking libc avoids potential problems when compiling from/for certain targets (see https://github.com/ghostty-org/ghostty/discussions/3218), and using an ArenaAllocator makes unigen run just as fast (in both release and debug modes) while also taking less memory. Benchmark 1 (3 runs): ./zig-out/bin/unigen-release-c measurement mean ± σ min … max outliers delta wall_time 1.75s ± 15.8ms 1.73s … 1.76s 0 ( 0%) 0% peak_rss 2.23MB ± 0 2.23MB … 2.23MB 0 ( 0%) 0% cpu_cycles 7.22G ± 62.8M 7.16G … 7.29G 0 ( 0%) 0% instructions 11.5G ± 16.0 11.5G … 11.5G 0 ( 0%) 0% cache_references 436M ± 6.54M 430M … 443M 0 ( 0%) 0% cache_misses 310K ± 203K 134K … 532K 0 ( 0%) 0% branch_misses 1.03M ± 29.9K 997K … 1.06M 0 ( 0%) 0% Benchmark 2 (3 runs): ./zig-out/bin/unigen-release-arena measurement mean ± σ min … max outliers delta wall_time 1.73s ± 6.40ms 1.72s … 1.73s 0 ( 0%) - 1.0% ± 1.6% peak_rss 1.27MB ± 75.7KB 1.18MB … 1.31MB 0 ( 0%) ⚡- 43.1% ± 5.4% cpu_cycles 7.16G ± 26.5M 7.13G … 7.18G 0 ( 0%) - 0.9% ± 1.5% instructions 11.4G ± 28.2 11.4G … 11.4G 0 ( 0%) - 0.8% ± 0.0% cache_references 441M ± 2.89M 439M … 444M 0 ( 0%) + 1.2% ± 2.6% cache_misses 152K ± 102K 35.2K … 220K 0 ( 0%) - 50.8% ± 117.8% branch_misses 1.05M ± 13.4K 1.04M … 1.06M 0 ( 0%) + 2.0% ± 5.1% Benchmark 1 (3 runs): ./zig-out/bin/unigen-debug-c measurement mean ± σ min … max outliers delta wall_time 1.75s ± 32.4ms 1.71s … 1.77s 0 ( 0%) 0% peak_rss 2.23MB ± 0 2.23MB … 2.23MB 0 ( 0%) 0% cpu_cycles 7.23G ± 136M 7.08G … 7.34G 0 ( 0%) 0% instructions 11.5G ± 37.9 11.5G … 11.5G 0 ( 0%) 0% cache_references 448M ± 1.03M 447M … 449M 0 ( 0%) 0% cache_misses 148K ± 42.6K 99.3K … 180K 0 ( 0%) 0% branch_misses 987K ± 5.27K 983K … 993K 0 ( 0%) 0% Benchmark 2 (3 runs): ./zig-out/bin/unigen-debug-arena measurement mean ± σ min … max outliers delta wall_time 1.76s ± 4.12ms 1.76s … 1.76s 0 ( 0%) + 0.4% ± 3.0% peak_rss 1.22MB ± 75.7KB 1.18MB … 1.31MB 0 ( 0%) ⚡- 45.1% ± 5.4% cpu_cycles 7.27G ± 17.1M 7.26G … 7.29G 0 ( 0%) + 0.6% ± 3.0% instructions 11.4G ± 3.79 11.4G … 11.4G 0 ( 0%) - 0.8% ± 0.0% cache_references 440M ± 4.52M 435M … 444M 0 ( 0%) - 1.7% ± 1.7% cache_misses 43.6K ± 19.2K 26.5K … 64.3K 0 ( 0%) ⚡- 70.5% ± 50.8% branch_misses 1.04M ± 2.25K 1.04M … 1.05M 0 ( 0%) 💩+ 5.8% ± 0.9%
2024-12-12unicode: emoji modifier requires emoji modifier base preceding to not breakMitchell Hashimoto
Fixes #2941 This fixes the rendering of the text below. For those that can't see it, it is the following in UTF-32: `0x22 0x1F3FF 0x22`. ``` "🏿" ``` `0x1F3FF` is the Fitzpatrick modifier for dark skin tone. It has the Unicode property `Emoji_Modifier`. Emoji modifiers are defined in UTS #51 and are only valid based on ED-13: ``` emoji_modifier_sequence := emoji_modifier_base emoji_modifier emoji_modifier_base := \p{Emoji_Modifier_Base} emoji_modifier := \p{Emoji_Modifier} ``` Additional quote from UTS #51: > To have an effect on an emoji, an emoji modifier must immediately follow > that base emoji character. Emoji presentation selectors are neither needed > nor recommended for emoji characters when they are followed by emoji > modifiers, and should not be used in newly generated emoji modifier > sequences; the emoji modifier automatically implies the emoji presentation > style. Our precomputed grapheme break table was mistakingly not following this rule. This commit fixes that by adding a check for that every `Emoji_Modifier` character must be preceded by an `Emoji_Modifier_Base`. This only has a cost during compilation (table generation). The runtime cost is identical; the table size didn't increase since we had leftover bits we could use.
2024-02-10terminal: only apply VS15/16 to emojiMitchell Hashimoto
Fixes #1482
2024-02-09unicode: precompute grapheme break dataMitchell Hashimoto
2024-02-09unicode: use packed struct for break stateMitchell Hashimoto
2024-02-09unicode: remove unusedMitchell Hashimoto
2024-02-09unicode: direct port of ziglyph to startMitchell Hashimoto
2024-02-09unicode: get grapheme boundary classMitchell Hashimoto
2024-02-08unicode: generate our own lookup tablesMitchell Hashimoto