Add UnicodeRange for font query #377

dhardy · 2025-06-07T14:11:18Z

This is a partial solution to #371.
My testing shows improvements; e.g. that arrows now use the most appropriate font instead of the first which happens to contain a match (this makes styles much more consistent).

Add enum UnicodeRange. New code because I couldn't find an appropriate impl on crates.io (though there are some things touching on this in ttf_parser and in read-fonts).
Fill in remaining ranges
Add struct UnicodeRanges
Filter Query by matching ranges
Extend query when no matches are available

The missing part (last item) is to add a second-stage fallback (all fonts functional over the range) for when there are no matches. This is not quite so straightforward since Collection doesn't have a function to list all available families; I'd like some guidance on the best way to do this within fontique. Should Query::matches_with automatically do this when it has no other matches, or should it be up to the caller to call something else like Query::all_fonts_for_range(range: UnicodeRange) in this case?

Parley changes are needed to use this. kas-text is updated here: kas-gui/kas-text#97.

dhardy · 2025-06-07T14:15:42Z

The test input from #371; also some Hebrew characters:

Font matches


[2025-06-07T14:11:48Z DEBUG kas_text::fonts::resolver] select: Script::Latn, Some(BasicLatin), GenericFamily::SystemUi, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Cantarell
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: DejaVu Sans
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::resolver] select: Script::Zyyy, Some(BasicLatin), GenericFamily::SystemUi, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Cantarell
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::resolver] select: Script::Hebr, Some(Hebrew), GenericFamily::SystemUi, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Noto Sans Hebrew
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::resolver] select: Script::Latn, Some(BasicLatin), GenericFamily::Serif, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Noto Serif
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Nimbus Roman
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: URW Bookman
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: C059
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: P052
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Standard Symbols PS
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Caladea
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Symbola
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: STIX
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Bengali
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Gujarati
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Marathi
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Tamil
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Kannada
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Telugu
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: DejaVu Sans
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::resolver] select: Script::Zyyy, Some(BasicLatin), GenericFamily::Serif, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Noto Serif
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Nimbus Roman
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: URW Bookman
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: C059
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: P052
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Standard Symbols PS
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Caladea
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Symbola
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: STIX
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Bengali
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Gujarati
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Marathi
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Tamil
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Kannada
[2025-06-07T14:11:48Z DEBUG kas_text::fonts::library] match: Lohit Telugu
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::resolver] select: Script::Zyyy, Some(Dingbats), GenericFamily::Serif, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Symbola
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: STIX
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::resolver] select: Script::Zyyy, None, GenericFamily::Serif, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Noto Serif
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Nimbus Roman
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: URW Bookman
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: C059
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: P052
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Standard Symbols PS
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Caladea
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Symbola
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: STIX
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Bengali
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Gujarati
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Marathi
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Tamil
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Kannada
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Lohit Telugu
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::resolver] select: Script::Zyyy, Some(Arrows), GenericFamily::Serif, FontWeight(400), FontWidth(256), Normal
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: Symbola
[2025-06-07T14:11:56Z DEBUG kas_text::fonts::library] match: STIX

Note here that Hebrew and Arrow ranges only match one/two fonts; before this they would match many more (inappropriately). The None case (second-last match) is because UnicodeRange is incomplete.

nicoburns · 2025-06-07T15:27:08Z

Hmm... I believe that for CSS matching we need support for arbitrary numeric ranges (not just "standard ranges") specified as part of the FontInfoOverride. See https://developer.mozilla.org/en-US/docs/Web/CSS/@font-face/unicode-range. So perhaps it would make sense to have some variant on Vec<Range<u32>> (SmallVec<[Range<u32>; 4]>?) as the core type for unicode ranges within Fontique?

Reading this PR it looks like the 128 standard ranges thing is part of an OpenType standard? So I guess we may still need that code (not my area of expertise), although presumably one can also check which codepoints a font actually supports?

dhardy · 2025-06-07T16:15:38Z

although presumably one can also check which codepoints a font actually supports?

Yes, of course. That's largely orthogonal to this PR. I had assumed that Parley would already do this, but maybe not; this may be why kas-text already did better than Parley in #371.

The way this works in kas-text is that, per glyph, I create a hash of the font selector (family & attributes), the script and (after this PR) the unicode_range); this is used to select a matching font list and cache it via the hash. Then, per glyph, I take the first matching font face from that list which contains the char (with the quirks that it prefers to use the last char's face if possible).

kas-text docs describe both FontId (the list) and FaceId (one font face), though that's a little dated (there's no longer a "default font").

This is per-char fallback, and the PR here mostly makes it more efficient (those font lists are smaller, omitting fonts that likely wouldn't cover the current char).

Using this PR without per-char fallback would probably mostly work, but likely not everywhere.

The second (missing) part of this PR would help by providing some likely font matches when no suitable fonts are found otherwise. E.g. if someone uses an arrow then the family won't identify a suitable font (though some contain a few common arrow glyphs anyway) and the script ("common") won't either, but the UnicodeRange will.

I believe that for CSS matching we need support for arbitrary numeric ranges

Hmm, UnicodeRange is intended to mirror ulUnicodeRange from the OS/2 table, which most fonts provide. I guess the CSS unicode-range is designed to do a similar thing, but (a) is more flexible and (b) relies on a web page's CSS to specify available fonts (with the ranges they cover).

For fonts found on the system, I think we normally only have the info from the OS/2 table.

The other reason I chose to precisely mirror the OS/2 specification is because of how kas-text caches a set of fonts as a FontId; I want some finite small-ish number of categories here (much fewer than the number of possible char codes and independent of fonts for simplicity).

I guess there are other possible approaches; e.g. building a (very large) map from (FontFamily, FontWeight, FontWidth, FontStyle, char) to a font face directly, or caching maps from char to a font face for each (FontFamily, FontWeight, FontWidth, FontStyle) or just performing the whole look-up chain for each char (probably too slow for reactive UI).

I'm not sure if you want to copy the approach I've adopted for kas-text in Parley since it's not exactly compatible with CSS's unicode-range — but I don't see a good alternative that allows fast cached look-ups (and is not horribly complex / needing a huge hash map).

I would like this to be adopted by fontique anyway, since it's not incompatible with using another approach for Parley.

dhardy · 2025-06-09T08:31:18Z

Combined with #378 (which effectively just massively increases the number of font matches), this PR is effectively a fast pre-filter for callbacks. E.g. with this PR, matching for a font supporting Hebrew (Script and UnicodeRange properties) yields 6 matches; without this PR (only Script property), it yields 61 (of which only 17 and 50 are obviously Hebrew fonts).

That is because the Script property is only used for fallbacks, not to prune other matches. I'm not sure whether that should change (not if we also have this PR, possibly otherwise though I don't know if it would be problematic), but even if so it still wouldn't work for arrows (which have a UnicodeRange but not a specific Script).

nicoburns · 2025-06-09T09:26:34Z

this PR is effectively a fast pre-filter for callbacks

I am understanding correctly that the invariant this PR is relying on / taking advantage of is "If a font doesn't list a 'standard unicode range' then it doesn't contain any glyphs for any codepoints within that range"? If a font does support a given "standard range" then we then still need to check whether the font actually contain the specific glyph we are looking for but this allows for fast-rejecting when a font doesn't cover a given "standard range"?

dhardy · 2025-06-09T10:39:30Z

Honestly, the only specification I found for behaviour is this one:

This field is used to specify the Unicode blocks or ranges encompassed by the font file in 'cmap' subtables for platform 3, encoding ID 1 (Microsoft platform, Unicode BMP) and platform 3, encoding ID 10 (Microsoft platform, Unicode full repertoire). If a bit is set (1), then the Unicode ranges assigned to that bit are considered functional. If the bit is clear (0), then the range is not considered functional. Each of the bits is treated as an independent flag and the bits can be set in any combination. The determination of “functional” is left up to the font designer, although character set selection should attempt to be functional by ranges if at all possible.

I fully expect that there are some chars contained by some fonts which do not indicate that they are "functional" over the corresponding input range, and thus are rejected by the new filter in this PR.

But does that matter? Lets consider → ("Arrow" range), using a Sans-Serif font family:

Without this PR, → would be picked from the first matching font, the Sans-Serif font. This may be more consistent with other text in that font.
With this PR and a suitable "Arrow" font, → would be picked from the arrow font. This is (according to my testing) more consistent with other, less common arrows like ↷, but maybe less consistent with ordinary Sans-Serif text.
With this PR without a suitable "Arrow" font, → would not be matched. That is a problem if a system doesn't have such a font.

The latter point can be addressed by changing how glyph fallback works: instead of checking one long list of fonts, use at least two: a pre-filtered list of preferred fonts (possibly also using Script for filtering) and a longer (unfiltered) list. This would require further changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add UnicodeRange for font query #377

Add UnicodeRange for font query #377

Uh oh!

dhardy commented Jun 7, 2025

Uh oh!

dhardy commented Jun 7, 2025 •

edited

Loading

Uh oh!

nicoburns commented Jun 7, 2025

Uh oh!

dhardy commented Jun 7, 2025

Uh oh!

dhardy commented Jun 9, 2025

Uh oh!

nicoburns commented Jun 9, 2025

Uh oh!

dhardy commented Jun 9, 2025

Uh oh!

khaledhosny commented Jun 14, 2025

Uh oh!

Uh oh!

Add UnicodeRange for font query #377

Are you sure you want to change the base?

Add UnicodeRange for font query #377

Uh oh!

Conversation

dhardy commented Jun 7, 2025

Uh oh!

dhardy commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicoburns commented Jun 7, 2025

Uh oh!

dhardy commented Jun 7, 2025

Uh oh!

dhardy commented Jun 9, 2025

Uh oh!

nicoburns commented Jun 9, 2025

Uh oh!

dhardy commented Jun 9, 2025

Uh oh!

khaledhosny commented Jun 14, 2025

Uh oh!

Uh oh!

dhardy commented Jun 7, 2025 •

edited

Loading