Affected page
https://www.php.net/manual/en/function.grapheme-strpos.php
Current issue
The grapheme_strpos() documentation includes a locale parameter, but the
documentation does not show that the locale may include Unicode locale
extension keys.
In particular, users may not realize that the Unicode locale extension key
ks can be used to request a collation strength, such as
en_US-u-ks-identic.
Suggested improvement
Expand the description of the locale parameter and add an example showing
that ks-identic affects matching.
Suggested wording for the locale parameter:
Locale to use for matching. The locale may include Unicode locale extension
keys, such as `ks` for collation strength.
Also consider adding a reference to Collator::setStrength() for users who
need to understand collation strength levels.
Example:
<?php
$textStyle = "\u{263A}\u{FE0E}"; // text presentation
$emojiStyle = "\u{263A}\u{FE0F}"; // emoji presentation
var_dump(grapheme_strpos($textStyle, $emojiStyle));
var_dump(grapheme_strpos(
$textStyle,
$emojiStyle,
locale: 'en_US-u-ks-identic'
));
Expected output:
Without identical strength, the two variation selector sequences match. With
en_US-u-ks-identic, the variation selectors are significant, so the search
does not match.
Additional context (optional)
The addition of the locale parameter makes grapheme_strpos() closer to
ICU collation-based matching.
Collator::compare() is appropriate when comparing two complete strings.
grapheme_strpos() is appropriate when searching for a substring and
returning its position in grapheme units.
Since collation strength affects matching, Collator::setStrength() is a
useful reference for understanding values such as primary, secondary,
tertiary, quaternary, and identical strength.
Collator::setStrength
https://www.php.net/manual/en/collator.setstrength.php
Specification reference:
https://www.unicode.org/reports/tr35/dev/tr35-collation.html#Setting_Options
Affected page
https://www.php.net/manual/en/function.grapheme-strpos.php
Current issue
The
grapheme_strpos()documentation includes alocaleparameter, but thedocumentation does not show that the locale may include Unicode locale
extension keys.
In particular, users may not realize that the Unicode locale extension key
kscan be used to request a collation strength, such asen_US-u-ks-identic.Suggested improvement
Expand the description of the
localeparameter and add an example showingthat
ks-identicaffects matching.Suggested wording for the
localeparameter:Also consider adding a reference to
Collator::setStrength()for users whoneed to understand collation strength levels.
Example:
Expected output:
Without identical strength, the two variation selector sequences match. With
en_US-u-ks-identic, the variation selectors are significant, so the searchdoes not match.
Additional context (optional)
The addition of the locale parameter makes
grapheme_strpos()closer toICU collation-based matching.
Collator::compare()is appropriate when comparing two complete strings.grapheme_strpos()is appropriate when searching for a substring andreturning its position in grapheme units.
Since collation strength affects matching,
Collator::setStrength()is auseful reference for understanding values such as primary, secondary,
tertiary, quaternary, and identical strength.
Collator::setStrength
https://www.php.net/manual/en/collator.setstrength.php
Specification reference:
https://www.unicode.org/reports/tr35/dev/tr35-collation.html#Setting_Options