Skip to content

Add an example for locale collation strength in grapheme_strpos() #5560

@masakielastic

Description

@masakielastic

Affected page

https://www.php.net/manual/en/function.grapheme-strpos.php

Current issue

The grapheme_strpos() documentation includes a locale parameter, but the
documentation does not show that the locale may include Unicode locale
extension keys.

In particular, users may not realize that the Unicode locale extension key
ks can be used to request a collation strength, such as
en_US-u-ks-identic.

Suggested improvement

Expand the description of the locale parameter and add an example showing
that ks-identic affects matching.

Suggested wording for the locale parameter:

Locale to use for matching. The locale may include Unicode locale extension
keys, such as `ks` for collation strength.

Also consider adding a reference to Collator::setStrength() for users who
need to understand collation strength levels.

Example:

<?php

$textStyle = "\u{263A}\u{FE0E}";  // text presentation
$emojiStyle = "\u{263A}\u{FE0F}"; // emoji presentation

var_dump(grapheme_strpos($textStyle, $emojiStyle));

var_dump(grapheme_strpos(
    $textStyle,
    $emojiStyle,
    locale: 'en_US-u-ks-identic'
));

Expected output:

int(0)
bool(false)

Without identical strength, the two variation selector sequences match. With
en_US-u-ks-identic, the variation selectors are significant, so the search
does not match.

Additional context (optional)

The addition of the locale parameter makes grapheme_strpos() closer to
ICU collation-based matching.

Collator::compare() is appropriate when comparing two complete strings.
grapheme_strpos() is appropriate when searching for a substring and
returning its position in grapheme units.

Since collation strength affects matching, Collator::setStrength() is a
useful reference for understanding values such as primary, secondary,
tertiary, quaternary, and identical strength.

Collator::setStrength
https://www.php.net/manual/en/collator.setstrength.php
Specification reference:
https://www.unicode.org/reports/tr35/dev/tr35-collation.html#Setting_Options

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions