unicode-emoji

[Emoji 15.1] A small Ruby library which provides Unicode Emoji data and regexes, incorporating the latest Unicode and Emoji standards. Includes a categorized list of recommended Emoji.


Keywords
emoji, emoji-unicode, hacktoberfest, regex, ruby, sequence, unicode, unicode-data
License
MIT
Install
gem install unicode-emoji -v 3.4.0

Documentation

Unicode::Emoji [version] [ci]

Provides Unicode Emoji data and regexes, incorporating the latest Unicode and Emoji standards.

Also includes a categorized list of recommended Emoji.

Emoji version: 15.0 (September 2022)

CLDR version (used for sub-region flags): 43 (April 2023)

Supported Rubies: 3.2, 3.1, 3.0

No longer supported Rubies, but might still work: 2.7, 2.6, 2.5, 2.4, 2.3

If you are stuck on an older Ruby version, checkout the latest 0.9 version of this gem.

Gemfile

gem "unicode-emoji"

Usage

Regex

The gem includes a bunch of Emoji regexes, which are compiled out of various Emoji Unicode data sources.

require "unicode/emoji"

string = "String which contains all kinds of emoji:

- Singleton Emoji: ๐Ÿ˜ด
- Textual singleton Emoji with Emoji variation: โ–ถ๏ธ
- Emoji with skin tone modifier: ๐Ÿ›Œ๐Ÿฝ
- Region flag: ๐Ÿ‡ต๐Ÿ‡น
- Sub-Region flag: ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ
- Keycap sequence: 2๏ธโƒฃ
- Sequence using ZWJ (zero width joiner): ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ

"

string.scan(Unicode::Emoji::REGEX) # => ["๐Ÿ˜ด", "โ–ถ๏ธ", "๐Ÿ›Œ๐Ÿฝ", "๐Ÿ‡ต๐Ÿ‡น", "๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ", "2๏ธโƒฃ", "๐Ÿคพ๐Ÿฝโ€โ™€๏ธ"]

Main Regexes

Matches (non-textual) Emoji of all kinds:

Regex Description Example Matches Example Non-Matches
Unicode::Emoji::REGEX Use this if unsure! Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of recommended Emoji sequences ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿป, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข
Unicode::Emoji::REGEX_VALID Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of valid Emoji sequences ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿป, ๐Ÿ‡ต๐Ÿ‡ต
Unicode::Emoji::REGEX_WELL_FORMED Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji) and all kind of well-formed Emoji sequences ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ‡ต๐Ÿ‡ต ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿป
Picking the Right Emoji Regex
  • Usually you just want REGEX (RGI set)
  • If you want broader matching (e.g. more sub-regions), choose REGEX_VALID
  • If you even want to match for invalid sequences, too, use REGEX_WELL_FORMED

Please see the standard for details.

Property REGEX (RGI / Recommended) REGEX_VALID (Valid) REGEX_WELL_FORMED (Well-formed)
Region "๐Ÿ‡ต๐Ÿ‡น" Yes Yes Yes
Region "๐Ÿ‡ต๐Ÿ‡ต" No No Yes
Tag Sequence "๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ" Yes Yes Yes
Tag Sequence "๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ" No Yes Yes
Tag Sequence "๐Ÿ˜ด๓ ง๓ ข๓ ก๓ ก๓ ก๓ ฟ" No No Yes
ZWJ Sequence "๐Ÿคพ๐Ÿฝโ€โ™€๏ธ" Yes Yes Yes
ZWJ Sequence "๐Ÿค โ€๐Ÿคข" No Yes Yes

More info about valid vs. recommended Emoji in this blog article on Emojipedia.

Singleton Regexes

Matches only simple one-codepoint (+ optional variation selector) Emoji:

Regex Description Example Matches Example Non-Matches
Unicode::Emoji::REGEX_BASIC Matches (non-textual) singleton Emoji (except for singleton components, like a skin tone modifier without base Emoji), but no sequences at all ๐Ÿ˜ด, โ–ถ๏ธ ๐Ÿ˜ด๏ธŽ, โ–ถ, ๐Ÿป, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, ๐Ÿ‡ต๐Ÿ‡ต,2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข
Unicode::Emoji::REGEX_TEXT Matches only textual singleton Emoji (except for singleton components, like digit 1) ๐Ÿ˜ด๏ธŽ, โ–ถ ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿป, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, ๐Ÿ‡ต๐Ÿ‡ต,2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข

Include Textual Emoji

By default, textual Emoji (emoji characters with text variation selector or those that have a default text presentation) will not be included in the default regexes. However, if you wish to match for them too, you can include them in your regex by appending the _INCLUDE_TEXT suffix:

Regex Description Example Matches Example Non-Matches
Unicode::Emoji::REGEX_INCLUDE_TEXT REGEX + REGEX_TEXT ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿ˜ด๏ธŽ, โ–ถ ๐Ÿป, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿค โ€๐Ÿคข
Unicode::Emoji::REGEX_VALID_INCLUDE_TEXT REGEX_VALID + REGEX_TEXT ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ˜ด๏ธŽ, โ–ถ ๐Ÿป, ๐Ÿ‡ต๐Ÿ‡ต
Unicode::Emoji::REGEX_WELL_FORMED_INCLUDE_TEXT REGEX_WELL_FORMED + REGEX_TEXT ๐Ÿ˜ด, โ–ถ๏ธ, ๐Ÿ›Œ๐Ÿฝ, ๐Ÿ‡ต๐Ÿ‡น, 2๏ธโƒฃ, ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ, ๐Ÿด๓ ง๓ ข๓ ก๓ ง๓ ข๓ ฟ, ๐Ÿคพ๐Ÿฝโ€โ™€๏ธ, ๐Ÿค โ€๐Ÿคข, ๐Ÿ‡ต๐Ÿ‡ต, ๐Ÿ˜ด๏ธŽ, โ–ถ ๐Ÿป

Extended Pictographic Regex

Unicode::Emoji::REGEX_PICTO matches single codepoints with the Extended_Pictographic property. For example, it will match โœ€ BLACK SAFETY SCISSORS.

Unicode::Emoji::REGEX_PICTO_NO_EMOJI matches single codepoints with the Extended_Pictographic property, but excludes Emoji characters.

See character.construction/picto for a list of all non-Emoji pictographic characters.

Partial Regexes

Matches potential Emoji parts (often, this is not what you want):

Regex Description Example Matches Example Non-Matches
Unicode::Emoji::REGEX_ANY Matches any Emoji-related codepoint (but no variation selectors, tags, or zero-width joiners). Please not that this will match Emoji-parts rather than complete Emoji, for example, single digits! ๐Ÿ˜ด, โ–ถ, ๐Ÿป, ๐Ÿ›Œ, ๐Ÿฝ, ๐Ÿ‡ต, ๐Ÿ‡น, 2, ๐Ÿด, ๐Ÿคพ, โ™€, ๐Ÿค , ๐Ÿคข -

List

Use Unicode::Emoji::LIST or the list method to get a grouped (and ordered) list of Emoji:

Unicode::Emoji.list.keys
# => ["Smileys & Emotion", "People & Body", "Component", "Animals & Nature", "Food & Drink", "Travel & Places", "Activities", "Objects", "Symbols", "Flags"]

Unicode::Emoji.list("Food & Drink").keys
# => ["food-fruit", "food-vegetable", "food-prepared", "food-asian", "food-marine", "food-sweet", "drink", "dishware"]

Unicode::Emoji.list("Food & Drink", "food-asian")
=> ["๐Ÿฑ", "๐Ÿ˜", "๐Ÿ™", "๐Ÿš", "๐Ÿ›", "๐Ÿœ", "๐Ÿ", "๐Ÿ ", "๐Ÿข", "๐Ÿฃ", "๐Ÿค", "๐Ÿฅ", "๐Ÿฅฎ", "๐Ÿก", "๐ŸฅŸ", "๐Ÿฅ ", "๐Ÿฅก"]

Please note that categories might change with future versions of the Emoji standard. This gem will issue warnings when attempting to retrieve old categories using the #list method.

A list of all Emoji can be found at character.construction.

Properties

Allows you to access the codepoint data form Unicode's emoji-data.txt file:

require "unicode/emoji"

Unicode::Emoji.properties "โ˜" # => ["Emoji", "Emoji_Modifier_Base"]

Also See

MIT