twitter-text
This code is used at Twitter to tokenize and parse text
...This code is used at Twitter to tokenize and parse text to meet the expectations for what can be used on the platform. This repository is a collection of libraries and conformance tests to standardize parsing of Tweet text. It synchronizes development, testing, creating issues, and pull requests for twitter-text's implementations and specification. These libraries are responsible for determining the quantity of characters in a Tweet and identifying and linking any URL, @username, #hashtag, or $cashtag. Emoji supported by twemoji always count as two characters, regardless of combining modifiers. ...