Skip to content

Update hashtag regex

If you stare into the abyss long enough you can find that Twitter's hashtag regex is exceedingly complex to catch edge cases like forgetting to put a space at the end of a sentence's punctuation: e.g., "hey guys this is cool.#awesome"

That's crazy. But you can find more details about it here: https://github.com/twitter/twitter-text/

The reality is that Twitter's hashtags have evolved over the years from a base ruleset to include some unicode chars and other things to be friendly to non-English speaking users. But it's really, really complicated. If we go back to Old School Twitter rules or "What English speaking users encounter", the rules are pretty simple:

  • Alpha, Numeric, and underscores are allowed (no dashes or other punctuation!)
  • The hashtag must start with an Alpha character

This regex change should match these expectations. This will ensure that #100 doesn't get recognized as a hashtag, but #go100 will.

This fixes #1 (closed)

Edited by feld

Merge request reports

Loading