Skip to content

Update hashtag regex

feld requested to merge feld/auto_linker:fix/hashtags into master

If you stare into the abyss long enough you can find that Twitter's hashtag regex is exceedingly complex to catch edge cases like forgetting to put a space at the end of a sentence's punctuation: e.g., "hey guys this is cool.#awesome"

That's crazy. But you can find more details about it here: https://github.com/twitter/twitter-text/

The reality is that Twitter's hashtags have evolved over the years from a base ruleset to include some unicode chars and other things to be friendly to non-English speaking users. But it's really, really complicated. If we go back to Old School Twitter rules or "What English speaking users encounter", the rules are pretty simple:

  • Alpha, Numeric, and underscores are allowed (no dashes or other punctuation!)
  • The hashtag must start with an Alpha character

This regex change should match these expectations. This will ensure that #100 doesn't get recognized as a hashtag, but #go100 will.

This fixes #1 (closed)

Edited by feld

Merge request reports

Loading