Skip to content
Snippets Groups Projects

Update hashtag regex

Closed feld requested to merge feld/auto_linker:fix/hashtags into master

If you stare into the abyss long enough you can find that Twitter's hashtag regex is exceedingly complex to catch edge cases like forgetting to put a space at the end of a sentence's punctuation: e.g., "hey guys this is cool.#awesome"

That's crazy. But you can find more details about it here: https://github.com/twitter/twitter-text/

The reality is that Twitter's hashtags have evolved over the years from a base ruleset to include some unicode chars and other things to be friendly to non-English speaking users. But it's really, really complicated. If we go back to Old School Twitter rules or "What English speaking users encounter", the rules are pretty simple:

  • Alpha, Numeric, and underscores are allowed (no dashes or other punctuation!)
  • The hashtag must start with an Alpha character

This regex change should match these expectations. This will ensure that #100 doesn't get recognized as a hashtag, but #go100 will.

This fixes #1 (closed)

Edited by feld

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading