Skip to content

Enhance parens stripping logic

feld requested to merge fix/parens-stripping into master

#27 (closed)

I didn't feel like creating a ton of functions at the moment so I'm not relying on matching custom return codes, but hacking it onto true/false and inverting logic where necessary to get what I want.

It's bad, but it works, and it will be easy to clean up incrementally.

Now it's split into functions so it's both clean AND confusing! 😹🎉

Rules:

  • During checks, always use buffer with leading ( stripped as it can't be part of a valid URL anyway
  • Short circuit to only strip leading if no trailing exists
  • If valid email address when trailing ) stripped, we can strip trailing ) and return
  • If valid URL when trailing ) stripped, continue checks; else just return
  • If query parameters detected, strip trailing ) as last character in query params should have been encoded as %29 anyway [1]
  • If there is a / in the valid URL the trailing ) could be part of the URL, so continue checks; else, strip both
  • If there is at least one ( in the URI.path, continue checks; else assume ) is not part of the URL and strip both. [2]
  • If we have an equal count of ( and ) chars with the leading ( already stripped, we should be confident they are intentional so we only strip leading; else strip both as a last resort [3]

[1] https://foo.com/bar/baz?q=ran0mch@r$) feels quite improbable

[2] https://blog.soykaf.com/post/encryption/) is extremely unlikely

[3] https://en.wikipedia.org/(fake_path)/wiki/Frame_(networking) high confidence, balanced parens

Edited by feld

Merge request reports