Url catching regexp

Posted Jan 13, 2014

By Yossi Ittach

1 min read

Not exactly a brilliant piece of engineering, but this is a useful dirty hack. If you want to clean urls from a string, you can match it using this regexp:

For http:// kind of urls:

  
text.replaceAll(/(https|http):\\/\\/[a-zA-Z0-9\-\._~:\\/\?#\[\]@!\\u0024&'\(\)\*\+,;=]+/, "")

For the www. kind:

  
text.replaceAll(/www\.[a-zA-Z0-9\-\._~:\\/\?#\[\]@!\\u0024&'\(\)\*\+,;=]+/, "")

The main point here is that the above characters are the only one allowed in urls, so every string that matches these is a url. It doesn’t work for stuff like io.com.

This post is licensed under CC BY 4.0 by the author.

Trending Tags