Web Hosting Talk







View Full Version : Can anyone help me with this regular expression problem?


monkey junkie
07-03-2007, 07:54 PM
Hello

If anyone can help me with this I would greatly appreciate it.

I have a web form which can take a URL in three formats -

http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117
http://user.domain.com/
http://www.domain.com/user

I would like to be able to check to make sure the domain was entered in the above formats. The regular expression should be forgiving, for example if the http:// or http://www. was not entered, or if http://user.domain.com/ or http://user.domain.com (note the lack of trailing slash) was entered.

Do any of you know how to do this?

Maybe it's very difficult...

Thanks for reading, and thanks in advance to anyone who can provide assitance.

Cheers.

foobic
07-03-2007, 08:47 PM
The way to approach it would be to split the url into components (eg. parse_url if you're working in PHP) and then validate each component as you see fit, eg. you might accept only http, check for a valid domain and use a regexp on the path and query string.
HTH

mwatkins
07-03-2007, 09:37 PM
Splitting it up might make sense; depends on how many different combinations are valid for a particular end point. Otherwise, you may find that you need more than one regex (likely) or a bunch of tests if parsing the URL,

Take the long one:

http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117

If the only combos you are likely to run into are with or without www, then a regex like this (Python but similar in others) will match two of the following (interactive python session -- great for testing):

->> import re
->> urls =
['http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117',
'http://user.domain.com/', 'http://www.domain.com/user',
'http://domain.com/Profile.jsp?MID=367137231&MemberId=500306117']
/>> for url in urls:
|.. match = re.match(r'^(http(s|)://(www\.|)domain\.com/Profile\.jsp\?MID=[0-9]+&MemberId=[0-9]+$)', url)
|.. if match:
|.. print '%s\n%s\n\n' % (url, match.groups()[0])
\__
http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117
http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117


http://domain.com/Profile.jsp?MID=367137231&MemberId=500306117
http://domain.com/Profile.jsp?MID=367137231&MemberId=500306117


The other two URL types can be matched with one regex. Have a stab at it...

mwatkins
07-04-2007, 10:35 AM
Ok, lets add a couple of other URI combinations and change the regex a little:

->> urls = ['http://www.domain.com/Profile.jsp?MID=367137231&MemberId=500306117',
'http://user.domain.com/', 'http://www.domain.com/user',
'http://domain.com/Profile.jsp?MID=367137231&MemberId=500306117',
'http://domain.com/', 'http://domain.com/user', 'http://domain.com']
/>> for url in urls:
|.. match = re.match(r'^(http(s|)://(www\.|user\.|)domain\.com(/|/user|)$)', url)
|.. if match:
|.. print '%s\n%s\n\n' % (url, match.groups()[0])
\__
http://user.domain.com/
http://user.domain.com/

http://www.domain.com/user
http://www.domain.com/user

http://domain.com/
http://domain.com/

http://domain.com/user
http://domain.com/user

http://domain.com
http://domain.com

monkey junkie
07-05-2007, 06:51 AM
mwatkins, you're great :) Thank you