
|
View Full Version : Stripping domain from URL, what do you think is more solid?
yegorpb 02-13-2007, 03:13 PM I have 2 methods that I came up with, I personally like the first, since its way more simple, and works with all TLD extensions, but I might not be seeing all sides of this.
function getDomainName($ful_url) {
$ref = parse_url($full_url);
if ($ref['host'] != "") {
$ref_check = substr($ref['host'], 0, 4);
// check for www
if ($ref_check == "www.") {
$ref['host'] = substr($ref['host'], 4);
}
} else {
$ref['host'] = "BAD_URL";
}
// Output filtered url
return $ref['host'];
}
vs.
function getDomainName($url) {
$url = strtolower($url);
if (substr($url, 0, 7) == 'http://')
$url = substr($url, 7);
if (strpos($url, '/'))
$url = substr($url, 0, strpos($url, '/'));
while (strpos($url, '.') != strrpos($url, '.'))
$url = substr($url, strpos($url, '.')+1);
return $url;
}
Xenatino 02-13-2007, 05:38 PM EDIT: Sorry, misread original post
Engelmacher 02-13-2007, 06:37 PM URLs don't have to have "www" in them. Your code dies if you feed it something like "http://slurp.junk.co.jp".
yegorpb 02-13-2007, 09:15 PM My code doesn't need a www. It strips the www if its there. 2nd piece of code only works with normal tld, like .com/.net and doesnt with 2 part tlds like co.uk
First one should work with everything.
Engelmacher 02-13-2007, 09:47 PM My code doesn't need a www. It strips the www if its there. 2nd piece of code only works with normal tld, like .com/.net and doesnt with 2 part tlds like co.uk
First one should work with everything.
Test it yourself. It clearly returns "BAD_URL" if it doesn't find "www." in the array.
yegorpb 02-13-2007, 10:35 PM Test it yourself. It clearly returns "BAD_URL" if it doesn't find "www." in the array.
No, it doesn't. It returns it if its not a valid URL.
Teh_Winnar 02-13-2007, 11:54 PM I personally like this
<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
"http://www.php.net/index.html", $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
plumsauce 02-14-2007, 05:35 AM The regex is pretty good, I like the "until slash" part, but it does not validate whether it contains any periods. For example, it would also match http://localhost It will also validate against http://192.168.0.1
The second part will fall apart if you have country tld's. For example, blah.co.uk
This is fine, until someone fat fingers something that gets past the regex. Users will do the most insane things to data input screens. And then there are my own coding errors :)
Engelmacher 02-14-2007, 06:25 AM No, it doesn't. It returns it if its not a valid URL.
Well then I guess the PHP installations on my test machines must be horribly broken because that's what it does on every single one of them. I don't know what else to say other than "don't quit your day job".
brendandonhu 02-14-2007, 03:08 PM Why don't you just use parse_url()?
yegorpb 02-14-2007, 03:26 PM Well then I guess the PHP installations on my test machines must be horribly broken because that's what it does on every single one of them. I don't know what else to say other than "don't quit your day job".
Right back at you, buddy. I also suggest you get a pair of eyeglasses. :rolleyes: There is a } you are not seeing.
Why don't you just use parse_url()?
I did, in the first example.... just wanted to see if its a sound code.
Xenatino 02-14-2007, 08:11 PM Well then I guess the PHP installations on my test machines must be horribly broken because that's what it does on every single one of them. I don't know what else to say other than "don't quit your day job".
Right back at you, buddy. I also suggest you get a pair of eyeglasses. There is a } you are not seeing.
This might shed some light in a test situation: http://www.1921681100.com/public/623.php
|