Web Hosting Talk







View Full Version : Detect if url is a redirect or offline?


lexington
02-27-2008, 05:29 PM
Hello, is there some code I could use to check a url that a user enters into a field to see if that url is a direct to another site? Also, another error check to see if the site is online? Either one that you could help me with is fine. If you could post a working example that would be great. Thanks!

azizny
02-27-2008, 09:45 PM
What do you mean "url is a direct to another site"? You mean if it redirects to another website using header redirect?

Peace,

lexington
02-27-2008, 09:50 PM
What do you mean "url is a direct to another site"? You mean if it redirects to another website using header redirect?

Peace,

Hmm I suppose those sites are using header redirect. I am referring to when you view a site url and it redirects to another url. There should be a way for a script to detect that by comparing the first URL to a changed url or something?

azizny
02-27-2008, 11:22 PM
There are two steps:

Step one: Fetch the redirect url from the actual url



//url = www.site.com
//getvar = /page?urlid=2
function get_url($url,$getvar){
$fp = fsockopen ($url, '80', $errno, $errstr, 30);
if ($fp){
$query = "GET $getvar HTTP/1.1\r\n";
$query .= "Host: $url\r\n\r\n";
fputs($fp, $query);
while (!feof($fp)){
$buf .= fgets($fp,128);
}
fclose ($fp);
} else {
return NULL;
}

preg_match("/Location: (.+)\n/U", $buf, $url);
print_r($url);
return $url;
}


Step two: use fopen to check if the url exist.

Peace,

lexington
02-27-2008, 11:48 PM
Meaning fopen(get_url($url));

?

azizny
02-28-2008, 11:50 AM
Depends on what the function is returning, that is why I used print_r($url). I am assuming that $url[0] contains the url, if you return that, then you can use fopen(get_url($url)).

You are not trying to scrape a scripts directory. Are you?

Peace,

lexington
02-28-2008, 08:36 PM
You are not trying to scrape a scripts directory. Are you?

Not sure what that means? I am adding this to my own site since some people add urls that appear to be normal but then redirect to a spam site so I want to add a check that sees if the url remains the same and if so it will allow it.

azizny
02-28-2008, 09:21 PM
Not sure what that means? I am adding this to my own site since some people add urls that appear to be normal but then redirect to a spam site so I want to add a check that sees if the url remains the same and if so it will allow it.

What if the site is doing a header redirect is actually fine (ex. site moved)?

Peace,

Xeentech
02-29-2008, 04:28 PM
There are two steps:

Step one: Fetch the redirect url from the actual url



//url = www.site.com
//getvar = /page?urlid=2
function get_url($url,$getvar){
$fp = fsockopen ($url, '80', $errno, $errstr, 30);
if ($fp){
$query = "GET $getvar HTTP/1.1\r\n";
$query .= "Host: $url\r\n\r\n";
fputs($fp, $query);
while (!feof($fp)){
$buf .= fgets($fp,128);
}
fclose ($fp);
} else {
return NULL;
}

preg_match("/Location: (.+)\n/U", $buf, $url);
print_r($url);
return $url;
}


Step two: use fopen to check if the url exist.

Peace,

Could we please avoid reimplementing an incomplete version of HTTP every time some one has a question like this. There are many many HTTP implementations available to reuse.

Here's an example using cURL that I made in about a second.

<?php

$ch = curl_init("http://domain.tld/");
curl_exec($ch);

if (curl_getinfo($ch, CURLINFO_HTTP_CODE) == 302) {
print("URL was an HTTP Redireect\n");
}

?>

lexington
02-29-2008, 04:40 PM
Thanks I will try that :)

lexington
02-29-2008, 06:31 PM
This works but it seems when the url is not a redirect it displays the website onto my own page. Is there a way to prevent that? If the site is a redirect it displays the printed error which works fine.

EDIT

I believe it is the curl_exec function that is displaying the other site on my page.

Xeentech
02-29-2008, 08:29 PM
I guess you could write teh output to null like this:

$fp = fopen("/dev/null", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
fclose($fp);

lexington
02-29-2008, 09:25 PM
Is that the full code to use? Because I do not see where it checks the site url in your new code.