Web Hosting Talk







View Full Version : How to take the address of all links present in a file?


wayiran
08-14-2006, 04:17 AM
I have done this much of it:


<?php
$text=htmlspecialchars(file_get_contents("welcome.txt"), ENT_NOQUOTES);
$start = strpos($text,"&lt;a href=\"");
$end = strpos($text,"\"&gt;");
echo substr($text, ($start+12), ($end - $start-12));
?>


But it takes just the first link present in the file, how can I change it to get the address of all links?

thank you

UK-Networks
08-14-2006, 06:26 AM
<?
$text=htmlspecialchars(file_get_contents("welcome.txt"), ENT_NOQUOTES);

$start=0;
$end=0;
$test=0;
while (strpos($text,"&lt;a href=\"")) {
$start = strpos($text,"&lt;a href=\"",$end);
$end = strpos($text,"\"&gt;",$start);
echo substr($text, ($start+12), ($end - $start-12));
}

?>

wayiran
08-14-2006, 09:04 AM
Thanks, but it repeats in while forever, and gives 1000pages of:

hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href="hhhvvvsss; <body> <p><a href

Do you have any correction? why it happens?

while (strpos($text,"&lt;a href=\"")) {
}

I think it starts $text again, when reaches end of it! Because when I removed <a href="..."> from my text file, It didnt show anything, even other things....

my welcome.txt file contains:
</head>
<body>
<p><a href="hhh">aaa</a></p>
</body>
</html>
<p><a href="vvv">aaa</a></p>
<p><a href="sss">aaa</a></p>

UK-Networks
08-14-2006, 09:22 AM
<?
$text=htmlspecialchars(file_get_contents("welcome.txt"), ENT_NOQUOTES);

$start=0;
$end=0;
$test=0;
while ((strpos($text,"&lt;a href=\"")) && ($start>=$test)) {
$start = strpos($text,"&lt;a href=\"",$end);
$end = strpos($text,"\"&gt;",$start);
$test=$end;
echo substr($text, ($start+12), ($end - $start-12));
}

?>

Give that a shot

tiamak
08-14-2006, 11:00 AM
why u play with strpos substr instead of preg_match_all() function ?

$html = file_get_contents('welcome.txt');
$urlpattern = '/<a[^>]+href="([^"]+)/i';
preg_match_all($urlpattern, $html, $matches);
foreach ($matches[1] as $u) {
echo $u."\n";
}


i guess it looks much better :D

wayiran
08-14-2006, 04:10 PM
<?
$text=htmlspecialchars(file_get_contents("welcome.txt"), ENT_NOQUOTES);

$start=0;
$end=0;
$test=0;
while ((strpos($text,"&lt;a href=\"")) && ($start>=$test)) {
$start = strpos($text,"&lt;a href=\"",$end);
$end = strpos($text,"\"&gt;",$start);
$test=$end;
echo substr($text, ($start+12), ($end - $start-12));
}

?>


It gives:
hhh

for:
</head>
<body>
<p><a href="hhh">aaa</a></p>
</body>
</html>
<p><a href="vvv">aaa</a></p>
<p><a href="sss">aaa</a></p>

That means it takes just the first URL, and lefts remaining!

Any new idea????

wayiran
08-14-2006, 04:29 PM
why u play with strpos substr instead of preg_match_all() function ?
PHP Code:
$html = file_get_contents('welcome.txt');
$urlpattern = '/<a[^>]+href="([^"]+)/i';
preg_match_all($urlpattern, $html, $matches);
foreach ($matches[1] as $u) {
echo $u."\n";
}




i guess it looks much better :D


Thanks, It worked.