Web Hosting Talk







View Full Version : php preg_match


getbusy
09-14-2004, 10:21 AM
Hello,

I'm having a small problem with preg_match. Here's an html example:

<div class=headline>Black</div>


<B>Publisher</B><BR>
<A HREF="http://games.ign.com/objects/025/025025.html?ui=GameDetailBox">Electronic Arts</A><br>


<B>Developer</B><BR>

<A HREF="http://games.ign.com/objects/026/026357.html?ui=GameDetailBox">Criterion Games</A><br>

<B>Genre</B><BR>

First-Person


<A HREF="http://ps2.ign.com/genre_23.html?ui=GameDetailBox">Shooter</A>


<B>Release Date</B><BR>Winter 2005<br>

<B>ESRB Rating:</B> <A HREF="http://www.esrb.com/esrb.asp?ui=GameDetailBox" target="_new">RP</A><br>


This works: $title = preg_match("|<div class=headline>(.*?)</div>|",$buffer);

but this does not: $developer = preg_match("|<B>Developer</B><BR>(.*?)<br>|",$buffer);

I think it's because of the blank line between <B>Developer... and the <br> but I really wouldn't know, anyone who has an idea?

Thanks a lot!,
Cedric

getbusy
09-14-2004, 10:24 AM
hm another question.. Is is possible to only get the a-z, 0-9 characters from a phrase? for example.. I need "hello123" from "h:e ll . o 123" via preg_match or.. ?

Thanks,
Cedric

coo_t2
09-14-2004, 01:19 PM
This works: $title = preg_match("|<div class=headline>(.*?)</div>|",$buffer);

but this does not: $developer = preg_match("|<B>Developer</B><BR>(.*?)<br>|",$buffer);

I think it's because of the blank line between <B>Developer... and the <br> but I really wouldn't know, anyone who has an idea?

Thanks a lot!,
Cedric [/B]

try this:

"#<B>Developer</B><BR>(.*?)<br>#s"

The s modifier causes the dot to match across lines. Also, I don't like using a meta character as a delimiter so I changed | to #, but that might just be me.

--ed

getbusy
09-14-2004, 01:27 PM
damn didn't work :(

coo_t2
09-14-2004, 01:30 PM
Originally posted by getbusy
hm another question.. Is is possible to only get the a-z, 0-9 characters from a phrase? for example.. I need "hello123" from "h:e ll . o 123" via preg_match or.. ?

Thanks,
Cedric

This seems to work:



<?php

$str = "h:e ll . o 123";

$replacementStr = preg_replace('/[^a-zA-Z0-9]/', '', $str);

var_dump('$replacementStr', $replacementStr);

?>



--ed

coo_t2
09-14-2004, 01:36 PM
Originally posted by getbusy
damn didn't work :(

It works for me:



<?php


$str = '<div class=headline>Black</div>


<B>Publisher</B><BR>
<A HREF="http://games.ign.com/objects/025/025025.html?ui=GameDetailBox">Electronic Arts</A><br>


<B>Developer</B><BR>

<A HREF="http://games.ign.com/objects/026/026357.html?ui=GameDetailBox">Criterion Games</A><br>

<B>Genre</B><BR>

First-Person


<A HREF="http://ps2.ign.com/genre_23.html?ui=GameDetailBox">Shooter</A>


<B>Release Date</B><BR>Winter 2005<br>

<B>ESRB Rating:</B> <A HREF="http://www.esrb.com/esrb.asp?ui=GameDetailBox" target="_new">RP</A><br>' ;


if (preg_match("#<B>Developer</B><BR>(.*?)<br>#s", $str) )
{
echo "it matches !!! \n";
}

?>



If I take the s modifier off it doesn't work.

--ed

getbusy
09-14-2004, 01:49 PM
this is my script (I know it is crapy coded, but it only has to run once)



<?php
include("../functions_external_connect.php");
$limit123 = "0";
if (!get_cfg_var('safe_mode')) {
set_time_limit($limit123);
}
else {
echo "****!";
die();
}




$sql = "SELECT * FROM zgrabber_1 order by id limit 5";
$resultaat = mysql_query($sql) or die(mysql_error());
while ($row = mysql_fetch_object($resultaat)) {

$title = "";
$description = "";
$publisher = "";
$esrb = "";
$platform = "";
$avatar = "";
$category = "";
$newid = "";



$tempdir = "/homepages/28/d98158917/htdocs/gamercentric/zgrabber/tempdir/";//temp directory, with ending slash


$url = $row->url."\?fromint=1";
echo "aa $url <br>";

if (isset($tempdir))
{
exec('wget --user-agent="Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)"'.
' --output-document='.$tempdir."gameinfo.html ".
"$url"
, $gay);

$fp = fopen($tempdir.'gameinfo.html','r');

while (!feof($fp)) {

$buffer = fgets($fp, 4096);
$buffer = trim($buffer);
//$buffer = eregi_replace(" - Pre-Played","",$buffer);





$timer = "0";
if (preg_match("|<div class=headline>(.*?)</div>|",$buffer,$array_links))
{
foreach ($array_links as $title)
{
}
}


$timer = "0";
if (preg_match("|<B>Release Date</B><BR>(.*?)<br>|",$buffer,$array_links))
{
foreach ($array_links as $releasedate)
{
}
}



$timer = "0";
if (preg_match("#<B>Developer</B><BR>(?)<br>#s",$buffer,$array_links))
{
foreach ($array_links as $publisher)
{
$timer++;
if ($timer == 2 ) {
echo "publisher: $publisher </a></a><br>";
}
}
}



$timer = "0";
if (preg_match("|<B>Developer</B><BR>(?)<br>|s",$buffer,$array_links))
{
foreach ($array_links as $publisher2)
{
}
}

$timer = "0";
if (preg_match("#<B>Developer</B><BR>(.*?)<br>#s",$buffer,$array_links))
{
foreach ($array_links as $publisher3)
{
}
}









}
fclose($fp);

echo "
publisher: $publisher<br>
publisher2: $publisher2<br>
publisher3: $publisher3<br><br>
";


}





//echo "<b>NEW ID FOR THIS GAME: $newid2</B> <br><br>";


$description = addslashes($description);
$title = addslashes($title);
$publisher = addslashes($publisher);
$category = addslashes($category);

$platform = "ps2";


$sql3 = "INSERT INTO `games` ( `id` , `title` , `platform` , `genre` , `homeinfo` , `publisher` , `developer` , `esrb` , `releasedate`) VALUES ('', '$title', '$platform', '$category', '', '$publisher', '$developer', '$esrb', '$releasedate')";
//$resultaat3 = mysql_query($sql3) or die(mysql_error());
echo "
('', '$title', '$platform', '$category', '', '$publisher', '$developer', '$esrb', '$releasedate')<br><br>
";

}








$str = "h:e ll . o 123";

$replacementStr = preg_replace('/[^a-zA-Z0-9]/', '', $str);

echo "<br>1: $replacementStr <br>";



?>



damn, it's so weird... really doesn't work.. take a look at http://gamercentric.synchronized-1and1.com/zgrabber/3.php (3.php is the file mentioned above).. the releasedate, title work but not the developer :s (o yea.. you can see the page where the content comes from on http://gamercentric.synchronized-1and1.com/zgrabber/tempdir/gameinfo.html)

Sorry that I'm keep bothering you :blush:
Cedric

coo_t2
09-14-2004, 06:15 PM
this is my script (I know it is crapy coded, but it only has to run once)

PHP:


<?php



$timer = "0";
if (preg_match("#<B>Developer</B><BR>(?)<br>#s",$buffer,$array_links))
{
foreach ($array_links as $publisher)
{
$timer++;
if ($timer == 2 ) {
echo "publisher: $publisher </a></a><br>";
}
}
}

$timer = "0";
if (preg_match("|<B>Developer</B><BR>(?)<br>|s",$buffer,$array_links))
{
foreach ($array_links as $publisher2)
{
}
}

$timer = "0";
if (preg_match("#<B>Developer</B><BR>(.*?)<br>#s",$buffer,$array_links))
{
foreach ($array_links as $publisher3)
{
}
}

?>





The pattern in the first two here is wrong, you have (?) instead of (.*?).
Why are you doing preg_match three times?
What exactly are you trying to do with the matches?

--ed

getbusy
09-14-2004, 06:18 PM
nvm I found it, I placed all the content on one line, than it works

$filename = $tempdir."gameinfo.html";
$handle = fopen($filename, "r");
$buffer = fread($handle, filesize($filename));

$buffer = str_replace("\t", " ", $buffer);
$buffer = str_replace("\n", " ", $buffer);
$buffer = str_replace("\r", " ", $buffer);

what is the difference between (?) and (.*?) ?