KenCoble
02-28-2010, 01:57 PM
Ok so I'm scraping content on my site.(It's from nhl 10 for xbox nothing devious) I wanted to have our stats scraped and then exported to CSV twice a day. If I can figure out how to export I'm sure a cron job could be set to run the CSV export twice a day.
What I have thus far is this.
<html>
<head>
<title>PHP Scrape</title>
</head>
<body>
<?php
// Read html file to be processed into $data variable
$data = file_get_contents('http://www.easportsworld.com/en_US/clubs/partial/762A0001/129082/members-list');
// Commented regex to extract contents from <div class="scrolling">contents</div>
// where "contents" may contain nested <div>s.
// Regex uses PCRE's recursive (?1) sub expression syntax to recurs group 1
$pattern_long = '{ # recursive regex to capture contents of "scrolling" DIV
<div\s+class="scrolling"\s*> # match the "scrolling" class DIV opening tag
( # capture "config" DIV contents into $1
(?: # non-cap group for nesting * quantifier
(?: (?!<div[^>]*>|</div>). )++ # possessively match all non-DIV tag chars
| # or
<div[^>]*>(?1)</div> # recursively match nested <div>xyz</div>
)* # loop however deep as necessary
) # end group 1 capture
</div> # match the "scrolling" class DIV closing tag
}six'; // single-line (dot matches all), ignore case and free spacing modes ON
// short version of same regex
$pattern_short = '{<div\s+class="scrolling"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(?1)</div>)*)</div>}si';
$matchcount = preg_match_all($pattern_long, $data, $matches);
// $matchcount = preg_match_all($pattern_short, $data, $matches);
echo("<pre>\n");
if ($matchcount > 0) {
echo("$matchcount matches found.\n");
// print_r($matches);
for($i = 0; $i < $matchcount; $i++) {
echo("\nMatch #" . ($i + 1) . ":\n");
echo($matches[1][$i]); // print 1st capture group for match number i
}
} else {
echo('No Matches');
}
echo("\n</pre>");
?>
</body>
</html>
The output can be seen on my personal server at. http://74.117.63.249/test.php
If anyone could help me create a script that could take this output and export to a CSV I would greatly appreciate it. I am looking to keep a historical track of our stats so the CSV will be appended each time the Cron job runs.
What I have thus far is this.
<html>
<head>
<title>PHP Scrape</title>
</head>
<body>
<?php
// Read html file to be processed into $data variable
$data = file_get_contents('http://www.easportsworld.com/en_US/clubs/partial/762A0001/129082/members-list');
// Commented regex to extract contents from <div class="scrolling">contents</div>
// where "contents" may contain nested <div>s.
// Regex uses PCRE's recursive (?1) sub expression syntax to recurs group 1
$pattern_long = '{ # recursive regex to capture contents of "scrolling" DIV
<div\s+class="scrolling"\s*> # match the "scrolling" class DIV opening tag
( # capture "config" DIV contents into $1
(?: # non-cap group for nesting * quantifier
(?: (?!<div[^>]*>|</div>). )++ # possessively match all non-DIV tag chars
| # or
<div[^>]*>(?1)</div> # recursively match nested <div>xyz</div>
)* # loop however deep as necessary
) # end group 1 capture
</div> # match the "scrolling" class DIV closing tag
}six'; // single-line (dot matches all), ignore case and free spacing modes ON
// short version of same regex
$pattern_short = '{<div\s+class="scrolling"\s*>((?:(?:(?!<div[^>]*>|</div>).)++|<div[^>]*>(?1)</div>)*)</div>}si';
$matchcount = preg_match_all($pattern_long, $data, $matches);
// $matchcount = preg_match_all($pattern_short, $data, $matches);
echo("<pre>\n");
if ($matchcount > 0) {
echo("$matchcount matches found.\n");
// print_r($matches);
for($i = 0; $i < $matchcount; $i++) {
echo("\nMatch #" . ($i + 1) . ":\n");
echo($matches[1][$i]); // print 1st capture group for match number i
}
} else {
echo('No Matches');
}
echo("\n</pre>");
?>
</body>
</html>
The output can be seen on my personal server at. http://74.117.63.249/test.php
If anyone could help me create a script that could take this output and export to a CSV I would greatly appreciate it. I am looking to keep a historical track of our stats so the CSV will be appended each time the Cron job runs.
