Results 1 to 5 of 5
  1. #1

    Grab content from webpage using PHP

    I am trying to get content from an external site and then grab certain content and place them in variables.

    For example, I wrote this to grab content in between title:' and '.

    $url = "websiteaddress";

    $str = file_get_contents($url);

    function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);

    $var1 = get_string_between($str, "title:\'", "\'");

    I want the script to continue finding the first five pieces of content after title:' and place them in variables (var1, var2, etc.); any way to get the script to repeat? Any help or insight is very much appreciated, thanks!

  2. #2
    Join Date
    May 2009


    I've coded this from strach but hopefully it's what you was looking for.

    $title = "<title>Free Proxy List - Socks List - Private Proxy Lists - Private Proxies</title>";
    if (preg_match_all("/<title.+?<\/title[^>]*>/ims", $title, $matches, PREG_PATTERN_ORDER)) {
    $title = explode(' ', strip_tags($matches[0][0]));
    The output:

    [0] => Free
    [1] => Proxy
    [2] => List
    [3] => -
    [4] => Socks
    [5] => List
    [6] => -
    [7] => Private
    [8] => Proxy
    [9] => Lists
    [10] => -
    [11] => Private
    [12] => Proxies

    So $var1 = $title[0]; and so on ...
    Last edited by GameFrame; 05-12-2011 at 08:54 PM.
    NiX API - A powerful Anti-Proxy/Anti-Fraud and IP Reputation Lookup API

  3. #3
    Join Date
    Apr 2008
    Montreal, Qc., Canada
    I believe this is closer to OP's requirements :

    PHP Code:

    "This is a first title:'title1', a second title : 'title2' with spaces, and here's a third one : Title: ' title3'...";

    if (
    preg_match_all("/title( )?:( )?'[^']*./i"$subject$matches))
        for (
    $i 0$i <= 4$i++)
            if (isset(
    $match explode("'"$matches[0][$i]);
    $titles[] = $match[1];

    print_r($titles); // Array ( [0] => title1 [1] => title2 [2] => title3 ) 
    'Pattern not found.';
    Note that it shall be possible to simplify this, notably with lookahead assertions.
    Heymman - Beefy servers, tiny price !

  4. #4
    WootWoot, that's exactly what I was looking for!

    Now I need to decipher what all this means so I can modify:
    HTML Code:
    preg_match_all("/title( )?:( )?'[^']*./i"
    Also, does this work for xml feed as well, so I could grab information in-between the <title> </title> and <description></description> tags?

    Last edited by monorail_driver; 05-17-2011 at 03:12 PM. Reason: disable smilies

  5. #5
    Join Date
    Apr 2008
    Montreal, Qc., Canada
    This is a regular expression (regex). Here's a great tutorial :

    /title( )?:( )?'[^']*./i
    I'm not a regex guru, but I'll try to shed some light one these modern hieroglyphs.

    In PHP, regular expressions are enclosed in a pair of slashes.
    The "/i" modifier means that the engine will perform a case-insensitive match.
    "title" stands for the keyword "title" you were searching for.
    "( )?" will match if there is either no space or a single space.
    "[^']*" will match as long as the engine hasn't reached a quotation mark since the caret symbol will basically negate the meaning of an expression enclosed in square brackets and the asterisk stands for "none or more".
    The final dot will match any character (wildcard).

    So, all together, this expression can be translated to : "match anything that starts with "title", is followed by 0 or 1 space, followed by a colon symbol (":"), followed by 0 or 1 space, followed by a quotation mark (') and then match as many characters as you want (this specific expression wouldn't match line breaks) until you reach another quotation mark (since this will exclude the final quotation mark, we include it in the match by using the wildcard "." at the end).

    The second part of the code explodes the string where the quotation marks are to recover what was between them.

    While it's possible, you wouldn't want to use such a regex to match any tag in an XML feed.

    Heymman - Beefy servers, tiny price !

Similar Threads

  1. Grab, Modify and Display Content on Page
    By Hauzer in forum Programming Discussion
    Replies: 0
    Last Post: 06-24-2010, 01:16 PM
  2. Need Coder PHP for our Webpage and Quick
    By WickedTradesMD in forum Employment / Job Offers
    Replies: 3
    Last Post: 09-15-2008, 04:51 AM
  3. Php webpage help needed
    By babymushy in forum Web Design and Content
    Replies: 4
    Last Post: 07-01-2006, 02:27 PM
  4. Looking for PHP/Perl Webpage Programmer for a job
    By MaLuBoB in forum Employment / Job Offers
    Replies: 1
    Last Post: 04-25-2006, 07:52 PM
  5. Replies: 2
    Last Post: 07-16-2005, 07:51 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts