Results 1 to 10 of 10
  1. #1
    Join Date
    May 2004
    Location
    Twickenham, UK
    Posts
    273

    Question [PHP] Is it possible to detect the SE spiders?

    Hello,

    I'm using a php script which detects the language of the browser to redirect the users on the pages in their languages.

    Is there a way to detect Search engines spiders in using a PHP function??

    thxs in advance
    Ludo
    Regards,
    Ludovic Coumétou
    Founder of the MythTV Community Website :: MythTVtalk.com :: Latest MythTV News

  2. #2
    Join Date
    May 2003
    Location
    Bayreuth, Bavaria, Germany
    Posts
    175

    Re: [PHP] Is it possible to detect the SE spiders?

    > Is there a way to detect Search engines spiders in using a PHP function??

    You could use the informations provided by www.robotstxt.org ... -> especially hav a look on http://www.robotstxt.org/wc/active.html

    You can also take advantage of patSpiderizer: http://www.php-tools.de/site.php?fil...r/overview.xml


    Michael

  3. #3
    Join Date
    Apr 2004
    Location
    East Anglia, UK
    Posts
    79
    The web page in the above post might cover this, but I am too lazy to read it, so I will give you a direct answer.

    use $_SERVER["HTTP_USER_AGENT"]

    search engines as a rule set their user agent to something other than that which a standard browser uses.

    Adios.

  4. #4
    Join Date
    May 2004
    Location
    Twickenham, UK
    Posts
    273
    thxs for both of you,
    now I just need to find the list of HTTP_user_agent available

    Ludo
    Regards,
    Ludovic Coumétou
    Founder of the MythTV Community Website :: MythTVtalk.com :: Latest MythTV News

  5. #5
    Join Date
    May 2003
    Location
    Bayreuth, Bavaria, Germany
    Posts
    175
    Originally posted by leight
    The web page in the above post might cover this, but I am too lazy to read it, ...
    hehe, and you're right
    It was in my mind when writing about robotstxt.org, but this was so clear to me that I didn't mind to mention it
    So thanks for your reminder, it's alwas the simple things in life ...

    Michael

  6. #6
    Join Date
    May 2003
    Location
    Bayreuth, Bavaria, Germany
    Posts
    175
    Originally posted by lubox.com
    now I just need to find the list of HTTP_user_agent available
    and that's what you'll find on www.robotstxt.org ...

    Michael

  7. #7
    Join Date
    Apr 2004
    Location
    East Anglia, UK
    Posts
    79
    lubox: Since i'm not feeling quit eso lazy now, a quick check on google and bish bash bosh, here we are:

    http://www.jafsoft.com/searchengines/webbots.html

    edit:

    m-b:

    the list on that site seems out of date. I couldn't see things like GoogleBot, or IECrawler.

  8. #8
    Join Date
    May 2003
    Location
    Bayreuth, Bavaria, Germany
    Posts
    175
    well, http://www.robotstxt.org/wc/active/all.txt seems to be more complete ...


    Michael
    Edit: @leight
    googlebot is there, but not IECrawler ...

  9. #9
    Join Date
    Apr 2004
    Location
    East Anglia, UK
    Posts
    79
    Damn my laziness, I didn't manage to navigate my way as far as that file. also, its MSIECrawler, my mistake.

    I stand corrected

  10. #10
    Join Date
    May 2004
    Location
    Twickenham, UK
    Posts
    273
    thxs again, once I'm back home I will start reading everything

    But after a quick review, it seems that this:

    You can also take advantage of patSpiderizer: http://www.php-tools.de/site.php?fi...er/overview.xml

    Is the perfect tool!!!!!!

    Ludo
    Regards,
    Ludovic Coumétou
    Founder of the MythTV Community Website :: MythTVtalk.com :: Latest MythTV News

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •