Results 1 to 6 of 6
  1. #1

    how does googlebot crawl and index pages


    my site is one of the sites , that was dropped after bourbon update. if before this i have 11000-18000 pages indexed by googlebot, after this update it dropped to 513 and was increasing 10-20 each day (during this last 2 month)

    during last 4-5 days google make 7600-8000 hits on my site (awstat logs and i put a small script which detect googlebot visits and send me notification email about time and url, which are crawled by googlebot).
    i got somethink nearly 4-5kb email with different url that are checked by gbot.

    my question is.
    how long it will take to see this pages indexed in google database. every day i'm check in google ( and via, but don't check any real progress (as i told before maximum +-10).

    for example in yahoo i have 23700 indexed pages (real number of my site pages 18000-19000)
    and the 2 question.

    how google crawl by pages during one update.

    what i mean.? for example if google start yesterday my site crawling can he visit the same page 2-3 times during the same day or the next day?

    why i ask this question.
    because during this 3-4 months i have problem with googlebot. it cause overloading every time, when he visit my dinamic site {here is my post about googlebot overloading}

    1 week ago i make some changes on my server and the great thing i did is, that i cached pages. [update cache once week]
    so if anybody check one page , system make cache of this page and if anybody after this want to view this page, script don't search this page in database. he got cached version.

    so ,,, when googlebot start crawling my site(yesterday), i check that my server load start to increase (in normal time it was 0.1-0,3 (4 cpus)). durign first hours server load increased to 50- 80 and even 105.
    server was still up

    today, at morning googlebot make a little pause (2-3hours) and start crawling with the same intensity (1000 pages/1hour)
    but my server load is 0.1-0.5. !!!

    why ? what changes?
    i guess, that one of the reason can be, that Gbot visit the second time pages, which he visit yesterday (so checking cached versions, he cann't overload server)

    what do u think about this?
    does Gbot during one update visit several times the same pages, which he visit 2-3 hours ago. ?

    Your Health Encyclopedia
    Medical and health consumer information resources containing comprehensive and unbiased information in patient-friendly language

  2. #2
    Join Date
    Jun 2002
    United Kingdom
    I notice that on my site.... i have about 8 or 9 different google bots all spidering at the same time...... the main reason i think this happens is that there are simutanious bots running at the same time..... and they are all just following links around the net, and i think its because a few different bots are following urls to your site, from other sites, at the same time.

    Googlebot deffinently spiders pages on my site more than once ..... its usually on my site a few hours a day.

    here is googlebots help page:

  3. #3

    why Gbot visit my site, but don't index


    my site was dropped from Google SERPS 2.5-3 months ago (imho, when Bourbon update starts)

    1-st month
    1. mywebsite disappear from google (searching,
    2. but are listed 15-20k results [] from my site (most of them with title and description, but with "supplemental result" mark)
    2-nd month

    1. 10-11k results, but most of them without title and description, and without supplemental result mark.

    2. searching got =>2-3 years old homepage (like this: without title and description
    3-rd month (may15-june15)

    1. 500-600 results (+-20 every day) (50% without title and description)

    for example i check my homepages in this results, it was visited at 13june, but in cache is result from 30 may !

    2. (without www, without title and description)


    now pls check stats of my awstat

    January (when everything is ok)
    Googlebot 30752+168 1.93 GB

    April (my site was dropped)
    [btw before this i add ~near 20mb content]
    Googlebot 5440+65 380.26 MB

    Googlebot 8709+47 648.51 MB

    June (for 15 days)
    Googlebot 11145+31 351.84 MB

    as u see from this last stats, gbot start to crawling (visiting) my pages, but u can see, that he don't index (cache it).

    btw, i update site every day , and 10-13 days ago i change site structure and desing. so even old files must be changed.

    what do u think?> what's the reason?
    why google don't want to index or cache my pages?
    or if he don't want to cache it, why he visit so much (btw all time gbot visit my site server load increased up to 80-90).

    so i don't undestand strategy of google

    Your Health Encyclopedia
    Medical and health consumer information resources containing comprehensive and unbiased information in patient-friendly language

  4. #4


    You can try the beta version of google sitemap. It is an xml file that will tell google bot how often your website changes and etc.


  5. #5
    google is complex as well as any search engine. if you really want to know everything go buy a book or do lots of research

  6. #6
    They scan various websites, and the index bot basically records the text, images, and an iamge of the page for the cached copy, and then it can add it to it's big index.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts