Results 1 to 18 of 18
  1. #1
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    375

    Interesting conundrum for you clever folk

    Can anybody answer this question?

    The wayback machine claims to index 30 billion archived pages, whilst Google claim to archive 4 billion *approx*.

    Does this mean that the "the wayback machine" is almost 10 times bigger and more complex than google? If so then why is the wayback machine not as widely known and as big as google?

    www.archive.org for the waybackmachine?

  2. #2
    The Wayback machien will list the same site more than once because it has multiple archives of these sites at different points in time.
    -TheOneVader
    AIM: SCREENsaverCRAZY
    E-Mail: [email protected]

  3. #3
    Join Date
    Jul 2003
    Location
    Goleta, CA
    Posts
    5,550
    yep indeed this is the reason.
    Patron: I'd like my free lunch please.
    Cafe Manager: Free lunch? Did you read the fine print stating it was an April Fool's joke.
    Patron: I read the same way I listen, I ignore the parts I don't agree with. I'm suing you for false advertising.
    Cafe Owner: Is our lawyer still working pro bono?

  4. #4
    Join Date
    Feb 2003
    Location
    Leigh on Sea, Essex, England
    Posts
    41
    Just because it is bigger does not mean it is more complex. The Wayback search function does not work as well as Google, and it's also worth observing how slow a wayback search is compared to Google.

  5. #5
    Yes, it's the different eras - it'll have multiple copies of the same site and obviously keep hold of page copies long since gone.

    In fact, if I remember right, last year someone posted something on how WHT used to look when Matt (now of Spenix Hosting) used to run the place - it's gone through some changes.

  6. #6
    This is what WHT looked like on July 6, 2000: http://web.archive.org/web/200007062...stingtalk.com/
    -TheOneVader
    AIM: SCREENsaverCRAZY
    E-Mail: [email protected]

  7. #7
    Join Date
    Oct 2002
    Location
    North America
    Posts
    1,229
    Not to mention: they're different tools, with different purposes. One is an archive. The other is a search index.

    One deals with what was, while the other focuses on what is (and is most current, since you can pull cached pages from Google.)
    Lesli Schauf, TLM Network
    Linux and Windows Hosting: Scribehost

  8. #8
    Join Date
    Jul 2002
    Location
    UK
    Posts
    2,026
    i always wonder how the hell they store all that content...

    must have a LOT of hard disks there...
    Gone.

  9. #9
    Join Date
    Oct 2002
    Location
    EU - east side
    Posts
    21,913
    I've always wondered that same thing... I guess they do have a LOT of hard disks. (or maybe some other storage solutions?)

  10. #10
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    375
    I mean what kind of storage space are we talking about to host billions of pages? Its mind boggalling/boggeling/boggoleing.

  11. #11
    Join Date
    May 2003
    Posts
    598
    Boggling

    I only visit the wayback machine like once every 6 months, if that. I visit google several times each day.

  12. #12
    Join Date
    Oct 2002
    Location
    State of Disbelief
    Posts
    22,951
    Originally posted by phision.com
    i always wonder how the hell they store all that content...

    must have a LOT of hard disks there...
    Or one of them there *unlimited* hosting accounts I've heard tell of?

  13. #13
    Join Date
    Mar 2004
    Location
    San Diego, CA
    Posts
    540
    That website is pretty amazing. I once had a site that for whatever reason, we could not find the previous version that we had backed up. We ended up getting almost everything we needed from the wayback machine. It came in very handy.

  14. #14
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    375
    Originally posted by bear
    Or one of them there *unlimited* hosting accounts I've heard tell of?
    ROFL out loud. Brilliant point Bear.

  15. #15
    Join Date
    Aug 2000
    Location
    NYC
    Posts
    6,627
    Originally posted by I, Brian
    In fact, if I remember right, last year someone posted something on how WHT used to look when Matt (now of Spenix Hosting) used to run the place - it's gone through some changes.
    Last year? Last week! And about every six month prior to that over the past three or four years.
    Originally posted by iblive
    I once had a site that for whatever reason, we could not find the previous version that we had backed up. We ended up getting almost everything we needed from the wayback machine. It came in very handy.
    Yep, very handy for your competitors to see how your prices have evolved, for site rippers to get copy that you're not currently using and so probably not checking for duplicates of, for customers to look back and see if there's a sale they've missed that they can ask for...

    I exclude a lot of sites from archive.org through robots.txt.
    Specializing in SEO and PPC management.

  16. #16
    Join Date
    Apr 2004
    Location
    Kansas City, MO
    Posts
    53
    You make a good point JayC. However, you can also use the wayback machine to help you when your content has been stolen. It shows to outside parties exactly how long you have had your content available, and proove that you used it before someone else did.

    I guess the possitives and negatives cancel each other out
    Take care,
    Brad Birmingham
    http://www.bluevirtual.com

  17. #17
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    375
    Rather interesting points!! I hadn't thought about it from that aspect.

  18. #18
    Join Date
    Apr 2004
    Location
    United Kingdom
    Posts
    375
    Oh! Another interesting point to the already interesting point!!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •