Web Hosting Talk







View Full Version : How does search engines work so fast?


Oakii
12-31-2004, 05:41 PM
After using the windows search tool ... i begun to wonder how search engines are able to return results, on the fly, in so little time compared to the windows search

anyone care to explain?

sid007
12-31-2004, 05:46 PM
http://www.google.com/technology/pigeonrank.html

websterworld
12-31-2004, 05:47 PM
Windows is crap. You can try and install the google desktop search program and see how much faster it works.

RossH
12-31-2004, 05:51 PM
Well I don't know if I can help that much but I'll tell you what I saw from Google's cage last time I was out at the datacenter.

Google has massive amounts of cabnites on wheels. In these cabnites are just motherboards and power supplies sitting on trays, 2 for each tray. These motherboards have dual gigabit ethernet controllers on them, both of which are hooked up.

So my guess is google is having these boards boot of some server and using them to serve their content. It is a very cheap method if you think about it. Anyway they probably have some sort of amazing load balancing in front of all these boards that are serving content. Then they probably have some awesome storage back end for the acutal data & searches. They also have 6 gig-e's all going to their space, and this is just one of 5 DC's they occupy.......

Oakii
12-31-2004, 05:57 PM
So instead of the result coming from one server, it's being worked on by the entire cluster? (like a rendering farm...?)

RossH
12-31-2004, 06:00 PM
Originally posted by Diaga
So instead of the result coming from one server, it's being worked on by the entire cluster? (like a rendering farm...?)

By a cluster that in their one spot in this DC was bigger then most houses and filled with just motherboard on rack trays.

azizny
12-31-2004, 06:08 PM
Because windows search is "Live"..
Peace,

sid007
12-31-2004, 06:09 PM
Here's a ZDnet article about Google Hardware: http://insight.zdnet.co.uk/hardware/servers/0,39020445,39175560,00.htm

Over four billion Web pages, each an average of 10KB, all fully indexed

Up to 2,000 PCs in a cluster

Over 30 clusters

104 interface languages including Klingon and Tagalog

One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue

Sustained transfer rates of 2Gbps in a cluster

An expectation that two machines will fail every day in each of the larger clusters

No complete system failure since February 2000

It is one of the largest computing projects on the planet, arguably employing more computers than any other single, fully managed system (we're not counting distributed computing projects here), some 200 computer science PhDs, and 600 other computer scientists.

TR Seeks
12-31-2004, 06:12 PM
gogle are very very big... you cant even compare them to windows...

Oakii
12-31-2004, 06:15 PM
Originally posted by LessHost
Here's a ZDnet article about Google Hardware: http://insight.zdnet.co.uk/hardware/servers/0,39020445,39175560,00.htm

Thanks for the link


Because windows search is "Live"..
Peace,

google's results are generated "live" also

websterworld
12-31-2004, 06:21 PM
Well those are two different things. An application that searches only a few or one hard-drive for files, locally... or a search engine with 5 huge clusters.

The reason the windows search is slow, is simple. Its inefficient. Its inefficient because it was poorly designed and programmed.

To compare apples to apples and not apples to oranges, try a different application that searches your computer locally. Such as the google desktop search.

Hell, even try dos.

TR Seeks
12-31-2004, 06:23 PM
Originally posted by websterworld


The reason the windows search is slow, is simple. Its inefficient. Its inefficient because it was poorly designed and programmed.

[/B]


pretty much sums up windows

JayC
12-31-2004, 07:10 PM
Google isn't searching through files on a hard drive, it's searching tokeninzed items in a highly optimized index. That is, it's not searching through the millions of documents in the index before responding to a search, it's searching through a list of search terms and their positions within documents.

RossH
12-31-2004, 07:18 PM
Originally posted by LessHost
Here's a ZDnet article about Google Hardware: http://insight.zdnet.co.uk/hardware/servers/0,39020445,39175560,00.htm

If they call the people that I saw computer scientists, then thats real funny.

Oakii
12-31-2004, 08:01 PM
Computer science - spiffy word for the study of programming
Computer scientist - programmer

I think it's refering to the people who write google's engine

Jasber
12-31-2004, 08:12 PM
I remember reading a theory a while ago that Google stores the entire Internet in RAM.

I'm not the best with hardware, and while it does seem extremely difficult; I think it's definitely a possibility.

JayC
12-31-2004, 08:40 PM
Originally posted by Jasber
I remember reading a theory a while ago that Google stores the entire Internet in RAM. Like I said, they don't have to use "the entire Internet" for the purpose of answering queries; just the index to the term vector database. It's possible that copies of that are stored in RAM. Once the queries answered there, they grab snippets with which to fill out the result pages -- simply by following the pointers (specifying the exact offset within the indicated file at which the searched term appears) in the term vector database to the cached copies of pages, from which the result pages are built. Because they're following those pointers to not just specific documents but to specific points within the documents (actually the original design had the entire document database in a single file, but I don't know if that's still the case), there's none of the searching that's necessary in a "Windows search."

BigBison
12-31-2004, 09:13 PM
Originally posted by Diaga
Computer science - spiffy word for the study of programming
Computer scientist - programmer

Programming is just one field within Computer Science (http://en.wikipedia.org/wiki/Computer_science). Some computer scientists don't program at all, for instance "Hardware Jocks".