Thread: pages indexed
View Single Post
Old 02-12-2005, 11:49 AM   #7
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
The problem starts when the search times start to be extended, it would appear that Google got around the problem by using an cluster of computers each containing some of the database, So I would imagine they have one front end machine that could lead the data on to machines that have smaller databases, maybe one database that covers A-b entries then another for C-D entries so each one only carries some of the load.

Over four billion Web pages, each an average of 10KB, all fully indexed.
Up to 2,000 PCs in a cluster.
Over 30 clusters.
104 interface languages including Klingon and Tagalog.
One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
Sustained transfer rates of 2Gbps in a cluster.
An expectation that two machines will fail every day in each of the larger clusters.
No complete system failure since February 2000.

More information can be found at:
http://www.zdnet.com.au/insight/soft...9168647,00.htm

It is a good read to see how they have managed to get around the same problems we are facing.
On a passing note it is cloudy this morning in New Zealand and if the sunburn times are less than twenty minutes I may have to increase my search capacity and see if I can catch some fish.
Never forget that running a search engine needs time spent searching out ever greater places where someone can extend the fishing rod, back of the boat, a glass of wine, indexing the ocean is a really important thing... It helps you find common answers to heaps of searches... Grin..
Dave A is offline   Reply With Quote