http://www.eclipsecon.org/sessions.php



The Google File System is optimized for handling large 64 megabyte blocks of data

if you have thousands of PCs, you can expect one (failure) a day," he said. "So you better deal with that in an automated way, or you will have service outages.

http://www.eweek.com/article2/0,1759,1772204,00.asp
Google uses an index, similar to a book's index, which takes several days on hundreds of machines to compile, Hoelzle said. It has more than 8 billion Web documents and 1.1 billion images.

Then Google uses its PageRank system for ranking and ordering the Web pages, he said. "Then we split them into pieces called shards, small enough to put on various machines. And we replicate the shards."

So an incoming query would hit the Google Web server and then the index server and eventually a document server that contains copies of the Web pages Google downloads.
The web is 10 billion pages - google has 8.5 billion