PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   The Mole Hole (http://www.phpdig.net/forum/forumdisplay.php?f=17)
-   -   pages indexed (http://www.phpdig.net/forum/showthread.php?t=1794)

jmitchell 01-31-2005 07:10 AM

pages indexed
 
So, how many pages have people indexed? NewTangent and myself have done about 30,000 pages with our installation of PHPDig, and we are wondering what other people have indexed (anyone over a million??)

JMitchell

Charter 01-31-2005 10:07 AM

I index just enough pages to test PhpDig or run the online demo, nowhere near 30,000 pages.

jmitchell 02-01-2005 10:55 AM

ok, I was wondering about phpdig speeds when you have a huge database, and if there was a way to have several databases, and phpdig searches through them, that way I could have multiple databases on multiple servers, thereby maximizing my speed - is this correct thinking, and can it be done?

jmitchell 02-02-2005 10:13 AM

I also remember that others were talking about huge databases a few weeks/months ago, and was wondering if they had any results.

jmitchell

Dave A 02-03-2005 01:37 AM

Size of database
 
My database stands at
Hosts : 105050 Entries
Pages : 3430752 Entries
Index : 14938699 Entries
Keywords : 392470 Entries
Temporary table : 0 Entries

Which is around 280mb of web space used but I am aware that some form of clipping may occur when the host free space starts to get small, so I increase it by another fifty mb.
Search times have shortened when I have recently gone over to braodband.

I have found that files left in the temp spider clog up the stsem and slow searches down a heap.

jmitchell 02-03-2005 10:05 AM

whats your search site Dave?

I'd like to see how you are using it.

JMitchell

Dave A 02-12-2005 10:49 AM

The problem starts when the search times start to be extended, it would appear that Google got around the problem by using an cluster of computers each containing some of the database, So I would imagine they have one front end machine that could lead the data on to machines that have smaller databases, maybe one database that covers A-b entries then another for C-D entries so each one only carries some of the load.

Over four billion Web pages, each an average of 10KB, all fully indexed.
Up to 2,000 PCs in a cluster.
Over 30 clusters.
104 interface languages including Klingon and Tagalog.
One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
Sustained transfer rates of 2Gbps in a cluster.
An expectation that two machines will fail every day in each of the larger clusters.
No complete system failure since February 2000.

More information can be found at:
http://www.zdnet.com.au/insight/soft...9168647,00.htm

It is a good read to see how they have managed to get around the same problems we are facing.
On a passing note it is cloudy this morning in New Zealand and if the sunburn times are less than twenty minutes I may have to increase my search capacity and see if I can catch some fish.
Never forget that running a search engine needs time spent searching out ever greater places where someone can extend the fishing rod, back of the boat, a glass of wine, indexing the ocean is a really important thing... It helps you find common answers to heaps of searches... Grin..

Dave A 02-13-2005 02:03 AM

Hi please check out my user profile for my search site and contact details.
This forum isn't the correct place to post things that can be indexed by major search engines and may appear to be adverts. Grin...! So please view my user profile, my contact details are available.
I am more than glad to assist anyone using this software to the best of my ability.
Many regards
Dave A

Dave A 02-15-2005 12:23 PM

Quote:

Originally Posted by jmitchell
whats your search site Dave?

I'd like to see how you are using it.

JMitchell

Hi It can be found at www.linknz.co.nz


All times are GMT -8. The time now is 02:48 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.