PhpDig.net

Go Back   PhpDig.net > General Forums > The Mole Hole

Reply
 
Thread Tools
Old 01-31-2005, 07:10 AM   #1
jmitchell
Orange Mole
 
Join Date: Dec 2004
Location: Tennessee
Posts: 60
Talking pages indexed

So, how many pages have people indexed? NewTangent and myself have done about 30,000 pages with our installation of PHPDig, and we are wondering what other people have indexed (anyone over a million??)

JMitchell
__________________
60,000 pages indexed!!!!! http://www.sharemylink.com
jmitchell is offline   Reply With Quote
Old 01-31-2005, 10:07 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
I index just enough pages to test PhpDig or run the online demo, nowhere near 30,000 pages.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-01-2005, 10:55 AM   #3
jmitchell
Orange Mole
 
Join Date: Dec 2004
Location: Tennessee
Posts: 60
ok, I was wondering about phpdig speeds when you have a huge database, and if there was a way to have several databases, and phpdig searches through them, that way I could have multiple databases on multiple servers, thereby maximizing my speed - is this correct thinking, and can it be done?
__________________
60,000 pages indexed!!!!! http://www.sharemylink.com
jmitchell is offline   Reply With Quote
Old 02-02-2005, 10:13 AM   #4
jmitchell
Orange Mole
 
Join Date: Dec 2004
Location: Tennessee
Posts: 60
I also remember that others were talking about huge databases a few weeks/months ago, and was wondering if they had any results.

jmitchell
__________________
60,000 pages indexed!!!!! http://www.sharemylink.com
jmitchell is offline   Reply With Quote
Old 02-03-2005, 01:37 AM   #5
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Size of database

My database stands at
Hosts : 105050 Entries
Pages : 3430752 Entries
Index : 14938699 Entries
Keywords : 392470 Entries
Temporary table : 0 Entries

Which is around 280mb of web space used but I am aware that some form of clipping may occur when the host free space starts to get small, so I increase it by another fifty mb.
Search times have shortened when I have recently gone over to braodband.

I have found that files left in the temp spider clog up the stsem and slow searches down a heap.
Dave A is offline   Reply With Quote
Old 02-03-2005, 10:05 AM   #6
jmitchell
Orange Mole
 
Join Date: Dec 2004
Location: Tennessee
Posts: 60
whats your search site Dave?

I'd like to see how you are using it.

JMitchell
__________________
60,000 pages indexed!!!!! http://www.sharemylink.com
jmitchell is offline   Reply With Quote
Old 02-12-2005, 10:49 AM   #7
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
The problem starts when the search times start to be extended, it would appear that Google got around the problem by using an cluster of computers each containing some of the database, So I would imagine they have one front end machine that could lead the data on to machines that have smaller databases, maybe one database that covers A-b entries then another for C-D entries so each one only carries some of the load.

Over four billion Web pages, each an average of 10KB, all fully indexed.
Up to 2,000 PCs in a cluster.
Over 30 clusters.
104 interface languages including Klingon and Tagalog.
One petabyte of data in a cluster -- so much that hard disk error rates of 10-15 begin to be a real issue.
Sustained transfer rates of 2Gbps in a cluster.
An expectation that two machines will fail every day in each of the larger clusters.
No complete system failure since February 2000.

More information can be found at:
http://www.zdnet.com.au/insight/soft...9168647,00.htm

It is a good read to see how they have managed to get around the same problems we are facing.
On a passing note it is cloudy this morning in New Zealand and if the sunburn times are less than twenty minutes I may have to increase my search capacity and see if I can catch some fish.
Never forget that running a search engine needs time spent searching out ever greater places where someone can extend the fishing rod, back of the boat, a glass of wine, indexing the ocean is a really important thing... It helps you find common answers to heaps of searches... Grin..
Dave A is offline   Reply With Quote
Old 02-13-2005, 02:03 AM   #8
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Hi please check out my user profile for my search site and contact details.
This forum isn't the correct place to post things that can be indexed by major search engines and may appear to be adverts. Grin...! So please view my user profile, my contact details are available.
I am more than glad to assist anyone using this software to the best of my ability.
Many regards
Dave A
Dave A is offline   Reply With Quote
Old 02-15-2005, 12:23 PM   #9
Dave A
Purple Mole
 
Dave A's Avatar
 
Join Date: Aug 2004
Location: North Island New Zealand
Posts: 170
Quote:
Originally Posted by jmitchell
whats your search site Dave?

I'd like to see how you are using it.

JMitchell
Hi It can be found at www.linknz.co.nz
Dave A is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Spider stops before all pages are indexed halide Troubleshooting 3 07-19-2005 12:26 AM
Too few pages indexed, Umlaut problem salzbermat Troubleshooting 4 12-16-2004 10:00 AM
Pages not re-indexed wx3 Troubleshooting 0 09-16-2004 05:53 PM
Number of pages indexed claudiomet How-to Forum 0 08-30-2004 02:26 PM
how to index only not indexed pages? zaartix How-to Forum 2 07-14-2004 04:23 AM


All times are GMT -8. The time now is 05:08 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.