View Full Version : hello / directories / phpdig & others

01-11-2005, 03:19 AM
I've been a fan of phpDig for a long time now. I have had it installed to test on a small scale about a year ago.

I've got the the point where I would like to do a large search engine. I'm a bit concerned at a few poeple talking about effective size limits of 35k-70k indexed pages (slow search performance)

I would be looking at an index larger than that. Is this something that php-dig can index quickly enough? (ie non-instant percieved results would not be good enough)

Or am I beter off going with something like mnoGoSearch? (depending on the answer I will be setting up an indexer here on my dev server this week to give it a good test thrashing)

I am looking at doing something interesting with whatever I end up going with - I'll post details on exactly what later...

Also - can somebody recommend Directory software (ala ODP)?


01-11-2005, 05:59 AM
As I have not tried mnoGoSearch, I cannot give you any comparison information. If you wish to make a large scale search engine, then you should consider that you'll probably need a cluster of servers to process requests. Also, you'll probably want to run precompiled code rather than parse code on each run, utilize a caching system, send compressed output, etcetera. Having a server and a script is not enough to go large scale. As for a script directory, there used to be something called "PHP Script Index" but I'm not sure if it's still available.

01-12-2005, 11:38 PM
You have confirmed what I suspected from my own research. Pity.

It would seem that the best performance - a few mill pages indexed at <2 sec - is acheived with DataPark followed by mnogo.

I do have a question though - it seems like a alot of the "grunt" work for SEs is done by script/bins outside of the DB, instead of the database server? I had thought that the DB would do the hard work.

Why is that?


01-13-2005, 12:18 AM
Maybe this (http://www.phpdig.net/forum/showthread.php?t=708) thread can answer your DB question, at least WRT PhpDig.