04-14-2004, 01:07 PM
Greetings Dig Board Members.

I've just started working with dig. Overall, I am happy to find such a fine search engine tool available open source for PHP.

I run a couple of larger sites 8,000 - 80,000 pages of content, that I have interest to index with a search engine. These sites will add about 20 new pages of content per day.

I've noticed that, while possible to index these pages with dig, it can be a slow process sometimes - and also a load intensive process as well.

What I want to accomplish is - to make incremental builds of the dig database.

First, I will build the existing sites. Then afterwards, I would like to index the new files that are added to the site - perhaps every few hours.

Can someone suggest a protocol for only indexing the new files that are added recently into the site?

My thought is to write a script that collects the URIs of the new pages into a file, and then feed this to the spider.php file, when I run it via cron every few hours.

Is this a common procedure for using Dig?



04-14-2004, 07:59 PM
Hi, Danny, and welcome to the forum! We're glad you could join us. :)

I haven't done spidering myself via cron, but what you've outlined will work very well with phpDig. Sounds like you may already know how to do this, but there is some discussion about indexing like you're talking about here (http://www.phpdig.net/navigation.php?action=doc#toc8) in the documentation. I hope you'll find it useful.