![]() |
|
![]() |
#1 |
Green Mole
Join Date: Apr 2004
Posts: 3
|
incremental building of dig database
Greetings Dig Board Members.
I've just started working with dig. Overall, I am happy to find such a fine search engine tool available open source for PHP. I run a couple of larger sites 8,000 - 80,000 pages of content, that I have interest to index with a search engine. These sites will add about 20 new pages of content per day. I've noticed that, while possible to index these pages with dig, it can be a slow process sometimes - and also a load intensive process as well. What I want to accomplish is - to make incremental builds of the dig database. First, I will build the existing sites. Then afterwards, I would like to index the new files that are added to the site - perhaps every few hours. Can someone suggest a protocol for only indexing the new files that are added recently into the site? My thought is to write a script that collects the URIs of the new pages into a file, and then feed this to the spider.php file, when I run it via cron every few hours. Is this a common procedure for using Dig? thanks! Danny |
![]() |
![]() |
![]() |
#2 |
Purple Mole
Join Date: Jan 2004
Posts: 694
|
Hi, Danny, and welcome to the forum! We're glad you could join us.
![]() I haven't done spidering myself via cron, but what you've outlined will work very well with phpDig. Sounds like you may already know how to do this, but there is some discussion about indexing like you're talking about here in the documentation. I hope you'll find it useful. |
![]() |
![]() |
![]() |
Thread Tools | |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
dig for certain words | nmott | How-to Forum | 1 | 02-26-2005 08:02 PM |
is 'dig 1.8.3 slower than 'dig 1.8.0 | JÿGius³ | Troubleshooting | 1 | 08-02-2004 07:32 PM |
Big dig database - spidering question | JWSmythe | How-to Forum | 3 | 05-14-2004 10:52 PM |
how to dig only 1 page | zaartix | Troubleshooting | 8 | 05-11-2004 01:23 AM |
Why Can't I dig this site???? | lighthouse | Troubleshooting | 4 | 03-12-2004 12:09 AM |