PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Very Slow Indexing (http://www.phpdig.net/forum/showthread.php?t=633)

airplay 03-09-2004 01:48 PM

Very Slow Indexing
 
Hi, this is my first attempt at using php dig.
I have a site that has around 32,000 pages.
When I first ran the spider it ran overnight and had around 10,000 pages indexed when I woke up.
I notced there was a long time inbetween pages (around 6 seconds). I found the "sleep" in your code so I went ahead and changed it to 2 seconds.

I then cleared out all of the tables and started over.
This sped up the process at first, but now it has been 24 hours and the darn thing is still running.
Is this normal for it to take this long to index 32,000 pages?

Any ideas?

Thanks!
Airplay....

Charter 03-09-2004 02:18 PM

Hi airplay, and welcome to PhpDig.net!

With many pages, perhaps set the following in the config.php file, where X is one or two:
PHP Code:

define('LIMIT_DAYS',0);              //default days before reindex a page
define('SPIDER_MAX_LIMIT',X);        //max recurse levels in spider
define('SPIDER_DEFAULT_LIMIT',X);    //default value
define('RESPIDER_LIMIT',X);          //recurse limit for update 

and then crawl your site in chunks.

One thing I've noticed is that users in general tend to set the search depth to the highest possible value and then let the robot run. This tends to get a lot of repeat documents, lending to a longer index time.

Also, when you want to start over, it might be better to delete the site from the admin panel, as this will empty the tables (execpt for keywords and logs) and delete the TXT files. The clean dictionary link will clean/empty the keywords table, but it is probably faster to do it from shell, and the logs tables would need to be emptied from shell or phpMyAdmin.

airplay 03-09-2004 02:20 PM

Charter
Excellent! Thanks for the quick reply! I'll give that a try and let you know how it goes!

Airplay....


All times are GMT -8. The time now is 07:18 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.