PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Typical run times... (http://www.phpdig.net/forum/showthread.php?t=141)

rayvd 10-09-2003 06:19 AM

Typical run times...
 
Yes, I know message boards have search features built in to them... :)

Nevertheless, we have been setting up phpdig and as a test have had it spider several message boards hosted on our server. One such board has about 9000 posts, and I realize, probably a lot of links that loop round and round... I set recursion at 2 and let phpdig go. 16 hours later it was still at it!

Is this typical? What are some runtimes some of you have experienced, and on what size of a site? Not necessarily looking for other message board crawling times, just anything in general that I can compare against.

Since some of these sites are "our" sites and on a local connection, it might be prudent to remove the sleep(2) call in spider.php to speed things up...

Charter 10-09-2003 03:43 PM

Hi. You might try using the PhpDig include and exclude comments for the header and footer, and if not already done, try running PhpDig from shell rather than from a browser.

Another idea, off the top of my head, would be to write a quick script to port the post URLs to a file, and then just crawl that file at level one.

renehaentjens 11-19-2003 01:17 PM

For those sites that take a looong time to index, it might be nice to have interruptible indexing (stop for a while, I'll tell you when to continue) - but that's a mod request and should be placed elsewhere I guess?

Anyway, if I can influence the design of the site to be indexed, what advice can I get from the gurus? What are the DOs and the DONTs for quick indexing? Where does PhpDig loose a lot of time when indexing sites? ...

rayvd 11-19-2003 01:20 PM

What I ended up doing was breaking my list of URL's into 7, 8, 9 or 10 sublists and then starting a crawler for each of them.

Pseudo-threading!

Doesn't make any individual site crawl faster, but the whole gets completed quicker.


All times are GMT -8. The time now is 07:24 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.