PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 10-09-2003, 06:19 AM   #1
rayvd
Green Mole
 
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
Typical run times...

Yes, I know message boards have search features built in to them...

Nevertheless, we have been setting up phpdig and as a test have had it spider several message boards hosted on our server. One such board has about 9000 posts, and I realize, probably a lot of links that loop round and round... I set recursion at 2 and let phpdig go. 16 hours later it was still at it!

Is this typical? What are some runtimes some of you have experienced, and on what size of a site? Not necessarily looking for other message board crawling times, just anything in general that I can compare against.

Since some of these sites are "our" sites and on a local connection, it might be prudent to remove the sleep(2) call in spider.php to speed things up...
rayvd is offline   Reply With Quote
Old 10-09-2003, 03:43 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You might try using the PhpDig include and exclude comments for the header and footer, and if not already done, try running PhpDig from shell rather than from a browser.

Another idea, off the top of my head, would be to write a quick script to port the post URLs to a file, and then just crawl that file at level one.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-19-2003, 01:17 PM   #3
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
For those sites that take a looong time to index, it might be nice to have interruptible indexing (stop for a while, I'll tell you when to continue) - but that's a mod request and should be placed elsewhere I guess?

Anyway, if I can influence the design of the site to be indexed, what advice can I get from the gurus? What are the DOs and the DONTs for quick indexing? Where does PhpDig loose a lot of time when indexing sites? ...
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 11-19-2003, 01:20 PM   #4
rayvd
Green Mole
 
Join Date: Oct 2003
Location: Mesa, AZ
Posts: 15
What I ended up doing was breaking my list of URL's into 7, 8, 9 or 10 sublists and then starting a crawler for each of them.

Pseudo-threading!

Doesn't make any individual site crawl faster, but the whole gets completed quicker.
rayvd is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Could not run set names. gle76130 Script Installation 1 04-11-2005 10:32 AM
Search times and speed Dave A The Mole Hole 6 03-20-2005 09:59 AM
why script can run all time ? fr :: anonymus The Mole Hole 3 12-10-2003 08:25 AM
Run at PHP 4.3.2 MySQL 4 Rolandks Troubleshooting 4 09-18-2003 04:36 AM
2 Linux-related articles from today's NY Times maggiemel The Mole Hole 0 08-05-2003 04:14 AM


All times are GMT -8. The time now is 12:49 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.