PDA

View Full Version : Spidering **VERY** Slow


Niall Fernie
07-12-2004, 04:24 AM
Hi, first post here as this is the first problem I've come across. Been using 1.4x until now and decided to go for the new version (shiney new features were to much to resist)

Now, though, I find that my site is impossible to spider. I takes for ever, or more like 4.5 seconds per page. I have no access to shell to see if this would be any faster, but 1.4x used to spider the site in about 10 minutes (13-1400 pages) but at 4.5 secs per page, I cannot get it to finish.

Now I have the time to waste, and on several occasions I've left the spider spidering for around 2 hours but for one reason or another my browser ends up saying "done" long before the spider process in finished and the spider page doesn't show the list of pages it found.

Is there something new thats causing things to take soooooo long. I will have to check with my hosts for any server info you might need to help and would like to offer my thinks in advance.

(if this has already been covered in another thread, please delete this and pm me the link)

allergie
07-12-2004, 04:27 AM
Ten minutes for 1400 pages? Ouch... I've never had that!

For indexing the first time a 1900 pages website it needed 4 hours on my own webserver.

I'm surprised that it should have been possible to do it quickly.

Niall Fernie
07-12-2004, 07:55 AM
Finally finished!!! w00t!!!

Grabed the end of the spidering to illustrate the delay between pages.


Meta Robots = NoIndex, or already indexed : No content indexed
2495:http://www.caithness-business.co.uk/business_print.php?id=1012
(time : 04:01:52)

Meta Robots = NoIndex, or already indexed : No content indexed
2496:http://www.caithness-business.co.uk/business_print.php?id=1013
(time : 04:01:57)

Meta Robots = NoIndex, or already indexed : No content indexed
2497:http://www.caithness-business.co.uk/business_print.php?id=1064
(time : 04:02:03)

No link in temporary table

links found : 1247


From the last post it would seem normal? Has some kind of delay been added since the old version? If so, is there some way to adjust this so that those of us that have to wait for a "4 hour webpage" can mabee trim a little of that time off.

bloodjelly
07-12-2004, 09:02 AM
Hi Niall -

There is a bit of "sleep" code in spider.php that prevents phpDig from requesting pages too quickly from web hosts. You can find this line and change the sleep time of 5 seconds between links to your choosing:
// Spidering ...
while($level <= $limit) {
sleep(5);
This will help make spidering faster, as long as the site your spidering doesn't mind the increased load.

Niall Fernie
07-13-2004, 12:45 AM
Brilliant!!!

Thanks for that.

Will check the stats to see when the server is quiet before I make it busy :)