![]() |
Spider Problem
Hey, i'm having a problem trying to spider my site. I've read through the other forum topics that match my symptoms but while the blame mostly is aimed at safe_mode being on, my host has it off.
Basically when i'm running spider.php it starts indexing as i'd expect but then hangs after about 10seconds with IE displaying the 'done' alert in the status bar. On the admin index page it says the page is locked as the spider is still running, but nothing else is added. I downloaded the latest version of phpdig (1.6.5 i believe) and am on PHP Version 4.2.3 (to see my PHP settings if it helps, look here). The work the spider manages before the hanging bug is what i'd expect .. i can search the pages it's indexed and am pleased with the results, just I need the whole site done! ;) The search results page can be found here. Thanks for any help :) --Edit-- I've just added this screenshot of what happens while running spider.php, in case this is of any help: Spider.php Screenshot --2nd Edit-- And thought i'd add my robots.txt file too User-agent: PhpDig Disallow: /forum Disallow: /phpMyAdmin Disallow: /sql Disallow: /templates Disallow: /templates_c Allow: /forum/index.php User-agent: * Disallow: / |
Hi. PhpDig is restrictive when it parses a robots.txt file. Try applying the code in this thread and then set the robots.txt file as so:
Code:
User-agent: PhpDig |
Quote:
I've changed the code as suggested in the thread you linked to and modified the robots.txt file as you said, and am getting the same problem each time .. namely that spider.php freezes during the indexing process and locks the site while not indexing any further. I should also mention I have tried completely removing the robots.txt file with no success. As for the tempspider table, here is the phpMyAdmin dumps in csv and xml |
Hi. Adding css to the FORBIDDEN_EXTENSIONS in the config file should prevent errors from appearing in the tempspider table. Anyway, this seems like a timeout issue. What is the time limit in httpd.conf?
|
ok i've changed the config line to
Quote:
as for httpd.conf, I don't have access to this file on my host :/ my max_execution_time is set to 50000 if this helps |
Hi. What happens if you try to crawl using a different browser?
|
Quote:
|
Hi. I'm thinking that this is a timeout issue with your PHP being in CGI mode. The max_execution_time says 50000 but it seems like the timeout is 30 seconds. What errors, if any, are showing in your PHP error log?
|
i'm afraid that log_errors in php.ini is set to Off by my host and I don't have access to php.ini in order to change this
|
Hi. I crawled your site using PHP with Server API as Apache and as CGI. Both were successful and below is the output using PHP in CGI mode. Maybe your host can shed some light on this issue as I'm not sure of the problem. :(
Spidering in progress... -------------------------------------------------------------------------------- SITE : http://liveinglasgow.com/ Exclude paths : - @NONE@ 1:http://liveinglasgow.com/archive.php (time : 00:00:07) + + + + + + + level 1... 2:http://liveinglasgow.com/privacy.php (time : 00:00:14) + 3:http://liveinglasgow.com/archive.php?start=21&sort=&mode=&size= (time : 00:00:19) + + + + + + 4:http://liveinglasgow.com/archive.php?start=&sort=date&mode=desc&size= (time : 00:00:25) 5:http://liveinglasgow.com/archive.php?start=&sort=date&mode=asc&size= (time : 00:00:29) 6:http://liveinglasgow.com/archive.php?start=&sort=title&mode=desc&size= (time : 00:00:33) 7:http://liveinglasgow.com/archive.php?start=&sort=title&mode=asc&size= (time : 00:00:38) 8:http://liveinglasgow.com/index.php (time : 00:00:42) level 2... 9:http://liveinglasgow.com/archive.php?start=41&sort=&mode=&size= (time : 00:00:46) + + + + 10:http://liveinglasgow.com/archive.php?start=1&sort=&mode=&size= (time : 00:00:52) + + + + 11:http://liveinglasgow.com/archive.php?start=21&sort=date&mode=desc&size= (time : 00:00:58) 12:http://liveinglasgow.com/archive.php?start=21&sort=date&mode=asc&size= (time : 00:01:04) 13:http://liveinglasgow.com/archive.php?start=21&sort=title&mode=desc&size= (time : 00:01:10) 14:http://liveinglasgow.com/archive.php?start=21&sort=title&mode=asc&size= (time : 00:01:15) 15:http://liveinglasgow.com/ (time : 00:01:17) level 3... 16:http://liveinglasgow.com/archive.php?start=41&sort=title&mode=desc&size= (time : 00:01:20) 17:http://liveinglasgow.com/archive.php?start=41&sort=title&mode=asc&size= (time : 00:01:24) 18:http://liveinglasgow.com/archive.php?start=41&sort=date&mode=asc&size= (time : 00:01:28) 19:http://liveinglasgow.com/archive.php?start=41&sort=date&mode=desc&size= (time : 00:01:32) 20:http://liveinglasgow.com/archive.php?start=1&sort=title&mode=asc&size= (time : 00:01:35) 21:http://liveinglasgow.com/archive.php?start=1&sort=title&mode=desc&size= (time : 00:01:39) 22:http://liveinglasgow.com/archive.php?start=1&sort=date&mode=asc&size= (time : 00:01:44) 23:http://liveinglasgow.com/archive.php?start=1&sort=date&mode=desc&size= (time : 00:01:48) No link in temporary table -------------------------------------------------------------------------------- links found : 23 http://liveinglasgow.com/archive.php http://liveinglasgow.com/privacy.php http://liveinglasgow.com/archive.php?start=21&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=&sort=title&mode=asc&size= http://liveinglasgow.com/index.php http://liveinglasgow.com/archive.php?start=41&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=1&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=21&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=21&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=21&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=21&sort=title&mode=asc&size= http://liveinglasgow.com/ http://liveinglasgow.com/archive.php?start=41&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=41&sort=title&mode=asc&size= http://liveinglasgow.com/archive.php?start=41&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=41&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=1&sort=title&mode=asc&size= http://liveinglasgow.com/archive.php?start=1&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=1&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=1&sort=date&mode=desc&size= Optimizing tables... Indexing complete ! -------------------------------------------------------------------------------- [Back] to admin interface. |
Okay, i'll email my host and start chatting to them about this, see if they can help at all. Just out of curiosity .. if i had ssh access to a shell on my host and could run this spidering script there, do you think that would work? Or would the same constraints apply as when executing it via my brower?
Thanks a lot for the help Charter, believe me it's very much appreciated :) Cam |
Hi. My understanding is that program execution via SSH bypasses the web application server, but the program execution would still be subject to the PHP configuration itself.
|
All times are GMT -8. The time now is 08:53 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.