|
12-28-2003, 12:50 PM | #1 |
Green Mole
Join Date: Dec 2003
Posts: 6
|
Spider Problem
Hey, i'm having a problem trying to spider my site. I've read through the other forum topics that match my symptoms but while the blame mostly is aimed at safe_mode being on, my host has it off.
Basically when i'm running spider.php it starts indexing as i'd expect but then hangs after about 10seconds with IE displaying the 'done' alert in the status bar. On the admin index page it says the page is locked as the spider is still running, but nothing else is added. I downloaded the latest version of phpdig (1.6.5 i believe) and am on PHP Version 4.2.3 (to see my PHP settings if it helps, look here). The work the spider manages before the hanging bug is what i'd expect .. i can search the pages it's indexed and am pleased with the results, just I need the whole site done! The search results page can be found here. Thanks for any help --Edit-- I've just added this screenshot of what happens while running spider.php, in case this is of any help: Spider.php Screenshot --2nd Edit-- And thought i'd add my robots.txt file too User-agent: PhpDig Disallow: /forum Disallow: /phpMyAdmin Disallow: /sql Disallow: /templates Disallow: /templates_c Allow: /forum/index.php User-agent: * Disallow: / Last edited by i_am_cam; 12-28-2003 at 01:07 PM. |
12-28-2003, 02:27 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. PhpDig is restrictive when it parses a robots.txt file. Try applying the code in this thread and then set the robots.txt file as so:
Code:
User-agent: PhpDig Disallow: User-agent: * Disallow: /
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-28-2003, 02:39 PM | #3 | |
Green Mole
Join Date: Dec 2003
Posts: 6
|
Quote:
I've changed the code as suggested in the thread you linked to and modified the robots.txt file as you said, and am getting the same problem each time .. namely that spider.php freezes during the indexing process and locks the site while not indexing any further. I should also mention I have tried completely removing the robots.txt file with no success. As for the tempspider table, here is the phpMyAdmin dumps in csv and xml Last edited by i_am_cam; 12-28-2003 at 02:42 PM. |
|
12-28-2003, 02:56 PM | #4 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Adding css to the FORBIDDEN_EXTENSIONS in the config file should prevent errors from appearing in the tempspider table. Anyway, this seems like a timeout issue. What is the time limit in httpd.conf?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-28-2003, 03:05 PM | #5 | |
Green Mole
Join Date: Dec 2003
Posts: 6
|
ok i've changed the config line to
Quote:
as for httpd.conf, I don't have access to this file on my host :/ my max_execution_time is set to 50000 if this helps Last edited by i_am_cam; 12-28-2003 at 03:23 PM. |
|
12-28-2003, 04:45 PM | #6 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What happens if you try to crawl using a different browser?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-29-2003, 02:04 AM | #7 | |
Green Mole
Join Date: Dec 2003
Posts: 6
|
Quote:
|
|
12-29-2003, 05:47 AM | #8 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. I'm thinking that this is a timeout issue with your PHP being in CGI mode. The max_execution_time says 50000 but it seems like the timeout is 30 seconds. What errors, if any, are showing in your PHP error log?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-29-2003, 06:27 AM | #9 |
Green Mole
Join Date: Dec 2003
Posts: 6
|
i'm afraid that log_errors in php.ini is set to Off by my host and I don't have access to php.ini in order to change this
|
12-29-2003, 07:04 AM | #10 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. I crawled your site using PHP with Server API as Apache and as CGI. Both were successful and below is the output using PHP in CGI mode. Maybe your host can shed some light on this issue as I'm not sure of the problem.
Spidering in progress... -------------------------------------------------------------------------------- SITE : http://liveinglasgow.com/ Exclude paths : - @NONE@ 1:http://liveinglasgow.com/archive.php (time : 00:00:07) + + + + + + + level 1... 2:http://liveinglasgow.com/privacy.php (time : 00:00:14) + 3:http://liveinglasgow.com/archive.php?start=21&sort=&mode=&size= (time : 00:00:19) + + + + + + 4:http://liveinglasgow.com/archive.php?start=&sort=date&mode=desc&size= (time : 00:00:25) 5:http://liveinglasgow.com/archive.php?start=&sort=date&mode=asc&size= (time : 00:00:29) 6:http://liveinglasgow.com/archive.php?start=&sort=title&mode=desc&size= (time : 00:00:33) 7:http://liveinglasgow.com/archive.php?start=&sort=title&mode=asc&size= (time : 00:00:38) 8:http://liveinglasgow.com/index.php (time : 00:00:42) level 2... 9:http://liveinglasgow.com/archive.php?start=41&sort=&mode=&size= (time : 00:00:46) + + + + 10:http://liveinglasgow.com/archive.php?start=1&sort=&mode=&size= (time : 00:00:52) + + + + 11:http://liveinglasgow.com/archive.php?start=21&sort=date&mode=desc&size= (time : 00:00:58) 12:http://liveinglasgow.com/archive.php?start=21&sort=date&mode=asc&size= (time : 00:01:04) 13:http://liveinglasgow.com/archive.php?start=21&sort=title&mode=desc&size= (time : 00:01:10) 14:http://liveinglasgow.com/archive.php?start=21&sort=title&mode=asc&size= (time : 00:01:15) 15:http://liveinglasgow.com/ (time : 00:01:17) level 3... 16:http://liveinglasgow.com/archive.php?start=41&sort=title&mode=desc&size= (time : 00:01:20) 17:http://liveinglasgow.com/archive.php?start=41&sort=title&mode=asc&size= (time : 00:01:24) 18:http://liveinglasgow.com/archive.php?start=41&sort=date&mode=asc&size= (time : 00:01:28) 19:http://liveinglasgow.com/archive.php?start=41&sort=date&mode=desc&size= (time : 00:01:32) 20:http://liveinglasgow.com/archive.php?start=1&sort=title&mode=asc&size= (time : 00:01:35) 21:http://liveinglasgow.com/archive.php?start=1&sort=title&mode=desc&size= (time : 00:01:39) 22:http://liveinglasgow.com/archive.php?start=1&sort=date&mode=asc&size= (time : 00:01:44) 23:http://liveinglasgow.com/archive.php?start=1&sort=date&mode=desc&size= (time : 00:01:48) No link in temporary table -------------------------------------------------------------------------------- links found : 23 http://liveinglasgow.com/archive.php http://liveinglasgow.com/privacy.php http://liveinglasgow.com/archive.php?start=21&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=&sort=title&mode=asc&size= http://liveinglasgow.com/index.php http://liveinglasgow.com/archive.php?start=41&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=1&sort=&mode=&size= http://liveinglasgow.com/archive.php?start=21&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=21&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=21&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=21&sort=title&mode=asc&size= http://liveinglasgow.com/ http://liveinglasgow.com/archive.php?start=41&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=41&sort=title&mode=asc&size= http://liveinglasgow.com/archive.php?start=41&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=41&sort=date&mode=desc&size= http://liveinglasgow.com/archive.php?start=1&sort=title&mode=asc&size= http://liveinglasgow.com/archive.php?start=1&sort=title&mode=desc&size= http://liveinglasgow.com/archive.php?start=1&sort=date&mode=asc&size= http://liveinglasgow.com/archive.php?start=1&sort=date&mode=desc&size= Optimizing tables... Indexing complete ! -------------------------------------------------------------------------------- [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
12-29-2003, 07:07 AM | #11 |
Green Mole
Join Date: Dec 2003
Posts: 6
|
Okay, i'll email my host and start chatting to them about this, see if they can help at all. Just out of curiosity .. if i had ssh access to a shell on my host and could run this spidering script there, do you think that would work? Or would the same constraints apply as when executing it via my brower?
Thanks a lot for the help Charter, believe me it's very much appreciated Cam |
12-29-2003, 07:45 AM | #12 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. My understanding is that program execution via SSH bypasses the web application server, but the program execution would still be subject to the PHP configuration itself.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
spider.php problem | digdug | Script Installation | 8 | 10-18-2006 07:25 AM |
Spider problem, Search mb_ereg_replace problem. (Fixed?!) | cpeter | Troubleshooting | 0 | 02-24-2006 01:56 PM |
Problem running spider from Command Line | joshuag200 | Troubleshooting | 17 | 09-13-2004 07:57 PM |
phpdig spider hangs (a powerpoint file problem) | davideyre | Troubleshooting | 1 | 03-29-2004 12:35 PM |
Indexing problem: PhpDig will not spider all of the site | mih | Troubleshooting | 5 | 03-24-2004 11:54 PM |