PDA

View Full Version : Incomplete spidering


bocephalus
03-28-2006, 04:12 PM
I have not been able to find any other posts that answer my question.

The spidering stops with no error message at times ranging from 25-ish seconds to 4-ish minutes. Each time I spider and usually I click 'stop spider' and the number of pages in the database has not gone up.

I am indexing via browser - firefox and have network.http.keep-alive.timeout = 600, and I have a php.ini file in the phpdig main directory with the following settings:
max_execution_time = 600
max_input_time = 600

What happens is that firefox stops loading the page and there is no error message or anything. At this point if I check the tempspider table it has a bunch of data in it. Then I go to the top of the spidering page and click "stop spider". it stops and I go back to the admin interface and the number of pages has not gone up even though the number of pages spidered was in the 200's and the page selected had a bunch of links to pages that still are not indexed. I don't know that any of those 200 were new pages, but there are definitely new pages linked.

i entered 1 link
search depth: 1
links per: 0
I have also tried with search depth of 20 with the same results.

some other settings:
define('SPIDER_MAX_LIMIT',20); //max recurse levels in spider
define('RESPIDER_LIMIT',5); //recurse respider limit for update
define('LINKS_MAX_LIMIT',20); //max links per each level
define('RELINKS_LIMIT',5); //recurse links limit for an update

//for limit to directory, URL format must either have file at end or ending slash at end
//e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php
define('LIMIT_TO_DIRECTORY',false); //limit index to given (sub)directory, no sub dirs of dirs are indexed

define('LIMIT_DAYS',0); //default days before reindex a page
define('SMALL_WORDS_SIZE',2); //words to not index - must be 2 or more


If something is killing the spider, then what would it be and how would I avoid that?