View Single Post
Old 03-28-2006, 04:12 PM   #1
bocephalus
Green Mole
 
Join Date: Jan 2006
Posts: 1
Incomplete spidering

I have not been able to find any other posts that answer my question.

The spidering stops with no error message at times ranging from 25-ish seconds to 4-ish minutes. Each time I spider and usually I click 'stop spider' and the number of pages in the database has not gone up.

I am indexing via browser - firefox and have
Code:
network.http.keep-alive.timeout = 600
, and I have a php.ini file in the phpdig main directory with the following settings:
Code:
max_execution_time = 600 
max_input_time = 600
What happens is that firefox stops loading the page and there is no error message or anything. At this point if I check the tempspider table it has a bunch of data in it. Then I go to the top of the spidering page and click "stop spider". it stops and I go back to the admin interface and the number of pages has not gone up even though the number of pages spidered was in the 200's and the page selected had a bunch of links to pages that still are not indexed. I don't know that any of those 200 were new pages, but there are definitely new pages linked.

i entered 1 link
Code:
search depth: 1
links per: 0
I have also tried with search depth of 20 with the same results.

some other settings:
Code:
define('SPIDER_MAX_LIMIT',20);          //max recurse levels in spider
define('RESPIDER_LIMIT',5);             //recurse respider limit for update
define('LINKS_MAX_LIMIT',20);           //max links per each level
define('RELINKS_LIMIT',5);              //recurse links limit for an update

//for limit to directory, URL format must either have file at end or ending slash at end
//e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php
define('LIMIT_TO_DIRECTORY',false);     //limit index to given (sub)directory, no sub dirs of dirs are indexed

define('LIMIT_DAYS',0);                 //default days before reindex a page
define('SMALL_WORDS_SIZE',2);           //words to not index - must be 2 or more
If something is killing the spider, then what would it be and how would I avoid that?
bocephalus is offline   Reply With Quote