PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 03-28-2006, 04:12 PM   #1
bocephalus
Green Mole
 
Join Date: Jan 2006
Posts: 1
Incomplete spidering

I have not been able to find any other posts that answer my question.

The spidering stops with no error message at times ranging from 25-ish seconds to 4-ish minutes. Each time I spider and usually I click 'stop spider' and the number of pages in the database has not gone up.

I am indexing via browser - firefox and have
Code:
network.http.keep-alive.timeout = 600
, and I have a php.ini file in the phpdig main directory with the following settings:
Code:
max_execution_time = 600 
max_input_time = 600
What happens is that firefox stops loading the page and there is no error message or anything. At this point if I check the tempspider table it has a bunch of data in it. Then I go to the top of the spidering page and click "stop spider". it stops and I go back to the admin interface and the number of pages has not gone up even though the number of pages spidered was in the 200's and the page selected had a bunch of links to pages that still are not indexed. I don't know that any of those 200 were new pages, but there are definitely new pages linked.

i entered 1 link
Code:
search depth: 1
links per: 0
I have also tried with search depth of 20 with the same results.

some other settings:
Code:
define('SPIDER_MAX_LIMIT',20);          //max recurse levels in spider
define('RESPIDER_LIMIT',5);             //recurse respider limit for update
define('LINKS_MAX_LIMIT',20);           //max links per each level
define('RELINKS_LIMIT',5);              //recurse links limit for an update

//for limit to directory, URL format must either have file at end or ending slash at end
//e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php
define('LIMIT_TO_DIRECTORY',false);     //limit index to given (sub)directory, no sub dirs of dirs are indexed

define('LIMIT_DAYS',0);                 //default days before reindex a page
define('SMALL_WORDS_SIZE',2);           //words to not index - must be 2 or more
If something is killing the spider, then what would it be and how would I avoid that?
bocephalus is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 11:52 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.