PDA

View Full Version : Default Depth


tryangle
04-16-2004, 08:19 PM
Hi,

What is the default depth of spidering when it's run as a cronjob from the shell? I didn't find any option to set it.

Thanks in advance,
Randy

catchme
04-17-2004, 06:24 AM
- and also an addition to the question from tryangle - how can I limit the spider to only spider the pages that I list in a text file fed to the spider?

I've already tried changing the default level of spider recursion to be 0. But, still, the spider is very eager to index everything it finds!

bloodjelly
04-17-2004, 03:45 PM
I think the default level running via cron is the same as browser, i.e. SPIDER_DEFAULT_LIMIT. Catchme - are you talking about URLs in the text file? It should just spider those. If you're talking about limiting the spidering of pages during an update, you could set the RESPIDER_LIMIT down.

tryangle
04-17-2004, 04:56 PM
As far as I can tell, the default limit in the browser is set to zero, because that's the one that is (pre)selected on the select dropdown. But, I'm wondering if it's possible to set the depth when running it from the shell, or in crontab.

TIA for your help.

bloodjelly
04-17-2004, 04:59 PM
If you open the includes/config.php file, you'll see the setting for the default depth. That's the one used when run through shell/crontab I believe.

catchme
04-17-2004, 11:10 PM
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.

tryangle
04-19-2004, 04:54 AM
Hi,

Thanks for your reply. I guess I missed it because I was looking for Search Depth.

Charter
04-20-2004, 10:48 AM
Originally posted by catchme
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.

Hi. Try the following. In spider.php find:

$andmore_tempspider = 'AND upddate < now()';

and replace with:

$andmore_tempspider = 'AND upddate < DATE_SUB(now(), INTERVAL 500 DAY)';