PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   Default Depth (http://www.phpdig.net/forum/showthread.php?t=836)

tryangle 04-16-2004 08:19 PM

Default Depth
 
Hi,

What is the default depth of spidering when it's run as a cronjob from the shell? I didn't find any option to set it.

Thanks in advance,
Randy

catchme 04-17-2004 06:24 AM

- and also an addition to the question from tryangle - how can I limit the spider to only spider the pages that I list in a text file fed to the spider?

I've already tried changing the default level of spider recursion to be 0. But, still, the spider is very eager to index everything it finds!

bloodjelly 04-17-2004 03:45 PM

I think the default level running via cron is the same as browser, i.e. SPIDER_DEFAULT_LIMIT. Catchme - are you talking about URLs in the text file? It should just spider those. If you're talking about limiting the spidering of pages during an update, you could set the RESPIDER_LIMIT down.

tryangle 04-17-2004 04:56 PM

Thanks for your reply...
 
As far as I can tell, the default limit in the browser is set to zero, because that's the one that is (pre)selected on the select dropdown. But, I'm wondering if it's possible to set the depth when running it from the shell, or in crontab.

TIA for your help.

bloodjelly 04-17-2004 04:59 PM

If you open the includes/config.php file, you'll see the setting for the default depth. That's the one used when run through shell/crontab I believe.

catchme 04-17-2004 11:10 PM

bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.

tryangle 04-19-2004 04:54 AM

SPIDER_DEFAULT_LIMIT
 
Hi,

Thanks for your reply. I guess I missed it because I was looking for Search Depth.

Charter 04-20-2004 10:48 AM


Quote:

Originally posted by catchme
bloddjelly - at the moment, i have the respider limit and the default limit both set to be 0. but yet - when i set the spider off again, i find that it continues to visit the old pages which have already been spidered.

i think that it just passes over them quickly - but it's taking 1 or 2 seconds just for looking over these pages, that i don't want to index.

the other setting that i've customized is the default reindex period - which i've set to be 500 days. basically, once something has been indexed, i don't want to return to this page again.

Hi. Try the following. In spider.php find:
PHP Code:

$andmore_tempspider 'AND upddate < now()'

and replace with:
PHP Code:

$andmore_tempspider 'AND upddate < DATE_SUB(now(), INTERVAL 500 DAY)'



All times are GMT -8. The time now is 12:32 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.