PDA

View Full Version : custom depth of search per site in cgi (including urllist)


b2l_grefix
07-17-2004, 12:49 PM
Hi,

since I use phpdig to spider several sites I wanted it to be possible to give a custom search depth per site in a urllist when spidering with a cron job.

So I made a few modifications to spider.php and robot_functions.php.

Thought this might be usefull for some off you, so I like to share it with you :)

IMPORTANT: modifications are made for version 1.8.3!

You need to add an extra column (site_limit) is to the table sites in the phpdig database sites

ALTER TABLE PREFIXsites ADD site_limit SMALLINT( 6 ) ;
------------------------------------------------------------------------------

example of list.txt
http://www.site1.com 2
http://www.site2.com 0
http://www.site3.com
http://www.site4.com 5

not specifying the depth or setting the depth to 0 will cause the spider to use the default depth as set in config.php

shell command:

#php -f [PHPDIG_DIR]/admin/spider.php list.txt

site1.com will be spidered with a depth of 2
site2.com and site3.com will use the default value as specified in the config.php file
site4.com will be spidered with a depth of 2

It can also be used to spider a single site

shell command:#php -f [PHPDIG_DIR]/admin/spider.php http://host.mydomain.com depthexamples: #php -f [PHPDIG_DIR]/admin/spider.php http://host.mydomain.com 5this will spider with a depth of 5#php -f [PHPDIG_DIR]/admin/spider.php http://host.mydomain.com#php -f [PHPDIG_DIR]/admin/spider.php http://host.mydomain.com 0If not specified or 0, then the default depth will be used as set up in config.php