PDA

View Full Version : Stop spidering site after using an amount of bandwidth


Grefix
01-14-2004, 06:58 AM
This may seem like an odd question but when crawling a site (for instance http://www.xxx.com) is it possible to stop the spider after it has spidered a certain amount of the site's bandwidth?

I ask this because my site spiders sites hosted by a free webhost with a limited amount of bandwith. A few days ago the spider got hung on one of these and used about 56MB of it's bandwidth. You can imagine the owner of that site wasn'tvery happy with that.

Charter
01-14-2004, 10:45 AM
Hi. The below is untested, but you might try making the following changes in the spider.php file. Of course, another alternative is to avoid crawling such sites or use a search depth of zero or one.

$sum_of_tempfilesize = 0;
// Spidering ...
while(($level <= $limit) && ($sum_of_tempfilesize <= X)) {
// $tempfilesize is text filesize, not the actual page size
// set X to be the maximum number of bytes allowed
...
$sum_of_tempfilesize = $sum_of_tempfilesize + $tempfilesize;
//Retrieve meta-tags for this page
...
// clean the tempspider table
$query = "DELETE FROM ".PHPDIG_DB_PREFIX."tempspider WHERE site_id=$site_id";