PDA

View Full Version : LIMIT_TO_DIRECTORY bug from shell


indeh
11-30-2004, 08:24 AM
I was experiencing the same problem as Ensim, namely an attempt to spider from the command line would always return "No link in temporary table". I took the time to trace spider.php, and found that setting LIMIT_TO_DIRECTORY to false in config.php solved the problem.

The code in question is at or about line 270 of spider.php (my line numbers may be a little off since I tidied the code up a bit to aid in reading). Specifically:


if (!(LIMIT_TO_DIRECTORY)) {
if ($links_per_lev == 0) {
$query_tempspider = "INSERT INTO ".PHPDIG_DB_PREFIX."tempspider (site_id,file,path) SELECT site_id,file,path FROM ".PHPDIG_DB_PREFIX."spider WHERE site_id=$site_id $andmore_tempspider";
mysql_query($query_tempspider,$id_connect);
}
else {
$query_count_lev = mysql_query("SELECT COUNT(*) as cnt FROM ".PHPDIG_DB_PREFIX."tempspider WHERE site_id = $site_id and level = 0",$id_connect);
$query_count_arr = mysql_fetch_array($query_count_lev);
$query_count_num = $query_count_arr['cnt'];
if ($query_count_num > $links_per_lev) {
$level_lim = $query_count_num - $links_per_lev;
$query_tempspider = "DELETE FROM ".PHPDIG_DB_PREFIX."tempspider WHERE level = 0 LIMIT $level_lim";
mysql_query($query_tempspider,$id_connect);
$flag_for_inserts_check1 = 1;
}
elseif (($links_per_lev > $query_count_num) &&
($flag_for_inserts_check1 == 0)) {
$level_lim = $links_per_lev - $query_count_num;
$query_tempspider = "INSERT INTO ".PHPDIG_DB_PREFIX."tempspider (site_id,file,path) SELECT site_id,file,path FROM ".PHPDIG_DB_PREFIX."spider WHERE site_id=$site_id $andmore_tempspider LIMIT $level_lim";
mysql_query($query_tempspider,$id_connect);
}
}
}

It seems that if LIMIT_TO_DIRECTORY is set to true, the tempspider table is never populated and spidering never begins. Please correct me if I'm wrong, though, since I only studied it enough to get it working for me ;).

For the record, I'm running spider.php as follows: php -f /path/to/my/site/dig/admin/spider.php all I have a single site in the database with the site_url formatted 'http://www.domain.com/' (with the trailing slash)