I'm interested in getting the spider function, not just the search function, to treat subdirectories of URLs as the root.
For example, if someone wanted to spider
http://www.geocities.com/website as its own site, without scanning the true root (
www.geocities.com).
So far I changed this bit of code in robot_functions.php:
PHP Code:
$url = $pu['scheme']."://".$pu['host']."/";
to this:
PHP Code:
$url = $pu['scheme']."://".$pu['host'];
if (isset($pu['path'])) {
$url .= $pu['path']."/";
}
else {
$url .= "/";
}
and this:
PHP Code:
$subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']);
to this:
PHP Code:
if (isset($pu['path'])) {
$subpu = phpdigRewriteUrl("?".$pu['query']);
}
else {
$subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']);
}
which made the end directory store correctly in the table, but I get a 0 links found message. Has anyone tried to do this yet? I'm not sure if I'm on the right track. Thanks.