View Single Post
Old 07-08-2004, 06:19 PM   #1
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Spidering sub-directories as the root

I'm interested in getting the spider function, not just the search function, to treat subdirectories of URLs as the root.

For example, if someone wanted to spider http://www.geocities.com/website as its own site, without scanning the true root (www.geocities.com).

So far I changed this bit of code in robot_functions.php:
PHP Code:
$url $pu['scheme']."://".$pu['host']."/"
to this:
PHP Code:
    $url $pu['scheme']."://".$pu['host'];
    if (isset(
$pu['path'])) {
        
$url .= $pu['path']."/";
    }
    else {
        
$url .= "/";
    } 
and this:
PHP Code:
$subpu phpdigRewriteUrl($pu['path']."?".$pu['query']); 
to this:
PHP Code:
    if (isset($pu['path'])) {
        
$subpu phpdigRewriteUrl("?".$pu['query']);
    }
    else {
        
$subpu phpdigRewriteUrl($pu['path']."?".$pu['query']);
    } 
which made the end directory store correctly in the table, but I get a 0 links found message. Has anyone tried to do this yet? I'm not sure if I'm on the right track. Thanks.
bloodjelly is offline   Reply With Quote