PhpDig.net - View Single Post - Spidering sub-directories as the root

bloodjelly · 07-08-2004, 06:19 PM

I'm interested in getting the spider function, not just the search function, to treat subdirectories of URLs as the root.

For example, if someone wanted to spider http://www.geocities.com/website as its own site, without scanning the true root (www.geocities.com).

So far I changed this bit of code in robot_functions.php:

PHP Code:


			
$url = $pu['scheme']."://".$pu['host']."/";

to this:

PHP Code:


			
    $url = $pu['scheme']."://".$pu['host'];

    if (isset($pu['path'])) {

        $url .= $pu['path']."/";

    }

    else {

        $url .= "/";

    }

and this:

PHP Code:


			
$subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']);

to this:

PHP Code:


			
    if (isset($pu['path'])) {

        $subpu = phpdigRewriteUrl("?".$pu['query']);

    }

    else {

        $subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']);

    }

which made the end directory store correctly in the table, but I get a 0 links found message. Has anyone tried to do this yet? I'm not sure if I'm on the right track. Thanks.

07-08-2004, 06:19 PM	#1
bloodjelly Purple Mole Join Date: Dec 2003 Posts: 106	Spidering sub-directories as the root I'm interested in getting the spider function, not just the search function, to treat subdirectories of URLs as the root. For example, if someone wanted to spider http://www.geocities.com/website as its own site, without scanning the true root (www.geocities.com). So far I changed this bit of code in robot_functions.php: PHP Code: `$url = $pu['scheme']."://".$pu['host']."/";` to this: PHP Code: `$url = $pu['scheme']."://".$pu['host']; if (isset($pu['path'])) { $url .= $pu['path']."/"; } else { $url .= "/"; }` and this: PHP Code: `$subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']);` to this: PHP Code: `if (isset($pu['path'])) { $subpu = phpdigRewriteUrl("?".$pu['query']); } else { $subpu = phpdigRewriteUrl($pu['path']."?".$pu['query']); }` which made the end directory store correctly in the table, but I get a 0 links found message. Has anyone tried to do this yet? I'm not sure if I'm on the right track. Thanks. __________________ Foundmyself.com artist community, art galleries