Hi. In robot_functions.php is a function called phpdigExplore.
In this function, replace the following:
PHP Code:
else {
$file_content = @file($tempfile);
}
with the following:
PHP Code:
else {
$file_content = @file($tempfile);
$my_file_base_content = implode("",$file_content);
if (eregi("<head>(.*)</head>",$my_file_base_content,$base_regs1)) {
$base_regs1 = $base_regs1[1];
if (eregi("<base href[[:space:]]*=[[:space:]]*['\\"]*([a-z]{3,5}://[.a-z0-9-]+[^'\\"]*)['\\"]*[[:space:]]*[/]?>",$base_regs1,$base_regs2)) {
$new_base_path = parse_url($base_regs2[1]);
if ((!isset($new_base_path["path"])) || ($new_base_path["path"] == "/")) {
$path = "";
}
else {
$new_base_path = eregi_replace("^/","",$new_base_path["path"]);
if (eregi("/$",$new_base_path)) {
$path = $new_base_path;
}
else {
$path = dirname($new_base_path)."/";
}
}
}
}
}
Minimal testing was done on this, but it seems to work for the following situations, where the one HTML file is located at http://www.domain.com/dir1/index1.html:
Code:
<HTML>
<HEAD>
<BASE HREF="http://www.domain.com/dir2/file.html">
</HEAD>
<BODY>
<A HREF="index2.html">test</A>
</BODY>
</HTML>
Both http://www.domain.com/dir1/index1.html and http://www.domain.com/dir2/index2.html should be crawled. It should also work with the following tags:
Code:
<BASE HREF="http://www.domain.com/file.html">
<BASE HREF="http://www.domain.com/dir2/dir3/file.html">
<A HREF="index2.html">test</A>
<!--- or the following tags --->
<BASE HREF="http://www.domain.com/dir2/file.html">
<A HREF="/index2.html">test</A>
Remember to remove any "word" wrapping in the above code.