View Single Post
Old 01-08-2004, 03:08 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. In robot_functions.php is a function called phpdigExplore.

In this function, replace the following:
PHP Code:
else {
    
$file_content = @file($tempfile);

with the following:
PHP Code:
else {
    
$file_content = @file($tempfile);
    
$my_file_base_content implode("",$file_content);
    if (
eregi("<head>(.*)</head>",$my_file_base_content,$base_regs1)) {
      
$base_regs1 $base_regs1[1];
      if (
eregi("<base href[[:space:]]*=[[:space:]]*['\\"]*([a-z]{3,5}://[.a-z0-9-]+[^'\\"]*)['\\"]*[[:space:]]*[/]?>",$base_regs1,$base_regs2)) {
        $new_base_path = parse_url($base_regs2[1]);
        if ((!isset($new_base_path["path"])) || ($new_base_path["path"] == "/")) {
          $path = "";
        }
        else {
          $new_base_path = eregi_replace("^/","",$new_base_path["path"]);
          if (eregi("/$",$new_base_path)) {
            $path = $new_base_path;
          }
          else {
            $path = dirname($new_base_path)."/";
          }
        }
      }
   }

Minimal testing was done on this, but it seems to work for the following situations, where the one HTML file is located at http://www.domain.com/dir1/index1.html:
Code:
<HTML>
<HEAD>
<BASE HREF="http://www.domain.com/dir2/file.html">
</HEAD>
<BODY>
<A HREF="index2.html">test</A>
</BODY>
</HTML>
Both http://www.domain.com/dir1/index1.html and http://www.domain.com/dir2/index2.html should be crawled. It should also work with the following tags:
Code:
<BASE HREF="http://www.domain.com/file.html">
<BASE HREF="http://www.domain.com/dir2/dir3/file.html">
<A HREF="index2.html">test</A>
<!--- or the following tags --->
<BASE HREF="http://www.domain.com/dir2/file.html">
<A HREF="/index2.html">test</A>
Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote