View Single Post
Old 03-23-2004, 05:34 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Do you make your robots.txt file on a Mac? It reads in as only one key value pair. If you have PHP 4.3.0+ then the ini_set function can be used to correct this.

Try replacing the whole phpdigReadRobotsTxt function with the following, as you see it onscreen:

Code:
//=================================================
//search robots.txt in a site
function phpdigReadRobotsTxt($site) { //don't forget the end backslash
  if (phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT') {
    @ini_set("auto_detect_line_endings","1"); // needs PHP 4.3.0+
    $robots = @file($site.'robots.txt');
    while (list($id,$line) = @each($robots)) {
      if ((strpos(trim($line),"#") === 0) || (trim($line) == ""))
        continue;
      if (ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs)) {
        if ($regs[1] == "*") {
          $user_agent = "'$regs[1]'";
        }
        else {
          $user_agent = $regs[1];
        }
      }
      if (eregi('[[:blank:]]*disallow:[[:blank:]]*([/]?([a-z0-9_/*+%.-]*))',$line,$regs)) {
          if ($regs[1] == '/') {
             $exclude[$user_agent]['@ALL@'] = 1;
          }
          else {
             $exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1;
          }
      }
      elseif (($user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs))) {
        $exclude['@NONE@'] = 1;
        return $exclude;
      }
    }
    if (isset($exclude['phpdig']) && is_array($exclude['phpdig']))
      return $exclude['phpdig'];
    elseif (isset($exclude['\\'*\\'']) && is_array($exclude['\\'*\\'']))
      return $exclude['\\'*\\''];
  }
$exclude['@NONE@'] = 1;
return $exclude;
}

With PHP 4.3.0+ the robots.txt file should now read in as multiple key value pairs, and the other changes in the function should let you index your site, so at a search depth of one you should get the following:


SITE : http://www.deco-dreams.com/
Exclude paths :
- unused
- admindeco
- decoMyAdmin
- Connections
- FX_DataCounter
- gallery
- rcdstnav
- rv
- rayform11
- rayform11b
- mm
- uploads
- index2\.php
- info\.php
- results1\.php
- results\.php
- results2\.php
- sp_images\.php
1:http://www.deco-dreams.com/
(time : 00:00:10)
+ + + + + + + +
level 1...
2:http://www.deco-dreams.com/privacy.php
(time : 00:00:28)

3:http://www.deco-dreams.com/links.php
(time : 00:00:36)

4:http://www.deco-dreams.com/aboutus.php
(time : 00:00:44)

5:http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
(time : 00:00:53)

6:http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
(time : 00:01:02)

7:http://www.deco-dreams.com/buy.php?vartab1_id=676
(time : 00:01:10)

8:http://www.deco-dreams.com/ordering.php
(time : 00:01:18)

9:http://www.deco-dreams.com/contactus.php
(time : 00:01:29)

No link in temporary table

--------------------------------------------------------------------------------

links found : 9
http://www.deco-dreams.com/
http://www.deco-dreams.com/privacy.php
http://www.deco-dreams.com/links.php
http://www.deco-dreams.com/aboutus.php
http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
http://www.deco-dreams.com/buy.php?vartab1_id=676
http://www.deco-dreams.com/ordering.php
http://www.deco-dreams.com/contactus.php
Optimizing tables...
Indexing complete !


Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote