PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 03-21-2004, 09:05 AM   #1
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
more indexing problems: No link in temporary table

Hi.

I've installed phpdig 1.8.0 but when I attempt to index my site, i get:

SITE : http://www.deco-dreams.com/
Exclude paths :
-
- @NONE@
No link in temporary table

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

My remote host is running PHP 4.3.3 and MySQL 3.23.49

any help or suggestions?

many thanks
boomboom100 is offline   Reply With Quote
Old 03-21-2004, 09:43 AM   #2
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
i've sinced discovered that when I have my robots.txt in place, indexing fails. If I remove it, indexing works. My robots.txt has these contents:

# /robots.txt file for http://www.deco-dreams.com/
# mail webmaster@deco-dreams.com for constructive criticism


User-agent: *
Disallow: /unused
Disallow: /admindeco
Disallow: /decoMyAdmin
Disallow: /Connections
Disallow: /FX_DataCounter
Disallow: /gallery
Disallow: /rcdstnav
Disallow: /rv
Disallow: /rayform11
Disallow: /rayform11b
Disallow: /mm
Disallow: /rv
Disallow: /uploads
Disallow: index2.php
Disallow: info.php
Disallow: results1.php
Disallow: results.php
Disallow: results2.php
Disallow: sp_images.php

Am I making an obvious mistake?

Thanks
boomboom100 is offline   Reply With Quote
Old 03-23-2004, 12:08 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps try the following.

In robot_functions.php is the phpdigReadRobotsTxt function.

In this function, replace:
PHP Code:
$user_agent $regs[1]; 
with the following:
PHP Code:
if ($regs[1] == "*") {
  
$user_agent "'$regs[1]'";
}
else {
  
$user_agent $regs[1];

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-23-2004, 01:22 AM   #4
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
Hi

Thanks for the reply, Charter.

I tried replacing the line but am getting parse errors. Here is the function from the original robots_functions.php:

PHP Code:
//=================================================
//search robots.txt in a site
function phpdigReadRobotsTxt($site)  //don't forget the end backslash
{
if (
phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT')
     {
     
$robots file($site.'robots.txt');
     while (list(
$id,$line) = each($robots))
            {
            if (
ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs))
                {
                     
$user_agent $regs[1];
                }
            if (
eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
                {
                if (
$regs[1] == '/')
                     {
                     
$exclude[$user_agent]['@ALL@'] = 1;
                     }
                else
                     {
                     
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1;
                     }
                }
            elseif ((
$user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs)))
                {
                
$exclude['@NONE@'] = 1;
                return 
$exclude;
                }
            }
     if (isset(
$exclude['phpdig']) && is_array($exclude['phpdig']))
         return 
$exclude['phpdig'];
     elseif (isset(
$exclude['*']) && is_array($exclude['*']))
         return 
$exclude['*'];
     }
$exclude['@NONE@'] = 1;
return 
$exclude;
}

//================================================= 
How should I edit this block?

Thanks again.
boomboom100 is offline   Reply With Quote
Old 03-23-2004, 01:50 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. In the block of code you posted, replace:
PHP Code:
$user_agent $regs[1]; 
with the following:
PHP Code:
if ($regs[1] == "*") {
  
$user_agent "'$regs[1]'";
}
else {
  
$user_agent $regs[1];

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-23-2004, 02:35 AM   #6
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
sorry you had to repeat yourself...the first time i was getting errors, but when i got the fromatting right, the page loaded ok (no parse error) but unfortunately the output is still:

SITE : http://www.deco-dreams.com/
Exclude paths :
-
- @NONE@
No link in temporary table


links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
boomboom100 is offline   Reply With Quote
Old 03-23-2004, 05:34 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Do you make your robots.txt file on a Mac? It reads in as only one key value pair. If you have PHP 4.3.0+ then the ini_set function can be used to correct this.

Try replacing the whole phpdigReadRobotsTxt function with the following, as you see it onscreen:

Code:
//=================================================
//search robots.txt in a site
function phpdigReadRobotsTxt($site) { //don't forget the end backslash
  if (phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT') {
    @ini_set("auto_detect_line_endings","1"); // needs PHP 4.3.0+
    $robots = @file($site.'robots.txt');
    while (list($id,$line) = @each($robots)) {
      if ((strpos(trim($line),"#") === 0) || (trim($line) == ""))
        continue;
      if (ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs)) {
        if ($regs[1] == "*") {
          $user_agent = "'$regs[1]'";
        }
        else {
          $user_agent = $regs[1];
        }
      }
      if (eregi('[[:blank:]]*disallow:[[:blank:]]*([/]?([a-z0-9_/*+%.-]*))',$line,$regs)) {
          if ($regs[1] == '/') {
             $exclude[$user_agent]['@ALL@'] = 1;
          }
          else {
             $exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1;
          }
      }
      elseif (($user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs))) {
        $exclude['@NONE@'] = 1;
        return $exclude;
      }
    }
    if (isset($exclude['phpdig']) && is_array($exclude['phpdig']))
      return $exclude['phpdig'];
    elseif (isset($exclude['\\'*\\'']) && is_array($exclude['\\'*\\'']))
      return $exclude['\\'*\\''];
  }
$exclude['@NONE@'] = 1;
return $exclude;
}

With PHP 4.3.0+ the robots.txt file should now read in as multiple key value pairs, and the other changes in the function should let you index your site, so at a search depth of one you should get the following:


SITE : http://www.deco-dreams.com/
Exclude paths :
- unused
- admindeco
- decoMyAdmin
- Connections
- FX_DataCounter
- gallery
- rcdstnav
- rv
- rayform11
- rayform11b
- mm
- uploads
- index2\.php
- info\.php
- results1\.php
- results\.php
- results2\.php
- sp_images\.php
1:http://www.deco-dreams.com/
(time : 00:00:10)
+ + + + + + + +
level 1...
2:http://www.deco-dreams.com/privacy.php
(time : 00:00:28)

3:http://www.deco-dreams.com/links.php
(time : 00:00:36)

4:http://www.deco-dreams.com/aboutus.php
(time : 00:00:44)

5:http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
(time : 00:00:53)

6:http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
(time : 00:01:02)

7:http://www.deco-dreams.com/buy.php?vartab1_id=676
(time : 00:01:10)

8:http://www.deco-dreams.com/ordering.php
(time : 00:01:18)

9:http://www.deco-dreams.com/contactus.php
(time : 00:01:29)

No link in temporary table

--------------------------------------------------------------------------------

links found : 9
http://www.deco-dreams.com/
http://www.deco-dreams.com/privacy.php
http://www.deco-dreams.com/links.php
http://www.deco-dreams.com/aboutus.php
http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
http://www.deco-dreams.com/buy.php?vartab1_id=676
http://www.deco-dreams.com/ordering.php
http://www.deco-dreams.com/contactus.php
Optimizing tables...
Indexing complete !


Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-23-2004, 09:02 AM   #8
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
Well spotted!

I am indeed using a mac (MacOS 10.3.3 Server), and when I looked at my robots.txt I saw that I had created a .txt file with macintosh line endings, as you suggested.

I've now saved as unix and indexing is working correctly. Superb!

Thanks Charter.
boomboom100 is offline   Reply With Quote
Old 03-23-2004, 09:04 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Glad it's working. Are you using the new phpdigReadRobotsTxt function from a couple of posts before this post?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-23-2004, 09:33 AM   #10
boomboom100
Green Mole
 
Join Date: Mar 2004
Posts: 6
I didn't need to because as soon as I saved as Unix style (I use BBEdit, a mac/unix text editor) and it started indexing ok.

But I've now tested it with the new phpdigReadRobotsTxt function. I created a different robots.txt saved with macintosh style line endings
and it seems that your new function does the trick. It's now indexing perfectly.

Thank you so much for considering the needs of us mac heads and our eccentric line endings!
boomboom100 is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
No link in temporary table yet again... funeral Troubleshooting 2 04-06-2005 01:45 PM
Help Please: No link in temporary table SystemX Troubleshooting 5 06-27-2004 10:20 PM
No link in temporary table Steve Joynt Troubleshooting 1 06-10-2004 01:05 AM
No link in temporary table (yet another one) renehaentjens Troubleshooting 7 03-30-2004 10:46 PM
No link in temporary table michabis101 Troubleshooting 20 03-29-2004 01:08 PM


All times are GMT -8. The time now is 01:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.