PDA

View Full Version : more indexing problems: No link in temporary table


boomboom100
03-21-2004, 10:05 AM
Hi.

I've installed phpdig 1.8.0 but when I attempt to index my site, i get:

SITE : http://www.deco-dreams.com/
Exclude paths :
-
- @NONE@
No link in temporary table

links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

My remote host is running PHP 4.3.3 and MySQL 3.23.49

any help or suggestions?

many thanks

boomboom100
03-21-2004, 10:43 AM
i've sinced discovered that when I have my robots.txt in place, indexing fails. If I remove it, indexing works. My robots.txt has these contents:

# /robots.txt file for http://www.deco-dreams.com/
# mail webmaster@deco-dreams.com for constructive criticism


User-agent: *
Disallow: /unused
Disallow: /admindeco
Disallow: /decoMyAdmin
Disallow: /Connections
Disallow: /FX_DataCounter
Disallow: /gallery
Disallow: /rcdstnav
Disallow: /rv
Disallow: /rayform11
Disallow: /rayform11b
Disallow: /mm
Disallow: /rv
Disallow: /uploads
Disallow: index2.php
Disallow: info.php
Disallow: results1.php
Disallow: results.php
Disallow: results2.php
Disallow: sp_images.php

Am I making an obvious mistake?

Thanks

Charter
03-23-2004, 01:08 AM
Hi. Perhaps try the following.

In robot_functions.php is the phpdigReadRobotsTxt function.

In this function, replace:

$user_agent = $regs[1];

with the following:

if ($regs[1] == "*") {
$user_agent = "'$regs[1]'";
}
else {
$user_agent = $regs[1];
}

boomboom100
03-23-2004, 02:22 AM
Hi

Thanks for the reply, Charter.

I tried replacing the line but am getting parse errors. Here is the function from the original robots_functions.php:

//=================================================
//search robots.txt in a site
function phpdigReadRobotsTxt($site) //don't forget the end backslash
{
if (phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT')
{
$robots = file($site.'robots.txt');
while (list($id,$line) = each($robots))
{
if (ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs))
{
$user_agent = $regs[1];
}
if (eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
{
if ($regs[1] == '/')
{
$exclude[$user_agent]['@ALL@'] = 1;
}
else
{
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1;
}
}
elseif (($user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs)))
{
$exclude['@NONE@'] = 1;
return $exclude;
}
}
if (isset($exclude['phpdig']) && is_array($exclude['phpdig']))
return $exclude['phpdig'];
elseif (isset($exclude['*']) && is_array($exclude['*']))
return $exclude['*'];
}
$exclude['@NONE@'] = 1;
return $exclude;
}

//=================================================

How should I edit this block?

Thanks again.

Charter
03-23-2004, 02:50 AM
Hi. In the block of code you posted, replace:

$user_agent = $regs[1];

with the following:

if ($regs[1] == "*") {
$user_agent = "'$regs[1]'";
}
else {
$user_agent = $regs[1];
}

boomboom100
03-23-2004, 03:35 AM
sorry you had to repeat yourself...the first time i was getting errors, but when i got the fromatting right, the page loaded ok (no parse error) but unfortunately the output is still:

SITE : http://www.deco-dreams.com/
Exclude paths :
-
- @NONE@
No link in temporary table


links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !

Charter
03-23-2004, 06:34 AM
Hi. Do you make your robots.txt file on a Mac? It reads in as only one key value pair. If you have PHP 4.3.0+ then the ini_set function can be used to correct this.

Try replacing the whole phpdigReadRobotsTxt function with the following, as you see it onscreen:


//=================================================
//search robots.txt in a site
function phpdigReadRobotsTxt($site) { //don't forget the end backslash
if (phpdigTestUrl($site.'robots.txt') == 'PLAINTEXT') {
@ini_set("auto_detect_line_endings","1"); // needs PHP 4.3.0+
$robots = @file($site.'robots.txt');
while (list($id,$line) = @each($robots)) {
if ((strpos(trim($line),"#") === 0) || (trim($line) == ""))
continue;
if (ereg('^user-agent:[ ]*([a-z0-9*]+)',strtolower($line),$regs)) {
if ($regs[1] == "*") {
$user_agent = "'$regs[1]'";
}
else {
$user_agent = $regs[1];
}
}
if (eregi('[[:blank:]]*disallow:[[:blank:]]*([/]?([a-z0-9_/*+%.-]*))',$line,$regs)) {
if ($regs[1] == '/') {
$exclude[$user_agent]['@ALL@'] = 1;
}
else {
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\+',str_replace('.','\.',$regs[2])))] = 1;
}
}
elseif (($user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs))) {
$exclude['@NONE@'] = 1;
return $exclude;
}
}
if (isset($exclude['phpdig']) && is_array($exclude['phpdig']))
return $exclude['phpdig'];
elseif (isset($exclude['\\'*\\'']) && is_array($exclude['\\'*\\'']))
return $exclude['\\'*\\''];
}
$exclude['@NONE@'] = 1;
return $exclude;
}


With PHP 4.3.0+ the robots.txt file should now read in as multiple key value pairs, and the other changes in the function should let you index your site, so at a search depth of one you should get the following:


SITE : http://www.deco-dreams.com/
Exclude paths :
- unused
- admindeco
- decoMyAdmin
- Connections
- FX_DataCounter
- gallery
- rcdstnav
- rv
- rayform11
- rayform11b
- mm
- uploads
- index2\.php
- info\.php
- results1\.php
- results\.php
- results2\.php
- sp_images\.php
1:http://www.deco-dreams.com/
(time : 00:00:10)
+ + + + + + + +
level 1...
2:http://www.deco-dreams.com/privacy.php
(time : 00:00:28)

3:http://www.deco-dreams.com/links.php
(time : 00:00:36)

4:http://www.deco-dreams.com/aboutus.php
(time : 00:00:44)

5:http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
(time : 00:00:53)

6:http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
(time : 00:01:02)

7:http://www.deco-dreams.com/buy.php?vartab1_id=676
(time : 00:01:10)

8:http://www.deco-dreams.com/ordering.php
(time : 00:01:18)

9:http://www.deco-dreams.com/contactus.php
(time : 00:01:29)

No link in temporary table

--------------------------------------------------------------------------------

links found : 9
http://www.deco-dreams.com/
http://www.deco-dreams.com/privacy.php
http://www.deco-dreams.com/links.php
http://www.deco-dreams.com/aboutus.php
http://www.deco-dreams.com/index.php?pageNum_Recordset1=123&
http://www.deco-dreams.com/index.php?pageNum_Recordset1=1&
http://www.deco-dreams.com/buy.php?vartab1_id=676
http://www.deco-dreams.com/ordering.php
http://www.deco-dreams.com/contactus.php
Optimizing tables...
Indexing complete !


Remember to remove any "word" wrapping in the above code.

boomboom100
03-23-2004, 10:02 AM
Well spotted!

I am indeed using a mac (MacOS 10.3.3 Server), and when I looked at my robots.txt I saw that I had created a .txt file with macintosh line endings, as you suggested.

I've now saved as unix and indexing is working correctly. Superb!

Thanks Charter.
:D

Charter
03-23-2004, 10:04 AM
Hi. Glad it's working. Are you using the new phpdigReadRobotsTxt function from a couple of posts before this post?

boomboom100
03-23-2004, 10:33 AM
I didn't need to because as soon as I saved as Unix style (I use BBEdit, a mac/unix text editor) and it started indexing ok.

But I've now tested it with the new phpdigReadRobotsTxt function. I created a different robots.txt saved with macintosh style line endings
and it seems that your new function does the trick. It's now indexing perfectly.

Thank you so much for considering the needs of us mac heads and our eccentric line endings! :D