PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   robots.txt (http://www.phpdig.net/forum/showthread.php?t=269)

renehaentjens 12-05-2003 01:40 AM

robots.txt
 
With the following robots.txt, no indexing, I always get: links found: 0, ... Was recently indexed:

User-agent: phpdig
Disallow:

User-agent: *
Disallow: /

After removing this robots.txt, all goes fine.

My intention was to allow PhpDig to index, but tell the others to go away. Did I get the syntax wrong?

fr :: anonymus 12-05-2003 03:28 AM

I think it's impossible.

Look at here ::

http://www.robotstxt.org/wc/norobots-rfc.html

Anonymus.

renehaentjens 12-05-2003 05:01 AM

I've taken this example from the quoted source, fr. Anonymus. In my opinion it shows that it should be possible (sorry for the lost alignment):

# /robots.txt for http://www.fict.org/
# comments to webmaster@fict.org

User-agent: unhipbot
Disallow: /

User-agent: webcrawler
User-agent: excite
Disallow:

User-agent: *
Disallow: /org/plans.html
Allow: /org/
Allow: /serv
Allow: /~mak
Disallow: /

The following matrix shows which robots are allowed to access URLs:

unhipbot webcrawler-excite other

http://www.fict.org/ No Yes No
http://www.fict.org/index.html No Yes No
http://www.fict.org/robots.txt Yes Yes Yes
http://www.fict.org/server.html No Yes Yes
http://www.fict.org/services/fast.html No Yes Yes
http://www.fict.org/services/slow.html No Yes Yes
http://www.fict.org/orgo.gif No Yes No
http://www.fict.org/org/about.html No Yes Yes
http://www.fict.org/org/plans.html No Yes No
http://www.fict.org/%7Ejim/jim.html No Yes No
http://www.fict.org/%7Emak/mak.html No Yes Yes

Charter 12-05-2003 02:40 PM

Hi. I haven't tested the below code, but it should get around the following case:

Quote:

Originally posted by renehaentjens
With the following robots.txt, no indexing, I always get: links found: 0, ... Was recently indexed:

User-agent: phpdig
Disallow:

User-agent: *
Disallow: /

After removing this robots.txt, all goes fine.

My intention was to allow PhpDig to index, but tell the others to go away. Did I get the syntax wrong?

In robot_functions.php find the phpdigReadRobotsTxt function and in this function find:
PHP Code:

            if (eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
                {
                if (
$regs[1] == '/')
                     
$exclude[$user_agent]['@ALL@'] = 1;
                else
                     {
                     
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\\+',str_replace('.','\\.',$regs[2])))] = 1;
                     }
                } 

and replace with the following:
PHP Code:

            if (eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
                {
                if (
$regs[1] == '/')
                     
$exclude[$user_agent]['@ALL@'] = 1;
                else
                     {
                     
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\\+',str_replace('.','\\.',$regs[2])))] = 1;
                     }
                }
            elseif ((
$user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs)))
                {
                
$exclude['@NONE@'] = 1;
                return 
$exclude;
                } 

Remember to remove any "word" wrapping in the above code.


All times are GMT -8. The time now is 10:55 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.