PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 12-05-2003, 01:40 AM   #1
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
robots.txt

With the following robots.txt, no indexing, I always get: links found: 0, ... Was recently indexed:

User-agent: phpdig
Disallow:

User-agent: *
Disallow: /

After removing this robots.txt, all goes fine.

My intention was to allow PhpDig to index, but tell the others to go away. Did I get the syntax wrong?
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 12-05-2003, 03:28 AM   #2
fr :: anonymus
Green Mole
 
fr :: anonymus's Avatar
 
Join Date: Dec 2003
Location: Lyon, France
Posts: 17
I think it's impossible.

Look at here ::

http://www.robotstxt.org/wc/norobots-rfc.html

Anonymus.
fr :: anonymus is offline   Reply With Quote
Old 12-05-2003, 05:01 AM   #3
renehaentjens
Orange Mole
 
Join Date: Nov 2003
Posts: 69
I've taken this example from the quoted source, fr. Anonymus. In my opinion it shows that it should be possible (sorry for the lost alignment):

# /robots.txt for http://www.fict.org/
# comments to webmaster@fict.org

User-agent: unhipbot
Disallow: /

User-agent: webcrawler
User-agent: excite
Disallow:

User-agent: *
Disallow: /org/plans.html
Allow: /org/
Allow: /serv
Allow: /~mak
Disallow: /

The following matrix shows which robots are allowed to access URLs:

unhipbot webcrawler-excite other

http://www.fict.org/ No Yes No
http://www.fict.org/index.html No Yes No
http://www.fict.org/robots.txt Yes Yes Yes
http://www.fict.org/server.html No Yes Yes
http://www.fict.org/services/fast.html No Yes Yes
http://www.fict.org/services/slow.html No Yes Yes
http://www.fict.org/orgo.gif No Yes No
http://www.fict.org/org/about.html No Yes Yes
http://www.fict.org/org/plans.html No Yes No
http://www.fict.org/%7Ejim/jim.html No Yes No
http://www.fict.org/%7Emak/mak.html No Yes Yes
__________________
René Haentjens, Ghent University
renehaentjens is offline   Reply With Quote
Old 12-05-2003, 02:40 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I haven't tested the below code, but it should get around the following case:

Quote:
Originally posted by renehaentjens
With the following robots.txt, no indexing, I always get: links found: 0, ... Was recently indexed:

User-agent: phpdig
Disallow:

User-agent: *
Disallow: /

After removing this robots.txt, all goes fine.

My intention was to allow PhpDig to index, but tell the others to go away. Did I get the syntax wrong?

In robot_functions.php find the phpdigReadRobotsTxt function and in this function find:
PHP Code:
            if (eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
                {
                if (
$regs[1] == '/')
                     
$exclude[$user_agent]['@ALL@'] = 1;
                else
                     {
                     
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\\+',str_replace('.','\\.',$regs[2])))] = 1;
                     }
                } 
and replace with the following:
PHP Code:
            if (eregi('[[:blank:]]*disallow:[[:blank:]]*(/([a-z0-9_/*+%.-]*))',$line,$regs))
                {
                if (
$regs[1] == '/')
                     
$exclude[$user_agent]['@ALL@'] = 1;
                else
                     {
                     
$exclude[$user_agent][str_replace('*','.*',str_replace('+','\\+',str_replace('.','\\.',$regs[2])))] = 1;
                     }
                }
            elseif ((
$user_agent == 'phpdig') && (eregi('[[:blank:]]*disallow:[[:blank:]]*',$line,$regs)))
                {
                
$exclude['@NONE@'] = 1;
                return 
$exclude;
                } 
Remember to remove any "word" wrapping in the above code.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt seems to be ignored :? galacticvoyager Bug Tracker 1 11-12-2005 12:52 PM
robots.txt and URL djavet How-to Forum 4 01-11-2005 03:19 AM
robots.txt comments edkay Mod Submissions 2 03-12-2004 12:41 PM
robots.txt versus robotsxx.txt Charter IPs, SEs, & UAs 0 03-11-2004 06:00 PM
robots.txt ignored roy Troubleshooting 3 02-20-2004 08:02 PM


All times are GMT -8. The time now is 07:28 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.