PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 02-20-2004, 01:56 PM   #1
roy
Green Mole
 
Join Date: Feb 2004
Posts: 3
robots.txt ignored

I have a site where my robots.txt acts as a honeypot to block bots that dont obey the standard.

This is what my robots.txt looks like:

User-agent: *
Disallow: /elguapo/index.php

I even added this to it now:

User-agent: PhpDig
Disallow: /elguapo

Still, phpdig, ends up ripping thru all the pages, ignores my robots.txt and gets banned. I see nothing in the excludes table.

Is it supposed to be writing to it? How do I get phpdig to obey the robots directives?
roy is offline   Reply With Quote
Old 02-20-2004, 02:02 PM   #2
roy
Green Mole
 
Join Date: Feb 2004
Posts: 3
One thing I left out, and am not sure if it makes a difference. I'm running the spider from my home server (IIS) on my website. The idea is to later move my database and the search feature to my hosting server, but just run the indexing from home.

Not sure if that messes up the paths for exclusion.
roy is offline   Reply With Quote
Old 02-20-2004, 05:27 PM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. My initial guess is that the phpdigReadRobotsTxt function and/or the phpdigDetectDir function, both in the robot_functions.php file, need some reworking.

In the meantime you might try just using:

User-agent: PhpDig
Disallow: /elguapo

and then replace your original robots.txt when indexing is done.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-20-2004, 08:02 PM   #4
roy
Green Mole
 
Join Date: Feb 2004
Posts: 3
Its cool, I disabled my honeypot while I spider the site, and then turn it back on. Its not that big of a deal, as long as I remember to do it.
roy is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt seems to be ignored :? galacticvoyager Bug Tracker 1 11-12-2005 12:52 PM
robots.txt and URL djavet How-to Forum 4 01-11-2005 03:19 AM
robots.txt comments edkay Mod Submissions 2 03-12-2004 12:41 PM
robots.txt versus robotsxx.txt Charter IPs, SEs, & UAs 0 03-11-2004 06:00 PM
robots.txt renehaentjens Troubleshooting 3 12-05-2003 02:40 PM


All times are GMT -8. The time now is 08:05 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.