PDA

View Full Version : robots.txt ignored


roy
02-20-2004, 01:56 PM
I have a site where my robots.txt acts as a honeypot to block bots that dont obey the standard.

This is what my robots.txt looks like:

User-agent: *
Disallow: /elguapo/index.php

I even added this to it now:

User-agent: PhpDig
Disallow: /elguapo

Still, phpdig, ends up ripping thru all the pages, ignores my robots.txt and gets banned. I see nothing in the excludes table.

Is it supposed to be writing to it? How do I get phpdig to obey the robots directives?

roy
02-20-2004, 02:02 PM
One thing I left out, and am not sure if it makes a difference. I'm running the spider from my home server (IIS) on my website. The idea is to later move my database and the search feature to my hosting server, but just run the indexing from home.

Not sure if that messes up the paths for exclusion.

Charter
02-20-2004, 05:27 PM
Hi. My initial guess is that the phpdigReadRobotsTxt function and/or the phpdigDetectDir function, both in the robot_functions.php file, need some reworking.

In the meantime you might try just using:

User-agent: PhpDig
Disallow: /elguapo

and then replace your original robots.txt when indexing is done.

roy
02-20-2004, 08:02 PM
Its cool, I disabled my honeypot while I spider the site, and then turn it back on. Its not that big of a deal, as long as I remember to do it.