PDA

View Full Version : PhpDig Ignoring Something in robots.txt


Destroyer X
06-18-2004, 12:44 PM
Hi everyone! As I'm trying to configure PhpDig for my own needs on my Web site, I created a robots.txt so PhpDig, and other search engines for that matter, will ignore certain folders. Here's what my robots.txt file looks like:

# robots.txt for http://www.destroyerx.net/

User-agent: *
Disallow: /cgi-bin
Disallow: /chris
Disallow: /errors
Disallow: /forum
Disallow: /images
Disallow: /poll
Disallow: /search
Disallow: /screenshots
Disallow: /stats
Disallow: /Templates
Disallow: /thumbs
Disallow: /formerror.php
Disallow: /formmail.php

Anyway, I managed to run the spider, and while it ignored almost all the folders and files I specified, it indexed my Error 404 page (a file in a folder I specified not to index). Here's what it says below:

level 1...
2:http://www.destroyerx.net/errors/404.php
(time : 00:00:16)

level 3...
Duplicate of an existing document
15:http://www.destroyerx.net/errors/404.php
(time : 00:01:52)

Duplicate of an existing document
26:http://www.destroyerx.net/errors/404.php
(time : 00:03:09)

etc., etc. etc.......

While it didn't index my 400.php, 401.php, 403.php, and 500.php in my errors folder, it did index my 404.php error page. Now, is there something wrong with the syntax of my robots.txt page for it to index that error page and somehow not index the others. Thanks everyone for your time. Ciao for now!

Charter
06-18-2004, 01:06 PM
Hi. Maybe there is a bad link? If not, you can delete the 404.php page from the search in the admin panel. Otherwise, maybe try replacing the phpdigReadRobotsTxt function in robot_functions.php with the function contained in the zip found here (http://www.phpdig.net/showthread.php?threadid=942).

Destroyer X
06-18-2004, 01:57 PM
Well, I don't know why it keeps indexing my 404.php page, but I removed my "errors" folder from being indexed. Anyway, thanks for everyone's help.

Ciao for now!