|
06-18-2004, 12:44 PM | #1 |
Green Mole
Join Date: Jun 2004
Location: Oklahoma, U.S.A.
Posts: 19
|
PhpDig Ignoring Something in robots.txt
Hi everyone! As I'm trying to configure PhpDig for my own needs on my Web site, I created a robots.txt so PhpDig, and other search engines for that matter, will ignore certain folders. Here's what my robots.txt file looks like:
# robots.txt for http://www.destroyerx.net/ User-agent: * Disallow: /cgi-bin Disallow: /chris Disallow: /errors Disallow: /forum Disallow: /images Disallow: /poll Disallow: /search Disallow: /screenshots Disallow: /stats Disallow: /Templates Disallow: /thumbs Disallow: /formerror.php Disallow: /formmail.php Anyway, I managed to run the spider, and while it ignored almost all the folders and files I specified, it indexed my Error 404 page (a file in a folder I specified not to index). Here's what it says below: level 1... 2:http://www.destroyerx.net/errors/404.php (time : 00:00:16) level 3... Duplicate of an existing document 15:http://www.destroyerx.net/errors/404.php (time : 00:01:52) Duplicate of an existing document 26:http://www.destroyerx.net/errors/404.php (time : 00:03:09) etc., etc. etc....... While it didn't index my 400.php, 401.php, 403.php, and 500.php in my errors folder, it did index my 404.php error page. Now, is there something wrong with the syntax of my robots.txt page for it to index that error page and somehow not index the others. Thanks everyone for your time. Ciao for now!
__________________
Visit the Destroyer X Network at http://www.destroyerx.net/ |
06-18-2004, 01:06 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Maybe there is a bad link? If not, you can delete the 404.php page from the search in the admin panel. Otherwise, maybe try replacing the phpdigReadRobotsTxt function in robot_functions.php with the function contained in the zip found here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
06-18-2004, 01:57 PM | #3 |
Green Mole
Join Date: Jun 2004
Location: Oklahoma, U.S.A.
Posts: 19
|
Well, I don't know why it keeps indexing my 404.php page, but I removed my "errors" folder from being indexed. Anyway, thanks for everyone's help.
Ciao for now!
__________________
Visit the Destroyer X Network at http://www.destroyerx.net/ |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Command line spider ignoring "filelist.txt" | lighthouse | Troubleshooting | 9 | 08-18-2004 07:35 AM |
robots.txt versus robotsxx.txt | Charter | IPs, SEs, & UAs | 0 | 03-11-2004 06:00 PM |
robots.txt ignored | roy | Troubleshooting | 3 | 02-20-2004 08:02 PM |
robots.txt | renehaentjens | Troubleshooting | 3 | 12-05-2003 02:40 PM |
phpDig ignores robots.txt | Dragonfly | Troubleshooting | 1 | 09-12-2003 06:54 AM |