PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 09-09-2003, 07:53 AM   #1
Dragonfly
Green Mole
 
Join Date: Sep 2003
Posts: 1
phpDig ignores robots.txt

Hi, everyone,
searching for a suitable alternative to the postnuke search engine (which can't be used for a multisite setup) I've stumbled over yours.
So far it works nicely, just some things I can't resolve:

I've told the machine to index http://www.subdomain.domain.com/html/ and put a robots.txt in the html-directory. But phpDIG keeps ignoring it...even when stating

User-agent: PhpDig
Disallow: /

it continues to spider into the subdirectories...

Is there any other way to exclude single directories ?It is said "Warning ! Erase is permanent" on the update form site but it isn't. This would be neat if I could just erase here all not-wanted pages but if I start reindexing the rest, again it starts to spider the just-erased pages. Adding the exclude-Tag to a single file didn't work either...again this page is indexed.

Maybe this is due to the postnukeCMS, no idea...it's a modular system and I wanted to limit access to some of the modules otherwise it would start to index without limits...so I need to restrict access to the dics.

Another problem is, that each spidering action causes damages to the postnuke-mySQL-files...I need to reinstall all tables of the site. This is weird; maybe due to the server configs (Apache 2.0) and not to phpdig.

Any ideas how to control this tool ?

Thanks for your input !

Drag

Last edited by Dragonfly; 09-09-2003 at 08:09 AM.
Dragonfly is offline   Reply With Quote
Old 09-12-2003, 07:54 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The "Warning ! Erase is permanent" error is being produced because there is not a lock, i.e., $locked = 0. If you have access to raw log files, is the URL to robots.txt correct? Otherwise, in robot_functions.php there is a function called phpdigReadRobotsTxt. In that function, you might try echoing $site.'robots.txt' to see if it is correct. Not sure why the PostNuke tables are damaged. Didn't see any conflicting tables, even when the PhpDig prefix is set to nuke. What kind of damage is done to the PostNuke tables?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
robots.txt seems to be ignored :? galacticvoyager Bug Tracker 1 11-12-2005 01:52 PM
PhpDig Ignoring Something in robots.txt Destroyer X Troubleshooting 2 06-18-2004 02:57 PM
robots.txt versus robotsxx.txt Charter IPs, SEs, & UAs 0 03-11-2004 07:00 PM
robots.txt ignored roy Troubleshooting 3 02-20-2004 09:02 PM
robots.txt renehaentjens Troubleshooting 3 12-05-2003 03:40 PM


All times are GMT -8. The time now is 09:08 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.