|09-09-2003, 06:53 AM||#1|
Join Date: Sep 2003
phpDig ignores robots.txt
searching for a suitable alternative to the postnuke search engine (which can't be used for a multisite setup) I've stumbled over yours.
So far it works nicely, just some things I can't resolve:
I've told the machine to index http://www.subdomain.domain.com/html/ and put a robots.txt in the html-directory. But phpDIG keeps ignoring it...even when stating
it continues to spider into the subdirectories...
Is there any other way to exclude single directories ?It is said "Warning ! Erase is permanent" on the update form site but it isn't. This would be neat if I could just erase here all not-wanted pages but if I start reindexing the rest, again it starts to spider the just-erased pages. Adding the exclude-Tag to a single file didn't work either...again this page is indexed.
Maybe this is due to the postnukeCMS, no idea...it's a modular system and I wanted to limit access to some of the modules otherwise it would start to index without limits...so I need to restrict access to the dics.
Another problem is, that each spidering action causes damages to the postnuke-mySQL-files...I need to reinstall all tables of the site. This is weird; maybe due to the server configs (Apache 2.0) and not to phpdig.
Any ideas how to control this tool ?
Thanks for your input !
Last edited by Dragonfly; 09-09-2003 at 07:09 AM.
|09-12-2003, 06:54 AM||#2|
Join Date: May 2003
Hi. The "Warning ! Erase is permanent" error is being produced because there is not a lock, i.e., $locked = 0. If you have access to raw log files, is the URL to robots.txt correct? Otherwise, in robot_functions.php there is a function called phpdigReadRobotsTxt. In that function, you might try echoing $site.'robots.txt' to see if it is correct. Not sure why the PostNuke tables are damaged. Didn't see any conflicting tables, even when the PhpDig prefix is set to nuke. What kind of damage is done to the PostNuke tables?
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
|Thread||Thread Starter||Forum||Replies||Last Post|
|robots.txt seems to be ignored :?||galacticvoyager||Bug Tracker||1||11-12-2005 12:52 PM|
|PhpDig Ignoring Something in robots.txt||Destroyer X||Troubleshooting||2||06-18-2004 01:57 PM|
|robots.txt versus robotsxx.txt||Charter||IPs, SEs, & UAs||0||03-11-2004 06:00 PM|
|robots.txt ignored||roy||Troubleshooting||3||02-20-2004 08:02 PM|
|robots.txt||renehaentjens||Troubleshooting||3||12-05-2003 02:40 PM|