View Single Post
Old 09-09-2003, 06:53 AM   #1
Dragonfly
Green Mole
 
Join Date: Sep 2003
Posts: 1
phpDig ignores robots.txt

Hi, everyone,
searching for a suitable alternative to the postnuke search engine (which can't be used for a multisite setup) I've stumbled over yours.
So far it works nicely, just some things I can't resolve:

I've told the machine to index http://www.subdomain.domain.com/html/ and put a robots.txt in the html-directory. But phpDIG keeps ignoring it...even when stating

User-agent: PhpDig
Disallow: /

it continues to spider into the subdirectories...

Is there any other way to exclude single directories ?It is said "Warning ! Erase is permanent" on the update form site but it isn't. This would be neat if I could just erase here all not-wanted pages but if I start reindexing the rest, again it starts to spider the just-erased pages. Adding the exclude-Tag to a single file didn't work either...again this page is indexed.

Maybe this is due to the postnukeCMS, no idea...it's a modular system and I wanted to limit access to some of the modules otherwise it would start to index without limits...so I need to restrict access to the dics.

Another problem is, that each spidering action causes damages to the postnuke-mySQL-files...I need to reinstall all tables of the site. This is weird; maybe due to the server configs (Apache 2.0) and not to phpdig.

Any ideas how to control this tool ?

Thanks for your input !

Drag

Last edited by Dragonfly; 09-09-2003 at 07:09 AM.
Dragonfly is offline   Reply With Quote