Hi, everyone,
searching for a suitable alternative to the postnuke search engine (which can't be used for a multisite setup) I've stumbled over yours.
So far it works nicely, just some things I can't resolve:
I've told the machine to index
http://www.subdomain.domain.com/html/ and put a robots.txt in the html-directory. But phpDIG keeps ignoring it...even when stating
User-agent: PhpDig
Disallow: /
it continues to spider into the subdirectories...
Is there any other way to exclude single directories ?It is said "Warning ! Erase is permanent" on the update form site but it isn't. This would be neat if I could just erase here all not-wanted pages but if I start reindexing the rest, again it starts to spider the just-erased pages. Adding the exclude-Tag to a single file didn't work either...again this page is indexed.
Maybe this is due to the postnukeCMS, no idea...it's a modular system and I wanted to limit access to some of the modules otherwise it would start to index without limits...so I need to restrict access to the dics.
Another problem is, that each spidering action causes damages to the postnuke-mySQL-files...I need to reinstall all tables of the site. This is weird; maybe due to the server configs (Apache 2.0) and not to phpdig.
Any ideas how to control this tool ?
Thanks for your input !
Drag