View Single Post
Old 10-06-2003, 10:27 PM   #7
Gecko
Green Mole
 
Join Date: Oct 2003
Location: Netherlands
Posts: 5
Quote:
Originally posted by Charter
Hi. PhpDig is set to crawl any links it encounters at the given level. Not sure if "called by the php script" means that the PHP script is feeding the a.htm files via a.htm links. Does setting up a robots.txt file in web root so PhpDig doesn't crawl a.htm type files work?
I have over 800 *.htm snippets in several subdirs. There is no link to any of them in HTML, all are handled by PHP-scripts where they are merged with HTML- and CSS- style templates via the PHP include function.

I already excluded the major part of these files by putting their directory as disallowed in robots.txt when the php-script is not located in the same directory as the *.htm snippets. This solves about 60% of the problem. But disallowing the remaining files by naming them directly in robots.txt is still a lot of work, and it is a workaround for a problem that should not exist in the first place.

I still find it curious that when only HTML links are spidered by PhpDig, files are found when there is no HTML-link pointing to it. Previous I used search services like Atomz and Freefind (but they became too limited for my rapidly expanding site); they just spidered the links and nothing else.
__________________
--
Life is wasted on the living
Gecko is offline   Reply With Quote