ok it kept me from asking you 9 more questions maybe......
http://www.horse-riding.net/cgi-bin/guestbook/book.
cgi?url=anything&mode=show&refresh=yes
I add it to ignore
guestbook in BANNED (it was a no go for some reason)
I added in FORBIDDEN_EXTENSIONS to have
cgi filetype ignored (it was a no go also)
This is after a fresh crawl with nothing else done.
Code:
// regular expression to ban useless external links in index
define('BANNED','^ad\.|banner|banners|doubleclick|links|forum|guestbook|geocities|8m|directory|affiliate|groups|');
// regexp forbidden extensions - return sometimes text/html mime-type !!!
define('FORBIDDEN_EXTENSIONS','\.(cgi|php|asp|pl|rm|ico|cab|swf|css|gz|z|tar|zip|tgz|msi|arj|zoo|rar|r[0-9]+|exe|bin|pkg|rpm|deb|bz2)$');
Any ideas?
The code box above made the geocities and tar file look weird with a space , but it's right in the config. Maybe its my browser playing tricks on me.