Thanks for your help, which is much appreciated, but I am still failing. Here is what I did:
I commented out the old FORBIDDEN_EXTENSIONS line and replaced it with
PHP Code:
define('FORBIDDEN_EXTENSIONS','(*index*|guestbook|\.(html|cgi|php|asp|pl|rm|ico|cab|swf| css|gz|z|tar|zip|tgz|msi|arj|zoo|rar|r[0-9]+|exe|bin|pkg|rpm|deb|bz2)$)');
The difference between this and the elements in your post in this thread :
http://www.phpdig.net/forum/showthread.php?t=1659 is that I added in *index* before the \ and html after it. I aslo deleted the space.
Then in robot_functions php file, I changed the 3 instances of regs[5] around the !eregi(BANNED,$regs[5]) piece for regs[2]
However, the spider still continues to index files named index123.html
Where am I going wrong?