View Single Post
Old 10-03-2003, 04:56 AM   #1
Gecko
Green Mole
 
Join Date: Oct 2003
Location: Netherlands
Posts: 5
Question Index update question

Hi, I recently found PhpDig serarching for a good site search engine for my remotely hosted website and I am currently configuring it to suit my needs. I still have a couple of questions, so I hope anyone here can help me along:

- A full index takes hours, and besides spidering and indexing all links, it also finds and indexes all files in the subdirectories. I just want it to spider the links, because the files itself are not complete. Served through the website, the files are embedded in php and css driven templates in order to serve complete html pages. Is there any way that I can tell PhpDig not to find and index files, but just to spider and index all links in html pages?

- What happens when I embed links in the <!-- phpdigExclude --> and <!-- phpdigInclude --> tags? Are hyperlinks placed between those tags being ignored for spidering? Hope so!

- Right now, PhpDig also indexes words and picks up links that are embedded in HTML remark tags <!-- and -->. Too bad.

- Wouldn't it be an idea if you could configure PhpDig with a list of files and directories to ignore? Then the spider does not have to spider everything in order to find out that certain pages are not to be indexed when the META ROBOTS tag tells it.

- Is there a possiblility that I can add seperate files to the index through the web interface? I have a news service on my site which is driven by a single php file. Right now it looks like that if I have to add new files to the index, I have to spider the entire news directory. This causes PhpDig to spider 900+ pages right now, and over 1200 next year etc.
__________________
--
Life is wasted on the living
Gecko is offline   Reply With Quote