View Single Post
Old 10-03-2003, 06:50 AM   #2
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Re: Index update question

Quote:
Originally posted by Gecko
- Is there any way that I can tell PhpDig not to find and index files, but just to spider and index all links in html pages?
Yes, <!-- exclude --> this parts. PhpDig is a search-engine, how should it know what you will index and which part not index ? Only by excluding !

Quote:

- What happens when I embed links in the <!-- phpdigExclude --> and <!-- phpdigInclude --> tags? Are hyperlinks placed between those tags being ignored for spidering? Hope so!
Yes, it works IMHO, try it with one page.
Quote:

- Right now, PhpDig also indexes words and picks up links that are embedded in HTML remark tags <!-- and -->. Too bad.
You are using PHP > 4.3.2 it is a Bug see: Indexing HTML-Comments
Quote:

- Wouldn't it be an idea if you could configure PhpDig with a list of files and directories to ignore? Then the spider does not have to spider everything in order to find out that certain pages are not to be indexed when the META ROBOTS tag tells it.
It is a feature request . But: PhpDig Tries to read a robots.txt file at the server root. It searches meta robots tags too. Other Workaround: Create a robot.txt with all directories to ignore (Disallow: /my_dir/), Dig the Site, delete robot.txt
Quote:

- Is there a possiblility that I can add seperate files to the index through the web interface? I have a news service on my site which is driven by a single php file. Right now it looks like that if I have to add new files to the index, I have to spider the entire news directory. This causes PhpDig to spider 900+ pages right now, and over 1200 next year etc.
Create a Indexfile with all Links and index this file, after that delete this indexfiles in Update form.
Rolandks is offline   Reply With Quote