View Single Post
Old 10-03-2003, 08:37 AM   #3
Gecko
Green Mole
 
Join Date: Oct 2003
Location: Netherlands
Posts: 5
Re: Re: Index update question

Roland, thank you for your advice! I think this might just be the trick to speed up the spider process and avoid having to remove hundreds of files by hand every time.

I have one question about your answer on my first question, though:

Quote:
Originally posted by Rolandks
Yes, <!-- exclude --> this parts. PhpDig is a search-engine, how should it know what you will index and which part not index ? Only by excluding !
I think this will not work. Let me give you an example. In one of my subdirs I have a php script called show.php. This script is used for calling all the files in that subdir and merging it with my template files (show.php?link=a etc) in order to produce complete html output. In the dir and deeper subdirs are also files called a.htm, b.htm etc. These files are only called by the php script, there are no direct links from other html pages on my site. Yet they ARE found and indexed by PhpDig (as is show.php).

In other words: i just want PhpDig to index the URL
.../show.php?link=a (which incorporates a.htm)
but I do not want PhpDig to index the a.htm file itself as it is no web page but just a part of it.

Your suggestion to put Phpdig exclude and include brackets into a.htm would not work, because then the contents are also not indexed when the spider is trying to index show.php?link=a!

If PhpDig spiders the site from the root URL, it should never encounter a.htm, just show.php?link=a. But it doesn't. It does not only spider the links and index the pages found that way, it also reads the remote filesystem and indexes every single file it finds. And that is not what I want it to do.
__________________
--
Life is wasted on the living
Gecko is offline   Reply With Quote