View Single Post
Old 03-06-2006, 05:19 PM   #2
jimurl@montanai
Green Mole
 
Join Date: Feb 2006
Posts: 1
excluding pages

Thats a good question. I had the same question, and the same problem finding a response to it. I did run across this thread:

http://www.phpdig.net/forum/showthread.php?t=691

which included a link to yet another thread... but that link was broken.

But I also found a link to this thread:
http://www.phpdig.net/forum/showthread.php?t=1416

which, to cut to the chase says, if you make a "robots.txt" file, and put this in it:

User-agent: PhpDig
Disallow: /path/file1.php
Disallow: /path/file2.html

Then, when you index, it will skip those files. You put the robots.txt at the root level of your site.

I haven't yet actually re-indexed using this robots.txt file in place, but I bet it'll work... the guy at the link above says it will.

I already had the pages in my index which I wanted to exclude, or actually, just get rid of altogether. I played around and discovered a pair of mysql DELETE statement that would do that. It goes more or less like this:

select * from digengine where spider_id in (select spider_id from digspider where file like '%article.php?article_id=%' and path ='press/');

and

select spider_id from digspider where file like '%article.php' and path ='press/'

where "press/article.php" was the page that I wanted to remove from the search index. Also, you have to replace the "select..." with the appropriate "delete", but I would recommend playing with select first, to make sure you are getting rid of the right stuff. You have to use both statements, in that order, or you can can really screw things up. But, I was ready to blow away my database and start over with re-indexing, had I really FUBARed things.

I hope this helps.
jimurl@montanai is offline   Reply With Quote