PDA

View Full Version : Exclude list?


antun
03-10-2004, 08:42 AM
First of all, PhpDig looks like an awesome product. I've been looking for a new search engine for ages, and I think I've found it!!

One question:

In the docs for phpdig 1.8.0, it says:

At least, the robot compare the URI with the exclude list.

This is the only mention of an exclude list I could find in any of the files in the distribution, and I searched these forums, but the only thing I turned up was the:
define('BANNED','^ad\.|banner|doubleclick');

... variable in config.php. Is the BANNED variable the recommended way to exclude paths? What I'd like to do is exclude, say:

/developers/community/forums/...

-Antun

Charter
03-10-2004, 09:29 AM
Hi antun, and welcome to PhpDig.net!

The BANNED constant is meant to prevent the following of certain links in pages that get crawled. To prevent certain directories from being crawled altogether, set a robots.txt file in your web root. If a directory has already been crawled and you want to exclude it, just click the red circle noway symbol from the admin panel.

antun
03-10-2004, 10:11 AM
Thanks, but if I use a robots.txt file to exclude certain directories, won't that prevent those dirs from being indexed by public search engines too (e.g. Google?).

I'm only trying to fine-tune our search - for example, I'd like to exclude our forums from all searches, and I'd like to remove our Developers area from all non-tech releated searches.

Should I be excluding different directories and running separate indexes, or should I be running one large index and (if possible?) excluding parts of the site at search-time?

-Antun

Charter
03-10-2004, 10:32 AM
Hi. A robots.txt file with the following should exclude the directories from PhpDig prior to index:

User-agent: PhpDig
Disallow: /developers/
Disallow: /developers/community/forums/
Disallow: /lps/
Disallow: /lps-2.0/docs/lzx-developers-guide/

If a directory that you don't want indexed has already been indexed, just click the red circle to delete and exclude it, making sure that the tempspider table is empty prior to reindex.

antun
03-10-2004, 11:24 AM
Got it! That will work for the "/developers/community/forums/", which I never want indexed.

However, in my case, I'd like to have separate configurations:

- The entire website (excluding /developers/).
- All of /developers/, but nothing in the rest of the site.
- Just /lps-2.0/docs/lzx-reference/, but nothing else.

I presume the best way would be to have each one as a separate website, right?. You see I want to give people an option as to what to search (using a pull-down) most likely. (You can see what I mean here: http://www.laszlosystems.com/developers/).

-Antun

Charter
03-10-2004, 11:38 AM
Hi. Perhaps this (http://www.phpdig.net/showthread.php?threadid=597) thread might help.