PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 03-10-2004, 08:42 AM   #1
antun
Green Mole
 
Join Date: Mar 2004
Location: San Francisco
Posts: 4
Exclude list?

First of all, PhpDig looks like an awesome product. I've been looking for a new search engine for ages, and I think I've found it!!

One question:

In the docs for phpdig 1.8.0, it says:

Quote:
At least, the robot compare the URI with the exclude list.
This is the only mention of an exclude list I could find in any of the files in the distribution, and I searched these forums, but the only thing I turned up was the:
PHP Code:
define('BANNED','^ad\.|banner|doubleclick'); 
... variable in config.php. Is the BANNED variable the recommended way to exclude paths? What I'd like to do is exclude, say:

/developers/community/forums/...

-Antun
antun is offline   Reply With Quote
Old 03-10-2004, 09:29 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi antun, and welcome to PhpDig.net!

The BANNED constant is meant to prevent the following of certain links in pages that get crawled. To prevent certain directories from being crawled altogether, set a robots.txt file in your web root. If a directory has already been crawled and you want to exclude it, just click the red circle noway symbol from the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-10-2004, 10:11 AM   #3
antun
Green Mole
 
Join Date: Mar 2004
Location: San Francisco
Posts: 4
Thanks, but if I use a robots.txt file to exclude certain directories, won't that prevent those dirs from being indexed by public search engines too (e.g. Google?).

I'm only trying to fine-tune our search - for example, I'd like to exclude our forums from all searches, and I'd like to remove our Developers area from all non-tech releated searches.

Should I be excluding different directories and running separate indexes, or should I be running one large index and (if possible?) excluding parts of the site at search-time?

-Antun
antun is offline   Reply With Quote
Old 03-10-2004, 10:32 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. A robots.txt file with the following should exclude the directories from PhpDig prior to index:
Code:
User-agent: PhpDig
Disallow: /developers/
Disallow: /developers/community/forums/
Disallow: /lps/
Disallow: /lps-2.0/docs/lzx-developers-guide/
If a directory that you don't want indexed has already been indexed, just click the red circle to delete and exclude it, making sure that the tempspider table is empty prior to reindex.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 03-10-2004, 11:24 AM   #5
antun
Green Mole
 
Join Date: Mar 2004
Location: San Francisco
Posts: 4
Got it! That will work for the "/developers/community/forums/", which I never want indexed.

However, in my case, I'd like to have separate configurations:

- The entire website (excluding /developers/).
- All of /developers/, but nothing in the rest of the site.
- Just /lps-2.0/docs/lzx-reference/, but nothing else.

I presume the best way would be to have each one as a separate website, right?. You see I want to give people an option as to what to search (using a pull-down) most likely. (You can see what I mean here: http://www.laszlosystems.com/developers/).

-Antun
antun is offline   Reply With Quote
Old 03-10-2004, 11:38 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps this thread might help.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
List how many Sites have been indexed? Dan Mod Requests 1 11-17-2006 06:00 AM
Feature List? paulsv The Mole Hole 1 01-31-2006 09:40 PM
Clear List of Queries Kvasir How-to Forum 1 05-19-2005 06:55 AM
List all pages from specified host BulForce How-to Forum 3 01-19-2005 12:19 PM
search list... staura How-to Forum 3 06-19-2004 05:54 AM


All times are GMT -8. The time now is 11:11 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.