PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-13-2004, 01:08 AM   #1
siliconkibou
Green Mole
 
Join Date: Dec 2003
Posts: 11
'Duplicate' Search Results

Hi,

I've noticed that PHPDig seems to not be able to differeniate between nearly identical(I say nearly, because they appear identical to my human eyes) documents located on a website.

If one document is located in say /worldwide/ and another in /about_us/ they both come up in a search result with identical percentages.

Additionally, documents that are generated dynamically but are identical also give multiple duplicate results.

For example:

http://www.issa.com/worldwide/index....pe=news&id=153

and

http://www.issa.com/worldwide/index....pe=news&id=153

Both are listed as results(they differ by the region variable in the URL).

This behavior is understandable, since they are slightly different(from a machines perspective).

However, is there a way to increase the criteria used to judge duplicate documents to filter out highly similar documents as well?

Say if they share 90% of the same content?

Thanks in advance,

-Paul

For reference, you may see for yourself this behavior at:

http://search.custodialadvisorsnetwork.org

Search for "cleaning standards" as a good example.

Several pages into the search, you'll see some examples of pseudo-duplicates.

Last edited by siliconkibou; 01-13-2004 at 01:10 AM.
siliconkibou is offline   Reply With Quote
Old 01-13-2004, 09:00 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. You might try modifying the $md5 variable talked about in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Restricting search results by URL at the search form level innerfire How-to Forum 3 08-01-2005 09:36 AM
No most searched terms, biggest results, most 0 results, last search queries, etc. jongag1 How-to Forum 6 04-22-2005 11:43 AM
Too many duplicate link, someone help please! warrence Troubleshooting 1 09-07-2004 05:26 PM
Duplicate/Similar search results? ChadK How-to Forum 3 08-20-2004 07:07 AM
Indexing duplicate descriptions and keywords causing false search results jerrywin5 Mod Requests 3 05-04-2004 09:27 AM


All times are GMT -8. The time now is 01:02 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.