![]() |
'Duplicate' Search Results
Hi,
I've noticed that PHPDig seems to not be able to differeniate between nearly identical(I say nearly, because they appear identical to my human eyes) documents located on a website. If one document is located in say /worldwide/ and another in /about_us/ they both come up in a search result with identical percentages. Additionally, documents that are generated dynamically but are identical also give multiple duplicate results. For example: http://www.issa.com/worldwide/index....pe=news&id=153 and http://www.issa.com/worldwide/index....pe=news&id=153 Both are listed as results(they differ by the region variable in the URL). This behavior is understandable, since they are slightly different(from a machines perspective). However, is there a way to increase the criteria used to judge duplicate documents to filter out highly similar documents as well? Say if they share 90% of the same content? Thanks in advance, -Paul For reference, you may see for yourself this behavior at: http://search.custodialadvisorsnetwork.org Search for "cleaning standards" as a good example. Several pages into the search, you'll see some examples of pseudo-duplicates. |
Hi. You might try modifying the $md5 variable talked about in this thread.
|
All times are GMT -8. The time now is 02:19 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.