View Single Post
Old 01-13-2004, 01:08 AM   #1
Green Mole
Join Date: Dec 2003
Posts: 11
'Duplicate' Search Results


I've noticed that PHPDig seems to not be able to differeniate between nearly identical(I say nearly, because they appear identical to my human eyes) documents located on a website.

If one document is located in say /worldwide/ and another in /about_us/ they both come up in a search result with identical percentages.

Additionally, documents that are generated dynamically but are identical also give multiple duplicate results.

For example:


Both are listed as results(they differ by the region variable in the URL).

This behavior is understandable, since they are slightly different(from a machines perspective).

However, is there a way to increase the criteria used to judge duplicate documents to filter out highly similar documents as well?

Say if they share 90% of the same content?

Thanks in advance,


For reference, you may see for yourself this behavior at:

Search for "cleaning standards" as a good example.

Several pages into the search, you'll see some examples of pseudo-duplicates.

Last edited by siliconkibou; 01-13-2004 at 01:10 AM.
siliconkibou is offline   Reply With Quote