![]() |
Duplicate Documents Problem...
For some reason, when I run the spider it is kicking back duplicate documents that are not in fact duplicates.
It indexes this: Code:
mambo104/index.php?option=com_weblinks&Itemid=4 Code:
mambo104/index.php?option=com_weblinks&Itemid=1&catid=2 |
Hi. It is the robot_functions.php file that determines whether a page is a duplicate, specifically the phpdigTestDouble function.
In this function, it is the following query that determines a duplicate: PHP Code:
PHP Code:
Making the page title dynamic, depending on the query string, or changing CHUNK_SIZE in the config file would be a couple ways to avoid the duplicates. |
Hmmm. I'll have to look into tempfilesize. Could there be some type of bug in there?
document 1 = 3.22KB document 2 = 3.4KB I'm thinking the temp file size should be same as the actual file size, no? And if so I would think the different file sizes would prevent them from being tagged as dupes. Thanks for your other suggestion regarding titles. Unfortunately I am building this a plug-in component for Mambo OS, and their titles are not dynamic out of the box. So, I need to come up with a better solution that works with the stock install of Mambo. Any more info would be appreciated... maybe I can modify the function so that it bases duplicates on the actual URL. |
Hi. The $tempfilesize varible is created in the phpdigTempFile function in the robot_functions.php file and is set to the filesize of the temporary file. Do those two pages still show as duplicates if you increase the CHUNK_SIZE or add some amount of r****m text to the end of one of the pages?
|
All times are GMT -8. The time now is 10:11 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.