PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   XDuplicate of an existing document Not working! (http://www.phpdig.net/forum/showthread.php?t=1743)

BulForce 01-15-2005 01:28 PM

XDuplicate of an existing document Not working!
 
:bang:
I have found that there is some kind of error when the digger compare the urls(probably only somekind of urls) I cannot post more than few examples from the log file generated by the digger. However i will be very happy if someone help me asap.

--- -- -
+205:http://www.site.com/static/blackebonyteens/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:44:31)

+206:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,1,0,0,0,0,0,0(time : 00:44:43)XDuplicate of an existing document

207:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,2,1,0,0,0,0,0,0(time : 00:45:10)

+208:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:45:24)XDuplicate of an existing document
--- -- -

I know that this isnt a normal links(with all this wierd charachers after the php?) but the url compare also did not work for this type addresses

--- -- -
nttp://www.site.com/m/gloryholestation-001/index.html?t1/revs=adultzone
--- -- -

Thank for reading this post.

*nttp is actually http(i have change it only on this post)

BulForce 01-15-2005 02:29 PM

Sorry for my stupid post, i have just figured out that the page is compared not only by name but its content too. And the pages that i have indexed have same text content and in some cases even no text content(only pictures)

Moderators feel free to erase this post if you want.

BulForce 01-15-2005 03:01 PM

However it will be okay for me, if somehow this duplicate check can be turned off.

If somebody knows how to avoid this duplicate check please help me.

thank

Charter 01-16-2005 12:44 PM

See this and think rand. ;)

BulForce 01-16-2005 03:59 PM

Thanks for your support

I have edit one line in robot_functions.php

Line:1323 Was $md5 = md5($titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize;

And have make it look this way:

Line:1323 Now $md5 = md5(rand().$titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize; //moded line - Turn off duplicate chk


I have run a little test spidering and all goes fine, i hope that there will be no more problems.


All times are GMT -8. The time now is 10:13 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.