PDA

View Full Version : XDuplicate of an existing document Not working!


BulForce
01-15-2005, 01:28 PM
:bang:
I have found that there is some kind of error when the digger compare the urls(probably only somekind of urls) I cannot post more than few examples from the log file generated by the digger. However i will be very happy if someone help me asap.

--- -- -
+205:http://www.site.com/static/blackebonyteens/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:44:31)

+206:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,1,0,0,0,0,0,0(time : 00:44:43)XDuplicate of an existing document

207:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,2,1,0,0,0,0,0,0(time : 00:45:10)

+208:http://www.site.com/static/blondeparade/index.php?q=adultzone,1,1,2,0,0,0,0,0,0(time : 00:45:24)XDuplicate of an existing document
--- -- -

I know that this isnt a normal links(with all this wierd charachers after the php?) but the url compare also did not work for this type addresses

--- -- -
nttp://www.site.com/m/gloryholestation-001/index.html?t1/revs=adultzone
--- -- -

Thank for reading this post.

*nttp is actually http(i have change it only on this post)

BulForce
01-15-2005, 02:29 PM
Sorry for my stupid post, i have just figured out that the page is compared not only by name but its content too. And the pages that i have indexed have same text content and in some cases even no text content(only pictures)

Moderators feel free to erase this post if you want.

BulForce
01-15-2005, 03:01 PM
However it will be okay for me, if somehow this duplicate check can be turned off.

If somebody knows how to avoid this duplicate check please help me.

thank

Charter
01-16-2005, 12:44 PM
See this (http://www.phpdig.net/forum/showthread.php?t=242) and think rand (http://www.php.net/manual/en/function.rand.php). ;)

BulForce
01-16-2005, 03:59 PM
Thanks for your support

I have edit one line in robot_functions.php

Line:1323 Was $md5 = md5($titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize;

And have make it look this way:

Line:1323 Now $md5 = md5(rand().$titre_resume.$page_desc['content'].$text[$max_chunk]).'_'.$tempfilesize; //moded line - Turn off duplicate chk


I have run a little test spidering and all goes fine, i hope that there will be no more problems.