View Single Post
Old 03-23-2007, 07:14 AM   #1
marco
Green Mole
 
Join Date: Mar 2007
Posts: 2
Crawler speed improvement (although affects limit)

I had the problem phpdigExplore() returns to many duplicate links. This caused the spider to check 100s of duplicate URLs, which caused a slowdown, and the 1000 pages limit was hit quite fast.

Finally I added the following code at the end of phpDigExplore():
PHP Code:
if(!$_SESSION["links"]) $_SESSION["links"]=array();
$resultlinks = array();
foreach(
$links as $link){
    if(!
array_search($link$_SESSION["links"])){
        
$_SESSION["links"][]=$link;
        
$resultlinks[]=$link;
    }
}
return 
$resultlinks
I don't know whether this modification is useful or harms other components. But for the moment, it works.
marco is offline   Reply With Quote