View Single Post
Old 11-18-2006, 04:07 PM   #2
CentaurAtlas
Green Mole
 
Join Date: Nov 2006
Location: Florida
Posts: 11
Multiple spiders

I've been looking at the multiple spider issue and for me the impetus for doing multiple spiders is to spider more pages faster (obviously, I think).

After looking at the performance of the software, the slowness isn't waiting for the pages, but in processing them.

The performace is in the code that processes the URLs found per page. Perhaps everyone knew this, but it is illustrative to show where the bottleneck is.

In particular if you check the performance you will see that the
foreach($urls as $lien) { ...
}

loop takes the majority of the time. Around 50-60 seconds per page.

In adIdion to (or in instead of) throwing 50 or 60 spiders at a problem, improving the performance of this section would help improve indexing performance greatly.

More in a bit!
CentaurAtlas is offline   Reply With Quote