View Single Post
Old 04-14-2004, 03:32 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. I've figured it out; here's the deal.

Short version: Empty the tempspider table between runs.

Long version: When wetcanvas.com was previously indexed, level two was reached before the cap was met. Once the cap was met, the remaining level two plus links were not deleted from the tempspider table. Afterwards, graphics.com was indexed using a file, and because there was level two plus data left over from wetcanvas.com, and possibly info from other sites, the $list_sites variable in spider.php was set to array(graphics.com info, wetcanvas.com info,...,wetcanvas.com info, domain.com info,...,domain.com info).

The wetcanvas.com info was entered into the array as many times as the number of wetcanvas.com level two plus links remaining in the tempspider table because of the join query used to form the $list_sites variable. That is why your logfile contains 'no links, level 1..., no links, level 2..., links' for wetcanvas.com and why the wetcanvas.com index starts over, indexing links using the tempspider table.

If the tempspider table would have been empty between runs, the join query would have produced just array(graphics.com info) for the the $list_sites variable, assuming graphics.com was the only site in the file. Just be sure to empty the tempspider table between runs and you should be fine. To empty the tempspider table from the admin panel, click the delete button without selecting a site.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote