Hi. Part of this could be solved by adding DISTINCT to the following query (or make a join query) in the spider.php file:
PHP Code:
$query = "SELECT DISTINCT(".PHPDIG_DB_PREFIX."sites.site_id),".PHPDIG_DB_PREFIX."sites.site_url,"
.PHPDIG_DB_PREFIX."sites.username as user,".PHPDIG_DB_PREFIX."sites.password as pass,"
.PHPDIG_DB_PREFIX."sites.port FROM ".PHPDIG_DB_PREFIX."sites,".PHPDIG_DB_PREFIX."tempspider WHERE "
.PHPDIG_DB_PREFIX."sites.site_id = ".PHPDIG_DB_PREFIX."tempspider.site_id";
This should make it so if file1 contains domainA, domainB then the bot1 array will only contain one instance of each domain. I say partly solved because once bot1 runs on domainA, domainB there will be stuff in the tempspider table, so when bot2 runs file2 containing domainC, domainD then the bot2 array will be domainA, domainB, domainC, domainD.
I suppose AND ".PHPDIG_DB_PREFIX."sites.locked = 0" could be added to the WHERE part of above query, but that still doesn't guarantee unique arrays across bots unless you make sure that each bot gets a chance to lock its sites before the next bot is fired up but before said bots unlock their sites. Even still the tempspider table would need to be emptied after all bots are done.