PDA

View Full Version : Command Line Spider spiders all sites


Wayne McBryde
01-27-2004, 01:50 PM
I’m still working to install 1.8.0. I’m building a new database and have a LOT of sites to spider. I created 9 text files with domain names, url_list_1.txt through url_list_9.txt.
When I entered “php –f spider.php url_list_1.txt” the spider, spidered the sites in the text file. When I enter “php –f spider.php url_list_2.txt” the spider, spiders the sites in list 2 then respiders the sites from list 1. Is this normal, or am I doing something wrong?

Charter
01-27-2004, 02:57 PM
Hi. If you are still using version 1.6.5, then PhpDig will spider similar to that. Once you upgrade to 1.8.0, only the ULRs in each file will be crawled.

Wayne McBryde
01-27-2004, 05:37 PM
It is 1.8.0 that I am having this problem with.

Charter
01-27-2004, 06:15 PM
Hi. Between runs, check that the tempspider table is empty. If it's not empty, then empty it. You can do this by clicking the delete button from the admin panel without selecting a site, or run the following query:

DELETE FROM tempspider;

Sometimes things can get left in the tempspider table when there is no error but the corresponding page hasn't been indexed. This can happen if the spidering process is terminated prematurely.