paullind
01-24-2004, 12:05 PM
Hi
Sometimes I spider a site that doesn't exist anymore but it gets forwarded to a default directory with an enormous number of links on them and begins spidering that page.
Here's an example page I enter to spider:
http://www.jacksondiamondkats.com
I am using windows2000 server and entering this page in a bat file like this:
php.exe -f "C:\InetPub\wwwroot\phpdig\admin\spider.php" http://www.jacksondiamondkats.com
I have 3 questions after dealing with this site:
1] Can I prevent my spider from being redirected?
PHPDIG_IN_DOMAIN set to false already
2] Can I limit the number of sites spidered in case there are too many links on a page?
- there is a variable in config.php, called NUMBER_OF_RESULTS_PER_SITE, which I set to 10 for example, but it still tries to spider however many links are on the page above ie >70.
Any recomondations on how to deal with a site like:
http://www.jacksondiamondkats.com
3] I just want to spider a main/homepage for a site and the links from this page only, are these variables set corrrectly?
SPIDER_MAX_LIMIT 1
SPIDER_DEFAULT_LIMIT 1
RESPIDER_LIMIT 1
Any assistance would be appreciated,
Paul L
Sometimes I spider a site that doesn't exist anymore but it gets forwarded to a default directory with an enormous number of links on them and begins spidering that page.
Here's an example page I enter to spider:
http://www.jacksondiamondkats.com
I am using windows2000 server and entering this page in a bat file like this:
php.exe -f "C:\InetPub\wwwroot\phpdig\admin\spider.php" http://www.jacksondiamondkats.com
I have 3 questions after dealing with this site:
1] Can I prevent my spider from being redirected?
PHPDIG_IN_DOMAIN set to false already
2] Can I limit the number of sites spidered in case there are too many links on a page?
- there is a variable in config.php, called NUMBER_OF_RESULTS_PER_SITE, which I set to 10 for example, but it still tries to spider however many links are on the page above ie >70.
Any recomondations on how to deal with a site like:
http://www.jacksondiamondkats.com
3] I just want to spider a main/homepage for a site and the links from this page only, are these variables set corrrectly?
SPIDER_MAX_LIMIT 1
SPIDER_DEFAULT_LIMIT 1
RESPIDER_LIMIT 1
Any assistance would be appreciated,
Paul L