PDA

View Full Version : Too many results per site spidered and redirections


paullind
01-24-2004, 01:05 PM
Hi

Sometimes I spider a site that doesn't exist anymore but it gets forwarded to a default directory with an enormous number of links on them and begins spidering that page.

Here's an example page I enter to spider:
http://www.jacksondiamondkats.com

I am using windows2000 server and entering this page in a bat file like this:
php.exe -f "C:\InetPub\wwwroot\phpdig\admin\spider.php" http://www.jacksondiamondkats.com

I have 3 questions after dealing with this site:

1] Can I prevent my spider from being redirected?
PHPDIG_IN_DOMAIN set to false already

2] Can I limit the number of sites spidered in case there are too many links on a page?
- there is a variable in config.php, called NUMBER_OF_RESULTS_PER_SITE, which I set to 10 for example, but it still tries to spider however many links are on the page above ie >70.

Any recomondations on how to deal with a site like:
http://www.jacksondiamondkats.com

3] I just want to spider a main/homepage for a site and the links from this page only, are these variables set corrrectly?
SPIDER_MAX_LIMIT 1
SPIDER_DEFAULT_LIMIT 1
RESPIDER_LIMIT 1

Any assistance would be appreciated,

Paul L

Charter
01-24-2004, 10:04 PM
Hi. For one, perhaps try modifying some code in the phpdigTestUrl function. For two, NUMBER_OF_RESULTS_PER_SITE is for the max number of results to display per site from a search, but perhaps this (http://www.phpdig.net/showthread.php?threadid=300) thread might help. For three, yes a limit of one indexes a given page and links from that page.