|
01-24-2004, 12:05 PM | #1 |
Orange Mole
Join Date: Jan 2004
Posts: 30
|
Too many results per site spidered and redirections
Hi
Sometimes I spider a site that doesn't exist anymore but it gets forwarded to a default directory with an enormous number of links on them and begins spidering that page. Here's an example page I enter to spider: http://www.jacksondiamondkats.com I am using windows2000 server and entering this page in a bat file like this: php.exe -f "C:\InetPub\wwwroot\phpdig\admin\spider.php" http://www.jacksondiamondkats.com I have 3 questions after dealing with this site: 1] Can I prevent my spider from being redirected? PHPDIG_IN_DOMAIN set to false already 2] Can I limit the number of sites spidered in case there are too many links on a page? - there is a variable in config.php, called NUMBER_OF_RESULTS_PER_SITE, which I set to 10 for example, but it still tries to spider however many links are on the page above ie >70. Any recomondations on how to deal with a site like: http://www.jacksondiamondkats.com 3] I just want to spider a main/homepage for a site and the links from this page only, are these variables set corrrectly? SPIDER_MAX_LIMIT 1 SPIDER_DEFAULT_LIMIT 1 RESPIDER_LIMIT 1 Any assistance would be appreciated, Paul L |
01-24-2004, 09:04 PM | #2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. For one, perhaps try modifying some code in the phpdigTestUrl function. For two, NUMBER_OF_RESULTS_PER_SITE is for the max number of results to display per site from a search, but perhaps this thread might help. For three, yes a limit of one indexes a given page and links from that page.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
Thread Tools | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How do I integrate results pages into a php site? | RubiconCSL | How-to Forum | 0 | 05-30-2006 04:52 AM |
javascript menu not being spidered | vuurvos | Troubleshooting | 1 | 10-05-2005 06:50 AM |
Show search results in other site | cnit | Troubleshooting | 6 | 08-20-2004 01:11 PM |
Adding the site URL in results page | lighthouse | How-to Forum | 5 | 03-22-2004 12:37 PM |
force dircotory to be spidered as new site | lennybruce22000 | How-to Forum | 6 | 02-09-2004 09:36 AM |