PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 02-07-2005, 01:10 AM   #1
WebSpider
Green Mole
 
Join Date: Feb 2005
Posts: 16
Break the depth limit of 20?

Is the Depth limit of 20 a script limitation? a resource limitation? some sort of loop avoidance?

I ask because I tried to spider a directory where each new page of results is considered a new level, and there are categories with more than 20 pages.

Can we break this limit somehow?

Thanks!
WebSpider is offline   Reply With Quote
Old 02-07-2005, 02:32 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Just change it in the config file:
Code:
define('SPIDER_MAX_LIMIT',20);                   // max (re)index search depth - used for shell and admin panel dropdown
define('RESPIDER_LIMIT',5);                      // max update search depth - only used for browser, not used for shell

define('LINKS_MAX_LIMIT',20);                    // max (re)index links per - used for shell and admin panel dropdown
define('RELINKS_LIMIT',5);                       // max update links per - only used for browser, not used for shell
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-07-2005, 06:47 AM   #3
WebSpider
Green Mole
 
Join Date: Feb 2005
Posts: 16
Thanks-a-bunch Charter!

Off-side, are you the only developer behind PHPDigger? Do u take donations?
WebSpider is offline   Reply With Quote
Old 02-07-2005, 12:16 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Antoine was the previous developer, releasing the initial version through v.1.6.2, and I have since been the current developer. There have also been contributions posted in the forums and/or listed in the CREDITS, CHANGELOG, and README files. Some history about the change in developers can be found here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-08-2005, 02:19 PM   #5
WebSpider
Green Mole
 
Join Date: Feb 2005
Posts: 16
Thanks.

I changed the depth limit to 60 and now i try to rerun the spider over the same domain so it will add the rest of links not spidered beyond the initial 20 hops, however it won't spider any link but the very first page and then stop.

Ideas?
WebSpider is offline   Reply With Quote
Old 02-08-2005, 06:06 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Check the values in the update sites table via the admin panel.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-08-2005, 11:11 PM   #7
WebSpider
Green Mole
 
Join Date: Feb 2005
Posts: 16
They match my proposals: depth 60 and links 0 (aka all).
WebSpider is offline   Reply With Quote
Old 02-09-2005, 11:30 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Some thoughts...

- Try using the textbox, 60, 0, no.
- View the robots.txt file for changes.
- Look for meta revisit-after/robots tags.
- Enter the site at a different location.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 02-09-2005, 02:18 PM   #9
WebSpider
Green Mole
 
Join Date: Feb 2005
Posts: 16
- Used both text and combo box
- No robots.txt present
- No revisits on the code
- Thats the only thing i should try now. However, does it make sense to index both www.domain.com and domain.com when they're 99% of the times the same thing? shouldn't this be implemented (even as a switch?) on the code of the digger?
WebSpider is offline   Reply With Quote
Old 02-09-2005, 02:21 PM   #10
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Set PHPDIG_IN_DOMAIN to true in the config file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Plus character(+) converted to (%20) in urls raymerica Troubleshooting 2 05-31-2006 12:19 PM
Spaces (%20) in URLs FaberFedor How-to Forum 2 02-08-2005 10:02 AM
Problem spidering sites at in .txt over 20 address joshuag200 Troubleshooting 3 01-30-2004 08:13 PM
Add search depth limit to the sites table peter Mod Requests 0 01-03-2004 09:14 PM


All times are GMT -8. The time now is 08:21 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.