PDA

View Full Version : How to index other pages but not farther from them?


WebSpider
02-06-2005, 02:25 AM
I'll try to explain it as clear as possible:

I spider a www.domainA.com which has links to www.domainX.com www.domainY.com and www.domainZ.com

How do i set up the digger to spider ALL links in domainA.com (domainX, domainY and domainZ) PLUS entering and spidering each of those links but not outside them?

So:

www.domainA.com
|
|\- www.domainX.com: grab links to domain1, domain2 and domain3.com
|
|\- www.domainY.com: grab links to domain4.com
|
\-- www.domainZ.com: grab links to domain5 and domain6.com

In this figure, my DB would contain

domainA, domainX, domainY, domainZ, domain1 to domain6 but not farther from domain1 to 6.

Is it clear enough?

Charter
02-07-2005, 01:23 AM
Set PHPDIG_IN_DOMAIN to true in config.php, find the phpdigCompareDomains function in robot_functions.php and set the else part to true, and set a counter in spider.php so that the phpdigSpiderAddSite function, and related code, is executed a max of X times. Note that phpdigSpiderAddSite appears twice in the spider.php file.

WebSpider
02-07-2005, 01:57 AM
Thanks for the answer.

Do you have the line numbers for "find the phpdigCompareDomains function in robot_functions.php and set the else part to true," and "and set a counter in spider.php so that the phpdigSpiderAddSite function"?

I'm still new to the script and I'm not a programmer, just used to apply hacks to VB.

Also, executed a max of X? In my case, how much would that be?

Thanks for your help.

Charter
02-07-2005, 03:39 AM
Setting a counter would require a modification to the code. If you set PHPDIG_IN_DOMAIN to true and set the else part of the phpdigCompareDomains function to true, PhpDig should crawl on/off-site links, assuming your resources can handle it. Maybe you would rather just list the URIs in the textbox in the admin panel for those sites you'd like to index?

WebSpider
02-07-2005, 03:43 AM
The thing is:

I own a directory, full of pages full of links to other sites...

Since it's a thematic directory, I just can't manually add 4 thousand links to spider, manually.

Rather than that I'd preffer the spider to spider my own directory PLUS each of the 4 thousand links but without following ANY of the links in those 4 thousand links.

Could u provide (payed, if u want) stepbystep instructions to achieve that?

Charter
02-07-2005, 07:18 PM
Moved to Mod Requests...