![]() |
Spidering sub-directories as the root
I'm interested in getting the spider function, not just the search function, to treat subdirectories of URLs as the root.
For example, if someone wanted to spider http://www.geocities.com/website as its own site, without scanning the true root (www.geocities.com). So far I changed this bit of code in robot_functions.php: PHP Code:
PHP Code:
PHP Code:
PHP Code:
|
hello bloodjelly
I have the same problem, and i solved it with adding this code: PHP Code:
PHP Code:
|
Thanks for the help, caco, but what I need is a mod that adds links to the database exactly as entered, either with a subdirectory or not. In other words, if I wanted to spider "http://www.mysite.com/directory" as a root, I could do it, and if I wanted to spider "http://www.mysite.com" as a root I could do that too.
|
Hi. Perhaps upgrade to PhpDig version 1.8.2... :D
|
You are awesome.
|
FYI: version 1.8.3 released to allow for the 'limit to directory' option to be consistent across other control panel options, among other changes.
|
Hi charter -
I'm not sure if I'm using the limit to directory feature correctly (I have it set to "true") but when I enter a website (www.geocities.com/psychology_x/main.html for example) it spiders correctly, but the listing in the "sites" table is only for geocities. Is there a way to make each separate directory treated as its own site? Or am I missing something? Thanks. |
Hi. The issue is that foo.com/bar/ is not a separate domain from foo.com/ but rather a subdirectory of that domain. Spidering can now be limited to subdirectories, but the domain is still the domain. On the other hand, the bar.foo.com/ subdomain, while it can point to the foo.com/bar/ subdirectory, it is a third level domain and can also be treated as a separate site on a separate server. The database storage scheme is domain based, and that is why subdirectories are not stored separately but subdomains are separately stored.
|
Got it. Thanks for the explaination.
|
All times are GMT -8. The time now is 07:53 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.