PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 05-24-2004, 05:59 AM   #1
ciaran@clissman
Green Mole
 
Join Date: May 2004
Posts: 10
Keeping the spider in the search directory and its subdirs

Hi,
I typically want to spider just a subdirectory and its subdirs, so that I don't want the spider to go up into the parent directory of the URL that I specify.

e.g. I want to index all of www.myplace.com/searchme

The starting point is www.myplace.com/searchme/index.html
I want all the other stuff in /searchme to be indexed, but I don't want www.myplace.come/donttouch, EVEN THOUGH there is a link from /searchme/index.html to /donttouch/index.html.

IS there a way to tell PHPdig not to 'go up' in the directory hierarchy ?

Thanks a lot !
ciaran@clissman is offline   Reply With Quote
Old 05-24-2004, 06:17 AM   #2
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Welcome to the forum, ciaran@clissman.

In the includes/config.php file, find the following statement:
PHP Code:
define('PHPDIG_IN_DOMAIN',false); 
and replace it with this:
PHP Code:
define('PHPDIG_IN_DOMAIN',true); 
vinyl-junkie is offline   Reply With Quote
Old 05-24-2004, 06:42 AM   #3
ciaran@clissman
Green Mole
 
Join Date: May 2004
Posts: 10
Thanks, Pat,

but it's not doing what I expect.

I ask it to index http://www.waterfordcity.ie/library/ballybricken.htm with a search depth of 3 and with
define('PHPDIG_IN_DOMAIN',true);


and the first few results are

SITE : http://www.waterfordcity.ie/
Exclude paths :
- @NONE@
1:http://www.waterfordcity.ie/library/ballybricken.htm
(time : 00:00:07)
+ + + + + +
level 1...
2:http://www.waterfordcity.ie/library/
(time : 00:00:20)

3:http://www.waterfordcity.ie/gallery.htm
(time : 00:00:28)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
4:http://www.waterfordcity.ie/environment/index.htm
(time : 00:00:37)
+
5:http://www.waterfordcity.ie/planning/index.htm
(time : 00:00:46)
+ +

While really what I want is that everything in http://www.waterfordcity.ie/library is indexed and nothing else

Any thoughts ?

Thanks again !

Ciaran
ciaran@clissman is offline   Reply With Quote
Old 05-24-2004, 07:22 AM   #4
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Sorry, I misunderstood what you were asking. You would like for phpdig to stay in a specific directory when spidering, right? In that case, this thread has what you need.
vinyl-junkie is offline   Reply With Quote
Old 05-24-2004, 07:28 AM   #5
ciaran@clissman
Green Mole
 
Join Date: May 2004
Posts: 10
Hmm, we're not there yet.

The sites I crawling aren't mine, so I can't put robot.txt files into them.

Is there not a function someplace that says
' if the directory of the page you are thinking about indexing is the parent directory of the page you were started at, leave it alone (or not, depending on the config variable)' ?

thanks again

Ciaran
ciaran@clissman is offline   Reply With Quote
Old 05-24-2004, 07:40 AM   #6
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
To my knowledge, the method that I gave you is the only way you can have phpdig stay within the directory you specify. I'm not sure what new features may end up in version 1.8.1, but I know Charter is working on that right now. Perhaps he'll consider adding this as a feature. I know it's a subject that comes up fairly often around here.
vinyl-junkie is offline   Reply With Quote
Old 05-24-2004, 07:44 AM   #7
ciaran@clissman
Green Mole
 
Join Date: May 2004
Posts: 10
Good enough. Thanks for the tips !

Ciaran [sunny Dublin, Ireland, quarter to five in the afternoon]
ciaran@clissman is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search in specific directory laurentxav How-to Forum 7 01-04-2005 08:34 AM
Man convicted of keeping accidentally mailed wages Charter The Mole Hole 1 09-27-2004 01:35 PM
Specific Directory Search kh44na How-to Forum 3 04-01-2004 04:52 AM
Search in specific directory tams Troubleshooting 1 03-15-2004 02:08 AM
Search in specific directory ONLY? mrfuches How-to Forum 6 01-22-2004 11:06 PM


All times are GMT -8. The time now is 01:03 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.