PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 10-21-2003, 07:56 AM   #1
alexp
Green Mole
 
Join Date: Oct 2003
Posts: 5
problem indexing password-protected directories

Hi all,

I am not able to spider a directory protected by .htaccess.

I have set up a test here:

http://testt:testt@www.php-web-devel...hpdig/main.php

...but the script just shows:

Code:
SITE : http://www.php-web-development.com/
Exclude paths :
- @NONE@
1:http://www.php-web-development.com/testphpdig/main.php
(time : 00:00:00)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.php-web-development.com/testphpdig/main.php
Optimizing tables...
Indexing complete !
Indexing the same content with the .htaccess removed is no problem at all...

I'm using 1.6.2 vanilla and have tried setting PHPDIG_DEFAULT_INDEX to both true and false

I'd be grateful for any suggestions.

TIA

Alex
alexp is offline   Reply With Quote
Old 10-21-2003, 09:11 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Hmm. Perhaps instead of passing the username and password via the URL, it might work to look at the sites table in say phpMyAdmin, and for the protected site, add the username and password to that row of the table.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-21-2003, 09:58 AM   #3
alexp
Green Mole
 
Join Date: Oct 2003
Posts: 5
Hi Charter,

Thanks for your reply. I checked in phpMyAdmin and the user:pass combination had already been correctly parsed by the script and entered into the DB.

Any other ideas? If you try to index:
Code:
http://testt:testt@www.php-web-development.com/testphpdig/main.php
..on your installation, do you get any links?

Thanks again,
Alex
alexp is offline   Reply With Quote
Old 10-21-2003, 08:50 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps this is related to the problem posted here. Can you try and set self.parent.location to the absolute URL instead of the relative URL and see if that works?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-22-2003, 04:04 AM   #5
alexp
Green Mole
 
Join Date: Oct 2003
Posts: 5
Hi Charter,

I think I've worked out the problem....

It's not related to relative META and JS links - the same "site" spiders fine without the .htaccess

In fact, this is now spidering fine:

http://testt:testt@www.php-web-development.com/testphpdig/main.php

BUT this isn't:
http://test%40domain.com:test@www.php-web-development.com/testphpdig/main.php

and nor is this:

http://test@domain.com:test@www.php-web-development.com/testphpdig/main.php

The first version sends an escaped "%40" so gets "access denied" as the incorrect user. The second example parses as "domain.com"

....so is there no way of sending an @ sign as part of a username?

Thanks for all your help...

Alex
alexp is offline   Reply With Quote
Old 10-22-2003, 05:09 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Ooh, do tell what you did to get it to work.

I can understand the %40 not working, but the @ sounds like a regex issue. With http://test@domain.com:test@www.php-web-development.com/testphpdig/main.php is the username and password in the sites table now blank?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-22-2003, 05:24 AM   #7
alexp
Green Mole
 
Join Date: Oct 2003
Posts: 5
Hi,

Quote:
Ooh, do tell what you did to get it to work.
Hmm wish I knew. I tried it again. It worked. Sorry

Trying this in the spider box:

http://test@domain.com:test@php-web-development.com/testphpdig/main.php

...attempts to spider http://domain.com/

phpMyadmin says this:

15 http://www.php-web-development.com/ 20031022121122 test%40domain.com test 0 0

16 http://domain.com/ 20031022132030 test 0 0


(so "test" is interpreted as the username, with the pw blank)

Cheers,
Alex
alexp is offline   Reply With Quote
Old 10-22-2003, 05:38 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Ah, okay, that'll help me track it down. I'll keep you posted.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-22-2003, 05:41 AM   #9
alexp
Green Mole
 
Join Date: Oct 2003
Posts: 5
I appreciate it...


You're welcome to use my test site to test with if you want.

http://www.php-web-development.com/testphpdig/main.php

the two valid user/pass combos are:

testt:testt

and

test@domain.com:test


The site is identical to the root domain, except for the .htaccess.

Thanks again,
Alex
alexp is offline   Reply With Quote
Old 11-26-2003, 12:56 AM   #10
Loewenherz
Green Mole
 
Join Date: Sep 2003
Posts: 14
Hi,

I have a problem with protected sites too. Maybe, I don't understand the tipps above (my english is not the best) phpdig 1.6.4 says:

Warning: file( http://...@www.vdoh.de/robots.txt): failed to open stream: No such file or directory in /is/htdocs/xyz/www.vdoh.de/inc/search/admin/robot_functions.php on line 553

Warning: Variable passed to each() is not an array or object in /is/htdocs/30981/www.vdoh.de/inc/search/admin/robot_functions.php on line 554
SITE : http://www.vdoh.de/
Exclude paths :
- @NONE@
(time : 00:00:00)
No link in temporary table
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete ! [Back] to admin interface.

The URL to index is like:
http://username:password@www.vdoh.de/index.php

username and password are in the .htaccess
Loewenherz is offline   Reply With Quote
Old 11-26-2003, 09:27 AM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What do the .htaccess and .htpasswd files look like?

The .htaccess file should have something in it like so:
Code:
AuthUserFile /full/path/to/.htpasswd
AuthGroupFile /dev/null
AuthName "Restricted Area"
AuthType Basic

require user Username
The .htpasswd file should have something in it like so:
Code:
Username:a1b2c3d4e5f6g
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-27-2003, 01:08 PM   #12
Loewenherz
Green Mole
 
Join Date: Sep 2003
Posts: 14
Quote:
username and password are in the .htaccess
Oh sorry, username and password are in the .htpasswd, naturellement.

Okay, what can be the problem?
Loewenherz is offline   Reply With Quote
Old 11-27-2003, 01:25 PM   #13
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What HTML source output do you get when you run the following script?
PHP Code:
<?php
$site 
'http://www.vdoh.de/';
$robots file($site.'robots.txt');
for (
$i=0$i<count($robots); $i++) {
echo 
$robots[$i].'<br>';
}
?>
I get the following:
Code:
User-agent:*
<br>
<br>Allow: /
<br>
<br>
<br>
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 11-27-2003, 02:15 PM   #14
Loewenherz
Green Mole
 
Join Date: Sep 2003
Posts: 14
Yes, this was a test today.

The really content of robots.txt is:
User-agent: *
Disallow:
Loewenherz is offline   Reply With Quote
Old 11-27-2003, 02:56 PM   #15
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Please run the following. It may help me determine the problem.
PHP Code:
<?php
$site 
'http://user:pass@www.vdoh.de/';
$robots file($site.'robots.txt');
for (
$i=0$i<count($robots); $i++) {
echo 
$robots[$i].'<br>';
}
?>
What do you get? I get the following when viewing the HTML source:
Code:
User-agent: *
<br>Disallow:<br>
Also, do the username and password that you are using in the URL match those that are in the sites table for this site?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Indexing Password Protected pages (using session variables) apetersen How-to Forum 1 03-27-2007 05:18 AM
Hiding Password Protected Pages joannelee How-to Forum 2 03-15-2005 11:07 AM
Highlight password-protected content with padlock? mr_ruskin How-to Forum 1 12-15-2004 04:59 AM
For Those With Password Protected Sites bbenson How-to Forum 3 09-16-2004 05:25 PM
install password protected? mistafeesh Script Installation 3 11-11-2003 12:54 AM


All times are GMT -8. The time now is 02:32 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.