PhpDig.net - View Single Post - Can't get PHPDig to index an htaccess protected site

mlerch@mac.com · 02-21-2004, 11:20 AM

Hi Charter,

So I did some more detailed looking into the problem. Here is what I found.

when spidering the URL that doesn't work (stalls):

I have traced it to:

In robot_functions.php

1. function phpdigDetectDir

in this function it parses the URL in to the variable $test, then it goes through an if { then } else { then } statment. In my case it it takes the ...else path because apparently the $test['query'] is set.

Since it is taking the else { then } path. In the very first line robot_functions.php tries to define following variable:

$status = phpdigTestUrl($link['url'].$link['path'].$link['file'],'date',$cookies);

This is where it seems to stall, so I checked into this function.

2. function phpdigTestUrl

it runs all the way through the "while" routine end it ends up where:
$status = "NOFILE";

at the very end of that function $mode does not seem to be 'date', so it is supposed to:

return $status;

I guess that is where it hangs.

Here are some details about the URL/website that I am trying to spider:

http://www.mydomain.com/index.php

index.php actually has in the very beginning a piece of script that checks if there is a variable string appended to index.php, and if it is formatted correctly.

If the script finds out that there is a formatting problem, or that there is no variable string at the end of .../index.php then it will grab the correct string and do a redirect to an URL like this:

http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0

Essentially when you were to go and type in the URL http://www.mydomain.com, or http://www.mydomain.com/index.php it will redirect you to:

http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0

Do you think that this is causing the problem? Please advise.

Oh yes, I actually tried to enter the URL into the PhpDig interface just like it would redirect it, but it still hangs with a NOFILE status.

Oh yes, why is $path always /robots.txt
I don't really understand it enough I guess.

Thank you very much,

Mr. L

02-21-2004, 11:20 AM	#13
mlerch@mac.com Green Mole Join Date: Feb 2004 Location: North Las Vegas, Nevada Posts: 18	Hi Charter, So I did some more detailed looking into the problem. Here is what I found. when spidering the URL that doesn't work (stalls): I have traced it to: In robot_functions.php 1. function phpdigDetectDir in this function it parses the URL in to the variable $test, then it goes through an if { then } else { then } statment. In my case it it takes the ...else path because apparently the $test['query'] is set. Since it is taking the else { then } path. In the very first line robot_functions.php tries to define following variable: $status = phpdigTestUrl($link['url'].$link['path'].$link['file'],'date',$cookies); This is where it seems to stall, so I checked into this function. 2. function phpdigTestUrl it runs all the way through the "while" routine end it ends up where: $status = "NOFILE"; at the very end of that function $mode does not seem to be 'date', so it is supposed to: return $status; I guess that is where it hangs. Here are some details about the URL/website that I am trying to spider: http://www.mydomain.com/index.php index.php actually has in the very beginning a piece of script that checks if there is a variable string appended to index.php, and if it is formatted correctly. If the script finds out that there is a formatting problem, or that there is no variable string at the end of .../index.php then it will grab the correct string and do a redirect to an URL like this: http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0 Essentially when you were to go and type in the URL http://www.mydomain.com, or http://www.mydomain.com/index.php it will redirect you to: http://www.mydomain.com/index.php?na...,1,1,1,1,1,0,0 Do you think that this is causing the problem? Please advise. Oh yes, I actually tried to enter the URL into the PhpDig interface just like it would redirect it, but it still hangs with a NOFILE status. Oh yes, why is $path always /robots.txt I don't really understand it enough I guess. Thank you very much, Mr. L Last edited by mlerch@mac.com; 02-21-2004 at 11:38 AM.