02-20-2004, 11:38 AM
hey all,

I think I have found a bug in PHPDig were it is not passing cookies for all requests.

When PHPDig does a HEAD request to a file, it passes all the cookies properly, but when it actually goes to read the file it uses the PHP file(...) function instead of doing a proper GET request. (admin/robot_functions.php, phpdigTempFile(...) function, line 682)

Why does this matter? Well, I have a multilingual site that uses the PHP session to find the current language setting to render the appropriate text. By not sending cookies, I cannot restore my session to find the proper language setting and thus always get the default text.

I have coded a fix for myself to solve my problem by creating a phpdigfile(...) function that executes a proper GET request instead of using the file() function.

Has anyone else run into this problem?

02-20-2004, 05:25 PM
Hi. I haven't personally come across this issue.

>> I have coded a fix for myself to solve my problem...

Mind sharing?

02-21-2004, 10:10 AM
Sure no problems, here is the function I wrote. I cut / pasted most of the code from your other functions. I'm sure some of this stuff can be broken out into utility functions to avoid the redundancy.

I don't think this is the perfect solution. I have not had time to debug further yet, but the spidering is awfully slow. Somewhere around 20 seconds per page - yikes!

I also noticed that on my platform, (Linux + Apache) the phpDigSetHeaders(...) function doesn't seem to be working properly. The function ini_set's user_agent but if I error_log(ini_get('user_agent')) on the line after it comes up blank... strange... maybe this is why I'm experiencing this whole cookie problem in the first place...

Anyhoot, here is the function. I put it in the robot_functions.php file, and I replaced line 682 of that same file to use it:

line 682:
$file_content = @phpdigfile($uri,$result_test['cookies']);

//Retrieves a remote page into an array. This is
//better then the PHP file(...) function because
//it passes proper headers, including cookies.
function phpdigfile($_url, $_cookies=array())
$retfile = array();
$components = parse_url($_url);

if (isset($components['host'])) {
$host = $components["host"];
if (isset($components['user']) && isset($components['pass']) &&
$components['user'] && $components['pass']) {
$auth_string = 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass']).END_OF_LINE_MARKER;
else {
$host = '';

if (isset($components['port'])) {
$port = (int)$components["port"];
else {
$port = 80;

if (isset($components['path'])) {
$path = $components["path"];
else {
$path = '';

if (isset($components['query'])) {
$query = $components["query"];
else {
$query = '';

$fp = @fsockopen($host,$port);

if (!$fp)
error_log('Failed to open socket');
return $retfile;

$path = str_replace("//","/",$path);

$cookiestosend = phpDigMakeCookies($_cookies, $path);

//complete get
$request =
."Host: $host$sport".END_OF_LINE_MARKER
."Accept: */*".END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;

$bHeaderDone = false;

while (!feof($fp))
$str = fgets($fp, 8192);

if (!eregi('[a-z0-9]+',$str))
$bHeaderDone = true;

$retfile[] = $str;

return $retfile;

02-21-2004, 04:30 PM
Hi there,

I think you forgot to escape a few "'s :)

."User-Agent: PhpDig/".PHPDIG_VERSION." (+<a href=\"http://www.phpdig.net/robot.php\" target=\"_blank\">http://www.phpdig.net/robot.php</a>)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;

I think this would work better. Otherwise I am getting a parse error.

Mr. L

02-21-2004, 07:47 PM
heh good catch -- it appears that this forum software took the liberty of inserting some code on me. There actually shouldn't be any href around the url at all.

To make things easier, I can send anyone interested a patch against v1.8, let me know.


05-04-2004, 10:20 AM
Hi fredh, what's the current status of this mod? Have a file to attach?

05-04-2004, 06:47 PM
you bet! I have attached a patch for the robot_functions.php file and I have attached the snoopy GPL library that I used to implement the fix.

Download the file here: phpdig1.8-fredh-patch.zip (http://www.harbellinternet.com/phpdig1.8-fredh-patch.zip)

Simply copy the Snoopy.class.php file into the /includes dir and apply the patch.

Let me know if I can be of any more help, thanks!