Sure no problems, here is the function I wrote. I cut / pasted most of the code from your other functions. I'm sure some of this stuff can be broken out into utility functions to avoid the redundancy.
I don't think this is the perfect solution. I have not had time to debug further yet, but the spidering is awfully slow. Somewhere around 20 seconds per page - yikes!
I also noticed that on my platform, (Linux + Apache) the phpDigSetHeaders(...) function doesn't seem to be working properly. The function ini_set's user_agent but if I error_log(ini_get('user_agent')) on the line after it comes up blank... strange... maybe this is why I'm experiencing this whole cookie problem in the first place...
Anyhoot, here is the function. I put it in the robot_functions.php file, and I replaced line 682 of that same file to use it:
line 682:
PHP Code:
$file_content = @phpdigfile($uri,$result_test['cookies']);
PHP Code:
//=================================================
//Retrieves a remote page into an array. This is
//better then the PHP file(...) function because
//it passes proper headers, including cookies.
function phpdigfile($_url, $_cookies=array())
{
$retfile = array();
$components = parse_url($_url);
if (isset($components['host'])) {
$host = $components["host"];
if (isset($components['user']) && isset($components['pass']) &&
$components['user'] && $components['pass']) {
$auth_string = 'Authorization: Basic '.base64_encode($components['user'].':'.$components['pass']).END_OF_LINE_MARKER;
}
}
else {
$host = '';
}
if (isset($components['port'])) {
$port = (int)$components["port"];
}
else {
$port = 80;
}
if (isset($components['path'])) {
$path = $components["path"];
}
else {
$path = '';
}
if (isset($components['query'])) {
$query = $components["query"];
}
else {
$query = '';
}
$fp = @fsockopen($host,$port);
if (!$fp)
{
error_log('Failed to open socket');
return $retfile;
}
$path = str_replace("//","/",$path);
$cookiestosend = phpDigMakeCookies($_cookies, $path);
//complete get
$request =
"GET $path HTTP/1.1".END_OF_LINE_MARKER
."Host: $host$sport".END_OF_LINE_MARKER
.$auth_string
.$cookiestosend
."Accept: */*".END_OF_LINE_MARKER
."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+[url]http://www.phpdig.net/robot.php[/url])".END_OF_LINE_MARKER.END_OF_LINE_MARKER;
fputs($fp,$request);
$bHeaderDone = false;
while (!feof($fp))
{
$str = fgets($fp, 8192);
if (!eregi('[a-z0-9]+',$str))
{
$bHeaderDone = true;
continue;
}
if($bHeaderDone)
{
$retfile[] = $str;
}
}
@fclose($fp);
return $retfile;
}