pepevilluela
08-28-2006, 02:24 AM
PHPDig is not indexing links with accents. I'm using apache in windows XP(XAMPP from apachefriends) and I've setted PHPDig 1.8.8 in spanish (es).
Example link: http://localhost/Informatica/Documentacion/DESpool,%20códigos%20de%20departamento/index.html
Pay attention to the word código and the accent.
Microsoft explorer gets this page. PHPDig no. Answer is 403 Forbidden.
I have seen that Microsoft explorer changes ó (oacute) for %C3%B3, instead of PHP fputs, that send %B3 only.
I've tried some code just in http request, like
$pathant=$path;
$separados=explode("?",$path,2);
$separados[0]=str_replace("%3A",":",str_replace("%2F","/",urlencode(utf8_encode($separados[0]))));
$path=implode("?",$separados);
//complete get
$request =
"HEAD $path $http_scheme/1.1".END_OF_LINE_MARKER
."Host: $host$sport".END_OF_LINE_MARKER
.$cookiesSendString
.$auth_string
."Accept: */*".END_OF_LINE_MARKER
."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."Connection: close".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;
$path=$pathant;
and I have not the error 403 Forbidden,
but then spider stops with "No links in temporary table"
Example link: http://localhost/Informatica/Documentacion/DESpool,%20códigos%20de%20departamento/index.html
Pay attention to the word código and the accent.
Microsoft explorer gets this page. PHPDig no. Answer is 403 Forbidden.
I have seen that Microsoft explorer changes ó (oacute) for %C3%B3, instead of PHP fputs, that send %B3 only.
I've tried some code just in http request, like
$pathant=$path;
$separados=explode("?",$path,2);
$separados[0]=str_replace("%3A",":",str_replace("%2F","/",urlencode(utf8_encode($separados[0]))));
$path=implode("?",$separados);
//complete get
$request =
"HEAD $path $http_scheme/1.1".END_OF_LINE_MARKER
."Host: $host$sport".END_OF_LINE_MARKER
.$cookiesSendString
.$auth_string
."Accept: */*".END_OF_LINE_MARKER
."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."Connection: close".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;
$path=$pathant;
and I have not the error 403 Forbidden,
but then spider stops with "No links in temporary table"