PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   accent in links (http://www.phpdig.net/forum/showthread.php?t=2567)

pepevilluela 08-28-2006 02:24 AM

accent in links
 
PHPDig is not indexing links with accents. I'm using apache in windows XP(XAMPP from apachefriends) and I've setted PHPDig 1.8.8 in spanish (es).

Example link: http://localhost/Informatica/Documen...nto/index.html

Pay attention to the word código and the accent.

Microsoft explorer gets this page. PHPDig no. Answer is 403 Forbidden.

I have seen that Microsoft explorer changes ó (oacute) for %C3%B3, instead of PHP fputs, that send %B3 only.

I've tried some code just in http request, like

$pathant=$path;
$separados=explode("?",$path,2);
$separados[0]=str_replace("%3A",":",str_replace("%2F","/",urlencode(utf8_encode($separados[0]))));
$path=implode("?",$separados);
//complete get
$request =
"HEAD $path $http_scheme/1.1".END_OF_LINE_MARKER
."Host: $host$sport".END_OF_LINE_MARKER
.$cookiesSendString
.$auth_string
."Accept: */*".END_OF_LINE_MARKER
."Accept-Charset: ".PHPDIG_ENCODING.END_OF_LINE_MARKER
."Accept-Encoding: identity".END_OF_LINE_MARKER
."Connection: close".END_OF_LINE_MARKER
."User-Agent: PhpDig/".PHPDIG_VERSION." (+http://www.phpdig.net/robot.php)".END_OF_LINE_MARKER.END_OF_LINE_MARKER;
$path=$pathant;


and I have not the error 403 Forbidden,

but then spider stops with "No links in temporary table"

pepevilluela 08-29-2006 01:56 AM

accents solved
 
I have solved my problem with accents in links. I'm spanish and I use accent in links and ñ and Ñ. I think I've fixed the problem replacing the line 218 in robot_functions.php:

$eval = str_replace(" ","%20",$eval);

with

$separados=explode("?",$eval,2);
$separados[0]=str_replace("%25","%",str_replace("%3A",":",str_replace("%2F","/",rawurlencode(utf8_encode($separados[0])))));
$eval=implode("?",$separados);

and changing in config.php the variable allowed_link_chars:

$allowed_link_chars = "[:%/?=&;\\,._a-zA-Z0-9áÁéÉ*ÍóÓúÚüÜñÑ|+ ()~-]*"; // includes space and () - not good with javascript, y acentos y guiones

and forget the previous post.

This can solve any special character, just including it in $allowed_link_chars.

I hope this help other spanish and not english people.

pepevilluela 08-29-2006 03:02 AM

Results page
 
Don't forget change search_function.php in line 518 or links will be wrong:

$timer->stop('Extracts');

$separados=explode("?",$url,2);
2F","/",rawurlencode(utf8_encode($separados[0])))));
$separados[0]=utf8_decode($separados[0]);
$url=implode("?",$separados);

$table_results[$n] = array (
'weight' => $weight,
'img_tag' => '<img border="0" src="'.WEIGHT_IMGSRC.'" width="'.ceil(WEIGHT_WIDTH*$weight/100).'" height="'.WEIGHT_HEIGHT.'" alt="" />',
'page_link' => "<a class=\"phpdig\" href=\"".$url."\" onmousedown=\"return clickit(".$n.",'".$js_url."')\" target=\"".LINK_TARGET."\" >".$title."</a>",
'limit_links' => phpdigMsg('limit_to')." ".$l_site.$l_path,
'filesize' => sprintf('%.1f',(ereg_replace('.*_([0-9]+)$','\1',$content['md5']))/1024),
'update_date' => ereg_replace('^([0-9]{4})[-]?([0-9]{2})[-]?([0-9]{2}).*',PHPDIG_DATE_FORMAT,$content['last_modified']),
'complete_path' => $url,
'link_title' => $title
);


All times are GMT -8. The time now is 02:55 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.