View Single Post
Old 10-13-2003, 03:39 PM   #16
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi Rolandks. The bug is that strip_tags is more lenient than before, meaning that certain things that used to be stipped are no longer. With preg_replace('/<.*>/sU', '', $text); and eregi_replace("<[^>]*>","",$text); everything between the < and > should be stripped. My personal preference is to use eregi_replace("<[^>]*>","",$text); over preg_replace('/<.*>/sU', '', $text); but I don't want to keep using strip_tags($text); because of problems encountered.

Hi manute. What version of PhpDig are you running? In robot_functions.php, the phpdigCleanHtml function in version 1.6.2 is as follows:
PHP Code:
function phpdigCleanHtml($text) {
//htmlentities
global $spec;

//replace blank characters by spaces
$text ereg_replace("[\\r\\n\\t]+"," ",$text);

//extracts title
if ( eregi("<title *>([^<>]*)</title *>",$text,$regs) ) {
    
$title $regs[1];
}
else {
    
$title "";
}
//delete content of head, script, and style tags
$text eregi_replace("<head[^<>]*>.*</head>"," ",$text);
$text eregi_replace("<script[^>]*>.*</script>"," ",$text);
$text eregi_replace("<style[^>]*>.*</style>"," ",$text);
// clean tags
$text eregi_replace("(</?[a-z0-9 ]+>)",'\\1 ',$text);

//tries to replace htmlentities by ascii equivalent
foreach ($spec as $entity => $char) {
      
$text eregi_replace ($entity."[;]?",$char,$text);
      
$title eregi_replace ($entity."[;]?",$char,$title);
}
$text ereg_replace('&#([0-9]+);',chr('\\1').' ',$text);

//replace blank characters by spaces
$text eregi_replace("--|[{}();\\"]+|</[a-z0-9]+>|[rnt]+",' ',$text);

//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\\1\\2',$text);

//replace any group of blank characters by an unique space
$text = ereg_replace("
[[:blank:]]+"," ",eregi_replace("<[^>]*>","",$text));

$retour['content'] = $text;
$retour['title'] = $title;
return $retour;

and in config.php, the $spec array in version 1.6.2 is as follows:
PHP Code:
//----------HTML ENTITIES
$spec = array( "&amp" => "&",
               
"&agrave" => "*",
               
"&egrave" => "è",
               
"&ugrave" => "ù",
               
"&oacute;" => "ó",
               
"&eacute" => "é",
               
"&icirc" => "î",
               
"&ocirc" => "ô",
               
"&ucirc" => "û",
               
"&ecirc" => "ê",
               
"&ccedil" => "ç",
               
"&#156" => "oe",
               
"&gt" => " ",
               
"&lt" => " ",
               
"&deg" => " ",
               
"&apos" => "'",
               
"&quot" => " ",
               
"&acirc" => "â",
               
"&iuml" => "ï",
               
"&euml" => "ë",
               
"&auml" => "ä",
               
"&ouml" => "ö",
               
"&uuml" => "ü",
               
"&nbsp" => " ",
               
"&szlig" => "ß",
               
"&iacute" => "*",
               
"&reg" => " ",
               
"&copy" => " ",
               
"&aacute" => "á",
               
"&Aacute" => "Á",
               
"&eth" => "ð",
               
"&ETH" => "Ð",
               
"&Eacute" => "É",
               
"&Iacute" => "Í",
               
"&Oacute" => "Ó",
               
"&uacute" => "ú",
               
"&Uacute" => "Ú",
               
"&THORN" => "Þ",
               
"&thorn" => "þ",
               
"&Ouml" => "Ö",
               
"&aelig" => "æ",
               
"&AELIG" => "Æ",
               
"&aring" => "å",
               
"&Aring" => "Å",
               
"&oslash" => "ø",
               
"&Oslash" => "Ø"
               
); 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote