PDA

View Full Version : Problems with html coments <!-- -->


uruloki
10-23-2003, 10:52 PM
When I index pages with html coments like

<!-- #begintemplate="algo" -->

the spider replace it with

< begintemplate algo >

and this is a problem because I have coments with paths to conexion entries for my DataBases
The regular expressions in robot functions that match with that kind of sentences is as it appears:

//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

Sorry for my english, thanks for suggestions.

BYE

<!-- #begintemplate="algo" -->

Rolandks
10-24-2003, 12:47 AM
It is a problem with with > PHP 4.3.2 . The following must work as possible solution: See this thread here: (http://www.phpdig.net/showthread.php?s=&threadid=140)

Change ONLY this in robot_functions.php Line 160:

//replace any group of blank characters by an unique space
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text));

to

//replace any group of blank characters by
$text = preg_replace('/<.*>/U', '', $text);


It works with PHP 4.3.2 and PhpDig 1.6.2.
NO html-comments are indexing !


-Roland-

uruloki
10-24-2003, 04:40 AM
The real problem I have is when I index an internet domain, the comments appear, and when I work with the intranet domain works well (no comment). We have PHP 4.3.2. I made that change in the order of eregi_replace in robot_functions.php.

BEFORE:
//replace blank characters by spaces
$text = eregi_replace("--|[{}();\"]+|</[a-z0-9]+>|[\r\n\t]+",' ',$text);

//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

AFTER:
//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

//replace blank characters by spaces
$text = eregi_replace("--|[{}();\"]+|</[a-z0-9]+>|[\r\n\t]+",' ',$text);

I test the change and seems to work fine. I will reindex all today and if results... I post another commentarie.

Again sorry for may english... :D