PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Problems with html coments <!-- --> (http://www.phpdig.net/forum/showthread.php?t=168)

uruloki 10-23-2003 10:52 PM

Problems with html coments <!-- -->
 
When I index pages with html coments like

<!-- #begintemplate="algo" -->

the spider replace it with

< begintemplate algo >

and this is a problem because I have coments with paths to conexion entries for my DataBases
The regular expressions in robot functions that match with that kind of sentences is as it appears:

//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

Sorry for my english, thanks for suggestions.

BYE

<!-- #begintemplate="algo" -->

Rolandks 10-24-2003 12:47 AM

It is a problem with with > PHP 4.3.2 . The following must work as possible solution: See this thread here:

Change ONLY this in robot_functions.php Line 160:
Code:

//replace any group of blank characters by an unique space
$text = ereg_replace("[[:blank:]]+"," ",strip_tags($text));

to
Code:

//replace any group of blank characters by
$text = preg_replace('/<.*>/U', '', $text);

It works with PHP 4.3.2 and PhpDig 1.6.2.
NO html-comments are indexing !


-Roland-

uruloki 10-24-2003 04:40 AM

What about that
 
The real problem I have is when I index an internet domain, the comments appear, and when I work with the intranet domain works well (no comment). We have PHP 4.3.2. I made that change in the order of eregi_replace in robot_functions.php.

BEFORE:
//replace blank characters by spaces
$text = eregi_replace("--|[{}();\"]+|</[a-z0-9]+>|[\r\n\t]+",' ',$text);

//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

AFTER:
//f..k <!SOMETHING tags !!
$text = eregi_replace('(<)!([^-])','\1\2',$text);

//replace blank characters by spaces
$text = eregi_replace("--|[{}();\"]+|</[a-z0-9]+>|[\r\n\t]+",' ',$text);

I test the change and seems to work fine. I will reindex all today and if results... I post another commentarie.

Again sorry for may english... :D


All times are GMT -8. The time now is 08:32 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.