View Single Post
Old 12-10-2004, 12:19 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
If you set your encoding to iso-8859-1, then you should crawl pages with the same encoding. It is not advisable to replace ereg_replace with str_replace! Try the below test.
PHP Code:
<?php
error_reporting
(E_ALL);
$text "abc 123 '\"._~@#$:&%/;,=- [abc] AÀÁÂÃÄÅ [123] ðþßµ";
define('PHPDIG_ENCODING','iso-8859-1');
$phpdig_words_chars['iso-8859-1'] = '[:alnum:]ðþßµ';
$encoding PHPDIG_ENCODING;
$text ereg_replace('[^'.$phpdig_words_chars[$encoding].' \'._~@#$:&%/;,=-]+',' ',$text);
$text ereg_replace('(['.$phpdig_words_chars[$encoding].'])[\'._~@#$:&%/;,=-]+($|[[:space:]]$|[[:space:]]['.$phpdig_words_chars[$encoding].'])','\1\2',$text);
echo 
$text// prints abc 123 ' ._~@#$:&%/;,=-  abc  A   123  ðþßµ
?>
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote