View Single Post
Old 01-15-2004, 01:25 PM   #23
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Try using mb_eregi_replace in place of eregi_replace, but note that some of the PHP multi-byte functions are experimental.

As for a dictionary, you might try the following. In spider.php add:
PHP Code:
$my_dictionary phpdigComWords("$relative_script_path/includes/my_dictionary.ext"); 
after the following:
PHP Code:
$common_words phpdigComWords("$relative_script_path/includes/common_words.txt"); 
In robot_functions.php replace:
PHP Code:
if (strlen($key) > SMALL_WORDS_SIZE and strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and ereg('^[0-9a-zßðþ]',$key)) 
with the following:
PHP Code:
if (mb_strlen($key) > SMALL_WORDS_SIZE and mb_strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and isset($my_dictionary[$key]) and mb_ereg('^['.$phpdig_words_chars[PHPDIG_ENCODING].']',$key)) 
Also apply any other changes given in this thread and use multi-byte functions in place of their single-byte counterparts.

The thing is, of course, to make sure that things that were treated as single-byte and now treated as multi-byte. The $phpdig_words_chars and $phpdig_string_subst variables may need to be treated differently too, so that the characters are seen as multi-byte rather than single-byte.

PhpDig was originally written for single-byte use. In theory it seems that it might be able to be converted to multi-byte use, but in practice it's going to take time, tweaking, and in the end hopefully it works.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote