Hi. Try using
mb_eregi_replace in place of eregi_replace, but note that some of the PHP multi-byte functions are experimental.
As for a dictionary, you might try the following. In spider.php add:
PHP Code:
$my_dictionary = phpdigComWords("$relative_script_path/includes/my_dictionary.ext");
after the following:
PHP Code:
$common_words = phpdigComWords("$relative_script_path/includes/common_words.txt");
In robot_functions.php replace:
PHP Code:
if (strlen($key) > SMALL_WORDS_SIZE and strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and ereg('^[0-9a-zßðþ]',$key))
with the following:
PHP Code:
if (mb_strlen($key) > SMALL_WORDS_SIZE and mb_strlen($key) <= MAX_WORDS_SIZE and !isset($common_words[$key]) and isset($my_dictionary[$key]) and mb_ereg('^['.$phpdig_words_chars[PHPDIG_ENCODING].']',$key))
Also apply any other changes given in
this thread and use
multi-byte functions in place of their single-byte counterparts.
The thing is, of course, to make sure that things that were treated as single-byte and now treated as multi-byte. The $phpdig_words_chars and $phpdig_string_subst variables may need to be treated differently too, so that the characters are seen as multi-byte rather than single-byte.
PhpDig was originally written for single-byte use. In theory it seems that it might be able to be converted to multi-byte use, but in practice it's going to take time, tweaking, and in the end hopefully it works.