Hi. Thank you for the help, I'll test the code you submitted. Though it might not work on some rare cases (when ¤ is actually the character before ¢¤) I think I'll go for it.
BTW, other search engines are based on a dictionnary for mutli-byte encodings. The dictionnary is a txt file that contains a word per line. The script extract the longest matching word from the page text and index it.
My question is: Would it be possible to implement such a dictionnary tool in phpdig?
If so, I would be happy to build a Japanese dictionnary.
|