View Single Post
Old 01-15-2004, 08:26 AM   #22
Edomondo
Orange Mole
 
Edomondo's Avatar
 
Join Date: Jan 2004
Location: In outer space
Posts: 37
Quote:
Originally posted by Charter
Hi. There is a ¤¢¤ combo in the $string variable where ¢¤ is not replaced with a space. Did you mean something else?
Errrr... I'm not sure I understand.
The script you submitted use a regular expression to prevent replacing ¢¤ if the before is ¤, right?
I meant, in the case where the character before ¢¤ is really a multi-byte character ending with ¤, ¢¤ is not replaced. But I think this has a few chance to happen.

Quote:
Originally posted by Charter
>> The script extract the longest matching word from the page text and index it.

With the mutli-byte dictionary, is it that only the longest matching word from a page gets indexed?
No of course, it will extract all the words comparing the page content with the longest words first. Ex : in English, it wouldn't extract "nation" from "internationalization" if "internationalization" is in the dictionnary.
But the dictionnary must be as complete as possible to do a good job.
Can it be integrated to phpdig?
Edomondo is offline   Reply With Quote