Hi Charter. Thanks for your reply but I still don't understand it (I don't have much experience...)
Correct my if I say something wrong... Delta character has decimal value 196, so phpdig spider reads 196 from the 8859-7 encoded html page, then writes it as 196 in the text file and then it converts it to the corresponding latin (below 127) character according to the phpdig_string_subst table. I cannot understand why you do not put the 196 character on the mysql table...
I want to index only iso-8859-7 pages, so I am not interested in other encodings. In the text_content directory I can read perfectly the txt files but when the greek to latin conversion takes place something goes wrong. I have tried many combinations of the phpdig_string_subst and phpdig_words_chars variables but the result isn't good.
So I came up that the only solution is to bypass the greek to latin conversion. Can you help me this? (I cannot easily find this conversion part in the phpdig code)
|