Hi. Thanks. When PhpDig spiders an ISO-8859-7 page, it sees characters like the following:
Code:
english test spider åëëç*éêÜ ôåóô áñÜ÷*ç áâãäåæçèéêëì*ïðñóôõö÷øù áâã
Because PhpDig currenly supports only ISO-8859-1 and ISO-8859-2, it does not know how to convert the above ASCII characters to the following characters that get displayed in the browser:
Code:
english test spider ελληνικά τεστ αράχνη αβγδεζηθικλμνοπρστυφχψω αβγ
The $phpdig_string_subst and $phpdig_words_chars variables are available to setup another ISO-8859 but only if the language can be mapped one-to-one with Latin counterparts.
Of course, this one-to-one mapping cannot be done with a variety of languages and so PhpDig does not convert those languages correctly.
Just as a test, if you are using PhpDig on ISO-8859-7 pages only, set the following in the config.php file and then do a crawl:
PHP Code:
define('PHPDIG_ENCODING','iso-8859-7');
// give functions something trivial to do
$phpdig_string_subst['iso-8859-7'] = 'A:A,a:a';
// remove word wrapping in the below line
$phpdig_words_chars['iso-8859-7'] = '[:alnum:]µ¶¸¹º¼¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞß*áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ';
With ISO-8859-7 set, the browser should pass the search query into PhpDig as (extended) ASCII characters. The other thing to check is to see how Client characterset and Server characterset are set.
This can be done via shell. Just go to the MySQL prompt and type status and MySQL will output the info. What are your Client characterset and Server characterset set to?
If you are not able to check the setting of Client characterset and Server characterset, then take a look at the new table entries via phpMyAdmin after doing a crawl with the above changes. Are the words and characters stored as (extended) ASCII? Also, how are the new search results?