PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   iso-8859-7 (http://www.phpdig.net/forum/showthread.php?t=135)

mkst 10-08-2003 05:43 AM

iso-8859-7
 
Hello there!
I would like to know how I can change the iso to iso-8859-7 (greek). I read the documentation but could not understand how to set the

$phpdig_string_subst['iso-8859-7'] and

$phpdig_words_chars['iso-8859-7'] values.

Any help please??

Rolandks 10-08-2003 12:51 PM

You must define ALL - Chr: in this String
$phpdig_string_subst['iso-8859-7'] ='......here is iso-8859-7 chr ...........'

see:
http://www.softlab.ntua.gr/~sivann/xgrk/iso8859-7.html

and set:
define('PHPDIG_ENCODING','iso-8859-7');

Perhaps you found the code in ONE Line with google ?

-Roland-

mkst 10-09-2003 01:19 AM

Thanks for your reply Rolandks! :)

oK, I think I got it......
What about the:

$phpdig_words_chars['iso-8859-2'] = '[:alnum:]ðþß';

What is it used for? Will I have to change it?

Regards,
Mike

Charter 10-09-2003 04:56 PM

Hi. The $phpdig_words_chars['iso-8859-2'] = '[:alnum:]ðþß'; is for non-accented 'lowercase' letters such as the German ß (pronouced 'ess set' if I remeber correctly) for example. Sort of think of it like anything that doesn't go in $phpdig_string_subst['iso-8859-2'] might go in $phpdig_words_chars['iso-8859-2']. If you will, once you get your 'iso-8859-7' set, please post it in the Mod Submissions forum in case others might want to use it. Thanks. :)

mkst 11-26-2003 08:04 AM

Unfortuanetely I can not make it to work. :cry: :cry:

I have used something like:

$phpdig_string_subst['iso-8859-7'] = 'Á:¢,Å:¸,Ç:¹,É:ºÚ,Ï:¼,Õ:¾,Ù:¿,Ü:á,å:Ý,ç:Þ,é:ßúÀ,ï :ü,õ:ýû*,ù:þ';

I have changed the encoding to: define ('PHPDIG_ENCODING','iso-8859-7');

I think that the problem is with $phpdig_words_chars['iso-8859-1']='[:alnum:]ðþß' string. What letters do i put within the [::] characters and what letters after this?

The script searches some of the english pages that i have in the site, but does not search any greek pages. The table 'keywords' only contains english words.

I would really need some help!
ps. I am using the 1.6.2 version.

Charter 11-26-2003 10:06 AM

Hi. I found the below ASCII representation of iso-8859-7 at http://www.gar.no/home/mats/8859-7.htm.
Code:

80-9F: unassigned
// note A0 is a space
A0-BF: _¡¢£¤¥¦§¨©ª«¬_®¯°±²³´µ¶·¸¹º»¼½¾¿
C0-DF: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
E0-FF: *áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿ

When making $phpdig_string_subst['iso-8859-7'], it's like making a key value set. For example, if the Latin A is like the Greek Ά (hex B6) then the $phpdig_string_subst['iso-8859-7'] variable would start like the following:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶'

If Greek uses Á (hex C1) also like the Latin A, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶Á'

The same type of thing goes for Latin a. If Greek uses ά (hex DC) and á (hex E1) like the Latin a, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶Á,a:Üá'

The $phpdig_string_subst['iso-8859-7'] variable is for all accented or diacritic characters (basically all accented characters and those characters that do not copy paste into ASCII as the characters themsleves but rather copy paste as ASCII representations of the characters).

The $phpdig_words_chars['iso-8859-7'] variable is for lowercase non-accented characters (basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves). An example of this would be Greek µ, so it could be added to $phpdig_words_chars['iso-8859-7'] like so:
PHP Code:

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ðþßµ'

Note that it is possible to have an ASCII representaion of a character be in $phpdig_string_subst['iso-8859-7'] and also have the ASCII character itself be in $phpdig_words_chars['iso-8859-7'].

mkst 11-27-2003 05:21 AM

Thanks for your reply Charter!
...but I am still confused!! :confused: :confused: :confused: :confused:

Quote:

Originally posted by Charter
For example, if the Latin A is like the Greek Ά (hex B6) then the $phpdig_string_subst['iso-8859-7'] variable would start like the following:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶'

If Greek uses A (hex C1) also like the Latin A, then $phpdig_string_subst['iso-8859-7'] would start like the following:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶A'

.......
The $phpdig_words_chars['iso-8859-7'] variable is for lowercase non-accented characters (basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves). An example of this would be Greek µ.....
What exactly do you mean by 'is like' ? I know that latin capital A looks like the greek capital Á but this is not the case for the lower case letters or some other capital letters.

And what exactly do you mean by '(basically those lowercase non-accented characters that copy paste into ASCII as the characters themselves)' ?

I have tried something like this:
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶A,a:Üá,E:Ÿ,e:åÝ,H:ǹ,h:çÞ,I:ɺÚ,i:éßúÀ,O:ϼ,o:ïü,Y:Õ¾Û,y:õýû*,L:Ë,l:ë,N:Í,n:*,V:Ù,v:ùþ,M:Ì,m:ì,P:Ð,p:ð,X:×,x:÷,K:Ê,k:ê,B:Â,b:â,C:Ø,c:ø,G:Ã,g:ã,D:Ä,d:ä,Z:Æ,z:æ,U:È,u:è,K:Ê,k:ê,J:Î,j:î,R:Ñ,r:ñ,S:Ó,s:óò,T:Ô,t:ô,F:Ö,f:ö'

and
PHP Code:

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ðþßìòñôèóäöãîêëæ÷øâ*ð'

I have also tried different variation of the above but still could not make it to work correct.

The engine indexes the site alright but only recoginzes and prints results for part of the keyword.
Also the 'keywords' table contains words with with latin letters only. It is this allright i guess uh?

Thank you for your time Charter, and i hope i am not much of a trouble :angel:

Charter 11-27-2003 07:10 AM

Hi. I'll use a German word as an example of what I mean by the 'is like' phrase. The German word Gästebuch means Guestbook. The ä in Gästebuch 'is like' the Latin a. Such characters like ä are stored as their Latin counterparts in the database for searching. When you copy paste a character into a text editor, it will either show up as the character or some ASCII equivalent of the character. The characters that show up as the actual character are the ones that go in $phpdig_words_chars['iso-8859-7'] but no accented characters should go in $phpdig_words_chars['iso-8859-7']. All accented or diacritic characters should go in $phpdig_string_subst['iso-8859-7'].

mkst 11-28-2003 05:11 AM

Thank you for your reply Charter. It seems that i managed to create the right $phpdig_string_subst and $phpdig_words_chars.

However, I still have one problem regarding words that start with capital letter. I can only find a word that starts with certan capital letters, otherwise I get zero matches. The search works ok for lower case words.

Do you have any idea why this is happening?

Regards,
Mike

Charter 11-28-2003 05:17 AM

Hi. What are $phpdig_string_subst['iso-8859-7'] and $phpdig_words_chars['iso-8859-7'] currently set to? What capital letters are not working? Maybe there is a mismatched key value type pairing.

mkst 11-28-2003 05:25 AM

PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:Á¶,a:Üá,B:Â,b:â,G:Ã,g:ã,D:Ä,d:ä,E:Ÿ,e:åÝ,Z:Æ,z:æ,H:ǹ,h:çÞ,U:È,u:è,I:ɺÚ,i:éßúÀ,K:Ê,k:ê,L:Ë,l:ë,M:Ì,m:ì,N:Í,n:*,J:Î,j:î,O:ϼ,o:ïü,P:Ð,p:ð,R:Ñ,r:ñ,S:Ó,s:óò,T:Ô,t:ô,Y:Õ¾Û,y:õýû*,F:Ö,f:ö,X:×,x:÷,C:Ø,c:ø,V:Ù,v:ùþ'

and
PHP Code:

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]áâãäåæçèéêëì*îïðñóôõö÷øù'

I have double checked for type errors, dont think that this is the case.
Words starting with Á, ¶, Ð, Ì have no problem.

Charter 11-28-2003 05:51 AM

Hi. Of áâãäåæçèéêëì*îïðñóôõö÷øù the only ones that should be in the $phpdig_words_chars['iso-8859-7'] variable are æçðø like so:
PHP Code:

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]æçðø'

These áâãäåèéêëì*îïñóôõö÷ù are accented/diacritic characters and need to be matched up to their Latin counterparts in $phpdig_string_subst['iso-8859-7'].

mkst 11-28-2003 06:39 AM

Thanks Charter but there is no improvent. :(
It is now worse than before....

Charter 11-28-2003 09:37 AM

Hi. I am not very familiar with the Greek alphabet beyond mathematical usage. Below is what I came up with assuming that Latin A is like Greek Alpha, Latin a is like Greek alpha, and so forth. I make no claims of correctness. ;)
PHP Code:

$phpdig_string_subst['iso-8859-7'] = 'A:¶Á,a:Üá,B:Â,G:Ã,g:ã,D:Ä,
d:ä,E:¸Å,e:Ýå,Z:Æ,z:æ,I:ºÉÚ,i:Àßéú,K:Ê,k:ê,L:Ë,l:ë,M:Ì,N:Í,n:*,
X:Î,x:î,O:¼Ï,o:ïü,P:Ð,p:ð,R:Ñ,r:ñ,S:Ó,s:òó,T:Ô,t:ô,Y:¾ÕÛ,y:*õûý'
;

$phpdig_words_chars['iso-8859-7'] = '[:alnum:]ßµ'

I was not sure what to do with the following characters: Eta, eta, Theta, theta, Phi, phi, Chi, chi, Psi, psi, Omega, omega.

I also made the following assumptions: Latin G is like Greek Gamma, Latin g is like Greek gamma, Latin R is like Greek Rho, Latin r is like Greek rho, Latin Y is like Greek Upsilon, Latin y is like Greek upsilon.

As I m not very familiar with the Greek language, this is the best that I can offer. :(

mitsoskitsos 12-24-2003 01:41 AM

Hi.
I am also trying to index greek pages with encoding 8859-7 and I have some problems.
I think that the origin of the problem is that greek characters are converted to latin and then putted in the keywords table.
Why is it necessary to convert the greek characters to latin?
I think that the engine would have worked much better and more accurate without this conversion.
Is there a hack that I could apply so greek characters won't be converted to latin?


All times are GMT -8. The time now is 07:48 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.