PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 07-28-2005, 09:45 AM   #1
pascalp
Green Mole
 
Join Date: Jul 2005
Posts: 14
Capitals and accents

Hi,

I work in 'iso-8859-1' encoding.
Is it possible to configure a 'match case' option ?

For instance I search for "Truck" ... I only find "Truck" pages and not "truck".

I have the same question for accents.

Thank you for your answer.
pascalp is offline   Reply With Quote
Old 07-29-2005, 09:08 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Untested, so do a backup, and try running the following query:
Code:
ALTER TABLE keywords MODIFY keyword VARCHAR(64) BINARY;
And try replacing eregi with ereg in the search_function.php file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-30-2005, 01:21 AM   #3
pascalp
Green Mole
 
Join Date: Jul 2005
Posts: 14
That might be a part of the problem...
but the thing might also come from indexing because the keywords stored in mysql don't contain any capital or accent.

What do you propose for indexing with keeping accents and capitals ?

Thanx
pascalp is offline   Reply With Quote
Old 07-30-2005, 07:12 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try the following in the config file and do a test index:
Code:
define('PHPDIG_ENCODING','iso-8859-1');
$phpdig_string_subst['iso-8859-1'] = 'Q:Q,q:q';
$phpdig_words_chars['iso-8859-1'] = '[:alnum:]ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß*áâãäåæçèéêëì*îïðñòóôõö÷øùúûüýþÿµ';
PhpDig 1.8.7 originally stripped accents from characters. Playing around with PHPDIG_ENCODING, $phpdig_string_subst, and $phpdig_words_chars may allow accents, but if you meet these requirements, you might want to try PhpDig 1.8.8 RC1 as it allows for more encodings:
Code:
cp037   cp856  cp875        iso-8859-4  symbol           windows-1257
cp1006  cp857  gsm0338      iso-8859-5  turkish          windows-1258
cp1026  cp860  iso-8859-1   iso-8859-6  us-ascii         x-mac-ce
cp424   cp861  iso-8859-10  iso-8859-7  us-ascii-quotes  x-mac-cyrillic
cp437   cp862  iso-8859-11  iso-8859-8  windows-1250     x-mac-greek
cp500   cp863  iso-8859-13  iso-8859-9  windows-1251     x-mac-icelandic
cp737   cp864  iso-8859-14  koi8-r      windows-1252     x-mac-roman
cp775   cp865  iso-8859-15  koi8-u      windows-1253     zdingbat
cp850   cp866  iso-8859-16  mazovia     windows-1254
cp852   cp869  iso-8859-2   nextstep    windows-1255
cp855   cp874  iso-8859-3   stdenc      windows-1256

ucs-4      utf-16le      byte2be         euc-tw
ucs-4be    utf-7         byte2le         cp950
ucs-4le    utf7-imap     byte4be         big-5
ucs-2      utf-8         byte4le         euc-kr
ucs-2be    ascii         base64          uhc
ucs-2le    euc-jp        html-entities   iso-2022-kr
utf-32     sjis          7bit
utf-32be   eucjp-win     8bit
utf-32le   sjis-win      euc-cn
utf-16     iso-2022-jp   cp936
utf-16be   jis           hz
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-31-2005, 08:59 AM   #5
pascalp
Green Mole
 
Join Date: Jul 2005
Posts: 14
Thanx for your reply.
I tried reindexing with these new params :
- it works with accents
- it doesn't work with capitals...

Any idea ?
pascalp is offline   Reply With Quote
Old 07-31-2005, 09:06 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try the following query:
Code:
ALTER TABLE keywords MODIFY keyword VARCHAR(64) BINARY;
And change eregi to ereg in search_function.php and phpdig_functions.php, and do another test index.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-31-2005, 09:23 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
One other thing...

In phpdig_functions.php, in the phpdigEpureText function, find:
Code:
$text = phpdigStripAccents(strtolower ($text));
//no-latin upper to lowercase - now islandic
switch (PHPDIG_ENCODING) {
   case 'iso-8859-1':
   $text = strtr( $text,'ÐÞ','ðþ');
   break;
}
And replace with:
Code:
$text = phpdigStripAccents($text);
//no-latin upper to lowercase - now islandic
/*
switch (PHPDIG_ENCODING) {
   case 'iso-8859-1':
   $text = strtr( $text,'ÐÞ','ðþ');
   break;
}
*/
And also remove the other instances of strtolower from search_function.php and phpdig_functions.php, and do another test index.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 07-31-2005, 10:52 AM   #8
pascalp
Green Mole
 
Join Date: Jul 2005
Posts: 14
Thanks for all.
It works fine now, did'nt need to change eregi...
Why modify the type of 'keyword' field into 'binary' ?
pascalp is offline   Reply With Quote
Old 07-31-2005, 11:34 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Binary is for case sensitivity: http://dev.mysql.com/doc/mysql/en/case-sensitivity.html
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -8. The time now is 12:43 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.