View Single Post
Old 08-07-2005, 05:20 AM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig 1.8.8 RC1 supports multiple and multibyte encodings, assuming you meet these requirements. The link you posted (http://www.dbmaker.com.tw/webprog/ap/discuss/index.php3) has charset=zh-tw and while zh-tw is not on the below encoding list, it is considered traditional so you can try to index your forum by setting the following:
Code:
// in config.php file
define('DETECT_ORDER','BIG-5,ASCII');

// in robot_functions.php file
if (mb_eregi("zh-tw",$charset_name)) {
  $charset_name = "BIG-5";
}
This of course assumes that zh-tw and big-5 are equivalent. One other thing to note is that your robots.txt file (http://www.dbmaker.com.tw/robots.txt) contains the following:
Code:
# /robots.txt for http://www.automatrix.com/
# See http://web.nexor.co.uk/mak/doc/robots/norobots.html

# by default
User-agent: *
Disallow: /cgi-bin/             # dynamic
Disallow: /demos/               # not for general consumption
Disallow: /images/              # useless images
Disallow: /icons/               # useless images
Disallow: /pic/			# useless images
Disallow: /service/discuss-area/ # dynamic
Disallow: /webprog/		# dynamic

#
#Disallow: /~skip/volkswagen/    # going away
#Disallow: /moec/        # gone
#Disallow: /panzl/       # gone
#Disallow: /old-conferences/                     # deprecated
#
#User-agent: Musi-Cal-Tour-Fetcher
#Disallow:
#
#User-agent: WebFast-robot/0.1
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/python/
#
#This example indicates that no robots should visit this site further: 
#
# go away
#User-agent: *
#Disallow: /
So in general you are disallowing spiders from the webprog directory. Anyway, here is the encoding list...
Code:
cp037   cp856  cp875        iso-8859-4  symbol           windows-1257
cp1006  cp857  gsm0338      iso-8859-5  turkish          windows-1258
cp1026  cp860  iso-8859-1   iso-8859-6  us-ascii         x-mac-ce
cp424   cp861  iso-8859-10  iso-8859-7  us-ascii-quotes  x-mac-cyrillic
cp437   cp862  iso-8859-11  iso-8859-8  windows-1250     x-mac-greek
cp500   cp863  iso-8859-13  iso-8859-9  windows-1251     x-mac-icelandic
cp737   cp864  iso-8859-14  koi8-r      windows-1252     x-mac-roman
cp775   cp865  iso-8859-15  koi8-u      windows-1253     zdingbat
cp850   cp866  iso-8859-16  mazovia     windows-1254
cp852   cp869  iso-8859-2   nextstep    windows-1255
cp855   cp874  iso-8859-3   stdenc      windows-1256

ucs-4      utf-16le      byte2be         euc-tw
ucs-4be    utf-7         byte2le         cp950
ucs-4le    utf7-imap     byte4be         big-5
ucs-2      utf-8         byte4le         euc-kr
ucs-2be    ascii         base64          uhc
ucs-2le    euc-jp        html-entities   iso-2022-kr
utf-32     sjis          7bit
utf-32be   eucjp-win     8bit
utf-32le   sjis-win      euc-cn
utf-16     iso-2022-jp   cp936
utf-16be   jis           hz
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote