PDA

View Full Version : IF phpdig is right on search from a word of a *.php or *.asp


tian
08-06-2005, 06:35 PM
then i will pay for usd:50,
if no, i dont see it good for my wish.
can someone answer me?
i had try, seems not the result i want.

tian
08-06-2005, 06:37 PM
OH! seems it only useful on english site.
but no use on chinese site.

tian
08-06-2005, 06:42 PM
OK, this is a good try!

please add an adress "http://www.dbmaker.com.tw/webprog/ap/discuss/index.php3"

then see if phpdig show a keyword result from that page.


im sorry but i didnt install phpdig sucess.

Charter
08-07-2005, 05:20 AM
PhpDig 1.8.8 RC1 supports multiple and multibyte encodings, assuming you meet these (http://www.phpdig.net/forum/showthread.php?t=1789) requirements. The link you posted (http://www.dbmaker.com.tw/webprog/ap/discuss/index.php3) has charset=zh-tw and while zh-tw is not on the below encoding list, it is considered traditional so you can try to index your forum by setting the following:

// in config.php file
define('DETECT_ORDER','BIG-5,ASCII');

// in robot_functions.php file
if (mb_eregi("zh-tw",$charset_name)) {
$charset_name = "BIG-5";
}

This of course assumes that zh-tw and big-5 are equivalent. One other thing to note is that your robots.txt file (http://www.dbmaker.com.tw/robots.txt) contains the following:

# /robots.txt for http://www.automatrix.com/
# See http://web.nexor.co.uk/mak/doc/robots/norobots.html

# by default
User-agent: *
Disallow: /cgi-bin/ # dynamic
Disallow: /demos/ # not for general consumption
Disallow: /images/ # useless images
Disallow: /icons/ # useless images
Disallow: /pic/ # useless images
Disallow: /service/discuss-area/ # dynamic
Disallow: /webprog/ # dynamic

#
#Disallow: /~skip/volkswagen/ # going away
#Disallow: /moec/ # gone
#Disallow: /panzl/ # gone
#Disallow: /old-conferences/ # deprecated
#
#User-agent: Musi-Cal-Tour-Fetcher
#Disallow:
#
#User-agent: WebFast-robot/0.1
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/python/
#
#This example indicates that no robots should visit this site further:
#
# go away
#User-agent: *
#Disallow: /

So in general you are disallowing spiders from the webprog directory. Anyway, here is the encoding list...

cp037 cp856 cp875 iso-8859-4 symbol windows-1257
cp1006 cp857 gsm0338 iso-8859-5 turkish windows-1258
cp1026 cp860 iso-8859-1 iso-8859-6 us-ascii x-mac-ce
cp424 cp861 iso-8859-10 iso-8859-7 us-ascii-quotes x-mac-cyrillic
cp437 cp862 iso-8859-11 iso-8859-8 windows-1250 x-mac-greek
cp500 cp863 iso-8859-13 iso-8859-9 windows-1251 x-mac-icelandic
cp737 cp864 iso-8859-14 koi8-r windows-1252 x-mac-roman
cp775 cp865 iso-8859-15 koi8-u windows-1253 zdingbat
cp850 cp866 iso-8859-16 mazovia windows-1254
cp852 cp869 iso-8859-2 nextstep windows-1255
cp855 cp874 iso-8859-3 stdenc windows-1256

ucs-4 utf-16le byte2be euc-tw
ucs-4be utf-7 byte2le cp950
ucs-4le utf7-imap byte4be big-5
ucs-2 utf-8 byte4le euc-kr
ucs-2be ascii base64 uhc
ucs-2le euc-jp html-entities iso-2022-kr
utf-32 sjis 7bit
utf-32be eucjp-win 8bit
utf-32le sjis-win euc-cn
utf-16 iso-2022-jp cp936
utf-16be jis hz