PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)
-   -   IF phpdig is right on search from a word of a *.php or *.asp (http://www.phpdig.net/forum/showthread.php?t=2104)

tian 08-06-2005 06:35 PM

IF phpdig is right on search from a word of a *.php or *.asp
 
then i will pay for usd:50,
if no, i dont see it good for my wish.
can someone answer me?
i had try, seems not the result i want.

tian 08-06-2005 06:37 PM

OH! seems it only useful on english site.
but no use on chinese site.

tian 08-06-2005 06:42 PM

OK, this is a good try!

please add an adress "http://www.dbmaker.com.tw/webprog/ap/discuss/index.php3"

then see if phpdig show a keyword result from that page.


im sorry but i didnt install phpdig sucess.

Charter 08-07-2005 05:20 AM

PhpDig 1.8.8 RC1 supports multiple and multibyte encodings, assuming you meet these requirements. The link you posted (http://www.dbmaker.com.tw/webprog/ap/discuss/index.php3) has charset=zh-tw and while zh-tw is not on the below encoding list, it is considered traditional so you can try to index your forum by setting the following:
Code:

// in config.php file
define('DETECT_ORDER','BIG-5,ASCII');

// in robot_functions.php file
if (mb_eregi("zh-tw",$charset_name)) {
  $charset_name = "BIG-5";
}

This of course assumes that zh-tw and big-5 are equivalent. One other thing to note is that your robots.txt file (http://www.dbmaker.com.tw/robots.txt) contains the following:
Code:

# /robots.txt for http://www.automatrix.com/
# See http://web.nexor.co.uk/mak/doc/robots/norobots.html

# by default
User-agent: *
Disallow: /cgi-bin/            # dynamic
Disallow: /demos/              # not for general consumption
Disallow: /images/              # useless images
Disallow: /icons/              # useless images
Disallow: /pic/                        # useless images
Disallow: /service/discuss-area/ # dynamic
Disallow: /webprog/                # dynamic

#
#Disallow: /~skip/volkswagen/    # going away
#Disallow: /moec/        # gone
#Disallow: /panzl/      # gone
#Disallow: /old-conferences/                    # deprecated
#
#User-agent: Musi-Cal-Tour-Fetcher
#Disallow:
#
#User-agent: WebFast-robot/0.1
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/volkswagen/
#Disallow: /~skip/python/
#
#This example indicates that no robots should visit this site further:
#
# go away
#User-agent: *
#Disallow: /

So in general you are disallowing spiders from the webprog directory. Anyway, here is the encoding list...
Code:

cp037  cp856  cp875        iso-8859-4  symbol          windows-1257
cp1006  cp857  gsm0338      iso-8859-5  turkish          windows-1258
cp1026  cp860  iso-8859-1  iso-8859-6  us-ascii        x-mac-ce
cp424  cp861  iso-8859-10  iso-8859-7  us-ascii-quotes  x-mac-cyrillic
cp437  cp862  iso-8859-11  iso-8859-8  windows-1250    x-mac-greek
cp500  cp863  iso-8859-13  iso-8859-9  windows-1251    x-mac-icelandic
cp737  cp864  iso-8859-14  koi8-r      windows-1252    x-mac-roman
cp775  cp865  iso-8859-15  koi8-u      windows-1253    zdingbat
cp850  cp866  iso-8859-16  mazovia    windows-1254
cp852  cp869  iso-8859-2  nextstep    windows-1255
cp855  cp874  iso-8859-3  stdenc      windows-1256

ucs-4      utf-16le      byte2be        euc-tw
ucs-4be    utf-7        byte2le        cp950
ucs-4le    utf7-imap    byte4be        big-5
ucs-2      utf-8        byte4le        euc-kr
ucs-2be    ascii        base64          uhc
ucs-2le    euc-jp        html-entities  iso-2022-kr
utf-32    sjis          7bit
utf-32be  eucjp-win    8bit
utf-32le  sjis-win      euc-cn
utf-16    iso-2022-jp  cp936
utf-16be  jis          hz



All times are GMT -8. The time now is 06:49 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.