![]() |
|
![]() |
#1 |
Green Mole
Join Date: Sep 2005
Posts: 2
|
Can phpDig do the job?
Hi guys,
I'm sorry for this generic post, but i'm under a tight deadline and trying to find the appropriate solution before i get all the purchasing done... I need to index a customer's website. I must index several different "versions" of the website (the main page will have a different ID and then it must spider from then on and be able to search each of these versions differently) - basically the same website will be called from different URLs, each will have slightly customized content and the searches must be separate (they do not link to each other in the sites themselves) The problem I am having... (I installed phpdig but it doesnt index follow the links) 99.9% of the links of these websites are generated through javascript (yeah, I know...) they use basically 2 forms: 1 - inline script, an javascript function, inside the HTMLs that have function like this: Code:
mp[0] = new mPes('/home/peq/index.htm?clique=Geneuenas_Empres','Pequitas',''); mp[1] = new mPes('/home/medias/index.htm?cliq=Generi_Emprs','Médiapresas',''); mp[2] = new mPes('/home/grandes/index.htm?cliq=Generi,'Gran,''); Can anyone indicate me whether some sort of spidering option in phpDig or in another spider can handle such weird website writing? Thanks Roy |
![]() |
![]() |
![]() |
#2 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
You can try editing the following line in the phpdigExplore fuction of the robot_functions.php file:
Code:
// this line is in PhpDig v.1.8.7 while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) { // this line is in PhpDig v.1.8.8 RC1 while (mb_eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*['\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*(($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars)+))(#[.a-zA-Z0-9-]*)?['\"]?",$eval,$regs)) {
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
![]() |
![]() |
![]() |
#3 |
Green Mole
Join Date: Sep 2005
Posts: 2
|
must i pay to see a response about whether or not phpDig is right for me?
![]() |
![]() |
![]() |
![]() |
|
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How set up a cron job with PhpDig 1.8.6? | gaam | How-to Forum | 2 | 01-10-2005 12:50 AM |
Reindex without cron job? | ark2424 | How-to Forum | 8 | 12-09-2004 04:54 AM |
cron job problems | takpoli | How-to Forum | 3 | 05-12-2004 12:26 PM |
Alternative to Cron job? | jirving | Troubleshooting | 1 | 09-29-2003 04:07 PM |
cron job | David J Harmon | How-to Forum | 1 | 09-27-2003 06:20 AM |