PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 09-15-2005, 11:23 AM   #1
RoyN
Green Mole
 
Join Date: Sep 2005
Posts: 2
Can phpDig do the job?

Hi guys,

I'm sorry for this generic post, but i'm under a tight deadline and trying to find the appropriate solution before i get all the purchasing done...

I need to index a customer's website.
I must index several different "versions" of the website (the main page will have a different ID and then it must spider from then on and be able to search each of these versions differently) - basically the same website will be called from different URLs, each will have slightly customized content and the searches must be separate (they do not link to each other in the sites themselves)
The problem I am having... (I installed phpdig but it doesnt index follow the links) 99.9% of the links of these websites are generated through javascript (yeah, I know...) they use basically 2 forms:
1 - inline script, an javascript function, inside the HTMLs that have function like this:
Code:
mp[0] = new mPes('/home/peq/index.htm?clique=Geneuenas_Empres','Pequitas','');

              mp[1] = new mPes('/home/medias/index.htm?cliq=Generi_Emprs','Médiapresas','');
              mp[2] = new mPes('/home/grandes/index.htm?cliq=Generi,'Gran,'');
other pages have similar code, but inside a JS script so it calls a <javascript src=/bla/bla/links.js>

Can anyone indicate me whether some sort of spidering option in phpDig or in another spider can handle such weird website writing?

Thanks
Roy
RoyN is offline   Reply With Quote
Old 09-15-2005, 12:03 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
You can try editing the following line in the phpdigExplore fuction of the robot_functions.php file:
Code:
         // this line is in PhpDig v.1.8.7
         while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) {

         // this line is in PhpDig v.1.8.8 RC1
         while (mb_eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*['\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*(($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars)+))(#[.a-zA-Z0-9-]*)?['\"]?",$eval,$regs)) {
Note that src= is already encompassed in the line of code, but try adding something like mPes[[:blank:]]*[(]| right before <frame and see if PhpDig indexes those links.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 09-15-2005, 01:05 PM   #3
RoyN
Green Mole
 
Join Date: Sep 2005
Posts: 2
must i pay to see a response about whether or not phpDig is right for me? (in which case i'd pay/donate since this is a commercial use)
RoyN is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How set up a cron job with PhpDig 1.8.6? gaam How-to Forum 2 01-10-2005 12:50 AM
Reindex without cron job? ark2424 How-to Forum 8 12-09-2004 04:54 AM
cron job problems takpoli How-to Forum 3 05-12-2004 12:26 PM
Alternative to Cron job? jirving Troubleshooting 1 09-29-2003 04:07 PM
cron job David J Harmon How-to Forum 1 09-27-2003 06:20 AM


All times are GMT -8. The time now is 11:40 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.