PDA

View Full Version : Can phpDig do the job?


RoyN
09-15-2005, 11:23 AM
Hi guys,

I'm sorry for this generic post, but i'm under a tight deadline and trying to find the appropriate solution before i get all the purchasing done...

I need to index a customer's website.
I must index several different "versions" of the website (the main page will have a different ID and then it must spider from then on and be able to search each of these versions differently) - basically the same website will be called from different URLs, each will have slightly customized content and the searches must be separate (they do not link to each other in the sites themselves)
The problem I am having... (I installed phpdig but it doesnt index follow the links) 99.9% of the links of these websites are generated through javascript (yeah, I know...) they use basically 2 forms:
1 - inline script, an javascript function, inside the HTMLs that have function like this:

mp[0] = new mPes('/home/peq/index.htm?clique=Geneuenas_Empres','Pequitas','');

mp[1] = new mPes('/home/medias/index.htm?cliq=Generi_Emprs','Médiapresas','');
mp[2] = new mPes('/home/grandes/index.htm?cliq=Generi,'Gran,'');

other pages have similar code, but inside a JS script so it calls a <javascript src=/bla/bla/links.js>

Can anyone indicate me whether some sort of spidering option in phpDig or in another spider can handle such weird website writing?

Thanks
Roy

Charter
09-15-2005, 12:03 PM
You can try editing the following line in the phpdigExplore fuction of the robot_functions.php file:

// this line is in PhpDig v.1.8.7
while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\'\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars))(#[.a-zA-Z0-9-]*)?[\'\" ]?",$eval,$regs)) {

// this line is in PhpDig v.1.8.8 RC1
while (mb_eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;[[:blank:]]*url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*['\"]?((([a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*(($allowed_link_chars\[?$allowed_link_chars\]?$allowed_link_chars)+))(#[.a-zA-Z0-9-]*)?['\"]?",$eval,$regs)) {

Note that src= is already encompassed in the line of code, but try adding something like mPes[[:blank:]]*[(]| right before <frame and see if PhpDig indexes those links.

RoyN
09-15-2005, 01:05 PM
must i pay to see a response about whether or not phpDig is right for me? :rolleyes: (in which case i'd pay/donate since this is a commercial use)