PDA

View Full Version : SPACE IN url


JPS
02-03-2004, 05:53 AM
I HAVE A SITE WITH ULR REWRITING and some url are have a space like:

http://cartouche-epson.1000cartouches.com/imprimantes_EPSON_Stylus%20Color_670.html

or

http://cartouche-epson.1000cartouches.com/imprimantes_EPSON_Stylus Color_670.html

This is no probleme for google that index these pages without probleme but phpdig don't. Is there some thing to change to get it ?

Regards

JPS

Charter
02-05-2004, 07:54 AM
Hi. Untested, but perhaps try the following.

In robot_functions.php add the following to the phpdigRewriteUrl function:

$eval = str_replace(" ","%20",$eval);

Also in robot_functions.php add the following to the phpdigUpdSpiderRow function:

$path = str_replace(" ","%20",$path);
$file = str_replace(" ","%20",$file);

JPS
02-05-2004, 03:12 PM
Actualy I add those lines but nothing change !

Still not working

Thank's for your help

Charter
02-05-2004, 03:34 PM
Hi. Did you reindex, or index new pages, after the changes were made?

JPS
02-05-2004, 11:31 PM
I have delete a whole domain and reindex it. For exemple this domain http://cartouche-epson.1000cartouches.com/ has 14 links also normaly it should be more than 200 or 300

Regards

JPS

vinyl-junkie
02-06-2004, 03:33 AM
Or instead of this$eval = str_replace(" ","%20",$eval);
try this$eval = str_replace(" ","",$eval);
Don't know if that will work, but it's worth a shot.

Also, and I have to ask this, how practical would it be for you to modify those URL's so there is no embedded space?

Charter
02-06-2004, 08:44 AM
Hi. In robot_functions.php are two functions to edit.

First, in phpdigExplore find:

while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\\\\'\"]?((([[a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*([:%/?=&;\\\\,._a-zA-Z0-9\\|+-]*))(#[.a-zA-Z0-9-]*)?[\\\\'\" ]?",$eval,$regs)) {

and replace with:

while (eregi("(<frame[^>]*src[[:blank:]]*=|href[[:blank:]]*=|http-equiv=['\"]refresh['\"] *content=['\"][0-9]+;url[[:blank:]]*=|window[.]location[[:blank:]]*=|window[.]open[[:blank:]]*[(])[[:blank:]]*[\\\\'\"]?((([[a-z]{3,5}://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*([:%/?=&;\\\\,._a-zA-Z0-9\\|+ ()-]*))(#[.a-zA-Z0-9-]*)?[\\\\'\" ]?",$eval,$regs)) {

Second, in phpdigIndexFile find:

while (eregi("<a([^>]*href[[:blank:]]*=[[:blank:]]*[\\\\'\"]?(((http://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*([:%/?=&;\\\\,._a-zA-Z0-9-]*))[#\\\\'\" ]?)",$line,$regs)) {

and replace with:

while (eregi("<a([^>]*href[[:blank:]]*=[[:blank:]]*[\\\\'\"]?(((http://)+(([.a-zA-Z0-9-])+(:[0-9]+)*))*([:%/?=&;\\\\,._a-zA-Z0-9\\|+ ()-]*))[#\\\\'\" ]?)",$line,$regs)) {

Now try another reindex. What are the results?

Remember to remove any "word" wrapping in the above code.

JPS
02-06-2004, 09:33 AM
Thank you Charter it's working fine now

JPS

Charter
02-06-2004, 10:09 AM
Great, glad it's working. BTW, did you leave in or take out the code in this (http://www.phpdig.net/showthread.php?s=&postid=2101#post2101) post?

JPS
02-06-2004, 10:29 AM
Yes I leave it, at first I try without but it did not work.

Regards :)

Charter
02-06-2004, 10:36 AM
Okay, thanks. :)