PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Bug Tracker (http://www.phpdig.net/forum/forumdisplay.php?f=27)
-   -   Bug in grabbing urls from the page (http://www.phpdig.net/forum/showthread.php?t=655)

Konstantine 03-12-2004 08:43 PM

Bug in grabbing urls from the page
 
Hello again! I found, that if link looks like:

≶a href=next/index.html>next/index.html≶/a>

i.e. without quotes, phpdig don't follow it!

Konstantine 03-13-2004 08:54 AM

Sorry, no bug found :angel:

Charter 03-13-2004 09:15 AM

Hi Konstantine, and welcome to PhpDig.net!

Thanks for the contributions too. It's good to have other people review the code and offer input. :)

Konstantine 03-13-2004 09:18 AM

I had that problem on my work but can't verify it now and tell what really happend.

Konstantine 03-15-2004 07:12 AM

Hi again, so there IS a bug. I can't explain why is it, but you can see it. Try to index http://madboard.ru and look at the page http://madboard.ru/index.html?act=do&code=43. So you will not find links such as http://madboard.ru/index.html?act=do&code=45. Try it. I didn't found why is it happening. Any sugestions?

Konstantine 03-17-2004 08:30 AM

I found it (BUG)!!! And it's not in PHPDIG :angel: It's in PHP function :D

I use PHP Version 4.3.3, OS Linux, so the bug is in parse_url function :D

You can find it out on site madboard.ru. If you'll try to index it you'll find about 42 pages (if the bug is in your version of PHP).

So change in robot_functions.php in function function phpdigRewriteUrl($eval)

code:

PHP Code:

$url = @parse_url(str_replace('\\\\'"','',$eval));
if (!isset($url['path'])) {
     $url['path'] = '';


by following code:

PHP Code:

$url = @parse_url(str_replace('\\\\'"','',$eval));
$url['query']=str_replace("
&","&",$url['query']);
if (!isset($url['path'])) {
     $url['path'] = '';


After that try to index madboard.ru again :angel:

You'll find about 400 pages!!!

the bug is:

if you try to parse url http://madboard.ru/index.html?act=do&code=43 you'll get in 'query' line act=do&code=43

Any questions? :angel:

If you tried it and it was as I said, please reply on this message :angel:

Charter 03-25-2004 05:34 AM

Hi. Speaking of & versus & there is a small bug in version 1.8.0 when PHPDIG_SESSID_REMOVE is set to true. To fix do the following.

In robot_functions.php find:
PHP Code:

$eval str_replace("&&","&",$eval);
$eval eregi_replace("[?][&]","?",$eval);
$eval eregi_replace("&$","",$eval); 

and replace with:
PHP Code:

$eval str_replace("&&","&",$eval);
$eval str_replace("?&","?",$eval);
$eval eregi_replace("&$","",$eval); 

Also, in robot_functions.php find:
PHP Code:

$file str_replace("&&","&",$file);
$file eregi_replace("[?][&]","?",$file);
$file eregi_replace("&$","",$file); 

and replace with:
PHP Code:

$file str_replace("&&","&",$file);
$file str_replace("?&","?",$file);
$file eregi_replace("&$","",$file); 



All times are GMT -8. The time now is 10:29 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.