PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   unable to parse url (http://www.phpdig.net/forum/showthread.php?t=731)

marb 03-26-2004 11:32 PM

unable to parse url
 
Hi,
I'm spider a page and get the below error notice, what can I do on it?
I use the loop option and have no troubles before with other pages spidering.
The spider index a page and the message show up wen a other url is located, not the page wich is spider at that moment.



[quote]
+ + + + + + + + + + + + + + + + + +
63:http://www.wetcanvas.com/MediaKit/
(time : 00:54:20)

Warning: parse_url(http://www.heritageglass.com?amp;zon...itageglass.com) [function.parse-url]: Unable to parse url in /opt/guide/www.artrefer.com/HTML/web/s3/admin/robot_functions.php on line 372
+ + + + + + + +
64:http://www.wetcanvas.com/web/
(time : 00:54:51)
+ +
65:http://www.wetcanvas.com/colormixer/
(time : 00:55:21)
+ + + +

Marten :)

Charter 03-27-2004 02:06 AM

Hi. There is a 1.8.0 fix in this post that should be applied.

However, even with the fix, I'm not sure parse_url will handle a URL in the query string. See below.
PHP Code:

<?php
$url
="http://www.heritageglass.com?amp;zoneid=0&source=&dest=http://www.heritageglass.com";
print_r(parse_url($url)); // without fix and with url
echo "\n<br>\n";
$url="http://www.heritageglass.com?zoneid=0&source=&dest=http://www.heritageglass.com";
print_r(parse_url($url)); // with fix and with url
echo "\n<br>\n";
$url="http://www.heritageglass.com?zoneid=0&source=&dest=";
print_r(parse_url($url)); // with fix and without url
?>

The output is as follows:

Array
(
[scheme] => http
[host] => www.heritageglass.com?amp;zoneid=0&source=&dest=http
[path] => //www.heritageglass.com
)

Array
(
[scheme] => http
[host] => www.heritageglass.com?zoneid=0&source=&dest=http
[path] => //www.heritageglass.com
)

Array
(
[scheme] => http
[host] => www.heritageglass.com
[query] => zoneid=0&source=&dest=
)


Untested, but in robot_functions.php you might try the following code:
PHP Code:

$newurl parse_url($newpath);

// add this chunk of code here
if ((isset($newurl["host"])) && (eregi("[?]",$newurl["host"]))) {
  if (!isset(
$newurl["path"])) { $newurl["path"] = ""; }
  if (!isset(
$newurl["query"])) { $newurl["query"] = ""; }
  
$newurl["query"] = substr(strstr($newurl["host"],"?"),1).$newurl["path"].$newurl["query"];
  unset(
$newurl["path"]);
  
$newurl["host"] = substr($newurl["host"],0,strpos($newurl["host"],"?"));
}

//search if relocation is absolute or relative 

Remember to remove any "word" wrapping in the above code.


All times are GMT -8. The time now is 08:50 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.