PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Bug Tracker

Reply
 
Thread Tools
Old 03-30-2004, 11:32 PM   #1
cybercox
Green Mole
 
Join Date: Jan 2004
Location: Italy
Posts: 11
Bug when spidering subdomains

Hi charter!
I have found the following bug:

1) I spider a site (example: http://www.jobnetwork.it/foodir/foo.htm) that contains a link to a subdomain. The link must consist in hostname onliy. In the example you will find a link to http://piemonte.jobnetwork.it

2) The spider finds the link and since i have define('PHPDIG_IN_DOMAIN',true); adds it to the tempspider and sites tables. Actually it add the site correctly but in tempspider adds the path of the current page. In this case adds http://piemonte.jobnetwork.it/foodir/

3) I have done some tweaking on it and found that in robot_functions.php in phpdigExplore function:


if (substr($regs[8],0,1) == "/") {
$links[$index] = phpdigRewriteUrl($regs[8]);
}
else{
$links[$index] = phpdigRewriteUrl($path.$regs[8]);
}

the "else" is executed when we don't have any path-filename in the link, so if i link to http://subdomain.jobnetwork.it the current path is added to the link!

My solution is the following:

if (substr($regs[8],0,1) == "/") {
$links[$index] = phpdigRewriteUrl($regs[8]);
}
elseif($regs[5]=="" or $url == 'http://'.$regs[5].'/'){
// we are in the same host or the host information is not provided
$links[$index] = phpdigRewriteUrl($path.$regs[8]);
}elseif ($regs[5] != "" && $url != 'http://'.$regs[5].'/') {
// host information is provided but we are not in the same host
$links[$index] = phpdigRewriteUrl($regs[8]);
}


Charter what do you think? I don't know if the solution is good, if it is conservative to the other links or not....
Regards
Simone Capra

capra__nospam__@erweb.it
E.R.WEB - s.r.l.
http://www.erweb.it
cybercox is offline   Reply With Quote
Old 03-31-2004, 11:48 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Good eye! Yes, I see the problem when a link like http://sub.domain.com is encountered without an ending slash. Untested, but an alternative solution might be the following:
PHP Code:
if (($regs[5] != "") && ($regs[8] == "")) {
     
$links[$index] = array("path" => """file" => "");
}
elsif (substr($regs[8],0,1) == "/") {
     
$links[$index] = phpdigRewriteUrl($regs[8]);
}
else {
     
$links[$index] = phpdigRewriteUrl($path.$regs[8]);

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug when spidering julien Troubleshooting 3 03-01-2005 10:21 PM
index subdomains AllKnightAccess How-to Forum 3 09-26-2004 01:01 PM
digging subdomains b2l_grefix How-to Forum 6 05-10-2004 02:34 PM
Problems with Subdomains herberth Troubleshooting 8 04-02-2004 06:42 AM
Logical bug and stopping spidering Konstantine Bug Tracker 0 03-14-2004 12:03 AM


All times are GMT -8. The time now is 09:17 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.