PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 10-19-2003, 08:40 AM   #1
druesome
Orange Mole
 
Join Date: Oct 2003
Posts: 30
Unhappy Choosy about domains?

Hi, for the last few days I've been spidering without a single hitch, until today. The last website I tried to spider has the .ph domain and I wonder if that could be the reason it could not be spidered. If you could try it out for me, the URL is http://www.birdwatch.ph ..

And lastly, I also noticed that when I spider a site that is hosted under Geocities, the site_url becomes www.geocities.com without including the folder where the site really is. (e.g. www.geocities.com/mysite). Is there a way around this? It may seem like a weird request but I really really need it to be this way coz I'm working on a hack that will benefit from it. Thanks in advance!!
druesome is offline   Reply With Quote
Old 10-19-2003, 11:31 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. What message did you get when you tried to crawl birdwatch.ph? Does setting PHPDIG_DEFAULT_INDEX to false in the config file have any effect?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 10-19-2003, 09:26 PM   #3
druesome
Orange Mole
 
Join Date: Oct 2003
Posts: 30
I already tried that yesterday, but didn't work. Actually, when I try to spider the site, it times out and would seem like nothing's happened. When I refresh the admin page, the URL is added to the list however no page is crawled.

Any ideas about my other question? Thanks.
druesome is offline   Reply With Quote
Old 04-19-2004, 05:47 PM   #4
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
I'm actually curious about druesome's second question as well, and found this thread searching for the answer, but no answer yet. Why does phpDig erase the folder name to a site when it stores the URL? I just searched http://gino.go-gaia.com/forum and it worked well, sticking to that directory, but in the admin panel the link has the forum directory removed. Sorry if this is an easy question but can I make phpDig leave the format of the URL I spidered alone? So that if I spider http://gino.go-gaia.com/forum then that URL will be in the sites table? Thanks.
bloodjelly is offline   Reply With Quote
Old 04-20-2004, 12:49 PM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. As to birdwatch.ph what do you get onscreen when you uncomment //print $answer."<br>\n"; in the robot_functions.php file?

WRT the admin index page, it shows only the site, domain or subdomain as the case may be. This is based off of parse_url (see below code). To view the directories/branches for a specific (sub)domain, just click the site and then click the update button.
PHP Code:
<?php

$link 
"http://foo.domain.com/dir1/dir2/dir3/file.php?a=b&c=d#anchor";
print_r(parse_url($link));

/* start output
Array
(
    [scheme] => http
    [host] => foo.domain.com
    [path] => /dir1/dir2/dir3/file.php
    [query] => a=b&c=d
    [fragment] => anchor
)
end output */

// foo.domain.com gets stored as http://foo.domain.com/

?>
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 04-20-2004, 04:52 PM   #6
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
How about if I wanted to store the directory information exactly as entered in the spider script in the spider's "sites" table? Or am I missing something...
bloodjelly is offline   Reply With Quote
Old 04-20-2004, 05:28 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. To get a feel for how it works, look through the tables and see how the domain is stored in the sites table and path/file info is stored in the spider/tempspider/excludes tables, and then search the robot_functions.php file for the parse_url function.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 04-30-2004, 12:39 AM   #8
bloodjelly
Purple Mole
 
Join Date: Dec 2003
Posts: 106
Thanks Charter - my host lost all MySQL for about a week (no explaination why) so I haven't been able to try this, but I will ASAP. Thanks for pointing me in the right direction.
bloodjelly is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Banned Domains JLutterklas How-to Forum 0 09-05-2006 10:38 AM
Blocking particular hosts or domains? cewyattjr How-to Forum 0 06-09-2006 11:49 AM
Blocking domains richwilson How-to Forum 0 03-29-2006 06:02 AM
Newbie on Domains: Yes or No Answer Please :) new2dev How-to Forum 1 03-01-2005 11:24 PM
Working with Domains bazarin How-to Forum 1 02-28-2004 03:28 PM


All times are GMT -8. The time now is 11:29 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.