PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-12-2005, 12:29 PM   #1
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
PhpDig not identifying itself on every page access

I'm in the process of setting up PhpDig and it works quite well.

But, there is one minor problem. PhpDig is not identifying itself when it accesses every page. Here's a sample from my logs on a test site...

69.64.40.48 - - [12/Jan/2005:11:32:58 -0500] "HEAD /robots.txt HTTP/1.1" 200 - "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)"
69.64.40.48 - - [12/Jan/2005:11:32:58 -0500] "GET /robots.txt HTTP/1.0" 200 321 "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)"
...
...
69.64.40.48 - - [12/Jan/2005:11:33:00 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-"

As you can see it does fine when requesting robots.txt, but when it requests an actual page it doesn't identify itself.
CBJim is offline   Reply With Quote
Old 01-12-2005, 01:08 PM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Check that the user agent you are using is set to not block referring information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-13-2005, 02:46 AM   #3
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
Nope, that's not the problem. Google, MSN, and others show up with no problem. As do browser user agents Mozilla, IE, Firefox, etc.

Interesting that the user agent shows up for the robots.txt query, but for the head and get queries for actual HTML pages it vanishes.

Last edited by CBJim; 01-13-2005 at 02:48 AM.
CBJim is offline   Reply With Quote
Old 01-13-2005, 03:21 AM   #4
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
Interesting addendum...

PhpDig identifies itself without a problem on the root "/" head request, then loses it on the root "/" get request. All head and get statements after the first root head request lose the user-agent.
CBJim is offline   Reply With Quote
Old 01-13-2005, 03:24 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PhpDig passes its user-agent on every request and does nothing to block referrer, so I wouldn't think this issue is related to PhpDig. Note that, in the following line, not even the page size is given. Perhaps send an email to server4you and see if they have an idea.
Code:
69.64.40.48 - - [12/Jan/2005:11:33:00 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-"
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-13-2005, 04:03 AM   #6
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
One last possibility/question:

This is the root IP on the server that is hosting the site being spidered. Could it be excluding the user-agent because of that?
CBJim is offline   Reply With Quote
Old 01-13-2005, 05:14 AM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Yes, the IP is from the machine running the spider, but I don't see why that would cause PhpDig to block out information.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-13-2005, 05:23 AM   #8
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
I'm puzzled myself, but I have been experimenting a little...

If you remove..

.$cookiesSendString
.$auth_string

from function phpdigTestUrl, it's identity is revealed again for every inquiry.

Change the HEAD to GET and the file size is correctly shown in the logs again. Which also allows PhpDig to be excluded in the spider killer in OSCommerce.

Unfortunately, put the cookie info back in and you lose the file size and phpdig's user-agent again.

And it's not phpdig that's blocking the info, I made it echo every request and the info is being sent, it's just not being shown by the server logs.

If you could help me rest better, could you spider www.candlerock.com (search depth 1, links 0)? This way I can compare the log entries and see if they are different/correct from an external IP.

Last edited by CBJim; 01-13-2005 at 05:32 AM.
CBJim is offline   Reply With Quote
Old 01-13-2005, 08:49 AM   #9
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Okay, but don't post my IP. Here's the admin log for 30 odd links before I stopped the spider. Check your access log a few minutes before the time of this post. Do you see correct UA and referrer info?

Spidering in progress... [Stop spider]
SITE : http://www.candlerock.com/
Exclude paths :
- _fpclass
- _private
- _themes
- _vti_cnf
- _vti_log
- _vti_pvt
- _vti_script
- _vti_txt
- download
- wmail
- CVS
- cgi-bin
- candles/admin
1:http://www.candlerock.com/
(time : 00:00:09)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
level 1...
2:http://www.candlerock.com/ordering.php
(time : 00:00:34)

3:http://www.candlerock.com/privacy.php
(time : 00:00:41)

4:http://www.candlerock.com/advanced_search.php
(time : 00:00:48)

5:http://www.candlerock.com/contact_us.php
(time : 00:00:54)

Meta Robots = NoIndex, or already indexed : No content indexed
6:http://www.candlerock.com/shopping_cart.php
(time : 00:01:01)

7:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Accessories&cPath=1
(time : 00:01:07)

8:http://www.candlerock.com/candle_making_supplies.php?c=Decorative_Candle_Making_Accessories&cPath=1_3 7
(time : 00:01:15)

9:http://www.candlerock.com/candle_making_supplies.php?c=Wax_Melting_Pots&cPath=1_39
(time : 00:01:22)

10:http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Candle_Making_Accessories&cPath= 1_42
(time : 00:01:30)

11:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Mold_Accessories&cPath=1_38
(time : 00:01:37)

12:http://www.candlerock.com/candle_making_supplies.php?c=Scales_for_Candle_Making&cPath=1_40
(time : 00:01:44)

13:http://www.candlerock.com/candle_making_supplies.php?c=Wick_Tabs&cPath=1_36
(time : 00:01:51)

14:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Additives&cPath=2
(time : 00:01:58)

15:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Coloring&cPath=21
(time : 00:02:05)

16:http://www.candlerock.com/candle_making_supplies.php?c=Color_Chips&cPath=21_28
(time : 00:02:12)

17:http://www.candlerock.com/candle_making_supplies.php?c=Liquid_Candle_Coloring&cPath=21_29
(time : 00:02:20)

18:http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Kits&cPath=22
(time : 00:02:28)

19:http://www.candlerock.com/candle_making_supplies.php?c=Metal_Candle_Molds&cPath=31
(time : 00:02:35)

20:http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Metal_Candle_Molds&cPath=31_57
(time : 00:02:42)

21:http://www.candlerock.com/candle_making_supplies.php?c=Oval_Metal_Candle_Molds&cPath=31_55
(time : 00:02:50)

22:http://www.candlerock.com/candle_making_supplies.php?c=Pyramid_Metal_Candle_Molds&cPath=31_56
(time : 00:02:58)

23:http://www.candlerock.com/candle_making_supplies.php?c=Round_Metal_Candle_Molds&cPath=31_52
(time : 00:03:06)

24:http://www.candlerock.com/candle_making_supplies.php?c=Square_Metal_Candle_Molds&cPath=31_53
(time : 00:03:13)

25:http://www.candlerock.com/candle_making_supplies.php?c=Star_Metal_Candle_Molds&cPath=31_54
(time : 00:03:22)

26:http://www.candlerock.com/candle_making_supplies.php?c=2_Piece_Plastic_Candle_Molds&cPath=30
(time : 00:03:29)

27:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Animal_Candle_Molds&cPath=30_44
(time : 00:03:36)

28:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Christmas_Candle_Molds&cPath=30_49
(time : 00:03:44)

29:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Column_and_Taper_Candle_Molds&cPath=30 _50
(time : 00:03:52)

30:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Floating_Candle_Molds&cPath=30_43
(time : 00:03:59)

31:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Food_and_Fruit_Candle_Molds&cPath=30_4 7
(time : 00:04:08)

32:http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Halloween_Candle_Molds&cPath=30_48
(time : 00:04:15)

No link in temporary table
links found : 31
http://www.candlerock.com/
http://www.candlerock.com/ordering.php
http://www.candlerock.com/privacy.php
http://www.candlerock.com/advanced_search.php
http://www.candlerock.com/contact_us.php
http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Accessories&cPath=1
http://www.candlerock.com/candle_making_supplies.php?c=Decorative_Candle_Making_Accessories&cPath=1_3 7
http://www.candlerock.com/candle_making_supplies.php?c=Wax_Melting_Pots&cPath=1_39
http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Candle_Making_Accessories&cPath= 1_42
http://www.candlerock.com/candle_making_supplies.php?c=Candle_Mold_Accessories&cPath=1_38
http://www.candlerock.com/candle_making_supplies.php?c=Scales_for_Candle_Making&cPath=1_40
http://www.candlerock.com/candle_making_supplies.php?c=Wick_Tabs&cPath=1_36
http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Additives&cPath=2
http://www.candlerock.com/candle_making_supplies.php?c=Candle_Coloring&cPath=21
http://www.candlerock.com/candle_making_supplies.php?c=Color_Chips&cPath=21_28
http://www.candlerock.com/candle_making_supplies.php?c=Liquid_Candle_Coloring&cPath=21_29
http://www.candlerock.com/candle_making_supplies.php?c=Candle_Making_Kits&cPath=22
http://www.candlerock.com/candle_making_supplies.php?c=Metal_Candle_Molds&cPath=31
http://www.candlerock.com/candle_making_supplies.php?c=Miscellaneous_Metal_Candle_Molds&cPath=31_57
http://www.candlerock.com/candle_making_supplies.php?c=Oval_Metal_Candle_Molds&cPath=31_55
http://www.candlerock.com/candle_making_supplies.php?c=Pyramid_Metal_Candle_Molds&cPath=31_56
http://www.candlerock.com/candle_making_supplies.php?c=Round_Metal_Candle_Molds&cPath=31_52
http://www.candlerock.com/candle_making_supplies.php?c=Square_Metal_Candle_Molds&cPath=31_53
http://www.candlerock.com/candle_making_supplies.php?c=Star_Metal_Candle_Molds&cPath=31_54
http://www.candlerock.com/candle_making_supplies.php?c=2_Piece_Plastic_Candle_Molds&cPath=30
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Animal_Candle_Molds&cPath=30_44
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Christmas_Candle_Molds&cPath=30_49
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Column_and_Taper_Candle_Molds&cPath=30 _50
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Floating_Candle_Molds&cPath=30_43
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Food_and_Fruit_Candle_Molds&cPath=30_4 7
http://www.candlerock.com/candle_making_supplies.php?c=Plastic_Halloween_Candle_Molds&cPath=30_48
Optimizing tables...
Indexing complete ! [Back] to admin interface.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-13-2005, 10:53 AM   #10
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
Well, nope...still the same situation. Here's a partial of the log...

xx.xx.xxx.15 - - [13/Jan/2005:12:37:33 -0500] "HEAD / HTTP/1.1" 200 - "-" "PhpDig/1.8.6 (+http://www.phpdig.net/robot.php)"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:33 -0500] "GET / HTTP/1.1" 200 19200 "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:34 -0500] "HEAD /nstylesheet.css HTTP/1.1" 200 - "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /ordering.php HTTP/1.1" 200 - "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /privacy.php HTTP/1.1" 200 - "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /advanced_search.php HTTP/1.1" 200 - "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:35 -0500] "HEAD /contact_us.php HTTP/1.1" 200 - "-" "-"
xx.xx.xxx.15 - - [13/Jan/2005:12:37:36 -0500] "HEAD /shopping_cart.php HTTP/1.1" 200 - "-" "-"

Also, since you clicked the link before you spidered that site... nice to see that someone is using firefox.
CBJim is offline   Reply With Quote
Old 01-13-2005, 03:23 PM   #11
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
When I test on my server, the UA and referrer come through okay. Maybe there is some protocol issue and/or your server is not able to understand the headers. What type of OS/setup are you using? Anyway, if you don't need cookies or authentication sent with the requests, just remove $cookiesSendString and/or $auth_string from the two HEAD and one GET requests in the robot_functions.php file. It's not an ideal solution, but I can't figure out what's going on, especially since I can't reproduce it. BTW, I do like Firefox.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-14-2005, 02:20 AM   #12
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
Linux Fedora - Core 2
Apache - 2.57
HTTPD - 2.0.51

I'll do some research on this, something is amiss. All other IDs are recognized.

I like Foxfire myself, I just need to get out of the habit of clicking the IE icon.
CBJim is offline   Reply With Quote
Old 01-14-2005, 03:29 AM   #13
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
Well, I tried spidering another site on the server and phpDig was recognized as the user-agent in the logs.

Being a little perplexed I wrote a quick script with apache_request_headers() and ran it on the site that hasn't been recognizing phpDig. There appears to be a "ghost" cookie being sent that isn't being picked up and echoed back to the server. It's a phpbb2mysql_data cookie. I'm not even sure why it's being set on that site since phpbb hasn't been on that site...ever.

Now I don't even know where to start looking for that to fix it.

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Cookie: lang=english; phpbb2mysql_data=s%3A0%3A%22%22%3B; osCsid=%cookie deleted%

The osCsid is being echoed back by phpDig, the phpbb2 is not.

Last edited by CBJim; 01-14-2005 at 03:36 AM.
CBJim is offline   Reply With Quote
Old 01-14-2005, 08:34 AM   #14
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
WRT phpbb2mysql_data:
PHP Code:
echo urldecode("phpbb2mysql_data=s%3A0%3A%22%22%3B");
// prints phpbb2mysql_data=s:0:"";
// looks like serialized data containing nothing 
In Firefox: tools > options > privacy > cookies > view cookies > remove cookie > ok > ok

Remove any phpbb2mysql_* type cookie. Still see the ghost cookie?

PhpDig tip: add osCsid to PHPDIG_SESSID_VAR in config file to remove session from links when indexing.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-14-2005, 08:56 AM   #15
CBJim
Green Mole
 
Join Date: Jan 2005
Posts: 9
LOL, never thought to remove the cookie from my system. Still doesn't explain how it got there though. But, that's a different problem.

So, that means I've run out of ideas for why the user-agent is showing.

Thanks for the tip!
CBJim is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PHP has encountered an Access Violation at csouza Troubleshooting 0 02-28-2008 03:15 PM
Strange things on my apache log access. dawn Troubleshooting 1 01-26-2005 07:29 AM
access forbbiden liquidice Script Installation 7 08-27-2004 03:12 PM
As I can index archives with access restricted with password? zertiko How-to Forum 7 07-24-2004 08:07 AM
Write access to installation directories not required. jirving Mod Requests 0 09-29-2003 11:01 AM


All times are GMT -8. The time now is 12:30 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.