PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Feedback & News (http://www.phpdig.net/forum/forumdisplay.php?f=25)
-   -   Version 1.8.1 Alpha (http://www.phpdig.net/forum/showthread.php?t=942)

vinyl-junkie 07-04-2004 07:01 PM

Don't know if this helps at all, but I found a few pages that were indexed with 1.8.0 and weren't with 1.8.1. These are all from www.napathon.net

/AlbumID1107.php - which can be found on: /Rock12.php
/AlbumID1113.php - on /Rock23.php
/AlbumID1114.php & /AlbumID1115.php - on /Rock28.php

Doesn't make sense why these pages wouldn't be spidered. I don't have a complete spider log or anything, but I could go spider again and make that if you need it.

Charter 07-04-2004 07:34 PM

Hi. Try comparing the version 1.8.0 config file against the version 1.8.1 config file. Perhaps something there is causing the difference?

vinyl-junkie 07-04-2004 08:20 PM

Yes, there are some differences in my config file between 1.8.0 and 1.8.1. I'm not sure whether they'd make that much difference though. Here are the ones that have anything to do with spidering:

SPIDER_MAX_LIMIT 20 (1.8.0) vs. 10 (1.8.1)
SPIDER_DEFAULT_LIMIT 3 (1.8.0) vs. 5 (1.8.1)

Everything else is the same in both versions.

I'll copy the 1.8.1 config file to my server, re-spider and let you know what happens.

Charter 07-05-2004 05:04 AM

Hi. Try checking CHUNK_SIZE in the config file too.

vinyl-junkie 07-05-2004 05:33 AM

CHUNK_SIZE in the 1.8.1 config file I have is 1024 vs. 2048 in my original config file.

When I re-spidered last night after copying the 1.8.1 config file to the server, I got 2 more pages indexed this time. Still far short of what I should have.

Charter, when you get a chance, could you put together a complete 1.8.1 zip file with all the latest stuff? I want to re-download again to make absolute certain I'm using all files from that. Then I'll re-spider again and let you know what happens.

Nice to have plenty of bandwidth to do that. ;)

Charter 07-05-2004 06:05 AM

PhpDig: 1.8.1 alpha <removed> and two replacement files <removed>. Manual install still required as install.php not yet included.

EDIT: PhpDig version 1.8.1 released.

vinyl-junkie 07-05-2004 07:02 AM

Are you aware that the 1.8.1 alpha zip has nothing separated into the appropriate folders? :( I don't remember it being like that when I downloaded it the first time.

Charter 07-05-2004 07:07 AM

Hi. It should be separated as before; it's the same file. Maybe an unzip program option needs to be un/checked?

vinyl-junkie 07-05-2004 09:58 AM

You were correct. I had "Use Folder Names" un-checked. :o

OK, this time I emptied all my folders and started from scratch with those two zip files. Search depth = 10. Links per = 0.

Here's the spider log:

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.napathon.net/
Exclude paths :
- test/
- phpdig181/
- BW-Original/
- Joe_and_Eddie/
1:http://www.napathon.net/
(time : 00:00:06)

No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.napathon.net/
Optimizing tables...
Indexing complete !



==========================
Hosts: 1 Pages
Entries: 1 Pages
Index: 177 Entries
Keywords: 177 Entries
Temporary Table: 0 Entries

Charter 07-05-2004 10:47 AM

Hi. When indexing http://www.napathon.net/ with search depth of ten and links_per of zero, and hand stopping after 100 links, below is the output. Do you not get this? Does anything show in your error logs?

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.napathon.net/
Exclude paths :
- test/
- phpdig181/
- BW-Original/
- Joe_and_Eddie/
1:http://www.napathon.net/
(time : 00:00:08)
+ + + + + + +
level 1...
2:http://www.napathon.net/miscmenu.php
(time : 00:00:20)
+ + + + + + + + + + + +
3:http://www.napathon.net/musicmenu.php
(time : 00:00:28)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
4:http://www.napathon.net/SearchMenu.php
(time : 00:00:42)
Ok for http://search.napathon.net/search.php (site_id:469)
+
5:http://www.napathon.net/sitemap.php
(time : 00:00:50)

6:http://www.napathon.net/FAQ.php
(time : 00:00:57)

7:http://www.napathon.net/ContactMe.php
(time : 00:01:03)

8:http://www.napathon.net/Privacy.php
(time : 00:01:09)

level 2...
9:http://www.napathon.net/1219AshlandIntro.php
(time : 00:01:21)

10:http://www.napathon.net/1219AshlandSlideShow.php
(time : 00:01:27)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

11:http://www.napathon.net/BeeGeesMobile.php
(time : 00:01:34)

12:http://www.napathon.net/BoganReunion2003SlideShow.php
(time : 00:01:40)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

13:http://www.napathon.net/EstherEscortToHeaven.php
(time : 00:01:48)

14:http://www.napathon.net/MeAtWork.php
(time : 00:01:54)

15:http://www.napathon.net/RekkidRoom.php
(time : 00:02:00)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

16:http://www.napathon.net/WongFamily.php
(time : 00:02:08)

17:http://www.napathon.net/Wonglets.php
(time : 00:02:14)

18:http://www.napathon.net/BillsRecordsSlideShow.php
(time : 00:02:20)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

19:http://www.napathon.net/JohnnyGimbleSlideShow.php
(time : 00:02:27)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

20:http://www.napathon.net/CDTrusteeReview.php
(time : 00:02:33)

21:http://www.napathon.net/MusicIntro.php
(time : 00:02:39)

Meta Robots = NoIndex, or already indexed : No content indexed
22:http://www.napathon.net/MyCollection1.php
(time : 00:02:45)

23:http://www.napathon.net/NewArrivals1.php
(time : 00:02:52)
+ + + + +
24:http://www.napathon.net/BeeGees1.php
(time : 00:03:00)
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
25:http://www.napathon.net/Blues1.php
(time : 00:03:11)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
26:http://www.napathon.net/Corrs1.php
(time : 00:03:24)
+ + + + + + + + + + + + + + + + + + + + +
27:http://www.napathon.net/Country1.php
(time : 00:03:34)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
28:http://www.napathon.net/EasyListening1.php
(time : 00:03:46)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
29:http://www.napathon.net/Folk1.php
(time : 00:03:59)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
30:http://www.napathon.net/Jazz1.php
(time : 00:04:10)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
31:http://www.napathon.net/Miscellaneous1.php
(time : 00:04:23)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
32:http://www.napathon.net/Reggae1.php
(time : 00:04:34)
+ + + + + + + + + + + + + + + + + + +
33:http://www.napathon.net/Rock1.php
(time : 00:04:44)
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
34:http://www.napathon.net/SavageGarden2.php
(time : 00:04:55)
+ + + + + + + + + + + + + + + + + + + + + + +
35:http://www.napathon.net/TradeList.php
(time : 00:05:05)
+ + + + + + + + + + + + + + + + + +
36:http://www.napathon.net/WantList.php
(time : 00:05:14)

37:http://www.napathon.net/BadTrader.php
(time : 00:05:21)

38:http://www.napathon.net/LPtoCD.php
(time : 00:05:28)
+
39:http://www.napathon.net/ABeeGeesChristmas.php
(time : 00:05:34)

40:http://www.napathon.net/BeeGeeTastic.php
(time : 00:05:40)

41:http://www.napathon.net/KerrvilleEarlyYearsReview.php
(time : 00:05:48)

42:http://www.napathon.net/GottaGetReview.php
(time : 00:05:54)

43:http://www.napathon.net/ThreeBees.php
(time : 00:06:00)

44:http://www.napathon.net/BeeGees6.php
(time : 00:06:07)
+ + + + + + + + + + + + + + + + + + + + + +
45:http://www.napathon.net/WeLoveTheBeeGees.php
(time : 00:06:18)

46:http://www.napathon.net/BillsRecordsArticle.php
(time : 00:06:24)
+
47:http://www.napathon.net/IStartedAJoke.php
(time : 00:06:31)

48:http://www.napathon.net/JohnnyGimbleConcert.php
(time : 00:06:38)

49:http://www.napathon.net/CliveAnderson.php
(time : 00:06:44)

50:http://www.napathon.net/RustyWier.php
(time : 00:06:50)

51:http://www.napathon.net/InternetCollecting.php
(time : 00:06:58)

52:http://www.napathon.net/BWStevenson.php
(time : 00:07:05)

53:http://www.napathon.net/BWStevensonOnRhino.php
(time : 00:07:12)

54:http://www.napathon.net/BW-Lyrics.php
(time : 00:07:18)
+ + + + + + + + + +
55:http://www.napathon.net/BWStevenson/bw_intro.php
(time : 00:07:26)

56:http://www.napathon.net/BWStevenson/bw_page_1.php
(time : 00:07:32)

57:http://www.napathon.net/BWStevenson/bw_discography.php
(time : 00:07:39)

HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
+
58:http://www.napathon.net/BWStevenson/bw_memories.php
(time : 00:07:47)

59:http://www.napathon.net/BWStevenson/bw_tv.php
(time : 00:07:55)

Meta Robots = NoIndex, or already indexed : No content indexed
60:http://www.napathon.net/MusicDBSearch.php
(time : 00:08:01)

level 3...
Meta Robots = NoIndex, or already indexed : No content indexed
61:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=74
(time : 00:08:12)

62:http://www.napathon.net/AlbumID1552.php
(time : 00:08:18)

63:http://www.napathon.net/AlbumID1553.php
(time : 00:08:24)

Meta Robots = NoIndex, or already indexed : No content indexed
64:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=55
(time : 00:08:30)

65:http://www.napathon.net/AlbumID1557.php
(time : 00:08:36)

Meta Robots = NoIndex, or already indexed : No content indexed
66:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=338
(time : 00:08:43)

67:http://www.napathon.net/AlbumID1063.php
(time : 00:08:49)

Meta Robots = NoIndex, or already indexed : No content indexed
68:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=354
(time : 00:08:55)

69:http://www.napathon.net/AlbumID1153.php
(time : 00:09:01)

Meta Robots = NoIndex, or already indexed : No content indexed
70:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=39
(time : 00:09:07)

71:http://www.napathon.net/AlbumID42.php
(time : 00:09:13)

72:http://www.napathon.net/AlbumID43.php
(time : 00:09:20)

Meta Robots = NoIndex, or already indexed : No content indexed
73:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=350
(time : 00:09:26)

74:http://www.napathon.net/AlbumID1116.php
(time : 00:09:32)

Meta Robots = NoIndex, or already indexed : No content indexed
75:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=51
(time : 00:09:38)

76:http://www.napathon.net/AlbumID1546.php
(time : 00:09:44)

77:http://www.napathon.net/AlbumID1547.php
(time : 00:09:50)

78:http://www.napathon.net/AlbumID534.php
(time : 00:09:56)

79:http://www.napathon.net/AlbumID1456.php
(time : 00:10:03)

80:http://www.napathon.net/AlbumID80.php
(time : 00:10:09)

81:http://www.napathon.net/AlbumID82.php
(time : 00:10:15)

82:http://www.napathon.net/AlbumID91.php
(time : 00:10:22)

83:http://www.napathon.net/AlbumID537.php
(time : 00:10:28)

84:http://www.napathon.net/AlbumID83.php
(time : 00:10:34)

85:http://www.napathon.net/AlbumID1336.php
(time : 00:10:40)

86:http://www.napathon.net/AlbumID1077.php
(time : 00:10:47)

87:http://www.napathon.net/AlbumID92.php
(time : 00:10:53)

88:http://www.napathon.net/AlbumID802.php
(time : 00:11:00)

89:http://www.napathon.net/AlbumID93.php
(time : 00:11:06)

90:http://www.napathon.net/AlbumID826.php
(time : 00:11:13)

91:http://www.napathon.net/BeeGees2.php
(time : 00:11:19)
+ + + + + + + + + + + + +
92:http://www.napathon.net/BeeGees18.php
(time : 00:11:28)
+ + + + + + + + + + + + + +
Meta Robots = NoIndex, or already indexed : No content indexed
93:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=92
(time : 00:11:36)

94:http://www.napathon.net/AlbumID242.php
(time : 00:11:42)

Meta Robots = NoIndex, or already indexed : No content indexed
95:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=97
(time : 00:11:48)

96:http://www.napathon.net/AlbumID1296.php
(time : 00:11:55)

97:http://www.napathon.net/AlbumID253.php
(time : 00:12:01)

Meta Robots = NoIndex, or already indexed : No content indexed
98:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=119
(time : 00:12:07)

99:http://www.napathon.net/AlbumID297.php
(time : 00:12:13)

Meta Robots = NoIndex, or already indexed : No content indexed
100:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=400
(time : 00:12:19)

vinyl-junkie 07-05-2004 11:28 AM

Here's what shows up in my error log:
Quote:

[Mon Jul 5 18:32:07 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/BWStevenson/top
[Mon Jul 5 18:32:06 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/BWStevenson/top/
[Mon Jul 5 18:26:54 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/theimage
[Mon Jul 5 18:26:54 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/theimage/
The last pair of error messages appears 4 other times.

Wonder why spidering works for you and not for me. :confused: All I did was delete the contents of my folders on the server, deleted and re-created my phpdig database and the user for it, made the necessary changes to config.php and connect.php, then re-spidered.

Charter 07-05-2004 11:39 AM

Hi. Those errors are just 404s, created when PhpDig thinks it found a link, but really it's not a link. The 404s shouldn't cause a problem with spidering though. Version 1.8.1 alpha, was it working for you, but now it's not? Are you using the two replacement files with the alpha version?

vinyl-junkie 07-05-2004 03:45 PM

Well, I did indeed discover that I had made a mistake with those two replacement files. True, I had unzipped them, but they didn't end up where I thought they were. Consequently, my previous spidering was with the wrong files.

However, letting the spider run its course with the correct 1.8.1 files, I still only have 1264 pages spidered, which is about 200 or so pages short of what I end up with for 1.8.0. I have no idea why. :(

Charter 07-05-2004 08:02 PM

Hi. Actually, I think it is working as it should.

Before respidering anything, using version 1.8.1, if you do an exact phrase search on "rock collection: page" (without the quotes) and see how many Rock Collection: Page X titles show up in the search results, you'll see the following, along with some other titles:

Rock Collection: Page 1
Rock Collection: Page 2
Rock Collection: Page 3
Rock Collection: Page 4
Rock Collection: Page 5
Rock Collection: Page 6
Rock Collection: Page 7
Rock Collection: Page 8
Rock Collection: Page 9
Rock Collection: Page 34
Rock Collection: Page 35
Rock Collection: Page 36
Rock Collection: Page 37
Rock Collection: Page 38
Rock Collection: Page 39
Rock Collection: Page 40

Seeing as how you kept SPIDER_MAX_LIMIT at ten, the index process worked as follows (using RockX.php links as example, omitting other links from the example):

Level Zero

http://www.napathon.net/ (links to musicmenu.php)

Level One:

http://www.napathon.net/musicmenu.php (links to Rock1.php)

Level Two:

http://www.napathon.net/Rock1.php (links to Rock2.php and Rock41.php)

Level Three:

http://www.napathon.net/Rock2.php (links to Rock3.php and Rock41.php)
http://www.napathon.net/Rock41.php (links to Rock1.php and Rock40.php)

Level Four:

http://www.napathon.net/Rock3.php (links to Rock4.php and Rock41.php)
http://www.napathon.net/Rock40.php (links to Rock1.php and Rock39.php)

Level Five:

http://www.napathon.net/Rock4.php (links to Rock5.php and Rock41.php)
http://www.napathon.net/Rock39.php (links to Rock1.php and Rock38.php)

Level Six:

http://www.napathon.net/Rock5.php (links to Rock6.php and Rock41.php)
http://www.napathon.net/Rock38.php (links to Rock1.php and Rock37.php)

Level Seven:

http://www.napathon.net/Rock6.php (links to Rock7.php and Rock41.php)
http://www.napathon.net/Rock37.php (links to Rock1.php and Rock36.php)

Level Eight:

http://www.napathon.net/Rock7.php (links to Rock8.php and Rock41.php)
http://www.napathon.net/Rock36.php (links to Rock1.php and Rock35.php)

Level Nine:

http://www.napathon.net/Rock8.php (links to Rock9.php and Rock41.php)
http://www.napathon.net/Rock35.php (links to Rock1.php and Rock34.php)

Level Ten:

http://www.napathon.net/Rock9.php (links to Rock10.php and Rock41.php)
http://www.napathon.net/Rock34.php (links to Rock1.php and Rock33.php)

So, with SPIDER_MAX_LIMIT at ten, PhpDig won't go further than ten levels. Applied to the above example, this means Rock10.php through Rock33.php were not indexed.

Solution: Increase SPIDER_MAX_LIMIT in the config file and then select a higher search depth to index your site.

Charter 07-05-2004 08:23 PM

Quote:

Originally posted by shinji
another bug(?) i've found:

whenever someone searched for something and clicks on one of the results he gets the htpassword-prompt "Administration-1736 PhpDig" and the result
Hi, and thanks. This happens when define('PHPDIG_ADM_AUTH','1'); is set in the config file. Just comment out the following code in the clickstats.php file:
PHP Code:

if (is_file("$relative_script_path/libs/auth.php")) {
    include 
"$relative_script_path/libs/auth.php";
}
else {
    die(
"Cannot find auth.php file.\n");




All times are GMT -8. The time now is 02:39 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.