PDA

View Full Version : Version 1.8.1 Alpha


Charter
05-16-2004, 12:52 PM
Hi. Download PhpDig 1.8.1 alpha <removed>. What do you think?

A couple of notes on feedback:

Please, no mod requests here, just feedback on what's done so far in the alpha version itself. Also, please, no what else are you going to add questions, as I'm not sure, but read on for possibilities.

What's in the alpha version:

Some things from this thread (http://www.phpdig.net/showthread.php?threadid=894) include "did you mean X" instead, different keyword storage, search by site or directory, click tracking, cron job management, limit spider to max of Y number of links per depth per site, and other config options.

Other things include removes '-' index pages, RSS feeds by search, robots.txt reading updated, read base href tags for indexing, tis-620 support added, allow some extra characters in URLs, bug fixes, possible https support.

Remember this is an alpha version - some things might not work as expected. Suggestion: install in a test directory rather than overwriting your current version.

Some things that may or may not be added:

Banner abilities, simultaneous spiders, add your site form, GET request modification, admin panel changes, different authentication method, thumbnail support, different searching options, allow for more than one encoding, treat directories within a site as different domains, and whatever else.

Something that probably won't be added:

Multi-byte support - Why? See this (http://www.php.net/manual/en/function.mb-eregi-replace.php) and other (http://www.php.net/manual/en/ref.mbstring.php) funcitons where it says, "This function is EXPERIMENTAL. The behaviour of this function, the name of this function, and anything else documented about this function may change without notice in a future release of PHP. Use this function at your own risk."

Note: version 1.8.1 alpha contains three new tables (clicks, site_page, and sites_days_upd) that you will need to create. See the init_dq.sql file for these tables, and remember to add your PhpDig prefix to the tables if you used one.

Special thanks to all who made suggestions and contributions!!!

EDIT: PhpDig version 1.8.1 released.

sktest
05-16-2004, 01:53 PM
Hi,

i have download the alpha version.
at the spidering, i get the following errors:

Warning: parse_url(http://?modul=): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=aboutme): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=projekte): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=bilder): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=sonstiges): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=wohnort): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=gastbuch): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=verkaufe): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=impressum): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=kontakt): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skdownloader): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=simpleamp): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=simpleampskins): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skcoverdesigner): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=vbruntime): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skdeineip): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skcam): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skscreenmatrix): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skscreenhypnotic): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=sknetsender): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skgta2cheater): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skvirtualdrive): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skspider): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=sksendlater): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=skclassroom): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=mjh-pong): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=uebersicht): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=leer): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=leer): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

Warning: parse_url(http://?modul=verkaufe): Unable to parse url in /home/pub_hogus_de/www/search/admin/robot_functions.php on line 1479

sorry, for my bad english. i come from germany :-)

Charter
05-16-2004, 04:36 PM
Hi. I see the issue. Until I get a fix, just set define('PHPDIG_IN_DOMAIN',false); in the config file.

vinyl-junkie
05-16-2004, 08:10 PM
Originally posted by Charter
Hi. I see the issue. Until I get a fix, just set define('PHPDIG_IN_DOMAIN',false); in the config file. When I did what you suggest here, even with a search depth of 10, I only get the root indexed. When I set this value to true, as sktest was doing, I get those parse errors.

Isn't alpha testing fun? :D

sktest
05-17-2004, 03:57 AM
Yes, i have the same problem as vinyl-junkie

Wayne McBryde
05-19-2004, 07:39 PM
All I get is:
Unable to connect to database : Check the connection script.

I had installed 1.8.1 on a new clean website. All it has is index.html. I uploaded the .zip file and unziped it. I ran http://domain.com/search/admin/install.php there is no install.php file. I ran the index.php file and got the error above.

Then I removed all PHPDig files. Repeated the above process with 1.8.0. I ran http://domain.com/search/admin/install.php and it worked fine, no errors. I spidered 2 websites, no errors. Then I uploaded 1.8.1 again and unziped it. I ran http://domain.com/search/admin/ and got the error again.

What am I missing?

Charter
05-19-2004, 08:54 PM
Hi. Looks like I forgot to include the install file in the zip. :eek:

I'll post an update once I get some kinks worked out of the alpha version.

bloodjelly
05-21-2004, 12:13 PM
I was wondering about that.:)

shinji
06-06-2004, 07:57 AM
Hi,

on some sites i get this error:

HTTP/1.1 404 Not Found See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.

HTTP/1.1 404 Not Found See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.

between the "Spidering in progress..." and the spidering itself
(i tried it with many domains - on about the half sites such errors apper)

as example 1 domain: http://www.otaku-forum.net/

sktest
06-08-2004, 05:26 AM
@ Charter: When come the next release out?

shinji
06-08-2004, 08:41 AM
another bug(?) i've found:

whenever someone searched for something and clicks on one of the results he gets the htpassword-prompt "Administration-1736 PhpDig" and the result

Charter
07-04-2004, 03:13 PM
Hi. In the <removed> file are two replacement files: the spider.php and robot_functions.php files. If you are alpha testing PhpDig version 1.8.1, then just copy over the old alpha files with those in the attached file, and let me know how it goes. Thanks.

EDIT: PhpDig version 1.8.1 released.

vinyl-junkie
07-04-2004, 04:01 PM
I didn't get any errors (well, one minor one - see below) but it only indexed 28 pages. I have almost 1,500 pages that get indexed with 1.8.0! FWIW, I made sure my tables were totally empty before I started.

The minor error:
I had been getting some strange 404 errors in my server log which I hadn't been able to figure out - a missing location.href file. I figured out where that was on my pages (based on the message that comes out on this new version of phpdig - thanks for that!), so I did this to my pages:<!-- phpdigExclude -->
<script><!--
<!-- Get me out of this frame
if (window!=window.top)
top.location.href=location.href;
// -->
// -->
</script>
<!-- phpdigInclude -->except the new 1.8.1 is still trying to spider that and gives me a 404.

Charter
07-04-2004, 04:11 PM
Hi. A 'links_per' (links per depth) set to zero means to crawl all links at each seach depth. A 'links_per' set to ten means to crawl at most ten links per depth. To try to crawl all pages just set 'search depth' to ten and 'links_per' to zero. The phpExclude/Include comments work as in this (http://www.phpdig.net/showthread.php?postid=1982#post1982) post, meaning that PhpDig will follow whatever it deems a link. With version 1.8.1, the 404s now show onscreen, whereas they didn't before. Thanks for the feedback.

vinyl-junkie
07-04-2004, 07:10 PM
OK, I did as you suggested. First I cleared out my tables, set the "links-per" and search depth as you suggested, then re-spidered my site. It might be looking for the proverbial needle in the haystack to find the differences between 1.8.0 and 1.8.1, but I'm still a bit short of the pages that should have been indexed.

1.8.0 - 1,514 pages
1.8.1 - 1,262 pages

If it would help, I can nose around and find some pages that weren't picked up by 1.8.1.

vinyl-junkie
07-04-2004, 08:01 PM
Don't know if this helps at all, but I found a few pages that were indexed with 1.8.0 and weren't with 1.8.1. These are all from www.napathon.net

/AlbumID1107.php - which can be found on: /Rock12.php
/AlbumID1113.php - on /Rock23.php
/AlbumID1114.php & /AlbumID1115.php - on /Rock28.php

Doesn't make sense why these pages wouldn't be spidered. I don't have a complete spider log or anything, but I could go spider again and make that if you need it.

Charter
07-04-2004, 08:34 PM
Hi. Try comparing the version 1.8.0 config file against the version 1.8.1 config file. Perhaps something there is causing the difference?

vinyl-junkie
07-04-2004, 09:20 PM
Yes, there are some differences in my config file between 1.8.0 and 1.8.1. I'm not sure whether they'd make that much difference though. Here are the ones that have anything to do with spidering:

SPIDER_MAX_LIMIT 20 (1.8.0) vs. 10 (1.8.1)
SPIDER_DEFAULT_LIMIT 3 (1.8.0) vs. 5 (1.8.1)

Everything else is the same in both versions.

I'll copy the 1.8.1 config file to my server, re-spider and let you know what happens.

Charter
07-05-2004, 06:04 AM
Hi. Try checking CHUNK_SIZE in the config file too.

vinyl-junkie
07-05-2004, 06:33 AM
CHUNK_SIZE in the 1.8.1 config file I have is 1024 vs. 2048 in my original config file.

When I re-spidered last night after copying the 1.8.1 config file to the server, I got 2 more pages indexed this time. Still far short of what I should have.

Charter, when you get a chance, could you put together a complete 1.8.1 zip file with all the latest stuff? I want to re-download again to make absolute certain I'm using all files from that. Then I'll re-spider again and let you know what happens.

Nice to have plenty of bandwidth to do that. ;)

Charter
07-05-2004, 07:05 AM
PhpDig: 1.8.1 alpha <removed> and two replacement files <removed>. Manual install still required as install.php not yet included.

EDIT: PhpDig version 1.8.1 released.

vinyl-junkie
07-05-2004, 08:02 AM
Are you aware that the 1.8.1 alpha zip has nothing separated into the appropriate folders? :( I don't remember it being like that when I downloaded it the first time.

Charter
07-05-2004, 08:07 AM
Hi. It should be separated as before; it's the same file. Maybe an unzip program option needs to be un/checked?

vinyl-junkie
07-05-2004, 10:58 AM
You were correct. I had "Use Folder Names" un-checked. :o

OK, this time I emptied all my folders and started from scratch with those two zip files. Search depth = 10. Links per = 0.

Here's the spider log:

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.napathon.net/
Exclude paths :
- test/
- phpdig181/
- BW-Original/
- Joe_and_Eddie/
1:http://www.napathon.net/
(time : 00:00:06)

No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://www.napathon.net/
Optimizing tables...
Indexing complete !



==========================
Hosts: 1 Pages
Entries: 1 Pages
Index: 177 Entries
Keywords: 177 Entries
Temporary Table: 0 Entries

Charter
07-05-2004, 11:47 AM
Hi. When indexing http://www.napathon.net/ with search depth of ten and links_per of zero, and hand stopping after 100 links, below is the output. Do you not get this? Does anything show in your error logs?

Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://www.napathon.net/
Exclude paths :
- test/
- phpdig181/
- BW-Original/
- Joe_and_Eddie/
1:http://www.napathon.net/
(time : 00:00:08)
+ + + + + + +
level 1...
2:http://www.napathon.net/miscmenu.php
(time : 00:00:20)
+ + + + + + + + + + + +
3:http://www.napathon.net/musicmenu.php
(time : 00:00:28)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
4:http://www.napathon.net/SearchMenu.php
(time : 00:00:42)
Ok for http://search.napathon.net/search.php (site_id:469)
+
5:http://www.napathon.net/sitemap.php
(time : 00:00:50)

6:http://www.napathon.net/FAQ.php
(time : 00:00:57)

7:http://www.napathon.net/ContactMe.php
(time : 00:01:03)

8:http://www.napathon.net/Privacy.php
(time : 00:01:09)

level 2...
9:http://www.napathon.net/1219AshlandIntro.php
(time : 00:01:21)

10:http://www.napathon.net/1219AshlandSlideShow.php
(time : 00:01:27)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

11:http://www.napathon.net/BeeGeesMobile.php
(time : 00:01:34)

12:http://www.napathon.net/BoganReunion2003SlideShow.php
(time : 00:01:40)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

13:http://www.napathon.net/EstherEscortToHeaven.php
(time : 00:01:48)

14:http://www.napathon.net/MeAtWork.php
(time : 00:01:54)

15:http://www.napathon.net/RekkidRoom.php
(time : 00:02:00)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

16:http://www.napathon.net/WongFamily.php
(time : 00:02:08)

17:http://www.napathon.net/Wonglets.php
(time : 00:02:14)

18:http://www.napathon.net/BillsRecordsSlideShow.php
(time : 00:02:20)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

19:http://www.napathon.net/JohnnyGimbleSlideShow.php
(time : 00:02:27)

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/theimage
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

20:http://www.napathon.net/CDTrusteeReview.php
(time : 00:02:33)

21:http://www.napathon.net/MusicIntro.php
(time : 00:02:39)

Meta Robots = NoIndex, or already indexed : No content indexed
22:http://www.napathon.net/MyCollection1.php
(time : 00:02:45)

23:http://www.napathon.net/NewArrivals1.php
(time : 00:02:52)
+ + + + +
24:http://www.napathon.net/BeeGees1.php
(time : 00:03:00)
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
25:http://www.napathon.net/Blues1.php
(time : 00:03:11)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
26:http://www.napathon.net/Corrs1.php
(time : 00:03:24)
+ + + + + + + + + + + + + + + + + + + + +
27:http://www.napathon.net/Country1.php
(time : 00:03:34)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
28:http://www.napathon.net/EasyListening1.php
(time : 00:03:46)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
29:http://www.napathon.net/Folk1.php
(time : 00:03:59)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
30:http://www.napathon.net/Jazz1.php
(time : 00:04:10)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
31:http://www.napathon.net/Miscellaneous1.php
(time : 00:04:23)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
32:http://www.napathon.net/Reggae1.php
(time : 00:04:34)
+ + + + + + + + + + + + + + + + + + +
33:http://www.napathon.net/Rock1.php
(time : 00:04:44)
+ + + + + + + + + + + + + + + + + + + + + + + + + + +
34:http://www.napathon.net/SavageGarden2.php
(time : 00:04:55)
+ + + + + + + + + + + + + + + + + + + + + + +
35:http://www.napathon.net/TradeList.php
(time : 00:05:05)
+ + + + + + + + + + + + + + + + + +
36:http://www.napathon.net/WantList.php
(time : 00:05:14)

37:http://www.napathon.net/BadTrader.php
(time : 00:05:21)

38:http://www.napathon.net/LPtoCD.php
(time : 00:05:28)
+
39:http://www.napathon.net/ABeeGeesChristmas.php
(time : 00:05:34)

40:http://www.napathon.net/BeeGeeTastic.php
(time : 00:05:40)

41:http://www.napathon.net/KerrvilleEarlyYearsReview.php
(time : 00:05:48)

42:http://www.napathon.net/GottaGetReview.php
(time : 00:05:54)

43:http://www.napathon.net/ThreeBees.php
(time : 00:06:00)

44:http://www.napathon.net/BeeGees6.php
(time : 00:06:07)
+ + + + + + + + + + + + + + + + + + + + + +
45:http://www.napathon.net/WeLoveTheBeeGees.php
(time : 00:06:18)

46:http://www.napathon.net/BillsRecordsArticle.php
(time : 00:06:24)
+
47:http://www.napathon.net/IStartedAJoke.php
(time : 00:06:31)

48:http://www.napathon.net/JohnnyGimbleConcert.php
(time : 00:06:38)

49:http://www.napathon.net/CliveAnderson.php
(time : 00:06:44)

50:http://www.napathon.net/RustyWier.php
(time : 00:06:50)

51:http://www.napathon.net/InternetCollecting.php
(time : 00:06:58)

52:http://www.napathon.net/BWStevenson.php
(time : 00:07:05)

53:http://www.napathon.net/BWStevensonOnRhino.php
(time : 00:07:12)

54:http://www.napathon.net/BW-Lyrics.php
(time : 00:07:18)
+ + + + + + + + + +
55:http://www.napathon.net/BWStevenson/bw_intro.php
(time : 00:07:26)

56:http://www.napathon.net/BWStevenson/bw_page_1.php
(time : 00:07:32)

57:http://www.napathon.net/BWStevenson/bw_discography.php
(time : 00:07:39)

HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top/
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.

HTTP/1.1 404 Not Found - http://www.napathon.net/BWStevenson/top
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for explanation.
404s are either dead links or something looked like a link to PhpDig so PhpDig tried to crawl it.
+
58:http://www.napathon.net/BWStevenson/bw_memories.php
(time : 00:07:47)

59:http://www.napathon.net/BWStevenson/bw_tv.php
(time : 00:07:55)

Meta Robots = NoIndex, or already indexed : No content indexed
60:http://www.napathon.net/MusicDBSearch.php
(time : 00:08:01)

level 3...
Meta Robots = NoIndex, or already indexed : No content indexed
61:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=74
(time : 00:08:12)

62:http://www.napathon.net/AlbumID1552.php
(time : 00:08:18)

63:http://www.napathon.net/AlbumID1553.php
(time : 00:08:24)

Meta Robots = NoIndex, or already indexed : No content indexed
64:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=55
(time : 00:08:30)

65:http://www.napathon.net/AlbumID1557.php
(time : 00:08:36)

Meta Robots = NoIndex, or already indexed : No content indexed
66:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=338
(time : 00:08:43)

67:http://www.napathon.net/AlbumID1063.php
(time : 00:08:49)

Meta Robots = NoIndex, or already indexed : No content indexed
68:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=354
(time : 00:08:55)

69:http://www.napathon.net/AlbumID1153.php
(time : 00:09:01)

Meta Robots = NoIndex, or already indexed : No content indexed
70:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=39
(time : 00:09:07)

71:http://www.napathon.net/AlbumID42.php
(time : 00:09:13)

72:http://www.napathon.net/AlbumID43.php
(time : 00:09:20)

Meta Robots = NoIndex, or already indexed : No content indexed
73:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=350
(time : 00:09:26)

74:http://www.napathon.net/AlbumID1116.php
(time : 00:09:32)

Meta Robots = NoIndex, or already indexed : No content indexed
75:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=51
(time : 00:09:38)

76:http://www.napathon.net/AlbumID1546.php
(time : 00:09:44)

77:http://www.napathon.net/AlbumID1547.php
(time : 00:09:50)

78:http://www.napathon.net/AlbumID534.php
(time : 00:09:56)

79:http://www.napathon.net/AlbumID1456.php
(time : 00:10:03)

80:http://www.napathon.net/AlbumID80.php
(time : 00:10:09)

81:http://www.napathon.net/AlbumID82.php
(time : 00:10:15)

82:http://www.napathon.net/AlbumID91.php
(time : 00:10:22)

83:http://www.napathon.net/AlbumID537.php
(time : 00:10:28)

84:http://www.napathon.net/AlbumID83.php
(time : 00:10:34)

85:http://www.napathon.net/AlbumID1336.php
(time : 00:10:40)

86:http://www.napathon.net/AlbumID1077.php
(time : 00:10:47)

87:http://www.napathon.net/AlbumID92.php
(time : 00:10:53)

88:http://www.napathon.net/AlbumID802.php
(time : 00:11:00)

89:http://www.napathon.net/AlbumID93.php
(time : 00:11:06)

90:http://www.napathon.net/AlbumID826.php
(time : 00:11:13)

91:http://www.napathon.net/BeeGees2.php
(time : 00:11:19)
+ + + + + + + + + + + + +
92:http://www.napathon.net/BeeGees18.php
(time : 00:11:28)
+ + + + + + + + + + + + + +
Meta Robots = NoIndex, or already indexed : No content indexed
93:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=92
(time : 00:11:36)

94:http://www.napathon.net/AlbumID242.php
(time : 00:11:42)

Meta Robots = NoIndex, or already indexed : No content indexed
95:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=97
(time : 00:11:48)

96:http://www.napathon.net/AlbumID1296.php
(time : 00:11:55)

97:http://www.napathon.net/AlbumID253.php
(time : 00:12:01)

Meta Robots = NoIndex, or already indexed : No content indexed
98:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=119
(time : 00:12:07)

99:http://www.napathon.net/AlbumID297.php
(time : 00:12:13)

Meta Robots = NoIndex, or already indexed : No content indexed
100:http://www.napathon.net/MusicDBSearch.php?BrowseArtistID=400
(time : 00:12:19)

vinyl-junkie
07-05-2004, 12:28 PM
Here's what shows up in my error log: [Mon Jul 5 18:32:07 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/BWStevenson/top
[Mon Jul 5 18:32:06 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/BWStevenson/top/
[Mon Jul 5 18:26:54 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/theimage
[Mon Jul 5 18:26:54 2004] [error] [client 69.59.195.15] File does not exist: /home/napathon/public_html/theimage/
The last pair of error messages appears 4 other times.

Wonder why spidering works for you and not for me. :confused: All I did was delete the contents of my folders on the server, deleted and re-created my phpdig database and the user for it, made the necessary changes to config.php and connect.php, then re-spidered.

Charter
07-05-2004, 12:39 PM
Hi. Those errors are just 404s, created when PhpDig thinks it found a link, but really it's not a link. The 404s shouldn't cause a problem with spidering though. Version 1.8.1 alpha, was it working for you, but now it's not? Are you using the two replacement files with the alpha version?

vinyl-junkie
07-05-2004, 04:45 PM
Well, I did indeed discover that I had made a mistake with those two replacement files. True, I had unzipped them, but they didn't end up where I thought they were. Consequently, my previous spidering was with the wrong files.

However, letting the spider run its course with the correct 1.8.1 files, I still only have 1264 pages spidered, which is about 200 or so pages short of what I end up with for 1.8.0. I have no idea why. :(

Charter
07-05-2004, 09:02 PM
Hi. Actually, I think it is working as it should.

Before respidering anything, using version 1.8.1, if you do an exact phrase search on "rock collection: page" (without the quotes) and see how many Rock Collection: Page X titles show up in the search results, you'll see the following, along with some other titles:

Rock Collection: Page 1
Rock Collection: Page 2
Rock Collection: Page 3
Rock Collection: Page 4
Rock Collection: Page 5
Rock Collection: Page 6
Rock Collection: Page 7
Rock Collection: Page 8
Rock Collection: Page 9
Rock Collection: Page 34
Rock Collection: Page 35
Rock Collection: Page 36
Rock Collection: Page 37
Rock Collection: Page 38
Rock Collection: Page 39
Rock Collection: Page 40

Seeing as how you kept SPIDER_MAX_LIMIT at ten, the index process worked as follows (using RockX.php links as example, omitting other links from the example):

Level Zero

http://www.napathon.net/ (links to musicmenu.php)

Level One:

http://www.napathon.net/musicmenu.php (links to Rock1.php)

Level Two:

http://www.napathon.net/Rock1.php (links to Rock2.php and Rock41.php)

Level Three:

http://www.napathon.net/Rock2.php (links to Rock3.php and Rock41.php)
http://www.napathon.net/Rock41.php (links to Rock1.php and Rock40.php)

Level Four:

http://www.napathon.net/Rock3.php (links to Rock4.php and Rock41.php)
http://www.napathon.net/Rock40.php (links to Rock1.php and Rock39.php)

Level Five:

http://www.napathon.net/Rock4.php (links to Rock5.php and Rock41.php)
http://www.napathon.net/Rock39.php (links to Rock1.php and Rock38.php)

Level Six:

http://www.napathon.net/Rock5.php (links to Rock6.php and Rock41.php)
http://www.napathon.net/Rock38.php (links to Rock1.php and Rock37.php)

Level Seven:

http://www.napathon.net/Rock6.php (links to Rock7.php and Rock41.php)
http://www.napathon.net/Rock37.php (links to Rock1.php and Rock36.php)

Level Eight:

http://www.napathon.net/Rock7.php (links to Rock8.php and Rock41.php)
http://www.napathon.net/Rock36.php (links to Rock1.php and Rock35.php)

Level Nine:

http://www.napathon.net/Rock8.php (links to Rock9.php and Rock41.php)
http://www.napathon.net/Rock35.php (links to Rock1.php and Rock34.php)

Level Ten:

http://www.napathon.net/Rock9.php (links to Rock10.php and Rock41.php)
http://www.napathon.net/Rock34.php (links to Rock1.php and Rock33.php)

So, with SPIDER_MAX_LIMIT at ten, PhpDig won't go further than ten levels. Applied to the above example, this means Rock10.php through Rock33.php were not indexed.

Solution: Increase SPIDER_MAX_LIMIT in the config file and then select a higher search depth to index your site.

Charter
07-05-2004, 09:23 PM
Originally posted by shinji
another bug(?) i've found:

whenever someone searched for something and clicks on one of the results he gets the htpassword-prompt "Administration-1736 PhpDig" and the result

Hi, and thanks. This happens when define('PHPDIG_ADM_AUTH','1'); is set in the config file. Just comment out the following code in the clickstats.php file:

if (is_file("$relative_script_path/libs/auth.php")) {
include "$relative_script_path/libs/auth.php";
}
else {
die("Cannot find auth.php file.\n");
}

Charter
07-05-2004, 09:30 PM
Hi. To test PhpDig 1.8.1 alpha, download <removed> and the two replacement files <removed>. Manual install still required as install.php not yet included.

Please read through this thread for possible solutions to any problems encountered. If you continue to have problems, please open a new thread in the PhpDig Forums (http://www.phpdig.net/forumdisplay.php?forumid=21).

Some 'alpha' kinks have been fixed, some mods are still in the works, but this thread is now closed. Thanks.

EDIT: PhpDig version 1.8.1 released.