PDA

View Full Version : Understanding logs from web indexing


kenazo
03-15-2004, 10:09 AM
Thanks for your help on my earlier question Charter. Well the spider is crawling away now and hopefully working. I am wondering how to interpret the logs that show up in the web page while I'm indexing. Here's an example:

19:http://www.nt-gateway.com/bookgosp.htm
(time : 01:19:28)
+ + + + + + + + Ok for http://www.thehungersite.com/ (site_id:30)
Ok for http://www.amazon.co.uk/exec/obidos/ASIN/080284653X/thenewtestamen07 (site_id:58)
Ok for http://www.amazon.de/exec/obidos/ASIN/080284653X/thenewtestamen0e (site_id:59)
+ + + + + + + Ok for http://www.groups.yahoo.com/group/ntgateway (site_id:35)
Ok for http://pub17.bravenet.com/guestbook/show.asp?usernum=1432383579&cpv=1 (site_id:26)

So the first url is obviously the root site I started on in this instance (it's not the my main site where the search box will be). The + means it has discovered a link for the next level right? My question is what does "Ok for http://etc." mean? It seems to have added those domains in my main index page but they are not appearing in the searches from the search box?

Hopefully this is going to work:) The plan is to create a search database of biblical studies related sites (you can see my main search page at
deinde.org (http://www.deinde.org/resources.htm). Do you think PHPDIG can handle that much data? I already have in my admin page ~650 hosts!!

Charter
03-15-2004, 11:06 AM
Hi. The "+" means that PhpDig is indexing a page. The "OK for" means that PhpDig found a new site. The process is basically index site to search depth, index next site to search depth, index next site to search depth, etcetera, and as you are letting PhpDig crawl across domains, each new site gets added to an array and then indexed when the time comes. As for handling data, you may find this (http://www.phpdig.net/showthread.php?threadid=369) thread useful.