View Single Post
Old 05-18-2005, 01:11 AM   #1
dhorwitz
Green Mole
 
Join Date: May 2005
Posts: 3
spidering in domain 2 problems

Hi All,

Spidering a domain using PHPDIG_IN_DOMAIN I have noticed 2 problems (in phpdig-1.8.7):

1) spidering a domain (in my case a large univeristy domain) from the main institutional site results in some sites not being recognized as on domain. For instance if the search starts at:
www.uct.ac.za then web.uct.ac.za is recodnised as part of the domain while www.ched.uct.ac.za is not (ie to check domains it seems to strip the first part rather than checl the end of the domain)

2) When it encounters a new site it recorded in the temp file as at / rather than the page linked. So sites that are not searchable from the root folder don't get indexed

I'll have a look in the code and see what I can find...

David
dhorwitz is offline   Reply With Quote