All sorted now
The strangest thing was happening. On inspection of the first_words field of the digSpider table - I found that the spider wasn't actuallly crawling each page. It took ages to find out why. To make the pages W3C compliant, i was writing the dynamic URL's like:
index.php?page=articleView& amp; articleId=336 - now a browser knows to render & amp; as & , but the spider does not - so it was just getting my built in 'Page cannot be found error' for every article.
|