![]() |
Yet Another indexing question
I have one server. RH 9.0 runs the Apache, MySQL, and 5 virtual web sites. I am able to index 4 of the sites successfully. The last site, will only index 3-4 pages then quits with no error or completion messages. I suspect the failure is caused by HTML page content. It might be an HTML coding error or obsolete style etc..
My question is: Are there any known coding styles/tags, comments etc. in HTML that will cause the spider to terminate abnormally? My failing (spider) pages display and behave correctly with MSIE, Netscape 7.1, and Firefox 1.0. |
Given that your 'last site' works across browsers, I doubt it's an HTML issue. Without knowing more about this last site, all I can suggest is to select the 'no' radio button, set 'search depth' to a large value, set 'links per' to zero, and give it a whirl. Depending on this last site, you might try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true, both in the config file.
|
Thanks for the feedback
Thanks for the interest in the question.. I have read a number of the other posts looking for clues to the problem. I have tried all optons you mention including making changes to the config.php.
Does line length in the HTML files have any affect on the spider? Like a buffer overflow perahps? |
How many MB is the max-sized page? What's the link to the site?
|
Web address
None of the pages are particularly large. None over 50Kb. Below shows the result of the indexing process. This happens every time.
Spidering in progress... [Stop spider] SITE : http://tulare.homelinux.net/ Exclude paths : - @NONE@ 1:http://tulare.homelinux.net/index.html (time : 00:00:05) + + + + + + + + + + + + + + + + + + + + + level 1... 2:http://tulare.homelinux.net/Chance_Phelps.html (time : 00:00:24) 3:http://tulare.homelinux.net/underway.html (time : 00:00:29) The status line at the bottom of the browser screen shows "Done". Thanks for the interest. |
All times are GMT -8. The time now is 09:03 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.