View Full Version : Yet Another indexing question
I have one server. RH 9.0 runs the Apache, MySQL, and 5 virtual web sites. I am able to index 4 of the sites successfully. The last site, will only index 3-4 pages then quits with no error or completion messages. I suspect the failure is caused by HTML page content. It might be an HTML coding error or obsolete style etc..
My question is: Are there any known coding styles/tags, comments etc. in HTML that will cause the spider to terminate abnormally? My failing (spider) pages display and behave correctly with MSIE, Netscape 7.1, and Firefox 1.0.
Charter
01-22-2005, 12:21 PM
Given that your 'last site' works across browsers, I doubt it's an HTML issue. Without knowing more about this last site, all I can suggest is to select the 'no' radio button, set 'search depth' to a large value, set 'links per' to zero, and give it a whirl. Depending on this last site, you might try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true, both in the config file.
Thanks for the interest in the question.. I have read a number of the other posts looking for clues to the problem. I have tried all optons you mention including making changes to the config.php.
Does line length in the HTML files have any affect on the spider? Like a buffer overflow perahps?
Charter
01-22-2005, 01:41 PM
How many MB is the max-sized page? What's the link to the site?
None of the pages are particularly large. None over 50Kb. Below shows the result of the indexing process. This happens every time.
Spidering in progress... [Stop spider]
SITE : http://tulare.homelinux.net/
Exclude paths :
- @NONE@
1:http://tulare.homelinux.net/index.html
(time : 00:00:05)
+ + + + + + + + + + + + + + + + + + + + +
level 1...
2:http://tulare.homelinux.net/Chance_Phelps.html
(time : 00:00:24)
3:http://tulare.homelinux.net/underway.html
(time : 00:00:29)
The status line at the bottom of the browser screen shows "Done".
Thanks for the interest.
vBulletin® v3.7.3, Copyright ©2000-2025, Jelsoft Enterprises Ltd.