PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Yet Another indexing question (http://www.phpdig.net/forum/showthread.php?t=1766)

ffe 01-22-2005 12:00 PM

Yet Another indexing question
 
I have one server. RH 9.0 runs the Apache, MySQL, and 5 virtual web sites. I am able to index 4 of the sites successfully. The last site, will only index 3-4 pages then quits with no error or completion messages. I suspect the failure is caused by HTML page content. It might be an HTML coding error or obsolete style etc..

My question is: Are there any known coding styles/tags, comments etc. in HTML that will cause the spider to terminate abnormally? My failing (spider) pages display and behave correctly with MSIE, Netscape 7.1, and Firefox 1.0.

Charter 01-22-2005 12:21 PM

Given that your 'last site' works across browsers, I doubt it's an HTML issue. Without knowing more about this last site, all I can suggest is to select the 'no' radio button, set 'search depth' to a large value, set 'links per' to zero, and give it a whirl. Depending on this last site, you might try setting LIMIT_TO_DIRECTORY to false and PHPDIG_IN_DOMAIN to true, both in the config file.

ffe 01-22-2005 12:42 PM

Thanks for the feedback
 
Thanks for the interest in the question.. I have read a number of the other posts looking for clues to the problem. I have tried all optons you mention including making changes to the config.php.

Does line length in the HTML files have any affect on the spider? Like a buffer overflow perahps?

Charter 01-22-2005 01:41 PM

How many MB is the max-sized page? What's the link to the site?

ffe 01-23-2005 07:16 AM

Web address
 
None of the pages are particularly large. None over 50Kb. Below shows the result of the indexing process. This happens every time.

Spidering in progress... [Stop spider]
SITE : http://tulare.homelinux.net/
Exclude paths :
- @NONE@
1:http://tulare.homelinux.net/index.html
(time : 00:00:05)
+ + + + + + + + + + + + + + + + + + + + +
level 1...
2:http://tulare.homelinux.net/Chance_Phelps.html
(time : 00:00:24)

3:http://tulare.homelinux.net/underway.html
(time : 00:00:29)



The status line at the bottom of the browser screen shows "Done".

Thanks for the interest.


All times are GMT -8. The time now is 02:00 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.