PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Problem Spidering... (http://www.phpdig.net/forum/showthread.php?t=1874)

dogfish 02-23-2005 02:56 PM

Problem Spidering...
 
Hello. I am fairly new to the phpdig thing. I am very happy with it on the whole. I have a couple questions though.
I am currently trying to spider wireworld.com for a client. I have set up a robots.txt file and go ahead and spider the root directory. The issue is this. There are about 1500 files in the root and the spider gets about 200-400 in and then I get all sorts of 404 errors.
I am wondering if this is a browser issue? I am running it on Mozilla.
I have no experience with shell at all. I don't even know if I can get access to the shell. The server is one and a half hours away.

Any suggestions?

Thanks.

Charter 02-26-2005 06:38 PM

Are the 1500 files linked to other pages? PhpDig follows links to index. If you have orphan pages, check this thread. Also, try using search depth set to a large number, links per set to zero, select no, set LIMIT_TO_DIRECTORY to false, and set PHPDIG_IN_DOMAIN to true. Not sure about the 404s as I cannot see the requests. Shell is accessible remotely, assuming you have permission. SSH/Telnet, cPanel, etcetera, can be used to access shell.


All times are GMT -8. The time now is 08:55 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.