PDA

View Full Version : Problem Spidering...


dogfish
02-23-2005, 02:56 PM
Hello. I am fairly new to the phpdig thing. I am very happy with it on the whole. I have a couple questions though.
I am currently trying to spider wireworld.com for a client. I have set up a robots.txt file and go ahead and spider the root directory. The issue is this. There are about 1500 files in the root and the spider gets about 200-400 in and then I get all sorts of 404 errors.
I am wondering if this is a browser issue? I am running it on Mozilla.
I have no experience with shell at all. I don't even know if I can get access to the shell. The server is one and a half hours away.

Any suggestions?

Thanks.

Charter
02-26-2005, 06:38 PM
Are the 1500 files linked to other pages? PhpDig follows links to index. If you have orphan pages, check this (http://www.phpdig.net/forum/showthread.php?t=1139) thread. Also, try using search depth set to a large number, links per set to zero, select no, set LIMIT_TO_DIRECTORY to false, and set PHPDIG_IN_DOMAIN to true. Not sure about the 404s as I cannot see the requests. Shell is accessible remotely, assuming you have permission. SSH/Telnet, cPanel, etcetera, can be used to access shell.