PDA

View Full Version : Rate of spidering: is it determined by the server?


misterbearcom
06-07-2004, 03:51 PM
I was wondering if anyone happned to know what would determine the speed of the spidering. Currently, I am spidering at an average rate of 375 URLs per hour. That seems rather slow. Would that have anything to do with the server's processor speeds? Or would it be a combination of a bunch of different factors such as:

1.) Server processor speed.
2.) Server OS.
3.) Internet bandwidth of server.
4.) My client script on my browser.

I tend to think it's 1-3 and not #4. But if anyone else has some feedback about how fast they are able to spider I would appreciate it. Thanks.

vinyl-junkie
06-07-2004, 05:48 PM
When I spidered my site recently around noon that day, it took over 2 hours to spider about 1,500 pages. I re-spidered the same site close to midnight that same day, and it took about 45 minutes. I tend to think that during lighter traffic times on your website, the spidering process would be faster. Just a guess though.

synnalagma
06-08-2004, 01:56 AM
1.) Server processor speed.
Of course this will change indexing speed
2.) Server OS.
Linux should be faster (it is for MySQL)
3.) Internet bandwidth of server.
If you index site that aren't on the server, of course it mather
4.) My client script on my browser.
No, this should't change anything

5) PHP configuration
If you have a low memory limit and so on it can slow indexing process
6)Load of your server
If there's ressource intensive scripts on your server this can also be scripts of your neighbours (if you're on a shared server) this can slow down indexing. Try to know where your server is located (I mean lot of european server are located in USA) to choose the right hour to do the job.

misterbearcom
06-08-2004, 10:53 AM
Thanks, guys. It's very much appreciated. I had a feeling that there would be a few contributing factors. I bet my server is pretty bogged down since the number of databases being used.

In the future I suppose I will have to consider renting my own server somewhere. If anyone knows of any great rates with PHP 4.3+ and MySQL I'd appreciate it. Otherwise I was thinking about a local server company, http://www.serverbeach.com which I believe has a good rate ($99/month) for Linux Redhat. But I'm still debating about this.

Again, thanks. I really appreciate the info. I'll have to do some more brainstorming about what would be the best thing to do.

vinyl-junkie
06-08-2004, 05:44 PM
You don't say how much disk space you require. That makes a big difference in what web host anyone could recommend. My web host is MindStormHosting (http://www.mindstormhosting.com/). I've been with them for about six months and have been very happy with their service. There are several hosting packages to choose from. You might want to check them out to see if they'd have what you need.

misterbearcom
06-08-2004, 06:20 PM
Originally posted by vinyl-junkie
You don't say how much disk space you require. That makes a big difference in what web host anyone could recommend. My web host is MindStormHosting (http://www.mindstormhosting.com/). I've been with them for about six months and have been very happy with their service. There are several hosting packages to choose from. You might want to check them out to see if they'd have what you need.

Currently, I use Neureal.com who are really great. However I know when logging on via cocoamysql that there must be at least a hundred mysql databases on the same server all running at the same time. So, it gets a bit bogged down.

I am not sure how much storage space I would need however, I am looking to grow a phpdig-based website in terms of collecting as many urls as possible but am currently on a limited budget, so I really do not know as of yet. However, more of anything in terms of hardware and software would always be better, me thinks. :D

robertDouglass
06-09-2004, 05:39 AM
I was wondering if there are any optimizations one can make when spidering a site hosted on the same server (same domain)? In particular, if I tell phpdig to spider www.mydomain.com, doesn't this involve the DNS server and a roundtrip to the internet? I tried localhost, but that didn't work (shared hosting). Any suggestions?