PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Shell Spidering Quits After Indexing A Few Pages (http://www.phpdig.net/forum/showthread.php?t=1404)

vinyl-junkie 09-30-2004 08:41 PM

Shell Spidering Quits After Indexing A Few Pages
 
I'm spidering from a shell command for the first time and having some problems. The process ran for around 5 minutes yesterday, and indexed about 50 pages, then said it was complete. I launched the shell command again, phpdig indexed a few more pages, then quit again. Same thing on a third try.

I have 1,500+ pages on my site, which indexes just fine when I run it from the secure web page, so why won't it do the same when I run it from a shell command? :confused:

Charter 09-30-2004 09:58 PM

What are these set to in the config file?
PHP Code:

define('SPIDER_MAX_LIMIT',20);          //max recurse levels in spider
define('RESPIDER_LIMIT',5);             //recurse respider limit for update
define('LINKS_MAX_LIMIT',20);           //max links per each level
define('RELINKS_LIMIT',5);              //recurse links limit for an update 


vinyl-junkie 10-01-2004 04:25 AM

Here's what I have:
Code:

define('SPIDER_MAX_LIMIT',40);          //max recurse levels in spider
define('RESPIDER_LIMIT',40);            //recurse respider limit for update
define('LINKS_MAX_LIMIT',30);          //max links per each level
define('RELINKS_LIMIT',40);              //recurse links limit for an update


vinyl-junkie 10-01-2004 11:04 PM

Here's a little more information, for whatever it's worth. I've just moved my website to a new host, and am trying to rebuild my phpdig search engine from scratch. The same performance issues are happening when I run phpdig as a secured web page as when I run via shell. Any idea what the problem could be?

Charter 10-01-2004 11:36 PM

Does it actually complete the index, or does it just stop after five minutes?

vinyl-junkie 10-02-2004 05:25 AM

It says indexing is complete, then stops.

I noticed when I ran phpdig one more time last night, the process ran to completion. Very strange. I sense this has been some sort of timeout issue. I wrote my web host to confirm, and will let you know if that was the problem after all.

vinyl-junkie 10-02-2004 05:50 PM

Well, here is my web host's reply:
Quote:

There haven't been any configuration changes to the server over the last couple days. I'm not familiar with the script you're using, but perhaps it relies on the domain name being fully resolved to function properly. What exactly does it return when it 'quits'? My suggestion would be to try again next time you need to re-index the site and let me know at the point if the same issues return. Other than that, any additional info (e.g. when your started the failed attempts) would be helpful in looking for any clues on the server side.
I tried phpdig several times before I got it to properly index my site, so now I'm really confused as to why it did work the one time and not the other dozen or so times. The problem in this sitation is that you guys know the software but not my server. My web host is the other way around.

Any suggestions as to where I should go from here with this? :confused:

Wayne McBryde 10-03-2004 07:06 AM

I don’t know if this will help but:

I have had the spidering stop when testing from my test server but never from my production server. The 2 servers are almost identical except the test server is in my house on a DSL line and the production server is co-located about 100 feet from where level 3 comes into Charlotte, NC. The test server has at most 2 websites that have VERY little traffic and no e-mail running. The production server has over 100 websites with lots of e-mail and traffic (But the server load is light). It looks to me like my problem is related to the slower internet connection, not the server. I would expect a server that is overloaded (or at least has a heavy load) could have the same problem.

vinyl-junkie 10-03-2004 08:18 AM

I have no idea what kind of server load there might be, don't know how one measures that. I've just moved my site to a new hosting company, and server response time in terms of page loads seems to be pretty quick. Don't know if that would be an indicator of server load necessarily.

I did discover one thing last night which might possibly be related to this issue. I hadn't been able to get my phpdig search page to work properly since the move. I'd enter a search term and click Go, but then I'd get the same page back. I went through my site to make sure there weren't any missing files of any kind, and after doing that, my search page worked properly. All this, too, when none of the files that I had to upload (or re-upload) was anything to do with phpdig searches. :confused:

Fixing all the little stuff that was wrong seems to also have fixed spidering from a secured web page, but I still had the spidering process just hang at a time of 9:53 into spidering around midnight last night. Go figure.

I guess I'll work with it a little more and see how it goes.

Charter 10-04-2004 10:01 AM

>> I have no idea what kind of server load there might be, don't know how one measures that.

From the shell prompt, type top or uptime and hit return. You should see the load average with three numbers showing the average load over the last 1, 5, and 15 minutes.

>> It says indexing is complete, then stops.
>> I still had the spidering process just hang...

Sometimes it hangs and sometimes it completes? Does anything unusual show in your raw access or error logs?

vinyl-junkie 10-04-2004 07:48 PM

Quote:

Originally Posted by Charter
>> I have no idea what kind of server load there might be, don't know how one measures that.

From the shell prompt, type top or uptime and hit return. You should see the load average with three numbers showing the average load over the last 1, 5, and 15 minutes.

Thanks. I'll try that next time and let you know the results.

Quote:

>> It says indexing is complete, then stops.
>> I still had the spidering process just hang...

Sometimes it hangs and sometimes it completes? Does anything unusual show in your raw access or error logs?
When spidering completes, it doesn't index very many pages at all through shell. However, I've spidered now a couple of times through the secure web page since moving my site, and it functioned just as I would have expected.

This whole thing really bugs me, because I would eventually like to have this run as a cron job and dispense with the other two methods. However, I don't have a lot of confidence at this point that a cron job would do any different than shell. This is all so strange.

Nothing unusual at all in the server log. My provider is at a loss to explain why it would just hang, too.

I've been chewing up lots of bandwidth messing with this. Still have plenty to play with for now, but need to watch it.

vinyl-junkie 10-09-2004 06:08 PM

Just a follow-up on this thread. I created a test phpdig database so I could mess with this a little more without clobbering production. Tried to populate the test database initially from shell, and phpdig wouldn't index any pages at all. My saved spider log was totally empty, too. I checked my server log, and nothing shows up for phpdig at all there.

I went ahead and populated my test database using the secure web page, and although it took about 3.5 hours to spider, it still indexed everything I would have expected. So why doesn't it work from shell?

Just now, I tried to update the index via shell, and the same thing happened that did initially - nothing indexed, empty spider log, nothing in the server log.

When I type uptime from the shell, here's what I get:
up 18 days, 38 minutes, 2 users, load average: 0.11, 0.31, 0.33

Any suggestions as to where I go from here? I've hit a brick wall... :bang:


All times are GMT -8. The time now is 08:53 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.