PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   Troubleshooting (http://www.phpdig.net/forum/forumdisplay.php?f=22)
-   -   Locking using Cron (http://www.phpdig.net/forum/showthread.php?t=1702)

Slider 01-04-2005 10:50 PM

Locking using Cron
 
I have searched and read all posts that have to do with "Locked" sites

I use cron to start the spider. After awhile of spidering it locks a site and the spider stops. After unlocking the site the spider does not restart unless I catch it just as soon as it happens.
Probably sounds familiar as most posts have said that I read.
As a note: The host I am on is not limiting anything and is very dependable.


1. What are the list of reasons that a site is locked?
2. How much time is alotted betweeen the time a site is locked and when the spider will quit trying? I haveto ask this since the spider doesn't seem to start back up on it's own after a certain amount of time. This will also help if I need to write some custom addition to unlock a site automatically.
3. Am I going to have the same problem when a scheduled update is done in a week or month from now with sites locking?

Just trying to automate a solution so it's not a manual problem.
Most posts talked about spidering with the admin page. I don't use the admin to spider. I only use cron.

Charter 01-05-2005 01:06 AM

Here is a tutorial I wrote to help you auto unlock and restart the process.

This tutorial assumes a *nix working environment with one spider process.

You will need to Google and/or mod the code as appropriate for your OS/setup.

First make a file containing '.' at /full/path/to/file.txt and set to 777 permission.

Next in spider.php find:
PHP Code:

if (USE_RENICE_COMMAND == 1) {
    print @
exec('renice 18 '.getmypid()).$br;


And afterwards add:
PHP Code:

$my_loc "/full/path/to/file.txt";
$my_file fopen($my_loc,"w+");
fputs($my_filegetmypid());
fclose($my_file); 

Then set the following script in a cron job and run it every so often.
PHP Code:

<?php
$my_loc 
"/full/path/to/file.txt";
$my_pid1 file_get_contents($my_loc);
$my_pid2 exec("ps -p $my_pid1 | grep \$? | awk '{print \$1}'");
if (
$my_pid1 != $my_pid2) {
/*
- Spider is either dead or index is completed
- Query the tempspider table or query the sites table
- Find num rows in tempspider or locked val in sites
- If num rows or locked equal zero index is completed
- Once completed there is nothing more to be done
- Otherwise the spider is dead so unlock the site
- Then restart the spidering process via cron
- You can do the code for this part ;)
*/
}
?>

Asking me why the spider dies is like asking me why there are dropped packets.

Maybe the MySQL connection hung, a server timed out somewhere, and so forth.

Something somewhere burped...

jmitchell 01-05-2005 04:47 PM

charter, what do we put in the txt file?

Slider 01-05-2005 07:47 PM

Thank you Charter. That is exactly what I was looking for. I'll see what I can do with the coding part I would have to make.
Charter: "Then restart the spidering process via cron" <-- this is the only part I will have to figure out now. I seen quite a few posts talking about exec() . It will take me a bit to know how to start the cron session automatically. I will get it though eventully.

Thanks again for the indepth information

djavet 01-12-2005 12:16 PM

Hello,

I've exactly the same problem. I've a site on Linux and try this code:
Code:

<?php
$my_loc = "/home/www/web330/html/search/admin/temp/status.txt";
$my_pid1 = file_get_contents($my_loc);
$my_pid2 = exec("ps -p $my_pid1 | grep \$? | awk '{print \$1}'");
if ($my_pid1 != $my_pid2) {
/*
- Spider is either dead or index is completed
- Query the tempspider table or query the sites table
- Find num rows in tempspider or locked val in sites
- If num rows or locked equal zero index is completed
- Once completed there is nothing more to be done
- Otherwise the spider is dead so unlock the site
- Then restart the spidering process via cron
- You can do the code for this part ;)
*/

exec("/usr/bin/php -f /home/www/web330/html/search/admin/spider.php http://www.john-howe.com > /home/www/web330/html/search/admin/temp/spider.log");

}
?>

I don't know if my code are ok or not. One thing is sure: that's doesn't work on my site and can't work.
It is suppose to write into the file.txt (for me status.txt) something when I run from the admin area the spider.php? He doesn't do anything...
I'm not a pro int php and code, I try to do my best.

A lot of thx for your help and time.

Regards, Dominique

Charter 01-12-2005 01:56 PM

@jmitchell: Stick a '.' in the file. It doesn't matter, as it will be overwritten anyway.

@djavet: Like I said, "you will need to Google and/or mod the code as appropriate for your OS/setup," as I have no idea what $my_pid* will contain when the code is run on your machine. Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason.

djavet 01-12-2005 10:28 PM

Hello,

Sorry to bother you Charter with my newbie reply. But I'm still learning everydax more ;)
What I don't understand (and it's a little complicated for me at this point of programming), is what to do in the code. My command exec() work and my status.txt is write/updated with some info when I'm spidering: one and only one number like 7358, and then when updated 21753, etc.
Nothing when is write *loked* into spider.log.

Maybe a full working sample code for unlock and restart cron spidering?
I will very appreciate to learn more about, but I can't figue how to do that.
I don't understand what you mean when you write:
Code:

Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason.

@Slider and @Slider:
Do you a working code to show us?

A lot of thx for your patience with us Charter!
Regards, Dominique

Charter 01-13-2005 12:26 AM

Processes in a *nix environment get assigned a PID (process ID number) so $my_pid1 contains that PID when the spider is run, and $my_pid2 looks to see if $my_pid1 still exists. Now PIDs are not unique, but when the spider process terminates, $my_pid2 probably won't return $my_pid1, at least in the short term, indicating that the spider process has ended. If $my_pid1 does not equal $my_pid2, then you'd need to determine whether the spider completed its index correctly or whether the spider ended prematurely. The comments in the code suggest a way to do this check, as I don't always have the time, patience, or energy to write complete code. If you simply do an exec() when $my_pid1 is not equal to $my_pid2, then you restart the spider regardless of how the process ended.

djavet 01-13-2005 12:32 AM

Hello,

Thx for the explanation.
But I'm lost into code :o

Heuuu don't know how to do that. Any help from someone?

Regards, Dominique


All times are GMT -8. The time now is 10:25 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.