![]() |
Locking using Cron
I have searched and read all posts that have to do with "Locked" sites
I use cron to start the spider. After awhile of spidering it locks a site and the spider stops. After unlocking the site the spider does not restart unless I catch it just as soon as it happens. Probably sounds familiar as most posts have said that I read. As a note: The host I am on is not limiting anything and is very dependable. 1. What are the list of reasons that a site is locked? 2. How much time is alotted betweeen the time a site is locked and when the spider will quit trying? I haveto ask this since the spider doesn't seem to start back up on it's own after a certain amount of time. This will also help if I need to write some custom addition to unlock a site automatically. 3. Am I going to have the same problem when a scheduled update is done in a week or month from now with sites locking? Just trying to automate a solution so it's not a manual problem. Most posts talked about spidering with the admin page. I don't use the admin to spider. I only use cron. |
Here is a tutorial I wrote to help you auto unlock and restart the process.
This tutorial assumes a *nix working environment with one spider process. You will need to Google and/or mod the code as appropriate for your OS/setup. First make a file containing '.' at /full/path/to/file.txt and set to 777 permission. Next in spider.php find: PHP Code:
PHP Code:
PHP Code:
Maybe the MySQL connection hung, a server timed out somewhere, and so forth. Something somewhere burped... |
charter, what do we put in the txt file?
|
Thank you Charter. That is exactly what I was looking for. I'll see what I can do with the coding part I would have to make.
Charter: "Then restart the spidering process via cron" <-- this is the only part I will have to figure out now. I seen quite a few posts talking about exec() . It will take me a bit to know how to start the cron session automatically. I will get it though eventully. Thanks again for the indepth information |
Hello,
I've exactly the same problem. I've a site on Linux and try this code: Code:
<?php It is suppose to write into the file.txt (for me status.txt) something when I run from the admin area the spider.php? He doesn't do anything... I'm not a pro int php and code, I try to do my best. A lot of thx for your help and time. Regards, Dominique |
@jmitchell: Stick a '.' in the file. It doesn't matter, as it will be overwritten anyway.
@djavet: Like I said, "you will need to Google and/or mod the code as appropriate for your OS/setup," as I have no idea what $my_pid* will contain when the code is run on your machine. Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason. |
Hello,
Sorry to bother you Charter with my newbie reply. But I'm still learning everydax more ;) What I don't understand (and it's a little complicated for me at this point of programming), is what to do in the code. My command exec() work and my status.txt is write/updated with some info when I'm spidering: one and only one number like 7358, and then when updated 21753, etc. Nothing when is write *loked* into spider.log. Maybe a full working sample code for unlock and restart cron spidering? I will very appreciate to learn more about, but I can't figue how to do that. I don't understand what you mean when you write: Code:
Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason. @Slider and @Slider: Do you a working code to show us? A lot of thx for your patience with us Charter! Regards, Dominique |
Processes in a *nix environment get assigned a PID (process ID number) so $my_pid1 contains that PID when the spider is run, and $my_pid2 looks to see if $my_pid1 still exists. Now PIDs are not unique, but when the spider process terminates, $my_pid2 probably won't return $my_pid1, at least in the short term, indicating that the spider process has ended. If $my_pid1 does not equal $my_pid2, then you'd need to determine whether the spider completed its index correctly or whether the spider ended prematurely. The comments in the code suggest a way to do this check, as I don't always have the time, patience, or energy to write complete code. If you simply do an exec() when $my_pid1 is not equal to $my_pid2, then you restart the spider regardless of how the process ended.
|
Hello,
Thx for the explanation. But I'm lost into code :o Heuuu don't know how to do that. Any help from someone? Regards, Dominique |
All times are GMT -8. The time now is 06:54 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.