PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Troubleshooting

Reply
 
Thread Tools
Old 01-04-2005, 10:50 PM   #1
Slider
Orange Mole
 
Join Date: Jan 2004
Posts: 30
Locking using Cron

I have searched and read all posts that have to do with "Locked" sites

I use cron to start the spider. After awhile of spidering it locks a site and the spider stops. After unlocking the site the spider does not restart unless I catch it just as soon as it happens.
Probably sounds familiar as most posts have said that I read.
As a note: The host I am on is not limiting anything and is very dependable.


1. What are the list of reasons that a site is locked?
2. How much time is alotted betweeen the time a site is locked and when the spider will quit trying? I haveto ask this since the spider doesn't seem to start back up on it's own after a certain amount of time. This will also help if I need to write some custom addition to unlock a site automatically.
3. Am I going to have the same problem when a scheduled update is done in a week or month from now with sites locking?

Just trying to automate a solution so it's not a manual problem.
Most posts talked about spidering with the admin page. I don't use the admin to spider. I only use cron.
__________________
Horse Search Engine
Slider is offline   Reply With Quote
Old 01-05-2005, 01:06 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Here is a tutorial I wrote to help you auto unlock and restart the process.

This tutorial assumes a *nix working environment with one spider process.

You will need to Google and/or mod the code as appropriate for your OS/setup.

First make a file containing '.' at /full/path/to/file.txt and set to 777 permission.

Next in spider.php find:
PHP Code:
if (USE_RENICE_COMMAND == 1) {
    print @
exec('renice 18 '.getmypid()).$br;

And afterwards add:
PHP Code:
$my_loc "/full/path/to/file.txt";
$my_file fopen($my_loc,"w+");
fputs($my_filegetmypid());
fclose($my_file); 
Then set the following script in a cron job and run it every so often.
PHP Code:
<?php
$my_loc 
"/full/path/to/file.txt";
$my_pid1 file_get_contents($my_loc);
$my_pid2 exec("ps -p $my_pid1 | grep \$? | awk '{print \$1}'");
if (
$my_pid1 != $my_pid2) {
/*
- Spider is either dead or index is completed
- Query the tempspider table or query the sites table
- Find num rows in tempspider or locked val in sites
- If num rows or locked equal zero index is completed
- Once completed there is nothing more to be done
- Otherwise the spider is dead so unlock the site
- Then restart the spidering process via cron
- You can do the code for this part ;)
*/
}
?>
Asking me why the spider dies is like asking me why there are dropped packets.

Maybe the MySQL connection hung, a server timed out somewhere, and so forth.

Something somewhere burped...
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-05-2005, 04:47 PM   #3
jmitchell
Orange Mole
 
Join Date: Dec 2004
Location: Tennessee
Posts: 60
charter, what do we put in the txt file?
__________________
60,000 pages indexed!!!!! http://www.sharemylink.com
jmitchell is offline   Reply With Quote
Old 01-05-2005, 07:47 PM   #4
Slider
Orange Mole
 
Join Date: Jan 2004
Posts: 30
Thank you Charter. That is exactly what I was looking for. I'll see what I can do with the coding part I would have to make.
Charter: "Then restart the spidering process via cron" <-- this is the only part I will have to figure out now. I seen quite a few posts talking about exec() . It will take me a bit to know how to start the cron session automatically. I will get it though eventully.

Thanks again for the indepth information
__________________
Horse Search Engine
Slider is offline   Reply With Quote
Old 01-12-2005, 12:16 PM   #5
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
Hello,

I've exactly the same problem. I've a site on Linux and try this code:
Code:
<?php
$my_loc = "/home/www/web330/html/search/admin/temp/status.txt";
$my_pid1 = file_get_contents($my_loc);
$my_pid2 = exec("ps -p $my_pid1 | grep \$? | awk '{print \$1}'");
if ($my_pid1 != $my_pid2) {
/*
- Spider is either dead or index is completed
- Query the tempspider table or query the sites table
- Find num rows in tempspider or locked val in sites
- If num rows or locked equal zero index is completed
- Once completed there is nothing more to be done
- Otherwise the spider is dead so unlock the site
- Then restart the spidering process via cron
- You can do the code for this part ;)
*/

exec("/usr/bin/php -f /home/www/web330/html/search/admin/spider.php http://www.john-howe.com > /home/www/web330/html/search/admin/temp/spider.log");

}
?>
I don't know if my code are ok or not. One thing is sure: that's doesn't work on my site and can't work.
It is suppose to write into the file.txt (for me status.txt) something when I run from the admin area the spider.php? He doesn't do anything...
I'm not a pro int php and code, I try to do my best.

A lot of thx for your help and time.

Regards, Dominique
djavet is offline   Reply With Quote
Old 01-12-2005, 01:56 PM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
@jmitchell: Stick a '.' in the file. It doesn't matter, as it will be overwritten anyway.

@djavet: Like I said, "you will need to Google and/or mod the code as appropriate for your OS/setup," as I have no idea what $my_pid* will contain when the code is run on your machine. Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-12-2005, 10:28 PM   #7
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
Hello,

Sorry to bother you Charter with my newbie reply. But I'm still learning everydax more
What I don't understand (and it's a little complicated for me at this point of programming), is what to do in the code. My command exec() work and my status.txt is write/updated with some info when I'm spidering: one and only one number like 7358, and then when updated 21753, etc.
Nothing when is write *loked* into spider.log.

Maybe a full working sample code for unlock and restart cron spidering?
I will very appreciate to learn more about, but I can't figue how to do that.
I don't understand what you mean when you write:
Code:
Also, exec(...) is not enough to do in the if statement, unless you want to keep initiating a spider process for no reason.

@Slider and @Slider:
Do you a working code to show us?

A lot of thx for your patience with us Charter!
Regards, Dominique
djavet is offline   Reply With Quote
Old 01-13-2005, 12:26 AM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Processes in a *nix environment get assigned a PID (process ID number) so $my_pid1 contains that PID when the spider is run, and $my_pid2 looks to see if $my_pid1 still exists. Now PIDs are not unique, but when the spider process terminates, $my_pid2 probably won't return $my_pid1, at least in the short term, indicating that the spider process has ended. If $my_pid1 does not equal $my_pid2, then you'd need to determine whether the spider completed its index correctly or whether the spider ended prematurely. The comments in the code suggest a way to do this check, as I don't always have the time, patience, or energy to write complete code. If you simply do an exec() when $my_pid1 is not equal to $my_pid2, then you restart the spider regardless of how the process ended.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-13-2005, 12:32 AM   #9
djavet
Orange Mole
 
Join Date: Jan 2005
Posts: 31
Hello,

Thx for the explanation.
But I'm lost into code

Heuuu don't know how to do that. Any help from someone?

Regards, Dominique
djavet is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Struggling with cron chali How-to Forum 8 08-28-2005 09:23 PM
cron again xdaniel Bug Tracker 4 05-12-2005 07:33 AM
Anti Locking device Dave A Mod Requests 2 11-24-2004 02:01 PM
Locking and XML problem :S truejedi Troubleshooting 1 06-15-2004 01:43 AM
Fixing spider.php, protecting from locking site after timeout or users stop Konstantine Mod Submissions 3 04-09-2004 12:37 PM


All times are GMT -8. The time now is 11:41 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.