![]() |
Automatic spider
hi there,
i want automate the adding of links to the se. had anyone played too with this idea? env: i create a table, in this will stored any new links (a lot links). i call this table my linkspool. so on,.. i have a cron job every 3 minutes which check, whether a new job (link) is in the spool table. if a new job is in there, the script lock the link and spider it. after the spidering the script delete the link in the spool. finished! but i have two probs!!!! first: if a spider lasts over 3 minutes, it takes the next link from the spool and starts a new spider... thats okay... i check with the script how many spider are running, if it more than 5, the script will exit and wait to a thread is free. this isnt work really good, how can i check with php how many php spider threads are opened?????????? second: so with the cron, the spider maschine runs and runs and runs..... but if a spiderjob is locked, out any reason, it blocked the thread. how can i kill via php the spider php pid which is older than 20 minutes and how kick the link from the se db. sorry for my bad english :) jdc |
Hi jdc -
If you have a main script (the one that looks at the linkspool and runs spider processes), keeping track of the number of spiders is easy. Just increment a counter every time a spider is called, and when your counter variable reaches 5, you can sleep the script for a period of time and then check again. To kill the process, check out this thread: http://www.phpdig.net/showthread.php...&highlight=PID But instead of using a CRON job, you could use exec() or system() commands through PHP. |
okay thats cool,
but with the cron i can kill the spider, but the link which the spider was spidering is still locked in the db. i need a search and destroy session :) how can i give after kill the spider a parameter (ex. the site_id) to another script which delete all db entries for this link? thx :) |
hmmm,.... after thinking, the cron is not really good....for my problem:
10 * * * * ps -ef | grep 'php -f spider.php' | awk '{print $2}' | xargs kill -9 i start any 3 minutes a new spider and shoult kill after 10 minutes,... so i have started more than 1 spider... this cron kill all my spider,.. thats no good. can i kill via shell all php spiders which has a running time from 10 minutes? |
All times are GMT -8. The time now is 06:39 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.