View Single Post
Old 06-30-2004, 09:37 AM   #1
jdc32
Green Mole
 
Join Date: Jun 2004
Posts: 11
Thumbs up Automatic spider

hi there,

i want automate the adding of links to the se.
had anyone played too with this idea?

env:
i create a table, in this will stored any new links (a lot links). i call this table my linkspool.

so on,.. i have a cron job every 3 minutes which check, whether a new job (link) is in the spool table. if a new job is in there, the script lock the link and spider it. after the spidering the script delete the link in the spool. finished!

but i have two probs!!!!

first:
if a spider lasts over 3 minutes, it takes the next link from the spool and starts a new spider... thats okay... i check with the script how many spider are running, if it more than 5, the script will exit and wait to a thread is free.
this isnt work really good, how can i check with php how many php spider threads are opened??????????

second:
so with the cron, the spider maschine runs and runs and runs..... but if a spiderjob is locked, out any reason, it blocked the thread.
how can i kill via php the spider php pid which is older than 20 minutes and how kick the link from the se db.

sorry for my bad english

jdc
jdc32 is offline   Reply With Quote