PDA

View Full Version : Spider From A File Thru Web Interface


vinyl-junkie
11-29-2004, 05:58 PM
You know how phpdig spiders by reading a file of URLs when doing so from shell. I'd like to see the same thing when spidering from the web interface. Thanks in advance for giving this idea some consideration.

leonardburton
12-01-2004, 05:44 PM
Here is the way I did it.
It is very simple mind you but it works.

I have a submit site page with a form for others [or myself] to submit pages to be reviewed for indexing. It submits these links to a mysql table.

I have a script with an SQL statement using left join to get only the items not added. On this script it shows up the links (you can have the links display in an input box so the reviewer can edit the link [to add a trailing slash or http://www or what ever] ) with two check boxes, one for add and one for deny.

Deny deletes the row and add inserts it to the site table with upddate=0.

I have another script that when executed (and this could be done from a link on the first script) that then runs spider.php by exec(). It finds all the sites where upddate=0 and loops through them to run exec(). I put a limit of 10 on it just so I can have a little more control over it. [once you run spider.php for a site it updates the upddate to current timestamp on lock and unlock of the tables]

What I did may not be the best solution, but it got working quickly. I will make a better solution sometime in the next couple weeks.

vinyl-junkie
12-01-2004, 05:54 PM
Would you be willing to post your code? I sure would appreciate it, and I'm sure others would as well. Thanks.

Hoek
12-15-2004, 03:15 AM
I use a perl-script (tree.pl) to feed the spider. It is very easy to use, and you get the urls (htm(l), doc, pdf, etc wich you want.