PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > Mod Requests

Reply
 
Thread Tools
Old 11-29-2004, 05:58 PM   #1
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Spider From A File Thru Web Interface

You know how phpdig spiders by reading a file of URLs when doing so from shell. I'd like to see the same thing when spidering from the web interface. Thanks in advance for giving this idea some consideration.
vinyl-junkie is offline   Reply With Quote
Old 12-01-2004, 05:44 PM   #2
leonardburton
Green Mole
 
Join Date: Dec 2004
Posts: 10
Here is the way I did it.
It is very simple mind you but it works.

I have a submit site page with a form for others [or myself] to submit pages to be reviewed for indexing. It submits these links to a mysql table.

I have a script with an SQL statement using left join to get only the items not added. On this script it shows up the links (you can have the links display in an input box so the reviewer can edit the link [to add a trailing slash or http://www or what ever] ) with two check boxes, one for add and one for deny.

Deny deletes the row and add inserts it to the site table with upddate=0.

I have another script that when executed (and this could be done from a link on the first script) that then runs spider.php by exec(). It finds all the sites where upddate=0 and loops through them to run exec(). I put a limit of 10 on it just so I can have a little more control over it. [once you run spider.php for a site it updates the upddate to current timestamp on lock and unlock of the tables]

What I did may not be the best solution, but it got working quickly. I will make a better solution sometime in the next couple weeks.
leonardburton is offline   Reply With Quote
Old 12-01-2004, 05:54 PM   #3
vinyl-junkie
Purple Mole
 
Join Date: Jan 2004
Posts: 694
Would you be willing to post your code? I sure would appreciate it, and I'm sure others would as well. Thanks.
vinyl-junkie is offline   Reply With Quote
Old 12-15-2004, 03:15 AM   #4
Hoek
Green Mole
 
Join Date: Feb 2004
Posts: 17
I use a perl-script (tree.pl) to feed the spider. It is very easy to use, and you get the urls (htm(l), doc, pdf, etc wich you want.
Hoek is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
no spider my file links lolodev Troubleshooting 21 07-16-2004 06:31 PM
is it real to inrease indexing time with web interface? zaartix How-to Forum 1 07-14-2004 08:13 PM
spider only one site/file jdc32 Troubleshooting 2 07-02-2004 05:49 AM
phpdig spider hangs (a powerpoint file problem) davideyre Troubleshooting 1 03-29-2004 12:35 PM
Indexing by command line interface Skop Troubleshooting 8 10-14-2003 02:23 AM


All times are GMT -8. The time now is 01:25 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.