multiple crawlers
anyone think this is a good way to get phpDig to run multiple crawlers from sites in the database? When I spider.php I noticed that it would only do one site at a time and when it came across a site such as dmoz.org it would take days, if not weeks to index all of that.
http://rbhs.ath.cx/~reza/phpdig/wrapper.php |
PHP Code:
|
fix
i made the following fix.
PHP Code:
|
Think it's a nice option.
But where must the file install? And how the get it work? Marten :) |
how to work?
Hey, I usually just run it in the same directory as spider.php
%pwd /usr/home/reza/public_html/phpdig/admin %php -f wrapper.php screen -A -m -d -S freebsd.org_phpdig php -f spider.php http://freebsd.org/ screen -A -m -d -S openbsd.org_phpdig php -f spider.php http://openbsd.org/ %screen -list There are screens on: 13219.daily.daemonnews.org_phpdig (Detached) 10700.freebsd.org_phpdig (Detached) 13241.openbsd.org_phpdig (Detached) 88053.staff.daemonnews.org_phpdig (Detached) 88057.seclists.org_phpdig (Detached) 6 Sockets in /tmp/screens/S-reza. % and you can add it to crontab to have it run whenever you want |
I guess I"m not totally understanding how this works...
If I'm correct, you install this script, and run it via cron jobs, and it will see if there are sites to be indexed, and add multiple spiders to handle it - right? jmitchell |
|
All times are GMT -8. The time now is 05:03 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.