muppet
01-31-2006, 02:30 AM
Hi Everybody, thanks for reading this.
I'm trying to work out the best way to re-index my site on a daily basis with minimal CPU impact. The site in question runs a couple of email lists which get about two dozen messages daily, and these are saved via pipermail.
In the PhpDig control panel I've deleted (clicked on the red cross) those paths that I don't need to index at all, i.e. the listinfo pages from mailman, the settings pages, and the TXT versions of the archives.
When I run the indexing manually from the control panel (by clicking on the green tick) the total process of reindexing takes less than a minute, and new posts are actually found without problem. What I'd like to do is to set up a crontab to do this sort of thing for me on a daily basis, but when I follow the instructions given on here I still end up with a total spidering time of between 25 minutes and 6 hours! This is way too big, can anybody tell me what I'm doing wrong?
The pages I need to index are:
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
http://www.uk-bandits.co.uk/pipermail/test-list/
... and so I've set up my crontab thus:
0 1 1-31/1 * * /usr/bin/php -f /usr/local/phpdig/admin/spider.php /usr/local/phpdig/admin/temp/cronlist.txt >> /usr/local/phpdig/admin/temp/spider.log
The file cronlist.txt contains:
http://www.uk-bandits.co.uk/pipermail/test-list/
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
Does PhpDig know that the sites I'm passing in via cronlist.txt are the same as those I've set via the control panel, and therefore apply the same exclusions as I've already specified there?
I guess what I really need is a high-level overview of how the whole thing works and how the different areas tie together ... without spending weeks and weeks on this.
I'm sure I'm not the only one using PhpDig with MailMan and Pipermail, what do other people do in my position?
Many thanks in advance for your time - I really appreciate any help on this!
Muppet.
I'm trying to work out the best way to re-index my site on a daily basis with minimal CPU impact. The site in question runs a couple of email lists which get about two dozen messages daily, and these are saved via pipermail.
In the PhpDig control panel I've deleted (clicked on the red cross) those paths that I don't need to index at all, i.e. the listinfo pages from mailman, the settings pages, and the TXT versions of the archives.
When I run the indexing manually from the control panel (by clicking on the green tick) the total process of reindexing takes less than a minute, and new posts are actually found without problem. What I'd like to do is to set up a crontab to do this sort of thing for me on a daily basis, but when I follow the instructions given on here I still end up with a total spidering time of between 25 minutes and 6 hours! This is way too big, can anybody tell me what I'm doing wrong?
The pages I need to index are:
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
http://www.uk-bandits.co.uk/pipermail/test-list/
... and so I've set up my crontab thus:
0 1 1-31/1 * * /usr/bin/php -f /usr/local/phpdig/admin/spider.php /usr/local/phpdig/admin/temp/cronlist.txt >> /usr/local/phpdig/admin/temp/spider.log
The file cronlist.txt contains:
http://www.uk-bandits.co.uk/pipermail/test-list/
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
Does PhpDig know that the sites I'm passing in via cronlist.txt are the same as those I've set via the control panel, and therefore apply the same exclusions as I've already specified there?
I guess what I really need is a high-level overview of how the whole thing works and how the different areas tie together ... without spending weeks and weeks on this.
I'm sure I'm not the only one using PhpDig with MailMan and Pipermail, what do other people do in my position?
Many thanks in advance for your time - I really appreciate any help on this!
Muppet.