PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-31-2006, 02:30 AM   #1
muppet
Green Mole
 
Join Date: Jan 2006
Posts: 2
Daily indexing using CRON? (PhpDig, MailMan, Pipermail)

Hi Everybody, thanks for reading this.

I'm trying to work out the best way to re-index my site on a daily basis with minimal CPU impact. The site in question runs a couple of email lists which get about two dozen messages daily, and these are saved via pipermail.

In the PhpDig control panel I've deleted (clicked on the red cross) those paths that I don't need to index at all, i.e. the listinfo pages from mailman, the settings pages, and the TXT versions of the archives.

When I run the indexing manually from the control panel (by clicking on the green tick) the total process of reindexing takes less than a minute, and new posts are actually found without problem. What I'd like to do is to set up a crontab to do this sort of thing for me on a daily basis, but when I follow the instructions given on here I still end up with a total spidering time of between 25 minutes and 6 hours! This is way too big, can anybody tell me what I'm doing wrong?

The pages I need to index are:
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
http://www.uk-bandits.co.uk/pipermail/test-list/
... and so I've set up my crontab thus:
0 1 1-31/1 * * /usr/bin/php -f /usr/local/phpdig/admin/spider.php /usr/local/phpdig/admin/temp/cronlist.txt >> /usr/local/phpdig/admin/temp/spider.log
The file cronlist.txt contains:
http://www.uk-bandits.co.uk/pipermail/test-list/
http://www.uk-bandits.co.uk/pipermail/social/
http://www.uk-bandits.co.uk/pipermail/tech/
Does PhpDig know that the sites I'm passing in via cronlist.txt are the same as those I've set via the control panel, and therefore apply the same exclusions as I've already specified there?
I guess what I really need is a high-level overview of how the whole thing works and how the different areas tie together ... without spending weeks and weeks on this.
I'm sure I'm not the only one using PhpDig with MailMan and Pipermail, what do other people do in my position?

Many thanks in advance for your time - I really appreciate any help on this!

Muppet.
muppet is offline   Reply With Quote
Old 02-03-2006, 02:58 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try clicking on the 'noway' icon (circle with dash) to delete and exclude a branch from future index. When indexing from shell, try decreasing SPIDER_MAX_LIMIT and LINKS_MAX_LIMIT, and also try setting LIMIT_TO_DIRECTORY to true. There is also a mod here that may be of use.
Code:
define('SPIDER_MAX_LIMIT',20);                   // max (re)index search depth - used for shell and admin panel dropdown
define('RESPIDER_LIMIT',5);                      // max update search depth - only used for browser, not used for shell

define('LINKS_MAX_LIMIT',20);                    // max (re)index links per - used for shell and admin panel dropdown
define('RELINKS_LIMIT',5);                       // max update links per - only used for browser, not used for shell

define('LIMIT_TO_DIRECTORY',false);              // limit index to given (sub)directory where (sub)directories of give (sub)directory are NOT indexed
                                                 // for limit to directory, URL format must either have file at end or ending slash at end
                                                 // e.g., http://www.domain.com/dirs/ (WITH ending slash) or http://www.domain.com/dirs/dirs/index.php

define('ALLOW_SUBDIRECTORIES',false);            // limit index to given (sub)directory where (sub)directories of give (sub)directory are indexed
                                                 // if set to true, LIMIT_TO_DIRECTORY must also be set to true
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How set up a cron job with PhpDig 1.8.6? gaam How-to Forum 2 01-10-2005 12:50 AM
Indexing via cpanel cron jobs... claudiomet Troubleshooting 1 09-07-2004 04:25 PM
PHPDIG not indexing Dave A Troubleshooting 2 08-22-2004 11:46 AM
PHPDig in not indexing fahadumer Troubleshooting 1 01-13-2004 07:53 AM
PHPDig 1.6.4 : No indexing Yannick Troubleshooting 13 12-03-2003 06:49 AM


All times are GMT -8. The time now is 02:19 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.