![]() |
antiword tweaking code
Am wrestling with antiword. In short, MSWord documents are uploaded to site, diverted by antiword to temp dir where antiword parses and counts characters, then script divides char count by 5, and outputs a "word" count.
Less than 1 percent variance is desired - compared to what Word reports when its TOOLS are used to count characters. Have code in place to remove any whitespace above two spaces after end-sentence punctuation, and to include tabs and returns. } $content = str_replace('[pic]', '', $content); $content = preg_replace('/[\r\n\t]/', '', $content); $content = preg_replace('/([^\.\!\?"\'])[ ]+/', '$1', $content); $content = preg_replace('/\.[ ]{3,}/', '', $content); echo 'Total character count for '. $file.': '. strlen($content).'<br/>'; $total_chars += strlen($content); But I get anything from near perfect to 5% under or over. Anyone with any ideas on how to tweak this antiword code to something more reliable? TIA, Sarah |
All times are GMT -8. The time now is 07:03 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.