View Single Post
Old 09-07-2006, 06:29 AM   #1
SABsearch2
Awaiting Email
 
Join Date: Jul 2006
Posts: 3
Thumbs up Use Antiword instead of catdoc on Wintel

I've been integrating phpdig on a Windows 2003 serveur.

There's a problem with catdoc on this platform.
- The official provider of catdoc is not supporting windows but DOS.
- The unofficial provider of catdoc for windows is using an older version of the product.

I don't know about the new version but with the old here are the problems :
- images within document are not skipped. They are transformed into text ... So the result is a very huge text and a wrong indexing.
- performance are bad since the program is using "standard output" and is not offering "file output". phpdig is going faster when using "file output".
- You take the two points above a you just have an everlasting indexing process (+ your server climb to 99% CPU and you lost the contact with it).

I've tried antiword instead. The images are correctly skipped and the performance are 10 times faster.
SABsearch2 is offline   Reply With Quote