Can anyone help me? I have been trying to get word documents
and excel files to index. I am using apache on a win XP system. It will work for text files only. this is how my config settings look :
define('USE_IS_EXECUTABLE_COMMAND','0'); //use is_executable for external binaries
// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','C:\catdoc\catdoc');
define('PHPDIG_OPTION_MSWORD','-s 8859-1');
define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-cork');
define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','C:\catdoc\xls2csv');
define('PHPDIG_OPTION_MSEXCEL','');
//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_PDF_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION','');
I have tried the xls2csv and the catdoc programs through the MSDOS interface and they work fine. When I try to submit a URI with a .doc or a .xls This is what I get:
SITE :
http://localhost/
Exclude paths :
- @NONE@
No link in temporary table
--------------------------------------------------------------------------------
links found : 0
...Was recently indexed
Optimizing tables...
Indexing complete !
any advice muchly appreciated
-Rich