PDA

View Full Version : catdoc


Tanasja
10-23-2003, 06:02 AM
Hi,

I know that you don't give support on catdoc, but...
I have trouble getting it installed and looking for help

I searched all the relevant sites with Google, also
http://www.45.free.net/~vitus/ice/catdoc/
but they give only brieve information.

Where can I find a good manual or support?

Thanx,
Tanasja

Charter
10-24-2003, 05:17 PM
Hi. Just download the package that contains the executable, FTP the executable over to your site in binary mode, and then set define('PHPDIG_PARSE_MSWORD','/full/path/to/catdoc'); in the config file.

Tanasja
10-25-2003, 07:20 AM
Hi Charter,

Can I put catdoc in any directory?
Because I have no access to the suggested usr/local/bin directory

And what means "in binary mode"?

Thnx. Tanasja

Charter
10-25-2003, 07:40 AM
Hi. FTP can allow files to be transferred in ASCII mode (e.g., for text files like HTML files) or BINARY mode (e.g., for graphic files like JPG files). Just FTP the executable catdoc like you would a graphics file.

Assuming that your host allows the execution of catdoc, you can put catdoc in any of your directories and call it from there using define('PHPDIG_PARSE_MSWORD','/full/path/to/catdoc'); in the config file.

Tanasja
10-28-2003, 02:58 AM
Hi Charter,

Phpdig functions oke, and I did like you told, but Catdoc is still not working.

This is what happens:
- When I re-index, no errors are given, only the comment: no link in termpory table.
- When I change PHPDIG_INDEX_MSWORD from false into true, the spider also shows the doc-files. (conlcusion: the spider finds and recognizes doc-files)
- When I change the catdoc direcotory name to a non-existing one, no error is given. (conclusion: spider does not ask for catdoc, here something goes wrong)

I tried it on my external host and local.

What can it be?

greetx, T

Charter
10-28-2003, 08:07 PM
Hi. Are there any files in the temp directory? If so, what's the extension?

Tanasja
10-29-2003, 06:13 AM
Hi Charter,

Yes, I can see that in admin/temp temp files created and unlinked with the extension .tmp2.

Here more information:

I run PhpDig local on my PC.

In config.php I changed
('PHPDIG_PARSE_MSWORD','c://apache/htdocs/catdoc/catdoc');
into
('PHPDIG_PARSE_MSWORD','c://apache/htdocs/catdoc/catdoc/catdoc.exe');
This was necessary for the function phpdigTempFile in robot_functions.php, to make true:
&& file_exists(PHPDIG_PARSE_MSWORD)
&& is_executable(PHPDIG_PARSE_MSWORD
... but I am not sure if that causes problems else...

I also changed
return array('tempfile'=>$tempfile,'tempfilesize'=>$tempfilesize);
as suggested in 1.6.2 fix to crawl binary files

At the end in the function phpdigTempFile there is the code
rename($tempfile,$tempfile.'2');
exec($command,$result,$retval);
unlink($tempfile.'2');
if (!$retval)
I can see that rename and unlink work oke.
In admin/temp files are created and unlinked like:
9883dacfac81cb7b7830b3d1b09ea72c.tmp2
These files contain the words from the doc-file.
So catdoc seems to work fine.
But if (!$retval) is false, so exec() seems not to work.

Thanx again so far,
T

Charter
11-07-2003, 02:55 PM
Hi. The catdoc.exe binary should create files with txt extensions. The tmp2 extensions are from the rename command.

From your post came the below two items:

define('PHPDIG_PARSE_MSWORD','c://apache/htdocs/catdoc/catdoc');
define('PHPDIG_PARSE_MSWORD','c://apache/htdocs/catdoc/catdoc/catdoc.exe');

The first item says that catdoc.exe is located in the c://apache/htdocs/catdoc/ directory whereas the second item says that catdoc.exe is the c://apache/htdocs/catdoc/catdoc/ directory. The path to catdoc.exe should be like this: c://apache/htdocs/catdoc/catdoc/catdoc (assuming the first two catdocs are directories and the last catdoc is for the catdoc.exe file).