PDA

View Full Version : no msword to txt parsing


lolodev
07-09-2004, 11:22 AM
hello

(i've 1.8.1 and 1.8.0 version on my site)

i made a simple test page as

<a href="http://quito.citipo.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br>

--
i indexe it ... a temporary file is created in admin/temp/xxxx.tmp for this .doc

but it seems that this file is not parse as txt file with phpdig

---

i don't know why ???

thanks

lolodev
07-09-2004, 12:15 PM
hello

i continue my test.

i put an echo at line 461 from spider.php script.

my script to index is : test.php
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Sans titre</title>
</head>
<body>
<a href="http://quito.citipro.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br>
</body>
</html>


the result is:

SITE : http://quito.citipro.fr/
Exclude paths :
- @NONE@
Resource id #5**../admin/temp/81475511.tmp**245**15********
test.php**HTML**20040709211142**20040709211125**Array**
1:http://quito.citipro.fr/test.php
(time : 00:00:22)
+
level 1...
Resource id #5**0**0**15******modules/documents/rep2/**
DocUtil.doc**MSWORD**20040709211152**20040708082318****
2:http://quito.citipro.fr/modules/documents/rep2/DocUtil.doc
(time : 00:00:32)

No link in temporary table

there is no temporary file for msword ...

thanks

Charter
07-09-2004, 12:20 PM
Hi. There is a checklist here (http://www.phpdig.net/showthread.php?threadid=799) to help with troubleshooting.

lolodev
07-10-2004, 02:23 PM
hello

thanks you for posting thread- i check your list and all your request are good - but ...

when i indexe my .doc, response is:

Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/44148632.tmp
Result contains: Array ( )
Return value is: 127

but nothing is record in the database

i try a command line with catdoc on my linux OS, catdoc runs well my MSWORD

what happend ??

Are there frenchies users in this forum ??

Charter
07-10-2004, 02:33 PM
Hi. In robot_functions.php find:

$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2;

and replace with:

$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';

to see what issue occurs.

lolodev
07-10-2004, 02:44 PM
hi (23:44 in france)

here response with the code modification:

Command is: /home/mutualiseweb/catdoc-0.93.3 -s 8859-1 ../admin/temp/38346732.tmp 2>&1
Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3: is a directory )
Return value is: 126

strange: when i use a command line /home/mutualiseweb/catdoc -s 8859-1 mymsword.doc, catdoc runs - but when i change define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3');

with define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc);, phpdig not recognize my msword file

Charter
07-10-2004, 02:47 PM
Hi. Does this work?

define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3/catdoc');

lolodev
07-10-2004, 02:49 PM
lol, i try this before your post

No! doesn't work

Charter
07-10-2004, 02:51 PM
Hi. What does

define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3/catdoc');

give you when you use

$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';

lolodev
07-10-2004, 02:53 PM
Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/39511712.tmp 2>&1
Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3/catdoc: No such file or directory )
Return value is: 127

Charter
07-10-2004, 02:56 PM
Hi. What does

define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc');

give you when you use

$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';

Also, is catdoc 755 permission?

lolodev
07-10-2004, 03:02 PM
OK !!
all is my fault

my catdoc is under /home/mutualiseweb/catdoc-0.93.3/src/ MY GOD

a little question with .pdf files: is it necessary to install GHOST ??

:))) sorry

lolodev
07-10-2004, 03:03 PM
THANKS LOT

Charter
07-10-2004, 03:11 PM
LOL, paths and permissions. ;)

For PDFs perhaps try getting pdftotext already compiled. Directions are in this (http://www.phpdig.net/showthread.php?postid=4582#post4582) post.