View Full Version : no msword to txt parsing
lolodev
07-09-2004, 10:22 AM
hello
(i've 1.8.1 and 1.8.0 version on my site)
i made a simple test page as
<a href="http://quito.citipo.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br>
--
i indexe it ... a temporary file is created in admin/temp/xxxx.tmp for this .doc
but it seems that this file is not parse as txt file with phpdig
---
i don't know why ???
thanks
lolodev
07-09-2004, 11:15 AM
hello
i continue my test.
i put an echo at line 461 from spider.php script.
my script to index is : test.php
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Sans titre</title>
</head>
<body>
<a href="http://quito.citipro.fr/modules/documents/rep2/DocUtil.doc">Docutilisateur</a><br>
</body>
</html>
the result is:
SITE : http://quito.citipro.fr/
Exclude paths :
- @NONE@
Resource id #5**../admin/temp/81475511.tmp**245**15********
test.php**HTML**20040709211142**20040709211125**Array**
1:http://quito.citipro.fr/test.php
(time : 00:00:22)
+
level 1...
Resource id #5**0**0**15******modules/documents/rep2/**
DocUtil.doc**MSWORD**20040709211152**20040708082318****
2:http://quito.citipro.fr/modules/documents/rep2/DocUtil.doc
(time : 00:00:32)
No link in temporary table
there is no temporary file for msword ...
thanks
Charter
07-09-2004, 11:20 AM
Hi. There is a checklist here (http://www.phpdig.net/showthread.php?threadid=799) to help with troubleshooting.
lolodev
07-10-2004, 01:23 PM
hello
thanks you for posting thread- i check your list and all your request are good - but ...
when i indexe my .doc, response is:
Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/44148632.tmp
Result contains: Array ( )
Return value is: 127
but nothing is record in the database
i try a command line with catdoc on my linux OS, catdoc runs well my MSWORD
what happend ??
Are there frenchies users in this forum ??
Charter
07-10-2004, 01:33 PM
Hi. In robot_functions.php find:
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2;
and replace with:
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
to see what issue occurs.
lolodev
07-10-2004, 01:44 PM
hi (23:44 in france)
here response with the code modification:
Command is: /home/mutualiseweb/catdoc-0.93.3 -s 8859-1 ../admin/temp/38346732.tmp 2>&1
Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3: is a directory )
Return value is: 126
strange: when i use a command line /home/mutualiseweb/catdoc -s 8859-1 mymsword.doc, catdoc runs - but when i change define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3');
with define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc);, phpdig not recognize my msword file
Charter
07-10-2004, 01:47 PM
Hi. Does this work?
define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3/catdoc');
lolodev
07-10-2004, 01:49 PM
lol, i try this before your post
No! doesn't work
Charter
07-10-2004, 01:51 PM
Hi. What does
define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc-0.93.3/catdoc');
give you when you use
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
lolodev
07-10-2004, 01:53 PM
Command is: /home/mutualiseweb/catdoc-0.93.3/catdoc -s 8859-1 ../admin/temp/39511712.tmp 2>&1
Result contains: Array ( [0] => sh: line 1: /home/mutualiseweb/catdoc-0.93.3/catdoc: No such file or directory )
Return value is: 127
Charter
07-10-2004, 01:56 PM
Hi. What does
define('PHPDIG_PARSE_MSWORD','/home/mutualiseweb/catdoc');
give you when you use
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
Also, is catdoc 755 permission?
lolodev
07-10-2004, 02:02 PM
OK !!
all is my fault
my catdoc is under /home/mutualiseweb/catdoc-0.93.3/src/ MY GOD
a little question with .pdf files: is it necessary to install GHOST ??
:))) sorry
lolodev
07-10-2004, 02:03 PM
THANKS LOT
Charter
07-10-2004, 02:11 PM
LOL, paths and permissions. ;)
For PDFs perhaps try getting pdftotext already compiled. Directions are in this (http://www.phpdig.net/showthread.php?postid=4582#post4582) post.
vBulletin® v3.7.3, Copyright ©2000-2025, Jelsoft Enterprises Ltd.