PDA

View Full Version : catdoc problem with WinXP


xperienss
12-08-2004, 03:12 AM
Hi all

I am using phpdig 1.8.4 on winXP (Windows NT SERVER 5.1 build 2600 ) with easyPHP 1.7 (PHP Version 4.3.3)

I am trying to index .doc files (to start with) with the spider but so far no luck...

When i used catdoc in command line, i get this :
---
catdoc ./test.doc
Banane
Fruit
Abricot
---
those are the words in my doc file.
So i guess catdoc.exe is working

But when i try to index the file using phpdig, here is what i get :
---
SITE : http://server/
Chemins exclus :
- @NONE@
1:http://server/moteur/catdoc/test.doc
(temps : 00:00:07)
Pas de liens dans la table temporaire
liens trouvés : 1
http://server/moteur/catdoc/test.doc
Optimizing tables...
Indexation terminée !
---
its look like its not indexing that file

Here is my config file


define('LIMIT_DAYS',0); //default days before reindex a page

//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','0'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
//define('PHPDIG_PARSE_MSWORD','D:\\serveur\\www\\moteur\\catdoc\\catdoc.exe' );
//define('PHPDIG_PARSE_MSWORD','D:\serveur\www\moteur\catdoc\catdoc.exe');
//define('PHPDIG_PARSE_MSWORD','D:\\\\serveur\\\\www\\\\moteur\\\\catdoc\\\\c atdoc.exe');
define('PHPDIG_PARSE_MSWORD','D:/serveur/www/moteur/catdoc/catdoc.exe');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','D:\\serveur\\www\\moteur\\catdoc\\pdftotext.exe' );
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','D:\\serveur\\www\\moteur\\catdoc\\xls2csv.ex e');
define('PHPDIG_OPTION_MSEXCEL','');

define('PHPDIG_INDEX_MSPOWERPOINT',false);
define('PHPDIG_PARSE_MSPOWERPOINT','/usr/local/bin/ppt2text');
define('PHPDIG_OPTION_MSPOWERPOINT','');


---
PHP INFO :
Safe_mode OFF
allow_url_fopen ON
---

robot_functions.php :

case 'MSWORD':
$usetool = true;
//$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2;
$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
break;

Anything else i can try to make it work ?? :squint:
thanx for your help... :bang:

Charter
12-08-2004, 10:58 AM
Post the info that gets printed from this (http://www.phpdig.net/forum/showthread.php?t=799) thread.

xperienss
12-08-2004, 11:17 AM
thanx Charter for replying...

I tried already all codes changes and here is what i get now when trying to index a pdf file :

---
SITE : http://10.1.0.181/
Chemins exclus :
- @NONE@


Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: d:\serveur\www\moteur\xpdf\pdftotext.exe
Does parse pdf exist: 1
---

and it stop there... nothing happened after that line... :confused:

but when i try in command line, its ok, i get the txt file right

Charter
12-08-2004, 11:27 AM
Maybe one of the following links might help?

http://www.phpdig.net/forum/showthread.php?t=1407
http://www.phpdig.net/forum/showthread.php?t=534

xperienss
12-08-2004, 10:15 PM
Hi again
i ve been to : http://www.phpdig.net/forum/showthread.php?t=1407
and i ve done the same change
and still no luck...

@ Charter
is there any way for me to contact mleray via the forum as she has exactly the same config than mine (easyphp1.7 WinXP) and its look like she found the solution ?
I can try to write a reply to her post but last time she came around was in october 2004 (2months ago)...

Charter
12-09-2004, 02:35 AM
When you use the following, what does it print out?

$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';

Oh, and I've bumped* this (http://www.phpdig.net/forum/showthread.php?t=1407) thread so you should now be able to respond.


* Just a general comment, not directed to anyone in particular: This bump is the exception, not the rule, so don't expect me to bump old threads even if asked. Thanks.

xperienss
12-10-2004, 02:58 AM
here is where i stand for now:

pdf files are indexing but no way for word or xls.


For those waiting for an answer :
My config is WinXP SP2, EasyPHP 1.7 (PHP 4.3.3)
EasyPHP is installed in 'd:\serveur'
Phpdig is installed in 'd:\serveur\www\moteur'

My config file for phpdig

//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','1'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','d:\\serveur\\www\\moteur\\catdoc\\catdoc.exe' );
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','d:\\serveur\\www\\moteur\\xpdf\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','d:\\serveur\\www\\moteur\\catdoc\\xls2csv.ex e');
define('PHPDIG_OPTION_MSEXCEL','');

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt');
define('PHPDIG_MSPOWERPOINT_EXTENSION','');


for pdf reading :
i am using Xpdf/pdftotext availaible here : ftp://ftp.foolabs.com/pub/xpdf/ -- (http://www.foolabs.com/xpdf/download.html)
get 'xpdf-3.00-win32.zip' 1,08Mb
Warning : It cannot index pdf file which are password protected !
AND : shut down ALL firewall on your machine before indexing.

as soon as i ve got the answer for doc and xls file, i ll post the answer.

hope that this will help

Xperienss

Charter
12-10-2004, 12:29 PM
Change:

define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt');

To:

// two single quotes, no space inbetween
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION','');

xperienss
12-11-2004, 04:35 AM
i tried that already and no change

(checked)431:http://xxx/budgetTresorerie.pdf
(temps : 00:58:01)
(not checked)432:http://xxx/budgetTreso.doc
(temps : 00:58:07)
still not indexing .doc and .xls file

but i don't give up and i ll find the solution soon or later ;)

xperienss
12-12-2004, 01:08 AM
okay i see what s wrong now
when i try to index .doc file, catdoc.exe seems to see the file but don't create the outpout file and store that file to the right directory.

the same when i run catdoc in MS-DOSS
catdoc read the info from the doc file but it doesn't print out any file
i can see the infos inside my MS-DOSS window but no file is created

anyone s got any idea what command we need to use ?
catdoc manual :
http://www.45.free.net/~vitus/ice/catdoc/catdoc.man.html

i tried :
-------
catdoc -s 8859-1 -f ascii ../../test/test.doc
Test

Fichier

Word
--------
it read the texte from the doc file but doesn't create any file