PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   catdoc problem with WinXP (http://www.phpdig.net/forum/showthread.php?t=1584)

xperienss 12-08-2004 03:12 AM

catdoc problem with WinXP
 
Hi all

I am using phpdig 1.8.4 on winXP (Windows NT SERVER 5.1 build 2600 ) with easyPHP 1.7 (PHP Version 4.3.3)

I am trying to index .doc files (to start with) with the spider but so far no luck...

When i used catdoc in command line, i get this :
---
catdoc ./test.doc
Banane
Fruit
Abricot
---
those are the words in my doc file.
So i guess catdoc.exe is working

But when i try to index the file using phpdig, here is what i get :
---
SITE : http://server/
Chemins exclus :
- @NONE@
1:http://server/moteur/catdoc/test.doc
(temps : 00:00:07)
Pas de liens dans la table temporaire
liens trouvés : 1
http://server/moteur/catdoc/test.doc
Optimizing tables...
Indexation terminée !
---
its look like its not indexing that file

Here is my config file

PHP Code:

define('LIMIT_DAYS',0);                 //default days before reindex a page

//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','0'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
//define('PHPDIG_PARSE_MSWORD','D:\\serveur\\www\\moteur\\catdoc\\catdoc.exe');
//define('PHPDIG_PARSE_MSWORD','D:\serveur\www\moteur\catdoc\catdoc.exe');
//define('PHPDIG_PARSE_MSWORD','D:\\\\serveur\\\\www\\\\moteur\\\\catdoc\\\\catdoc.exe');
define('PHPDIG_PARSE_MSWORD','D:/serveur/www/moteur/catdoc/catdoc.exe');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','D:\\serveur\\www\\moteur\\catdoc\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','D:\\serveur\\www\\moteur\\catdoc\\xls2csv.exe');
define('PHPDIG_OPTION_MSEXCEL','');

define('PHPDIG_INDEX_MSPOWERPOINT',false);
define('PHPDIG_PARSE_MSPOWERPOINT','/usr/local/bin/ppt2text');
define('PHPDIG_OPTION_MSPOWERPOINT',''); 

---
PHP INFO :
Safe_mode OFF
allow_url_fopen ON
---

robot_functions.php :
PHP Code:

case 'MSWORD':
$usetool true;
//$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2;
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
break; 

Anything else i can try to make it work ?? :squint:
thanx for your help... :bang:

Charter 12-08-2004 10:58 AM

Post the info that gets printed from this thread.

xperienss 12-08-2004 11:17 AM

thanx Charter for replying...

I tried already all codes changes and here is what i get now when trying to index a pdf file :

---
SITE : http://10.1.0.181/
Chemins exclus :
- @NONE@


Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: d:\serveur\www\moteur\xpdf\pdftotext.exe
Does parse pdf exist: 1
---

and it stop there... nothing happened after that line... :confused:

but when i try in command line, its ok, i get the txt file right

Charter 12-08-2004 11:27 AM

Maybe one of the following links might help?

http://www.phpdig.net/forum/showthread.php?t=1407
http://www.phpdig.net/forum/showthread.php?t=534

xperienss 12-08-2004 10:15 PM

Hi again
i ve been to : http://www.phpdig.net/forum/showthread.php?t=1407
and i ve done the same change
and still no luck...

@ Charter
is there any way for me to contact mleray via the forum as she has exactly the same config than mine (easyphp1.7 WinXP) and its look like she found the solution ?
I can try to write a reply to her post but last time she came around was in october 2004 (2months ago)...

Charter 12-09-2004 02:35 AM

When you use the following, what does it print out?
PHP Code:

$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1'

Oh, and I've bumped* this thread so you should now be able to respond.


* Just a general comment, not directed to anyone in particular: This bump is the exception, not the rule, so don't expect me to bump old threads even if asked. Thanks.

xperienss 12-10-2004 02:58 AM

here is where i stand for now:

pdf files are indexing but no way for word or xls.


For those waiting for an answer :
My config is WinXP SP2, EasyPHP 1.7 (PHP 4.3.3)
EasyPHP is installed in 'd:\serveur'
Phpdig is installed in 'd:\serveur\www\moteur'

My config file for phpdig
PHP Code:

//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','1'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','d:\\serveur\\www\\moteur\\catdoc\\catdoc.exe');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','d:\\serveur\\www\\moteur\\xpdf\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','d:\\serveur\\www\\moteur\\catdoc\\xls2csv.exe');
define('PHPDIG_OPTION_MSEXCEL','');

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt');
define('PHPDIG_MSPOWERPOINT_EXTENSION',''); 

for pdf reading :
i am using Xpdf/pdftotext availaible here : ftp://ftp.foolabs.com/pub/xpdf/ -- (http://www.foolabs.com/xpdf/download.html)
get 'xpdf-3.00-win32.zip' 1,08Mb
Warning : It cannot index pdf file which are password protected !
AND : shut down ALL firewall on your machine before indexing.

as soon as i ve got the answer for doc and xls file, i ll post the answer.

hope that this will help

Xperienss

Charter 12-10-2004 12:29 PM

Change:
PHP Code:

define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt'); 

To:
PHP Code:

// two single quotes, no space inbetween
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION',''); 


xperienss 12-11-2004 04:35 AM

i tried that already and no change

(checked)431:http://xxx/budgetTresorerie.pdf
(temps : 00:58:01)
(not checked)432:http://xxx/budgetTreso.doc
(temps : 00:58:07)
still not indexing .doc and .xls file

but i don't give up and i ll find the solution soon or later ;)

xperienss 12-12-2004 01:08 AM

okay i see what s wrong now
when i try to index .doc file, catdoc.exe seems to see the file but don't create the outpout file and store that file to the right directory.

the same when i run catdoc in MS-DOSS
catdoc read the info from the doc file but it doesn't print out any file
i can see the infos inside my MS-DOSS window but no file is created

anyone s got any idea what command we need to use ?
catdoc manual :
http://www.45.free.net/~vitus/ice/ca...atdoc.man.html

i tried :
-------
catdoc -s 8859-1 -f ascii ../../test/test.doc
Test

Fichier

Word
--------
it read the texte from the doc file but doesn't create any file


All times are GMT -8. The time now is 11:41 PM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.