PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 12-08-2004, 04:12 AM   #1
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
catdoc problem with WinXP

Hi all

I am using phpdig 1.8.4 on winXP (Windows NT SERVER 5.1 build 2600 ) with easyPHP 1.7 (PHP Version 4.3.3)

I am trying to index .doc files (to start with) with the spider but so far no luck...

When i used catdoc in command line, i get this :
---
catdoc ./test.doc
Banane
Fruit
Abricot
---
those are the words in my doc file.
So i guess catdoc.exe is working

But when i try to index the file using phpdig, here is what i get :
---
SITE : http://server/
Chemins exclus :
- @NONE@
1:http://server/moteur/catdoc/test.doc
(temps : 00:00:07)
Pas de liens dans la table temporaire
liens trouvés : 1
http://server/moteur/catdoc/test.doc
Optimizing tables...
Indexation terminée !
---
its look like its not indexing that file

Here is my config file

PHP Code:
define('LIMIT_DAYS',0);                 //default days before reindex a page

//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','0'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
//define('PHPDIG_PARSE_MSWORD','D:\\serveur\\www\\moteur\\catdoc\\catdoc.exe');
//define('PHPDIG_PARSE_MSWORD','D:\serveur\www\moteur\catdoc\catdoc.exe');
//define('PHPDIG_PARSE_MSWORD','D:\\\\serveur\\\\www\\\\moteur\\\\catdoc\\\\catdoc.exe');
define('PHPDIG_PARSE_MSWORD','D:/serveur/www/moteur/catdoc/catdoc.exe');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','D:\\serveur\\www\\moteur\\catdoc\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','D:\\serveur\\www\\moteur\\catdoc\\xls2csv.exe');
define('PHPDIG_OPTION_MSEXCEL','');

define('PHPDIG_INDEX_MSPOWERPOINT',false);
define('PHPDIG_PARSE_MSPOWERPOINT','/usr/local/bin/ppt2text');
define('PHPDIG_OPTION_MSPOWERPOINT',''); 
---
PHP INFO :
Safe_mode OFF
allow_url_fopen ON
---

robot_functions.php :
PHP Code:
case 'MSWORD':
$usetool true;
//$command = PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2;
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1';
break; 
Anything else i can try to make it work ??
thanx for your help...
xperienss is offline   Reply With Quote
Old 12-08-2004, 11:58 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Post the info that gets printed from this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-08-2004, 12:17 PM   #3
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
thanx Charter for replying...

I tried already all codes changes and here is what i get now when trying to index a pdf file :

---
SITE : http://10.1.0.181/
Chemins exclus :
- @NONE@


Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 0
Index the pdf is set to: 1
Parse the pdf is set to: d:\serveur\www\moteur\xpdf\pdftotext.exe
Does parse pdf exist: 1
---

and it stop there... nothing happened after that line...

but when i try in command line, its ok, i get the txt file right

Last edited by xperienss; 12-08-2004 at 12:25 PM.
xperienss is offline   Reply With Quote
Old 12-08-2004, 12:27 PM   #4
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Maybe one of the following links might help?

http://www.phpdig.net/forum/showthread.php?t=1407
http://www.phpdig.net/forum/showthread.php?t=534
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-08-2004, 11:15 PM   #5
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
Hi again
i ve been to : http://www.phpdig.net/forum/showthread.php?t=1407
and i ve done the same change
and still no luck...

@ Charter
is there any way for me to contact mleray via the forum as she has exactly the same config than mine (easyphp1.7 WinXP) and its look like she found the solution ?
I can try to write a reply to her post but last time she came around was in october 2004 (2months ago)...
xperienss is offline   Reply With Quote
Old 12-09-2004, 03:35 AM   #6
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
When you use the following, what does it print out?
PHP Code:
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1'
Oh, and I've bumped* this thread so you should now be able to respond.


* Just a general comment, not directed to anyone in particular: This bump is the exception, not the rule, so don't expect me to bump old threads even if asked. Thanks.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-10-2004, 03:58 AM   #7
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
here is where i stand for now:

pdf files are indexing but no way for word or xls.


For those waiting for an answer :
My config is WinXP SP2, EasyPHP 1.7 (PHP 4.3.3)
EasyPHP is installed in 'd:\serveur'
Phpdig is installed in 'd:\serveur\www\moteur'

My config file for phpdig
PHP Code:
//---------EXTERNAL TOOLS SETUP
// if set to true is_executable used - set to '0' if is_executable is undefined
define('USE_IS_EXECUTABLE_COMMAND','1'); //use is_executable for external binaries

// if set to true, full path to external binary required
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','d:\\serveur\\www\\moteur\\catdoc\\catdoc.exe');
define('PHPDIG_OPTION_MSWORD','');

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','d:\\serveur\\www\\moteur\\xpdf\\pdftotext.exe');
define('PHPDIG_OPTION_PDF','');

define('PHPDIG_INDEX_MSEXCEL',true);
define('PHPDIG_PARSE_MSEXCEL','d:\\serveur\\www\\moteur\\catdoc\\xls2csv.exe');
define('PHPDIG_OPTION_MSEXCEL','');

//---------EXTERNAL TOOLS EXTENSIONS
// if external binary is not STDOUT or different extension is needed
// for example, use '.txt' if external binary writes to filename.txt
define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_PDF_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt');
define('PHPDIG_MSPOWERPOINT_EXTENSION',''); 
for pdf reading :
i am using Xpdf/pdftotext availaible here : ftp://ftp.foolabs.com/pub/xpdf/ -- (http://www.foolabs.com/xpdf/download.html)
get 'xpdf-3.00-win32.zip' 1,08Mb
Warning : It cannot index pdf file which are password protected !
AND : shut down ALL firewall on your machine before indexing.

as soon as i ve got the answer for doc and xls file, i ll post the answer.

hope that this will help

Xperienss

Last edited by xperienss; 12-10-2004 at 04:32 AM.
xperienss is offline   Reply With Quote
Old 12-10-2004, 01:29 PM   #8
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Change:
PHP Code:
define('PHPDIG_MSWORD_EXTENSION','.txt');
define('PHPDIG_MSEXCEL_EXTENSION','.txt'); 
To:
PHP Code:
// two single quotes, no space inbetween
define('PHPDIG_MSWORD_EXTENSION','');
define('PHPDIG_MSEXCEL_EXTENSION',''); 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-11-2004, 05:35 AM   #9
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
i tried that already and no change

(checked)431:http://xxx/budgetTresorerie.pdf
(temps : 00:58:01)
(not checked)432:http://xxx/budgetTreso.doc
(temps : 00:58:07)
still not indexing .doc and .xls file

but i don't give up and i ll find the solution soon or later
xperienss is offline   Reply With Quote
Old 12-12-2004, 02:08 AM   #10
xperienss
Green Mole
 
Join Date: Dec 2004
Location: Geneva Switzerland
Posts: 8
okay i see what s wrong now
when i try to index .doc file, catdoc.exe seems to see the file but don't create the outpout file and store that file to the right directory.

the same when i run catdoc in MS-DOSS
catdoc read the info from the doc file but it doesn't print out any file
i can see the infos inside my MS-DOSS window but no file is created

anyone s got any idea what command we need to use ?
catdoc manual :
http://www.45.free.net/~vitus/ice/ca...atdoc.man.html

i tried :
-------
catdoc -s 8859-1 -f ascii ../../test/test.doc
Test

Fichier

Word
--------
it read the texte from the doc file but doesn't create any file

Last edited by xperienss; 12-12-2004 at 02:37 AM.
xperienss is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
catdoc with WinXP sandychan External Binaries 0 07-12-2006 07:50 PM
command line using winxp mrgee Troubleshooting 1 11-03-2004 03:20 AM
Catdoc garbage Hoek External Binaries 3 02-23-2004 02:57 PM
catdoc Tanasja External Binaries 7 11-07-2003 02:55 PM
catdoc mario External Binaries 1 10-28-2003 08:13 PM


All times are GMT -8. The time now is 12:54 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.