PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 12-01-2004, 01:29 AM   #1
sofos
Green Mole
 
Join Date: Dec 2004
Posts: 5
Question Win 98 + Easyphp & binary problem

Hello,
has some one managed to set up pdf indexation with Win 98 + EasyPHP (PHP Version 4.3.3) + latest phpdig + "pdftotext" binary ??
Difficult to check up with the advised "checklist", it seems that the php function "is_executable" doesnt work on php 4.3.3 (?)

I get this :


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: C:\Web\cgi-bin\pdftotext.exe
Does parse pdf exist: 1

Fatal error: Call to undefined function: is_executable() in c:\web\www\phpdig\admin\robot_functions.php on line 963


And if I cut this line (to see whats going on next), it goes to :


Is result test http an array: 1
What is result test http status: HTML

Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: C:\Web\cgi-bin\pdftotext.exe
Does parse pdf exist: 1
Doublon avec un document existant
1:http://localhost/Documentation/Revues/
(temps : 00:00:06)
+
niveau 1...


Is result test http an array: 1
What is result test http status: PDF

Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: C:\Web\cgi-bin\pdftotext.exe
Does parse pdf exist: 1

Command is: C:\Web\cgi-bin\pdftotext.exe -cork ../admin/temp/62981782.tmp2>&1
Result contains: Array ( )
Return value is: 0

2:http://localhost/Documentation/Revues/Lisezmoi.pdf
(temps : 00:00:18)

Pas de liens dans la table temporaire


Thank's for your help ....
sofos is offline   Reply With Quote
Old 12-02-2004, 11:12 PM   #2
sofos
Green Mole
 
Join Date: Dec 2004
Posts: 5
Smile

OK, after several readings, I have solved the problem for pdf (I wasnt the only one apparenly and it was just the problem of the option line given to the binary)

I am now stucked on a similar problem, with "word" documents (I use Doc2txt for the text conversion) and whenever I try to index those documents, indexation is not done, but it remains :
a 98183891.tmp file in the admin/temp directory and a 98183892.txt file in the admin directory. By the way, this last one is the right text translation of the original Word document. The one in the temp directory contains only one line saying something like :
C:Web\www\phpdig\admin\temp\98183891.tmp --> "" "" \admin\98183892.txt

Has s.o experienced this ?

Thanks

Last edited by sofos; 12-02-2004 at 11:19 PM.
sofos is offline   Reply With Quote
Old 12-03-2004, 03:43 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try setting define('PHPDIG_MSWORD_EXTENSION',''); to define('PHPDIG_MSWORD_EXTENSION','.txt'); in the config file, making sure to have the period on the .txt extension.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-03-2004, 05:34 AM   #4
sofos
Green Mole
 
Join Date: Dec 2004
Posts: 5
Hi Charter, That was already set like this (actually, I have duplicated the 'pdf' settings, except the name of the binary, of course).
sofos is offline   Reply With Quote
Old 12-03-2004, 05:46 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Try running Doc2Txt from shell and verify that filename.doc is output to filename.txt - maybe Doc2Txt outputs to filename.doc.txt or something else?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-03-2004, 06:10 AM   #6
sofos
Green Mole
 
Join Date: Dec 2004
Posts: 5
I try this.
Just to make sure I have been clear enough, the text conversion seems to work just fine, since the 98183892.txt file in admin (which, I guess, was previoulsly in admin/temp) is the right text extraction on the right word doc.
Anyway, I try your advice and i 'll go back to you on monday.
Good week end
sofos is offline   Reply With Quote
Old 12-03-2004, 09:18 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
OIC, so try changing:
PHP Code:
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2
to the following:
PHP Code:
$command PHPDIG_PARSE_MSWORD.' '.PHPDIG_OPTION_MSWORD.' '.$tempfile2.' 2>&1'
and see what it says when you try to index a *.doc file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-06-2004, 12:33 AM   #8
sofos
Green Mole
 
Join Date: Dec 2004
Posts: 5
Smile

Hi, It works now. And I explain it in case some one uses Doc2txt also on a Windows + Easyphp configuration :
Actually, when using Doc2txt, it is necessary to precise in the options the three flags ( " Doc2txt -q -o /admin/temp -E txt filename.doc" )
"-q" to make it quiet,
"-o /admin/temp" to force the generated text file to be in the right directory,
"-E txt" to force the right extension : The problem was mainly here because Phpdig is expecting that a "filename.tmp" (the local copy of the given "filename.doc") will be translated by Doc2txt into a "filename.tmp.txt". But, if the flag -E is omitted, Doc2txt will generate "filename.txt" instead of "filename.tmp.txt".

So it's working and I really appreciate using phpdig !
I try now to set up crons for the scheduling of the indexation process : I hope Windows wil let me do that....

Thanks for your help,
sofos is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DISPLAY_SNIPPETS & DISPLAY_SUMMARY problem philcheese Troubleshooting 0 10-07-2007 05:43 AM
catdoc MSWORD binary won't execute frodo External Binaries 0 06-22-2006 01:31 PM
pstotext binary tomas External Binaries 2 02-12-2004 07:09 PM
PhpDig and EasyPHP frostbyte Troubleshooting 4 01-04-2004 12:40 PM
1.6.2 fix to crawl binary files Charter Mod Submissions 1 09-16-2003 06:52 PM


All times are GMT -8. The time now is 11:50 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.