PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 12-07-2003, 01:35 AM   #1
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
Thumbs down Indexing MS Word docs under Windows

I've had no success trying to index MS Word (.DOC) documents under Windows. I have:
Code:
define('PHPDIG_INDEX_MSWORD',true);
define('PHPDIG_PARSE_MSWORD','C:\\Program Files\\EasyPHP1-7\\www\\k3\\catdoc');
define('PHPDIG_OPTION_MSWORD','-s 8859-1');
Can anyone comment as to why phpdig makes no attempt to index Word docs?

Any help appreciated

Phil
phil_ballard is offline   Reply With Quote
Old 12-07-2003, 10:06 AM   #2
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Is USE_IS_EXECUTABLE_COMMAND set to true (one) or false (zero) in the config file?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-08-2003, 05:05 AM   #3
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
Unhappy

USE_IS_EXECUTABLE_COMMAND is set at the default value of 1. But things have got worse ...
I decided to can 1.6.2 and try 1.6.5, so I removed all code and the DB tables and re-installed 1.6.5 - install seemed to go OK, but now I can't get past here:

Code:
Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://[whatever]
Exclude paths :
- @NONE@

Fatal error: Call to undefined function: is_executable() in 
c:\program files\easyphp1-7\www\k3\phpdig\admin\robot_functions.php on line 633
Any offers/advice gratefully received.

Phil
phil_ballard is offline   Reply With Quote
Old 12-08-2003, 06:59 AM   #4
Rolandks
Purple Mole
 
Rolandks's Avatar
 
Join Date: Sep 2003
Location: Kassel, Germany
Posts: 119
Take a look here: http://www.phpdig.net/showthread.php?s=&threadid=272

-Roland-
Rolandks is offline   Reply With Quote
Old 12-08-2003, 07:58 AM   #5
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
That seemed to work - thanks!
phil_ballard is offline   Reply With Quote
Old 12-08-2003, 09:14 AM   #6
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
we-ell, we're improving .... now it spiders OK without giving errors, but it still isn't indexing the contents of the .doc files ... I tried spidering directly to the URL of a .doc file I knew existed:

Code:
Spidering in progress...

--------------------------------------------------------------------------------
SITE : http://[mysite IP]/
Exclude paths :
- @NONE@
1:http://[mysite IP]/k3/CVs/4.doc
(time : 00:00:03)
No link in temporary table

--------------------------------------------------------------------------------

links found : 1
http://[mysite IP]/k3/CVs/4.doc
Optimizing tables...
Indexing complete !
--------------------------------------------------------------------------------
 [Back] to admin interface.
... but there's still no keywords indexed .....

Any more help welcome.
phil_ballard is offline   Reply With Quote
Old 12-08-2003, 01:07 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. From the command line what does the following produce?


C:\\Program Files\\EasyPHP1-7\\www\\k3\\catdoc -s 8859-1 change-me-4.doc
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 12-09-2003, 12:44 AM   #8
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
"Cannot load charset cp1251 - file not found"
phil_ballard is offline   Reply With Quote
Old 12-09-2003, 01:50 AM   #9
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
OK, sorted out the charset paths, now seems to extract text OK from the command line, but still not via the web interface...
phil_ballard is offline   Reply With Quote
Old 12-09-2003, 03:02 AM   #10
phil_ballard
Green Mole
 
Join Date: Dec 2003
Posts: 9
OK, all working; it seems that it didn't like the path name having a space in it at C:\\Program Files\\.......
Once I moved catdoc (and it's config subdirectories) to a path not requiring a space (C:\\ for instance) all was well.
Many thanks for your help, guys. (Though I'm sure I'll be back with more dopy questions
BTW my own requirement is for index searching on just one, local directory full of MS Word files. To facilitate this I have a file index.php which provides a link for the spider to all Word files in the directory:
Code:
<HTML>
<HEAD></HEAD>
<BODY>
<?
// function to return file extension (converts extn to lower case)

function gfext($filename)
{
$pathinfo = pathinfo($filename);
$ext = $pathinfo['extension'];
return strtolower($ext);
}

// read this directory
if ($handle = opendir('.')) {
    while (false !== ($file = readdir($handle))) {
        if (gfext($file) == "doc") {   // we only want the Word files
            echo "<a href=\"".$file."\">".$file."</a><br>";
        }
    }
    closedir($handle);
}
?>
</BODY>
</HTML>
At this page the spider encounters a list of href links, one to each word document. Simple stuff, I know, but may help someone?

All the best

Phil

Last edited by phil_ballard; 12-09-2003 at 03:05 AM.
phil_ballard is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Indexing MS Word without binaries phil_ballard External Binaries 2 05-08-2006 02:25 AM
Indexing "<word>-<word>"? FaberFedor How-to Forum 23 02-28-2005 03:35 AM
Indexing word docs javajaga External Binaries 1 03-30-2004 08:19 AM
Indexing word doc's OK search through files don't work dapuse External Binaries 3 02-05-2004 07:38 AM
Can PhpDig index OpenOffice Docs? veggie2u How-to Forum 1 12-08-2003 01:52 PM


All times are GMT -8. The time now is 01:28 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.