|
02-12-2004, 11:23 AM | #1 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
Not indexing pdf files
I am using pdftotext to index my pdf files. It converts the pdf to a txt file. I can do this successfully from the command prompt. However, when I try to index my site with phpdig it does not index the txt file. I have the following set in my config.php file:
define('PHPDIG_PDF_EXTENSION','.txt'); define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','F:/internet/search/pdftotext/pdftotext'); define('PHPDIG_OPTION_PDF',''); Any suggesions? |
02-13-2004, 08:13 AM | #2 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
Any ideas? Anyone?
|
02-13-2004, 10:57 AM | #3 |
Orange Mole
Join Date: Feb 2004
Posts: 47
|
hi jayhhawk,
define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/path/to/your/pdftotext'); define('PHPDIG_OPTION_PDF',''); //---------EXTERNAL TOOLS EXTENSIONS define('PHPDIG_PDF_EXTENSION','.txt'); this settings shold work - please make shure that here: define('PHPDIG_OPTION_PDF',''); after the comma there are two single quotes! hope this helps :-) tomas |
02-13-2004, 11:50 AM | #4 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
They are two single quotes. I have been trying to track down the problem. One thing to note is that when I index the site it lists the url for the pdf, but it does not have a green checkmark next to it. Does that provide any clues to the problem I am having?
|
02-14-2004, 12:43 PM | #5 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Perhaps check that the permissions are 755 for the directories to pdftotext and also for the pdftotext file.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-16-2004, 01:58 PM | #6 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
Permissions are full control (just to see if I can get it to work). Still no luck.
|
02-16-2004, 02:48 PM | #7 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. What version of PHP are you running? Perhaps you are experiencing the same problem as in this thread.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-16-2004, 03:09 PM | #8 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
I'm running PHP version 4.3.4.
|
02-16-2004, 03:18 PM | #9 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Try echoing out the statements like was done in this thread. What do you get?
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-16-2004, 03:43 PM | #10 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
Here is what I get:
Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 1:http://dhi-internet/ (time : 00:00:08) + + + + + 2: http://dhi-internet/ Was recently indexed (time : 00:00:14) 3: http://dhi-internet/ Was recently indexed (time : 00:00:19) 4: http://dhi-internet/ Was recently indexed (time : 00:00:24) level 1... Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 5:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000 (time : 00:00:35) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 6:http://dhi-internet/test/acobook.pdf (time : 00:00:40) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 7:http://dhi-internet/docs/seanresume0204.pdf (time : 00:00:45) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 8:http://dhi-internet/test/regs.html (time : 00:00:53) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 9:http://dhi-internet/test/dhi.html (time : 00:01:00) No link in temporary table |
02-16-2004, 04:16 PM | #11 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. It looks like the following is not returning a value:
PHP Code:
PHP Code:
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-17-2004, 07:23 AM | #12 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
I fixed the path problem, but it is still not indexing the pdfs. Here is what it displays:
SITE : http://dhi-internet/ Exclude paths : - @NONE@ Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 1:http://dhi-internet/ (time : 00:00:08) + + + + + Error: Couldn't open file '.txt' Error: Couldn't open file '.txt' level 1... Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 2:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000 (time : 00:00:20) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 Hello PDFRValue 13:http://dhi-internet/docs/seanresume0204.pdf (time : 00:00:25) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 Hello PDFRValue 14:http://dhi-internet/test/acobook.pdf (time : 00:00:59) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 5:http://dhi-internet/test/regs.html (time : 00:01:07) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 6:http://dhi-internet/test/dhi.html (time : 00:01:14) No link in temporary table |
02-17-2004, 12:10 PM | #13 |
Head Mole
Join Date: May 2003
Posts: 2,539
|
Hi. Do you have the following in the config file?
define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','F:\\dhi-internet\\search\\pdftotext\\pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','.txt');
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension. |
02-18-2004, 06:13 AM | #14 |
Green Mole
Join Date: Feb 2004
Posts: 13
|
I got this to work finally. Thanks for all of your help!
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
searching PDF files | bcunico | External Binaries | 3 | 02-24-2006 02:40 AM |
problem with .pdf and .doc files | mleray | External Binaries | 11 | 12-09-2004 11:26 PM |
How to index a directory with pdf files | simonced | How-to Forum | 3 | 02-13-2004 11:41 AM |
Add PDF files to be indexed | chazter | External Binaries | 4 | 10-07-2003 07:43 AM |
Search PDF files | chazter | External Binaries | 4 | 10-02-2003 07:47 AM |