![]() |
Not indexing pdf files
I am using pdftotext to index my pdf files. It converts the pdf to a txt file. I can do this successfully from the command prompt. However, when I try to index my site with phpdig it does not index the txt file. I have the following set in my config.php file:
define('PHPDIG_PDF_EXTENSION','.txt'); define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','F:/internet/search/pdftotext/pdftotext'); define('PHPDIG_OPTION_PDF',''); Any suggesions? |
Any ideas? Anyone?
:bang: |
hi jayhhawk,
define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','/path/to/your/pdftotext'); define('PHPDIG_OPTION_PDF',''); //---------EXTERNAL TOOLS EXTENSIONS define('PHPDIG_PDF_EXTENSION','.txt'); this settings shold work - please make shure that here: define('PHPDIG_OPTION_PDF',''); after the comma there are two single quotes! hope this helps :-) tomas |
They are two single quotes. I have been trying to track down the problem. One thing to note is that when I index the site it lists the url for the pdf, but it does not have a green checkmark next to it. Does that provide any clues to the problem I am having?
|
Hi. Perhaps check that the permissions are 755 for the directories to pdftotext and also for the pdftotext file.
|
Permissions are full control (just to see if I can get it to work). Still no luck.
|
Hi. What version of PHP are you running? Perhaps you are experiencing the same problem as in this thread.
|
I'm running PHP version 4.3.4.
|
Hi. Try echoing out the statements like was done in this thread. What do you get?
|
Here is what I get:
Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 1:http://dhi-internet/ (time : 00:00:08) + + + + + 2: http://dhi-internet/ Was recently indexed (time : 00:00:14) 3: http://dhi-internet/ Was recently indexed (time : 00:00:19) 4: http://dhi-internet/ Was recently indexed (time : 00:00:24) level 1... Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 5:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000 (time : 00:00:35) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 6:http://dhi-internet/test/acobook.pdf (time : 00:00:40) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 7:http://dhi-internet/docs/seanresume0204.pdf (time : 00:00:45) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 8:http://dhi-internet/test/regs.html (time : 00:00:53) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\Ghostgum\pstotxt\pstotxt3 Does parse pdf exist: 9:http://dhi-internet/test/dhi.html (time : 00:01:00) No link in temporary table |
Hi. It looks like the following is not returning a value:
PHP Code:
PHP Code:
|
I fixed the path problem, but it is still not indexing the pdfs. Here is what it displays:
SITE : http://dhi-internet/ Exclude paths : - @NONE@ Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 1:http://dhi-internet/ (time : 00:00:08) + + + + + Error: Couldn't open file '.txt' Error: Couldn't open file '.txt' level 1... Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 2:http://dhi-internet/index.php?=PHPB8...9-4C7B08C10000 (time : 00:00:20) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 Hello PDFRValue 13:http://dhi-internet/docs/seanresume0204.pdf (time : 00:00:25) Is result test http an array: 1 What is result test http status: PDF Is result test an array: 1 What is result test status: PDF Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 Hello PDFRValue 14:http://dhi-internet/test/acobook.pdf (time : 00:00:59) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 5:http://dhi-internet/test/regs.html (time : 00:01:07) Is result test http an array: 1 What is result test http status: HTML Is result test an array: 1 What is result test status: HTML Use is executable is set to: 0 Index the pdf is set to: 1 Parse the pdf is set to: F:\dhi-internet\search\pdftotext\pdftotext.exe Does parse pdf exist: 1 6:http://dhi-internet/test/dhi.html (time : 00:01:14) No link in temporary table |
Hi. Do you have the following in the config file?
define('PHPDIG_INDEX_PDF',true); define('PHPDIG_PARSE_PDF','F:\\dhi-internet\\search\\pdftotext\\pdftotext'); define('PHPDIG_OPTION_PDF',''); define('PHPDIG_PDF_EXTENSION','.txt'); |
I got this to work finally. Thanks for all of your help!:D
|
All times are GMT -8. The time now is 06:51 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.