![]() |
I wrote a mod for indexing pdf without an external binary!!!
Hello people
I have written a modification, with which I now can index pdf-files. The special is: You don't need an external binary like ps2txt or another UNIX-tool. The mod sends the pdf to adobe, which it converts to html-code. After that, this code is indexed by phpDig. For more information, please visit my homepage <removed> |
Is your robot_functions.php meant to completely replace the one that comes with phpdig? It's hard to tell since your site isn't in English. ;)
|
I had to change code at 4 or 5 positions in the already existing file robot_functions.
so, the easiest way is to replaceing this file (if you didnt made some changes in this file yourself, else make a backup!). In the header of the file, I have listened all changes, i made. The english part of it in my homepage will comming soon... (Or has anyone desire for doing that?) sorry for my bad english ;) |
Please download and use only the actual version from my site.
(The older version has a bug) I made it for the phpDig V1.8.1. It won't work with older version of phpDig. |
Hi. From the Adobe Terms of Use located here:
Quote:
|
Hello charter
;( , sorry, i didn' read the terms of adobe. I was very happy to have a sollution for this sch*** pdf-problem. oh, i really hate adobe!!! because I can't install ps2txt or pdf2html at my webspace, i have to search annother sollution. could i send the pdf to annother server (of a friend or else) which converts it for me with pdf2html and sends then back to me? i have not much enought unix-experience, so i'm not sure. or know anybody a converter for pdf2txt written in perl (cgi)? annother sollution is, sending it once to adobe and then save it in a database, until its mtime changes. with this, I think adobe could nothing say!!! P.S. I really like phpDig, but without pdf-support, I could it use only half. greets CaCO3 |
I know nothing about Perl, but a quick search on Google yielded this. If that's not a workable solution for you, just do a search on "pdf to text perl script" (without the quotes).
Hope this helps. :) |
Hi. At FooLabs is a mirror to PlanetMirror where you can find compiled versions of pdftotext.
Go to PlanetMirror and download xpdf-3.00-linux.tar.gz (assuming linux is your operating system). Unzip xpdf-3.00-linux.tar.gz and extract only the pdftotext file (it's already been compiled and is a binary file). FTP just the pdftotext file in binary mode to your account. Once the file is over, change its permission to rwxr-xr-x (755 permission). Now in the PhpDig config file, set the following: PHP Code:
PHP Code:
From the admin panel of PhpDig version 1.8.1, just type in the link to a PDF file, and set search depth to zero and set links per to one, to test pdftotext on the one PDF file. |
thanx for your tipps charter
I made it with explanations. (firstly i restored all files from phpdig to its originals) ;) then i changed the config.php like you said. for the path, i used /home/ruinelli/public_html/cgi-bin in which I too moved the file pdftotext (1MB). But I think, in this dir I can't (don't have the permition) for executing binaries!!! then I executet the spider with http://testdomain.ruinelli.ch/gpl.pdf it spiders, but no keyword is putted in the database. ;( I think, the problem is that the file pdf2txt has to be in a bin-folder like /bin or /usr/local/bin to wich I don't have access. you can test it under: http://www.ruinelli.ch/phpdig/admin/index.php @vinyl-junkie: read the problem @: http://forums.devshed.com/archive/t-121054 |
Quote:
|
Hi. Make a new directory called binaries and move the pdftotext to this directory. Make sure pdftotext still has 755 permission. Then set the following in the PhpDig config file:
PHP Code:
|
yeeeees, it works!!!!!
in the path, i forgot the filname pdftotxt in the path ;( but now it works. thank a lot!!! I read so many explanations but with none I get it to work. now, I can send my mod to /dev/null ;) I think, It would be nice, when the docu for phpdig would be more explaining. greets CaCO3 [a really happy man with a genial searchmaschin on his page ;) ] |
All times are GMT -8. The time now is 05:56 PM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.