problems with pstotext - path-problem?
I installed PhpDig yesterday (after several other search engines) for a webproject from a friend of mine - we are really impressed about this great tool!
Indexing of .html and .doc works fine - without problems. Unfortunately there are problems with pdf-generation - and I could not figure it out in some hours yesterday. This morning I could reproduce this error - and can not understand this very strange behaviour. So perhaps you might have an idea... Operating system is Debian. Spider finds .pdf without problems, but adds no information to database. I checked manually on command line by SSH. "pstotext originalpdf.pdf" works without problems. Also "pstotext mydirectory/originalpdf.pdf" works fine. But: If I change the path to a "higher directory" and must access the file with "../" - generation fails. So with "pstotext ../mydirectory/originalpdf.pdf" (correct path): Code:
gs -r72 -dNODISPLAY -dFIXEDMEDIA -dDELAYBIND -dWRITESYSTEMDICT -dNOPAUSE -dSAFER /tmp/ps2tQYryQK -- '../doctest/Suchmaschinentest2.pdf' The same problem occurs when accessing the generated (and not deleted) tempfiles from commandline (/usr/bin/pstotext -cork ../admin/temp/49389132.tmp). Is this a problem with pstotext? I tried to work with full paths from "/" - but same error. Any help or suggestions would be greatly appreciated... Thank you, kind regards, Jens |
changing pstotext --> pdftotext - spider hangs up
My next steps...
I installed pdftotext now and changed PhpDig to use this tool. Parsing PDFs is ok now - but unfortunately parser hangs up after parsing first pdfs. With next try he reads 1-2 more pdfs and hangs up again. Unfortunately I can not see with which pdf the error occurs. The file xxxxxxxx.tmp in admin/temp is 0 Bytes. All pdfs in webdirectory are accessible with pdftotext directly. Hmm, any suggestions? Kind regards, Jens |
All times are GMT -8. The time now is 09:22 AM. |
Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.