PDA

View Full Version : problems with pstotext - path-problem?


jmeyerdo
01-25-2006, 10:02 PM
I installed PhpDig yesterday (after several other search engines) for a webproject from a friend of mine - we are really impressed about this great tool!

Indexing of .html and .doc works fine - without problems.
Unfortunately there are problems with pdf-generation - and I could not figure it out in some hours yesterday.
This morning I could reproduce this error - and can not understand this very strange behaviour. So perhaps you might have an idea...

Operating system is Debian.
Spider finds .pdf without problems, but adds no information to database.

I checked manually on command line by SSH.
"pstotext originalpdf.pdf" works without problems.
Also "pstotext mydirectory/originalpdf.pdf" works fine.
But: If I change the path to a "higher directory" and must access the file with "../" - generation fails.
So with "pstotext ../mydirectory/originalpdf.pdf" (correct path):

gs -r72 -dNODISPLAY -dFIXEDMEDIA -dDELAYBIND -dWRITESYSTEMDICT -dNOPAUSE -dSAFER /tmp/ps2tQYryQK -- '../doctest/Suchmaschinentest2.pdf'
GPL Ghostscript 8.01 (2004-01-30)
Copyright (C) 2004 artofcode LLC, Benicia, CA. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
QI 100 0 0 -100 0 84200
Error: /invalidfileaccess in --.libfile--
Operand stack:
(../doctest/Suchmaschinentest2.pdf)
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push --nostringval-- 1 3 %oparray_pop --nostringval-- --nostringval-- --nostringval--
Dictionary stack:
--dict:1051/1123(ro)(G)-- --dict:0/20(G)-- --dict:71/200(L)--
Current allocation mode is local
Last OS error: 2
GPL Ghostscript 8.01: Unrecoverable error, exit code 1


I can not understand this behaviour.
The same problem occurs when accessing the generated (and not deleted) tempfiles from commandline (/usr/bin/pstotext -cork ../admin/temp/49389132.tmp).

Is this a problem with pstotext?
I tried to work with full paths from "/" - but same error.

Any help or suggestions would be greatly appreciated...
Thank you, kind regards,
Jens

jmeyerdo
01-26-2006, 11:30 AM
My next steps...

I installed pdftotext now and changed PhpDig to use this tool.
Parsing PDFs is ok now - but unfortunately parser hangs up after parsing first pdfs. With next try he reads 1-2 more pdfs and hangs up again.
Unfortunately I can not see with which pdf the error occurs.
The file xxxxxxxx.tmp in admin/temp is 0 Bytes.

All pdfs in webdirectory are accessible with pdftotext directly.

Hmm, any suggestions?
Kind regards,
Jens