jguert
08-17-2006, 07:39 AM
I have some problems with correct mime type detection on our linux server. The documents are pdf and word (doc) files, uploaded with a form an saved without fileextension. Normally Phpdig should read the header and spider the file with the correct external binary. The files are named like 22_upload, 23_upload ...
I'm using catdoc and pstotext with phpdig version 1.8.5. The binary installation should be correct, because
catdoc /path to file/file and
pstotext -cork /path to file/file
returns the content text
file -i /path to file/file shows the mime-type:
application/pdf or application/msword
Spider ist running, but the files in text_content (*.txt) and the column first_words in the database contains the binary code of the files not text content. I'm using # php -f /path/spider.php http://path/documents/ >> /var/log/phpdig.log
So it seems, that robot_functions.php does not recognise the mime-type of the documents and does not know, which external binary is correct. Therefore binary code is written into database.
Thanks for any suggestions,
Joe
I'm using catdoc and pstotext with phpdig version 1.8.5. The binary installation should be correct, because
catdoc /path to file/file and
pstotext -cork /path to file/file
returns the content text
file -i /path to file/file shows the mime-type:
application/pdf or application/msword
Spider ist running, but the files in text_content (*.txt) and the column first_words in the database contains the binary code of the files not text content. I'm using # php -f /path/spider.php http://path/documents/ >> /var/log/phpdig.log
So it seems, that robot_functions.php does not recognise the mime-type of the documents and does not know, which external binary is correct. Therefore binary code is written into database.
Thanks for any suggestions,
Joe