Sorry for the delay, I couldn't test right away. I had forgotten the period in '.txt'. Now that I have the period the temp dir is not full of files after an attempt to index anymore.
But indexing of pdf's ends after the fifth pdf when I try to index a directory with 130 pdf's. I get 7 files, one index (html), the index of the parent directory (html) and 5 pdf indexed. The directory "text_content" contains 7 txt files (1.txt, 2.txt etc), the first 6 are readable but last file "7.txt" is full of unreadable junk, only the first line is readable and then it continues with "° ¢£§! ¢£ ©¢ §£ ¶ ¶ ¶ "3 # $ &' •• ß ® ®¶ ©¢ ¢£ %
@3 @ 3F )01)12A021B2" etc.
I didn't succeed with the debugging line, I tried:
PHP Code:
if ($usetool) {
rename($tempfile1,$tempfile2);
exec($command,$result,$retval);
unlink($tempfile2);
echo "[h1]$retval[H1]";
if (!$retval) {
// the replacement if ö is for unbreaking spaces
// returned by catdoc parsing msword files
// and '0xAD' "tiret quadratin" returned by pstotext
// in iso-8859-1
// Adjust with your encoding and/or your tools
if ((is_array($result)) && (count($result) > 0)) {
$f_handler = fopen($tempfile1,'wb');
fwrite($f_handler,str_replace('ö',' ',str_replace(chr(0xad),'-',implode(' ',$result))));
fclose($f_handler);
}
}
else {
return array('tempfile'=>0,'tempfilesize'=>0);
}
}
but never saw a "0" or an "1", does that mean they are 0 all the time?
/thanks Aryan