Thread: PDF indexing
View Single Post
Old 11-26-2003, 02:07 PM   #7
aryan
Green Mole
 
Join Date: Nov 2003
Posts: 6
Sorry for the delay, I couldn't test right away. I had forgotten the period in '.txt'. Now that I have the period the temp dir is not full of files after an attempt to index anymore.

But indexing of pdf's ends after the fifth pdf when I try to index a directory with 130 pdf's. I get 7 files, one index (html), the index of the parent directory (html) and 5 pdf indexed. The directory "text_content" contains 7 txt files (1.txt, 2.txt etc), the first 6 are readable but last file "7.txt" is full of unreadable junk, only the first line is readable and then it continues with "° ¢£§! ¢£ ©¢ §£ ¶ ¶ ¶ "3 # $ &' •• ß ® ®¶ ©¢ ¢£ %
@3 @ 3F )01)12A021B2" etc.

I didn't succeed with the debugging line, I tried:

PHP Code:
  if ($usetool) {
        
rename($tempfile1,$tempfile2);
        
exec($command,$result,$retval);
        
unlink($tempfile2);
        echo 
"[h1]$retval[H1]";
            if (!
$retval) {
             
// the replacement if ö is for unbreaking spaces
             // returned by catdoc parsing msword files
             // and '0xAD' "tiret quadratin" returned by pstotext
             // in iso-8859-1
             // Adjust with your encoding and/or your tools
             
if ((is_array($result)) && (count($result) > 0)) {
                
$f_handler fopen($tempfile1,'wb');
                
fwrite($f_handler,str_replace('ö',' ',str_replace(chr(0xad),'-',implode(' ',$result))));
                
fclose($f_handler);
             }
        }
        else {
              return array(
'tempfile'=>0,'tempfilesize'=>0);
        }
    } 
but never saw a "0" or an "1", does that mean they are 0 all the time?

/thanks Aryan

Last edited by aryan; 11-26-2003 at 02:11 PM.
aryan is offline   Reply With Quote