PDA

View Full Version : Check box from spider.php


adtphpDig
02-04-2004, 02:59 AM
I am still trying to get phpDig to crawl pdf's as they do not appear in the search listing.

Since the fonts were installed in a seperate directory I had to do the following in my config.php

define('PHPDIG_INDEX_PDF',true);
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');
define('PHPDIG_OPTION_PDF','-gs "gs -I/usr/ghostscript/fonts/default/Type1"');

Is the above acceptable?

Similar to some of the other postings, I see the pdf's in the list, however not in any of the search engine results. I also noticed that there is no green checkbox next to these pdf's. Does no check box or x mark mean something?

Anyway I'm using 1.6.5 with the critical change to the config.php file. Also, I've attached my spider.php file.

Thanks,
Anand

Charter
02-05-2004, 08:44 AM
Hi. What do you get when you run the following from shell?

/usr/local/bin/pstotext -gs "gs -I/usr/ghostscript/fonts/default/Type1" filename.pdf

adtphpDig
02-05-2004, 09:50 AM
To answer your question. I get the text from the pdf file.

I've moved on believing that the pstotext was not install correctly and have tried pdftotext which I have actually gotten green check marks as well as a 0 result from the exec() command in the robot_functions.php.

In the result from exec() I get 0, 1, and 3. What is 1 and 3 represent?

Charter
02-05-2004, 11:00 AM
Hi. What happens if you copy /usr/local/bin/pstotext to your account and then change PHPDIG_PARSE_PDF to reflect the full path to the new location of pstotext, making sure that the directories and file are all 755 permissions?