Charter |
04-09-2004 09:24 AM |
README before posting
External Binaries Problem Checklist
This checklist includes most external binaries related issues pertaining to PhpDig version 1.6.4+ but is not meant to be absolutely exhaustive. If you are experiencing an external binaries related problem, then read through this checklist. - If receiving a "call to undefined function: is_executable" error or using PHP < 5.0.0 on a Win system, set define('USE_IS_EXECUTABLE_COMMAND','0'); in the config file.
- Check that the directories to the external binary and the external binary itself are set to 755 permissions if applicable.
- Check that the following directories are set to 777 permissions if applicable:
- [PHPDIG_DIR]/text_content
- [PHPDIG_DIR]/includes (can be set to 755 after connect.php is configured)
- [PHPDIG_DIR]/admin/temp
- If using PHP version 4.2.2/3, check this thread or upgrade your PHP.
- If using for example pdftotext, make sure define('PHPDIG_PDF_EXTENSION','.txt'); includes the period in the .txt extension.
- If using for example pstotext, make sure Ghostscript is installed correctly, version 3.33+ for PS files or version 3.51+ for PDF files.
- Set the correct path, for example define('PHPDIG_PARSE_PDF','/path/to/pdftotext'); on *nix or define('PHPDIG_PARSE_PDF','C:\\path\\to\\pdftotext'); on Win (may need .exe extension on Win).
- If not sure of the path, run the external binary from command line first and try that path.
- Use a path that does not include spaces, periods, or other 'special' characters.
- Check to make sure that safe_mode is set to off and allow_url_fopen is set to on.
- If an open_basedir restriction is in place, make sure to stick the files in the correct directory.
- If indexing from command line, make sure register_argc_argv is on or check this thread.
- If not sure about safe_mode, allow_url_fopen, open_basedir, or register_argc_argv, check your phpinfo page.
- Set define('LIMIT_DAYS',0); to allow for immediate reindex or check this thread.
- Contact the authors of the external binaries if you have trouble compiling and/or installing those programs.
- Still having problems...
Try the below code, modifying the code for other binaries if necessary, do another index, and post the results in your own thread:
First try the following and then reindex.
In robot_functions.php, find the appropriate $command variable:
PHP Code:
// it can have _PDF or _MSWORD or _MSEXCEL depending on binary
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;
And change to the following to see if the issue is displayed upon reindex:
PHP Code:
// it can have _PDF or _MSWORD or _MSEXCEL depending on binary
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';
If that didn't help, then try the following and reindex.
In spider.php, add the following echo statements:
PHP Code:
// sets $tempfile and $tempfilesize
/*****/
echo "<br><br>Is result test http an array: " . is_array($result_test_http) . "<br>";
echo "What is result test http status: " . $result_test_http['status'] . "<br>";
/*****/
extract(phpdigTempFile($url_indexing,$result_test_http,$relative_script_path.'/admin/temp/'));
In robot_functions.php, add the following echo statements:
PHP Code:
function phpdigTempFile($uri,$result_test,$prefix='temp/',$suffix1='1.tmp',$suffix2='2.tmp') {
/*****/
echo "<br>Is result test an array: " . is_array($result_test) . "<br>";
echo "What is result test status: " . $result_test['status'] . "<br>";
echo "Use is executable is set to: " . USE_IS_EXECUTABLE_COMMAND . "<br>";
// in the next four lines change _PDF to either _MSWORD or _MSEXCEL for those binaries
echo "Index the pdf is set to: " . PHPDIG_INDEX_PDF . "<br>";
echo "Parse the pdf is set to: " . PHPDIG_PARSE_PDF . "<br>";
echo "Does parse pdf exist: " . file_exists(PHPDIG_PARSE_PDF) . "<br>";
echo "Is parse pdf executable: " . is_executable(PHPDIG_PARSE_PDF) . "<br>";
/*****/
// $temp_filename = md5(time()+getmypid()).$suffix;
Also in robot_functions.php, add the following echo/print statements:
PHP Code:
exec($command,$result,$retval);
/*****/
echo "<br>Command is: " . $command . "<br>";
echo "Result contains: ";
print_r($result);
echo "<br>Return value is: " . $retval . "<br><br>";
/*****/
unlink($tempfile2);
Remember to remove any "word" wrapping in the above code.
|