PDA

View Full Version : README before posting


Charter
04-09-2004, 09:24 AM
External Binaries Problem Checklist

This checklist includes most external binaries related issues pertaining to PhpDig version 1.6.4+ but is not meant to be absolutely exhaustive. If you are experiencing an external binaries related problem, then read through this checklist.

If receiving a "call to undefined function: is_executable" error or using PHP < 5.0.0 on a Win system, set define('USE_IS_EXECUTABLE_COMMAND','0'); in the config file.

Check that the directories to the external binary and the external binary itself are set to 755 permissions if applicable.

Check that the following directories are set to 777 permissions if applicable:
- /text_content
- [PHPDIG_DIR]/includes (can be set to 755 after connect.php is configured)
- [PHPDIG_DIR]/admin/temp

If using PHP version 4.2.2/3, check this (http://www.phpdig.net/showthread.php?threadid=570) thread or upgrade your PHP.

If using for example pdftotext, make sure define('PHPDIG_PDF_EXTENSION','.txt'); includes the period in the .txt extension.

If using for example pstotext, make sure Ghostscript is installed correctly, version 3.33+ for PS files or version 3.51+ for PDF files.

Set the correct path, for example define('PHPDIG_PARSE_PDF','/path/to/pdftotext'); on *nix or define('PHPDIG_PARSE_PDF','C:\\path\\to\\pdftotext'); on Win (may need .exe extension on Win).

If not sure of the path, run the external binary from command line first and try that path.

Use a path that does not include spaces, periods, or other 'special' characters.

Check to make sure that safe_mode is set to off and allow_url_fopen is set to on.

If an open_basedir restriction is in place, make sure to stick the files in the correct directory.

If indexing from command line, make sure register_argc_argv is on or check this (http://www.phpdig.net/showthread.php?threadid=547) thread.

If not sure about safe_mode, allow_url_fopen, open_basedir, or register_argc_argv, check your phpinfo (http://www.php.net/manual/en/function.phpinfo.php) page.

Set define('LIMIT_DAYS',0); to allow for immediate reindex or check this (http://www.phpdig.net/showthread.php?threadid=513) thread.

Contact the authors of the external binaries if you have trouble compiling and/or installing those programs.

Still having problems...

Try the below code, modifying the code for other binaries if necessary, do another index, and post the results in your own thread:

First try the following and then reindex.

In robot_functions.php, find the appropriate $command variable:
[php]
// it can have _PDF or _MSWORD or _MSEXCEL depending on binary
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2;

And change to the following to see if the issue is displayed upon reindex:

// it can have _PDF or _MSWORD or _MSEXCEL depending on binary
$command = PHPDIG_PARSE_PDF.' '.PHPDIG_OPTION_PDF.' '.$tempfile2.' 2>&1';

If that didn't help, then try the following and reindex.

In spider.php, add the following echo statements:

// sets $tempfile and $tempfilesize

/*****/
echo "<br><br>Is result test http an array: " . is_array($result_test_http) . "<br>";
echo "What is result test http status: " . $result_test_http['status'] . "<br>";
/*****/

extract(phpdigTempFile($url_indexing,$result_test_http,$relative_script_pat h.'/admin/temp/'));

In robot_functions.php, add the following echo statements:

function phpdigTempFile($uri,$result_test,$prefix='temp/',$suffix1='1.tmp',$suffix2='2.tmp') {

/*****/
echo "<br>Is result test an array: " . is_array($result_test) . "<br>";
echo "What is result test status: " . $result_test['status'] . "<br>";
echo "Use is executable is set to: " . USE_IS_EXECUTABLE_COMMAND . "<br>";
// in the next four lines change _PDF to either _MSWORD or _MSEXCEL for those binaries
echo "Index the pdf is set to: " . PHPDIG_INDEX_PDF . "<br>";
echo "Parse the pdf is set to: " . PHPDIG_PARSE_PDF . "<br>";
echo "Does parse pdf exist: " . file_exists(PHPDIG_PARSE_PDF) . "<br>";
echo "Is parse pdf executable: " . is_executable(PHPDIG_PARSE_PDF) . "<br>";
/*****/

// $temp_filename = md5(time()+getmypid()).$suffix;

Also in robot_functions.php, add the following echo/print statements:

exec($command,$result,$retval);

/*****/
echo "<br>Command is: " . $command . "<br>";
echo "Result contains: ";
print_r($result);
echo "<br>Return value is: " . $retval . "<br><br>";
/*****/

unlink($tempfile2);

Remember to remove any "word" wrapping in the above code.