PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > External Binaries

Reply
 
Thread Tools
Old 01-08-2004, 06:38 AM   #16
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. On the very last else of the phpdigTempFile function, add the following:
PHP Code:
else {
      
// add the two echo lines

      
echo "Evaluation is false for URI: " $uri "<br>";
      echo 
"Result test contains: " print_r($result_test) . "<br>";

      return array(
'tempfile'=>0,'tempfilesize'=>0);

__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-08-2004, 07:05 AM   #17
zevince
Green Mole
 
Join Date: Dec 2003
Posts: 26
ok here is the output :

Quote:
HTML <--- Status
Doublon avec un document existant
43:http://umvf.cochin.univ-paris5.fr/ru...id_rubrique=91
(temps : 00:00:12)
File date unchanged
44:http://umvf.cochin.univ-paris5.fr/ru...id_rubrique=99
(temps : 00:00:12)
HTML <--- Status
45:http://umvf.cochin.univ-paris5.fr/avare3.html
(temps : 00:00:12)
+
niveau 1...
Evaluation is false for URI: http://umvf.cochin.univ-paris5.fr/avare2.pdf
Array ( [status] => PDF [lm_date] => Wed, 07 Jan 2004 13:39:43 GMT [path] => /avare2.pdf [host] => umvf.cochin.univ-paris5.fr [cookies] => Array ( ) ) Result test contains: 1
46:http://umvf.cochin.univ-paris5.fr/avare2.pdf
(temps : 00:00:12)
zevince is offline   Reply With Quote
Old 01-08-2004, 07:47 AM   #18
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Okay, it is the below that is failing because print_r($result_test) outputs one:
PHP Code:
if (is_array($result_test)
     && 
$result_test['status'] == 'HTML'
     
|| $result_test['status'] == 'PLAINTEXT'
     
|| $result_test['status'] == 'MSWORD' && PHPDIG_INDEX_MSWORD == true && file_exists(PHPDIG_PARSE_MSWORD) && $is_exec_command_msword
     
|| $result_test['status'] == 'MSEXCEL' && PHPDIG_INDEX_MSEXCEL == true && file_exists(PHPDIG_PARSE_MSEXCEL) && $is_exec_command_msexcel
     
|| $result_test['status'] == 'PDF' && PHPDIG_INDEX_PDF == true && file_exists(PHPDIG_PARSE_PDF) && $is_exec_command_pdf
    

There was another strange occurrence here that dealt with cookies. If cookies are not the issue, try crawling again and copy paste the info from the raw Apache logs for the crawl.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-09-2004, 01:48 AM   #19
zevince
Green Mole
 
Join Date: Dec 2003
Posts: 26
hmm, i've tried to put the cookies for http://umvf.cochin.univ-paris5.fr/ always permitted, but it seems not to change anything in the crawl :

Quote:
File date unchanged
41:http://umvf.cochin.univ-paris5.fr/ru...id_rubrique=99
(temps : 00:00:10)

HTML <--- Status
42:http://umvf.cochin.univ-paris5.fr/avare3.html
(temps : 00:00:10)
+
niveau 1...
Evaluation is false for URI: http://umvf.cochin.univ-paris5.fr/avare2.pdf
Array ( [status] => PDF [lm_date] => Wed, 07 Jan 2004 13:39:43 GMT [path] => /avare2.pdf [host] => umvf.cochin.univ-paris5.fr [cookies] => Array ( ) ) Result test contains: 1
43:http://umvf.cochin.univ-paris5.fr/avare2.pdf
(temps : 00:00:10)

HTML <--- Status
Doublon avec un document existant
44:http://umvf.cochin.univ-paris5.fr/spip_login.php3
(temps : 00:00:10)

Pas de liens dans la table temporaire

And here is the output of the "access_combined_log" from apache

Quote:
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/style_uol.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/habillage.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/impression.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/lien.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /article.php3?id_article=177 HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "GET /article.php3?id_article=177 HTTP/1.0" 302 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "GET /desole.html HTTP/1.0" 200 1036 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /menu.css HTTP/1.1" 404 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/spip_style.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /rubrique.php3?id_rubrique=99 HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /menu.css HTTP/1.1" 404 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /css/spip_style.css HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "HEAD /avare3.html HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:37 +0100] "GET /avare3.html HTTP/1.0" 200 61 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:38 +0100] "HEAD /avare2.pdf HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:38 +0100] "HEAD /avare2.pdf HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:38 +0100] "HEAD /ecrire/ HTTP/1.1" 302 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:38 +0100] "HEAD //../spip_login.php3 HTTP/1.1" 200 0 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
umvf.cochin.univ-paris5.fr - - [09/Jan/2004:11:36:38 +0100] "GET /spip_login.php3 HTTP/1.0" 200 2222 "-" "PhpDig/1.6.4 (+http://www.phpdig.net/robot.php)"
nestor.lurt-cochin.prd.fr - admin [09/Jan/2004:11:36:38 +0100] "POST /recherche/admin/spider.php HTTP/1.1" 200 8691 "http://umvf.cochin.univ-paris5.fr/recherche/admin/index.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
zevince is offline   Reply With Quote
Old 01-09-2004, 03:26 AM   #20
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. It really seems like the following returns false: $result_test['status'] == 'PDF' && PHPDIG_INDEX_PDF == true && file_exists(PHPDIG_PARSE_PDF) && $is_exec_command_pdf

However, it looks like you echo $result_test_http which says status is PDF but then later $result_test says one: Array ( [status] => PDF [lm_date] => Wed, 07 Jan 2004 13:39:43 GMT [path] => /avare2.pdf [host] => umvf.cochin.univ-paris5.fr [cookies] => Array ( ) ) Result test contains: 1

Let's echo the below items right before and right after the phpdigTempFile function is called and try another index:

In spider.php, add the following echo statements:
PHP Code:
// sets $tempfile and $tempfilesize

/*****/
echo "<br><br>Is result test http an array: " is_array($result_test_http) . "<br>";
echo 
"What is result test http status: " $result_test_http['status'] . "<br><br>";
/*****/

extract(phpdigTempFile($url_indexing,$result_test_http,$relative_script_path.'/admin/temp/')); 
In robot_functions.php, add the following echo statements:
PHP Code:
function phpdigTempFile($uri,$result_test,$prefix='temp/',$suffix1='1.tmp',$suffix2='2.tmp') {

/*****/
echo "<br><br>Is result test an array: " is_array($result_test) . "<br>";
echo 
"What is result test status: " $result_test['status'] . "<br>";
echo 
"Use is executable is set to: " USE_IS_EXECUTABLE_COMMAND "<br>";
echo 
"Index the pdf is set to: " PHPDIG_INDEX_PDF "<br>";
echo 
"Parse the pdf is set to: " PHPDIG_PARSE_PDF "<br>";
echo 
"Does parse pdf exist: " file_exists(PHPDIG_PARSE_PDF) . "<br>";
echo 
"Is parse pdf executable: " is_executable(PHPDIG_PARSE_PDF) . "<br><br>";
/*****/

// $temp_filename = md5(time()+getmypid()).$suffix; 
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-12-2004, 03:40 AM   #21
zevince
Green Mole
 
Join Date: Dec 2003
Posts: 26
ok, i've appended the echo statements, and respider avare3.html

Quote:
HTML <--- Status
Doublon avec un document existant
40:http://umvf.cochin.univ-paris5.fr/ar...id_article=177
(temps : 00:00:50)

File date unchanged
41:http://umvf.cochin.univ-paris5.fr/ru...id_rubrique=99
(temps : 00:00:50)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable:

HTML <--- Status
42:http://umvf.cochin.univ-paris5.fr/avare3.html
(temps : 00:00:50)
+
niveau 1...


Is result test http an array: 1
What is result test http status: PDF



Is result test an array: 1
What is result test status: PDF
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable:

Evaluation is false for URI: http://umvf.cochin.univ-paris5.fr/avare2.pdf
Array ( [status] => PDF [lm_date] => Wed, 07 Jan 2004 13:39:43 GMT [path] => /avare2.pdf [host] => umvf.cochin.univ-paris5.fr [cookies] => Array ( ) ) Result test contains: 1
43:http://umvf.cochin.univ-paris5.fr/avare2.pdf
(temps : 00:00:50)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable:

HTML <--- Status
Doublon avec un document existant
44:http://umvf.cochin.univ-paris5.fr/ar...id_article=131
(temps : 00:00:51)



Is result test http an array: 1
What is result test http status: HTML



Is result test an array: 1
What is result test status: HTML
Use is executable is set to: 1
Index the pdf is set to: 1
Parse the pdf is set to: /usr/local/bin/pstotext
Does parse pdf exist: 1
Is parse pdf executable:

HTML <--- Status
Doublon avec un document existant
45:http://umvf.cochin.univ-paris5.fr/ar...id_article=134
(temps : 00:00:52)
zevince is offline   Reply With Quote
Old 01-12-2004, 04:19 AM   #22
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
PHP Code:
define('PHPDIG_PARSE_PDF','/usr/local/bin/pstotext');

echo 
"Is parse pdf executable: " is_executable(PHPDIG_PARSE_PDF) . "<br><br>"

Is parse pdf executable// empty meaning false 
Hi. The "is parse pdf executable" is not returning a result. This is why the expression in the if statement evaluates to false.

From php.net "If a directory is not executable, then you cannot get details on the files in the directory - this includes the permissions."

Try checking that the usr, local, and bin directories, as well as the pstotext file, are all 755 permissions.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-12-2004, 04:51 AM   #23
zevince
Green Mole
 
Join Date: Dec 2003
Posts: 26
Ok, it's working.. my /usr/local/bin/pstotext dirs and binaries were in chmod 751 and not 755..

sorry for the time to solve this, And thanks very much for your help !
zevince is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Indexing PDF dlaperle Troubleshooting 1 03-21-2007 07:00 PM
spider hangs on indexing pdf (pstotext) sushie External Binaries 7 06-15-2005 05:57 AM
indexing pdf Hoek External Binaries 9 02-25-2004 02:42 AM
PDF indexing lelandv External Binaries 15 12-08-2003 04:23 PM
PDF indexing aryan External Binaries 11 11-27-2003 07:51 AM


All times are GMT -8. The time now is 12:27 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.