PhpDig.net

PhpDig.net (http://www.phpdig.net/forum/index.php)
-   External Binaries (http://www.phpdig.net/forum/forumdisplay.php?f=36)
-   -   running out of memory (http://www.phpdig.net/forum/showthread.php?t=534)

tomas 02-16-2004 12:22 PM

running out of memory
 
hello list,

spidering a bunch of pdf-files (about 250) in one directory -
spider.php runs out of mem (8k in php.ini) after file 50 -
setting php.ini to 32k error after file 110 -
setting to 128k error after file 220 -

i think there is a bug in spider.php with freeing mem ???

any ideas???

tomas

Charter 02-16-2004 01:42 PM

Hi. Is it kb or mb? Maybe try breaking the list into smaller lists and/or index from shell.

tomas 02-16-2004 02:47 PM

sorry charter,

of course 8mb -> 32mb ->128mb

why a smaller list - is spider.php eating my memory :-)
when is called from browser ???

tomas

Charter 02-16-2004 03:35 PM

Hi. Using shell bypasses the web server. What version of PHP are you using and what's your OS? Maybe this is a timeout issue? What are the actual errors that you are receiving?

tomas 02-16-2004 03:47 PM

hi charter,

php-4.3.3
fedora core_1
apache 2

by the way - maybe tricky helpful for other who open pdf-files via javascript which is not recognized by spider.php:

1) on one of the websites make a dummy-link eg. <a href="pdf.php"></a>
2) setup pdf.php in website-root:

<?php
$files = explode("\n",`find .|sort`);
for ($i = 0; $i < count($files); $i++) {
$file=$files[$i];
if (!is_dir($file) and strpos($file, ".pdf", "0")!="") {
printf("<a href=\"%s\"></a><br>\n", $file);
}
}
?>

regards
tomas

Charter 02-16-2004 08:40 PM

Hi. I'm not sure if the issue is related to pdftotext and/or PhpDig. Maybe try memory_get_usage and get_defined_vars within the spider.php file to see if anything unusual shows.

tomas 02-18-2004 02:22 PM

hello charter,

setting php.ini back to 8mb and running spider.php with bash/cron:

spider dies - and his last words were :-)

<b>Fatal error</b>: Allowed memory size of 8388608 bytes exhausted (tried to allocate 653 bytes) in <b>/var/www/html/search/admin/robot_functions.php</b> on line <b>707</b><br />


?

Charter 02-19-2004 06:04 AM

Hi. In the phpdigTempFile function of robot_functions.php, perhaps replace the following:
PHP Code:

$f_handler fopen($tempfile1,'wb');
if (
is_array($file_content)) {
   
fwrite($f_handler,implode('',$file_content));
}
fclose($f_handler);
$tempfilesize filesize($tempfile1); 

with the following:
PHP Code:

$f_handler fopen($tempfile1,'wb');
if (
is_array($file_content)) {
   
fwrite($f_handler,implode('',$file_content));
}
fclose($f_handler);
unset(
$file_content);
$tempfilesize filesize($tempfile1); 


tomas 02-19-2004 07:52 AM

hi charter,

i tried and tested a bit -
now i'm sure the reason are pdf's larger than 2or3 mb
with lots of vector-graphics inside.

so - how could we setup spider.php - to go on
spidering the next files even if one or more files
are too big for allowed memory setting in php.ini.

thanks
tomas

Charter 02-19-2004 09:10 AM

Hi. If you are asking to do something like "if fatal error, no more memory, so skip this file and go to next file" I doubt this can be done because, by the time PHP encounters the fatal error, no more memory, there isn't room to do anything else.

Untested but what you might try though is the following. In the phpdigTempFile function, add the following:
PHP Code:

if (memory_get_usage() + 2000000 8000000) {
    return array(
'tempfile'=>0,'tempfilesize'=>0);


right before the following line:
PHP Code:

$f_handler fopen($tempfile1,'wb'); 

That way at least if the current memory being used (in bytes) plus 2MB is greater than 8MB then the function will end, the file shouldn't be indexed, and the index process should continue.

tomas 02-19-2004 10:52 AM

sorry charter - to bother you again and again,
but nothing works.

tomas

Charter 02-19-2004 11:12 AM

Hi. Try changing the numbers like in the below code or just make a list of PDFs that are less than the 2 or 3 MB ones that are using so much memory.
PHP Code:

if (memory_get_usage() + 1000000 3000000) { 
    return array(
'tempfile'=>0,'tempfilesize'=>0); 




All times are GMT -8. The time now is 06:16 AM.

Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.