View Full Version : Indexing by command line interface
Hi,
i installed phpdig 1.6.2 in a linux machine and now i'm trying to index by command line.
/usr/bin/php4 -f [path]/search/admin/spider.php forceall >> /tmp/phpdig.log
nothing happend! the phpdig.log includes something like
848: old priority 0, new priority 18
and the indexing (reindexing of existing hosts) doesn't work.
Some ideas?
Thanks a lot.
JS
Charter
09-18-2003, 07:19 AM
Hi. Here are some suggestions.
If CGI mode, perhaps try the following:
#!/usr/bin/php4 -f [path]/search/admin/spider.php forceall >> /tmp/phpdig.log
If not CGI mode, and PHP can run anywhere, cd to the search dir and try the following:
php -f admin/spider.php forceall > phpdig.log
If this is the first time indexing, change forceall to http://www.domain.com
In the config file, change the following to one if updating before seven days have past:
define('LIMIT_DAYS',7); //default days before reindex a page
To start over and index from scratch, do the following:
empty all the PhpDig database tables
delete all files that may be in the temp dir
delete all files in the text_content dir except keepalive.txt
run spider.php from a browser or command prompt
Before running spider.php from the command prompt, in the config file, change the following to one if only one level is wanted:
define('SPIDER_MAX_LIMIT',20); //max recurse levels in sipder
define('SPIDER_DEFAULT_LIMIT',3); //default value
define('RESPIDER_LIMIT',4); //recurse limit for update
Originally posted by Charter
php -f admin/spider.php forceall > phpdig.log
If this is the first time indexing, change forceall to http://www.domain.com
Nothing, nothing happend. I take a look on spider.php source, and i think that the program hang on line 80:
print @exec('renice 18 '.getmypid()).$br;
I try also to clean the tables etc like you write; but the db stay empty, and the spider.php don't works.
Thanks a lot.
Rolandks
09-19-2003, 03:29 AM
Hmm,
command Line is something with difficulty. I also have many attempts until it works.
I think it shoult be change in the one of the next versions to work better with all Operating Systems, because it is important that it works fine, when you will indexing frequently Content Sites daily with Cron jobs or Windows Tasks.
Read this:
http://www.phpdig.net/showthread.php?s=&threadid=56
-Roland-
[...]
Read this:
http://www.phpdig.net/showthread.php?s=&threadid=56
-Roland- [/B]
I red this, but unfortunally don't help me ;) Now i'll try to hack a little the code... If you have other ideas, i'm here! :D
Thanks a lot
JS
Charter
09-19-2003, 07:06 AM
Hi. It looks like the renice command is working as 848: old priority 0, new priority 18 appears in the log file, but you could try commenting that line out. The renice command is for setting the priority of the spidering process.
Are there any files besides keepalive.txt in the text_content dir?
Originally posted by Charter
[...]
The renice command is for setting the priority of the spidering process.
Are there any files besides keepalive.txt in the text_content dir? [/B]
I commented out this line, but as how i write, nothing happend.
The text_content dir is empty (except keepalive.txt [2 b])
For now i've this solution: I use the lynx for call the function:
lynx -dump -auth=yourlogin:yourpwd '[URL]/pathtosearch/admin/update.php?site_id=XXX&exp=1' >/tmp/uotput 2>/tmp/erroroutput
this works :cool:
JS
Charter
09-19-2003, 07:50 AM
Originally posted by Skop
lynx -dump -auth=yourlogin:yourpwd '[URL]/pathtosearch/admin/update.php?site_id=XXX&exp=1' >/tmp/uotput 2>/tmp/erroroutput
Great! Glad it's working. Interesting that lynx will work but php won't. Are you able to do the following from the command prompt?
php -f test.php
where test.php is the below:
<?php
echo "test";
?>
Originally posted by Charter
php -f test.php
Hi, sorry for late answer. I try what you suggest to me, and works. I think the problem is the spider.php file, and how get the inputs from STDIN.
vBulletin® v3.7.3, Copyright ©2000-2024, Jelsoft Enterprises Ltd.