PDA

View Full Version : Indexing by command line interface


Skop
09-17-2003, 11:59 PM
Hi,

i installed phpdig 1.6.2 in a linux machine and now i'm trying to index by command line.

/usr/bin/php4 -f [path]/search/admin/spider.php forceall >> /tmp/phpdig.log

nothing happend! the phpdig.log includes something like

848: old priority 0, new priority 18

and the indexing (reindexing of existing hosts) doesn't work.

Some ideas?

Thanks a lot.
JS

Charter
09-18-2003, 07:19 AM
Hi. Here are some suggestions.


If CGI mode, perhaps try the following:

#!/usr/bin/php4 -f [path]/search/admin/spider.php forceall >> /tmp/phpdig.log

If not CGI mode, and PHP can run anywhere, cd to the search dir and try the following:

php -f admin/spider.php forceall > phpdig.log

If this is the first time indexing, change forceall to http://www.domain.com


In the config file, change the following to one if updating before seven days have past:

define('LIMIT_DAYS',7); //default days before reindex a page

To start over and index from scratch, do the following:

empty all the PhpDig database tables
delete all files that may be in the temp dir
delete all files in the text_content dir except keepalive.txt
run spider.php from a browser or command prompt

Before running spider.php from the command prompt, in the config file, change the following to one if only one level is wanted:

define('SPIDER_MAX_LIMIT',20); //max recurse levels in sipder
define('SPIDER_DEFAULT_LIMIT',3); //default value
define('RESPIDER_LIMIT',4); //recurse limit for update

Skop
09-19-2003, 01:17 AM
Originally posted by Charter


php -f admin/spider.php forceall > phpdig.log

If this is the first time indexing, change forceall to http://www.domain.com



Nothing, nothing happend. I take a look on spider.php source, and i think that the program hang on line 80:

print @exec('renice 18 '.getmypid()).$br;

I try also to clean the tables etc like you write; but the db stay empty, and the spider.php don't works.

Thanks a lot.

Rolandks
09-19-2003, 03:29 AM
Hmm,
command Line is something with difficulty. I also have many attempts until it works.
I think it shoult be change in the one of the next versions to work better with all Operating Systems, because it is important that it works fine, when you will indexing frequently Content Sites daily with Cron jobs or Windows Tasks.

Read this:
http://www.phpdig.net/showthread.php?s=&threadid=56

-Roland-

Skop
09-19-2003, 05:20 AM
[...]
Read this:
http://www.phpdig.net/showthread.php?s=&threadid=56

-Roland- [/B]

I red this, but unfortunally don't help me ;) Now i'll try to hack a little the code... If you have other ideas, i'm here! :D

Thanks a lot
JS

Charter
09-19-2003, 07:06 AM
Hi. It looks like the renice command is working as 848: old priority 0, new priority 18 appears in the log file, but you could try commenting that line out. The renice command is for setting the priority of the spidering process.

Are there any files besides keepalive.txt in the text_content dir?

Skop
09-19-2003, 07:37 AM
Originally posted by Charter
[...]
The renice command is for setting the priority of the spidering process.

Are there any files besides keepalive.txt in the text_content dir? [/B]

I commented out this line, but as how i write, nothing happend.

The text_content dir is empty (except keepalive.txt [2 b])

For now i've this solution: I use the lynx for call the function:


lynx -dump -auth=yourlogin:yourpwd '[URL]/pathtosearch/admin/update.php?site_id=XXX&exp=1' >/tmp/uotput 2>/tmp/erroroutput

this works :cool:

JS

Charter
09-19-2003, 07:50 AM
Originally posted by Skop

lynx -dump -auth=yourlogin:yourpwd '[URL]/pathtosearch/admin/update.php?site_id=XXX&exp=1' >/tmp/uotput 2>/tmp/erroroutput



Great! Glad it's working. Interesting that lynx will work but php won't. Are you able to do the following from the command prompt?

php -f test.php

where test.php is the below:

<?php
echo "test";
?>

Skop
10-14-2003, 02:23 AM
Originally posted by Charter

php -f test.php



Hi, sorry for late answer. I try what you suggest to me, and works. I think the problem is the spider.php file, and how get the inputs from STDIN.