PhpDig.net - Can I close Putty during command line indexing?

PhpDig.net (http://www.phpdig.net/forum/index.php)

- How-to Forum (http://www.phpdig.net/forum/forumdisplay.php?f=33)

- - Can I close Putty during command line indexing? (http://www.phpdig.net/forum/showthread.php?t=431)

Nosmada

01-22-2004 08:54 PM

Can I close Putty during command line indexing?

Can I close Putty during command line indexing or will it stop indexing? This is what I do with perl scripts:

nohup perl nph-build.cgi --all > log.txt &

[the nohup command means it keeps running even if you hang up and come back later - log.txt & - that is where the output goes]

tail -n50 -f log.txt

[this line gets you back into the log to see what is going on in realtime]

Can I use the same commands in php? Do I need to? How does it work?

Nosmada

01-22-2004 09:31 PM

I just closed Putty and am not sure if it is still running. I have 45,000 pages. If I run it continuously it will take almost 4 days.

What should I do? I guess I should adjust some of the settings below but don't quite understand what they do and what the impact will be on the search results?

define('SPIDER_MAX_LIMIT',20); //max recurse levels in spider
define('SPIDER_DEFAULT_LIMIT',3); //default value
define('RESPIDER_LIMIT',4); //recurse limit for update

define('LIMIT_DAYS',7); //default days before reindex a page

Charter

01-23-2004 11:05 AM

Hi. The $limit variable is set to the number selected in the drop down box via the admin panel for search depth. The numbers in this drop down box go from zero to SPIDER_MAX_LIMIT. If running from shell or updating a site via the admin panel, $limit is set according to the following code:

PHP Code:


		
			
if (!isset($limit) or (int)$limit > SPIDER_MAX_LIMIT) {

 if ($run_mode != 'cgi') {

    $limit = RESPIDER_LIMIT;

 }

 else {

    $limit = SPIDER_MAX_LIMIT;

 }

}

The $run_mode variable is set to cgi if indexing from shell and is set to http if indexing from the browser interface. The SPIDER_DEFAULT_LIMIT is currently not used in the code other than as a defined constant. The LIMIT_DAYS is the number of days that should pass before a page is reindexed.

You can use shell commands to run PhpDig, but if you wish to use shell commands via PHP, check out this page for various methods.

Nosmada

01-23-2004 11:18 AM

Okay looked through that whole page and found the following for shell command to make it run in the background.

php -q foobar.php >/dev/null 2>&1

When I log back in where do I go to get the realtime output log or do I have to add more code onto the end? What are those parameters on the end? Like /dev/null/ 2>&1.

Charter

01-23-2004 11:46 AM

The '> /dev/null 2>&1' redirects STDOUT and STRERR to /dev/null so if you want to background something use '> /dev/null 2>&1 &' and then look in /dev/null for the results. Alternatively, you can try using 'php -f spider.php [option] > phpdiglog.txt &' from the admin directory and then check the phpdiglog.txt file in the admin directory for output. Available options are listed here.

Nosmada

01-23-2004 11:56 AM

...and then look in /dev/null for the results or phpdig.log.

How do I call this from the command line when I log back in. Cgi would be something like tail -n50 -f phpdig.log?

And thanks for explaining the depth parametes but - just one more suggestion from you should help. So with 45,000 pages, how would you deal with this. Probably comment out some or all of the shared borders: header, footer, left-side, right side? And maybe limit the dept. I don't know what to limit to and what I will lose. I know where to limit now from your explanation, just don't know what to set it to and what exactly that means in terms of what is gained and what is lost?

Charter

01-23-2004 02:23 PM

Hi. Perhaps try using 'php -f spider.php [option] > phpdiglog.txt &' from the admin directory and then 'tail -f phpdiglog.txt' from the admin directory for output.

As for what to set in the config file, that is up to you, but below is an example of search depth, assuming a simple five-page site with the following link structure:

Code:

pageA.html

-- pageA1.html

   -- pageA11.html

-- pageA2.html

   -- pageA21.html

Search Depth Zero: pageA.html indexed
Search Depth One: pageA.html, pageA1.html, pageA2.html indexed
Search Depth Two: pageA.html, pageA1.html, pageA2.html, pageA11.html, pageA21.html indexed

Nosmada

01-23-2004 03:46 PM

Thanks Charter I will give your command syntax a go and play around with the depth.

All times are GMT -8. The time now is 03:50 PM.