PhpDig.net

Go Back   PhpDig.net > PhpDig Forums > How-to Forum

Reply
 
Thread Tools
Old 01-22-2004, 08:54 PM   #1
Nosmada
Orange Mole
 
Join Date: Dec 2003
Posts: 32
Can I close Putty during command line indexing?

Can I close Putty during command line indexing or will it stop indexing? This is what I do with perl scripts:


nohup perl nph-build.cgi --all > log.txt &

[the nohup command means it keeps running even if you hang up and come back later - log.txt & - that is where the output goes]

tail -n50 -f log.txt

[this line gets you back into the log to see what is going on in realtime]

Can I use the same commands in php? Do I need to? How does it work?
__________________
Nosmada
Nosmada is offline   Reply With Quote
Old 01-22-2004, 09:31 PM   #2
Nosmada
Orange Mole
 
Join Date: Dec 2003
Posts: 32
I just closed Putty and am not sure if it is still running. I have 45,000 pages. If I run it continuously it will take almost 4 days.

What should I do? I guess I should adjust some of the settings below but don't quite understand what they do and what the impact will be on the search results?

define('SPIDER_MAX_LIMIT',20); //max recurse levels in spider
define('SPIDER_DEFAULT_LIMIT',3); //default value
define('RESPIDER_LIMIT',4); //recurse limit for update

define('LIMIT_DAYS',7); //default days before reindex a page
__________________
Nosmada
Nosmada is offline   Reply With Quote
Old 01-23-2004, 11:05 AM   #3
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. The $limit variable is set to the number selected in the drop down box via the admin panel for search depth. The numbers in this drop down box go from zero to SPIDER_MAX_LIMIT. If running from shell or updating a site via the admin panel, $limit is set according to the following code:
PHP Code:
if (!isset($limit) or (int)$limit SPIDER_MAX_LIMIT) {
 if (
$run_mode != 'cgi') {
    
$limit RESPIDER_LIMIT;
 }
 else {
    
$limit SPIDER_MAX_LIMIT;
 }

The $run_mode variable is set to cgi if indexing from shell and is set to http if indexing from the browser interface. The SPIDER_DEFAULT_LIMIT is currently not used in the code other than as a defined constant. The LIMIT_DAYS is the number of days that should pass before a page is reindexed.

You can use shell commands to run PhpDig, but if you wish to use shell commands via PHP, check out this page for various methods.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-23-2004, 11:18 AM   #4
Nosmada
Orange Mole
 
Join Date: Dec 2003
Posts: 32
Okay looked through that whole page and found the following for shell command to make it run in the background.

php -q foobar.php >/dev/null 2>&1

When I log back in where do I go to get the realtime output log or do I have to add more code onto the end? What are those parameters on the end? Like /dev/null/ 2>&1.
__________________
Nosmada
Nosmada is offline   Reply With Quote
Old 01-23-2004, 11:46 AM   #5
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
The '> /dev/null 2>&1' redirects STDOUT and STRERR to /dev/null so if you want to background something use '> /dev/null 2>&1 &' and then look in /dev/null for the results. Alternatively, you can try using 'php -f spider.php [option] > phpdiglog.txt &' from the admin directory and then check the phpdiglog.txt file in the admin directory for output. Available options are listed here.
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-23-2004, 11:56 AM   #6
Nosmada
Orange Mole
 
Join Date: Dec 2003
Posts: 32
...and then look in /dev/null for the results or phpdig.log.

How do I call this from the command line when I log back in. Cgi would be something like tail -n50 -f phpdig.log?

And thanks for explaining the depth parametes but - just one more suggestion from you should help. So with 45,000 pages, how would you deal with this. Probably comment out some or all of the shared borders: header, footer, left-side, right side? And maybe limit the dept. I don't know what to limit to and what I will lose. I know where to limit now from your explanation, just don't know what to set it to and what exactly that means in terms of what is gained and what is lost?
__________________
Nosmada
Nosmada is offline   Reply With Quote
Old 01-23-2004, 02:23 PM   #7
Charter
Head Mole
 
Charter's Avatar
 
Join Date: May 2003
Posts: 2,539
Hi. Perhaps try using 'php -f spider.php [option] > phpdiglog.txt &' from the admin directory and then 'tail -f phpdiglog.txt' from the admin directory for output.

As for what to set in the config file, that is up to you, but below is an example of search depth, assuming a simple five-page site with the following link structure:
Code:
pageA.html
-- pageA1.html
   -- pageA11.html
-- pageA2.html
   -- pageA21.html
  • Search Depth Zero: pageA.html indexed
  • Search Depth One: pageA.html, pageA1.html, pageA2.html indexed
  • Search Depth Two: pageA.html, pageA1.html, pageA2.html, pageA11.html, pageA21.html indexed
__________________
Responses are offered on a voluntary if/as time is available basis, no guarantees. Double posting or bumping threads will not get your question answered any faster. No support via PM or email, responses not guaranteed. Thank you for your comprehension.
Charter is offline   Reply With Quote
Old 01-23-2004, 03:46 PM   #8
Nosmada
Orange Mole
 
Join Date: Dec 2003
Posts: 32
Thanks Charter I will give your command syntax a go and play around with the depth.
__________________
Nosmada
Nosmada is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
command line indexing that actually works carlaron Troubleshooting 0 11-06-2006 08:48 PM
Excellent results using Putty SSH client and nohup command claudiomet How-to Forum 2 09-30-2004 02:17 PM
Command line vs. admin indexing wx3 Troubleshooting 8 09-08-2004 12:31 AM
Indexing by command line... Canadian How-to Forum 4 01-04-2004 06:44 PM
Indexing by command line interface Skop Troubleshooting 8 10-14-2003 02:23 AM


All times are GMT -8. The time now is 07:44 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright © 2001 - 2005, ThinkDing LLC. All Rights Reserved.