Skip to content
Skip to breadcrumbs
Skip to header menu
Skip to action menu
Skip to quick search
Quick Search
Browse
Pages
Blog
Labels
Attachments
Mail
Advanced
What’s New
Space Directory
Feed Builder
Keyboard Shortcuts
Confluence Gadgets
Log In
Sign Up
Dashboard
Bruce - Postgres Replication
Copy Page
You are not logged in. Any changes you make will be marked as
anonymous
. You may want to
Log In
if you already have an account. You can also
Sign Up
for a new account.
This page is being edited by
.
Paragraph
Paragraph
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Preformatted
Quote
Bold
Italic
Underline
More colours
Strikethrough
Subscript
Superscript
Monospace
Clear Formatting
Bullet list
Numbered list
Outdent
Indent
Align left
Align center
Align right
Link
Table
Insert
Insert Content
Image
Link
Attachment
Symbol
Emoticon
Wiki Markup
Horizontal rule
tinymce.confluence.insert_menu.macro_desc
Info
JIRA Issue
Status
Gallery
Tasklist
Table of Contents
Other Macros
Page Layout
No Layout
Two column (simple)
Two column (simple, left sidebar)
Two column (simple, right sidebar)
Three column (simple)
Two column
Two column (left sidebar)
Two column (right sidebar)
Three column
Three column (left and right sidebars)
Undo
Redo
Find/Replace
Keyboard Shortcuts Help
<h3>Monitoring Guide</h3> <p>This document outlines usage of <a href="http://www.nagios.org/">Nagios</a> to monitor the health of the Bruce replication system. A familiarity with administering Nagios is presumed of the reader.</p> <h5>Monitoring for the Bruce daemon</h5> <p>The check_procs Nagios plugin is used to monitor for one (and only one) Bruce daemon running on a server. The check_procs plugin is included with the "Official Nagios Plugins", and is likely already installed on a running Nagios system.</p> <p>The steps to monitor for a Bruce daemon on the same host as Nagios:</p> <ul> <li>Setup 'check_bruce_daemon' command.</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # 'check_bruce_daemon' command definition define command { command_name check_bruce_daemon command_line $USER1$/check_procs -w 1:1 -c 1:1 -a com.netblue.bruce.Main } </pre></td></tr></table> <ul> <li>Setup a service that will check for the replication daemon</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # Define a service that will check for the replication daemon define service { use local-service ; service template to user host_name localhost service_description bruce daemon check_command check_bruce_daemon } </pre></td></tr></table> <h5>Monitoring for a database</h5> <p>In this example, we are monitoring for the presence of a bruce_config database. The distribution included check_pgsql plugin is used:</p> <ul> <li>Setup check_pgserver command</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # 'check_pgserver' command definition define command { command_name check_pgserver command_line $USER1$/check_pgsql -H $HOSTADDRESS$ -P $ARG1$ -d $ARG2$ } </pre></td></tr></table> <ul> <li>and set up a service to monitor for the database</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # Define a service to monitor for the bruce_config postgresql server and database define service { use local-service ; service template to use host_name localhost service_description bruce_config DB ; port database name check_command check_pgserver!5432!bruce_config } </pre></td></tr></table> <h5>Monitoring for replication lag</h5> <p>The custom check_bruce_lag plugin is used in this monitor. To create this plugin, change directories to the Nagios plugins directory (often either /usr/lib/nagios/plugins or /usr/local/nagios/libexec), and execute this script:</p> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> cat >check_bruce_lag <<EOF #!/bin/bash export HOST=$1 export TYPE=$2 # 'MASTER' or 'SLAVE' export CLUSTER=$3 export PORT=$4 export DB=$5 declare -i warn crit lag warn=$6 crit=$7 lag_rc=-1 if [ "$TYPE" == "MASTER" ] ; then lag=`psql -t -h $HOST -c "select round(extract(epoch from now()) - extract(epoch from update_time)) from bruce.snapshotlog where current_xaction = (select max(current_xaction) from bruce.snapshotlog)" -p $PORT $DB` lag_rc=$? fi if [ "$TYPE" == "SLAVE" ] ; then lag=`psql -t -h $HOST -c "select round(extract(epoch from now()) - extract(epoch from update_time)) from bruce.slavesnapshotstatus where clusterid = $CLUSTER" -p $PORT $DB` lag_rc=$? fi if [ "$lag_rc" != "0" ] ; then printf "UNKNOWN error $lag_rc\n" exit 3 fi if (($lag>=$crit)) then printf "CRIT Bruce Replication Lag %ds\n" $lag exit 2 fi if (($lag>=$warn)) then printf "WARN Bruce Replication Lag %ds\n" $lag exit 1 fi printf "OK Bruce Replication Lag %ds\n" $lag exit 0 EOF chmod a+x check_bruce_lag </pre></td></tr></table> <ul> <li>Setup a command for monitoring</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # 'check_bruce_lag' command definition define command { command_name check_bruce_lag command_line $USER1$/check_bruce_lag $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ 60 300 } </pre></td></tr></table> <ul> <li>Service to monitor a MASTER database</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # Define a service to monitor for snapshot lag on the bruce_master database define service { use local-service ; service template to use host_name localhost service_description bruce_master lag ; clusterid ; type | port database name check_command check_bruce_lag!MASTER!1!5432!bruce_master } </pre></td></tr></table> <ul> <li>Service to monitor a SLAVE database</li> </ul> <table class="wysiwyg-macro" data-macro-name="noformat" style="background-image: url(/plugins/servlet/confluence/placeholder/macro-heading?definition=e25vZm9ybWF0fQ&locale=en_GB&version=2); background-repeat: no-repeat;" data-macro-body-type="PLAIN_TEXT"><tr><td class="wysiwyg-macro-body"><pre> # Define a service to monitor for snapshot lag on the bruce_slave_02 database define service { use local-service ; service template to use host_name localhost service_description bruce_slave_02 lag ; clusterid ; type | port database name check_command check_bruce_lag!SLAVE!1!5432!bruce_slave_02 } </pre></td></tr></table>
Please type the word appearing in the picture.
Attachments
Labels
Location
Watch this page
< Edit
Preview >
Loading…
Save
Cancel
Next hint
search
attachments
weblink
advanced