配置nagios监控整个集群
拓扑图
nagios定义脚本,非常灵活,可以用任何语言来写脚本监控你想要干的活,只需要给出参数和退出码
nagios有4大状态:
0: 正常,显示为ok
1: 警告, 显示为warning
2: 严重,显示为critical
3: 未知错误,显示为unkown
15-1 安装nagios
15-1-1 服务器安装nagios
#防火墙上传nagios安装包至内网nagios服务器
[root@nagios ~]# mkdir nagios
[root@firewall ~]# scp nagios-* 192.168.1.100:/root/nagios/
nagios-3.5.1.tar.gz 100% 1722KB 1.7MB/s 00:00
nagios-plugins-2.1.1.tar.gz 100% 2615KB 2.6MB/s 00:00
#安装依赖
[root@nagios ~]# yum install gcc glibc glibc-common php gd gd-devel libpng libmng libjpeg zlib httpd -y
#创建用户,添加额外的组
[root@nagios ~]# useradd nagios
[root@nagios ~]# groupadd nagcmd
[root@nagios ~]# usermod -G nagcmd nagios
[root@nagios ~]# usermod -G nagcmd apache
#开始编译
[root@nagios ~]# cd nagios/
[root@nagios nagios]# tar xf nagios-3.5.1.tar.gz
[root@nagios nagios]# cd nagios
[root@nagios nagios]# ./configure --with-command-group=nagcmd
.......
General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagcmd
Embedded Perl: no
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/rc.d/init.d
Apache conf.d directory: /etc/httpd/conf.d
Mail program: /bin/mail
Host OS: linux-gnu
Web Interface Options:
------------------------
HTML URL: http://localhost/nagios/
CGI URL: http://localhost/nagios/cgi-bin/
Traceroute (used by WAP): /bin/traceroute
Review the options above for accuracy. If they look okay,
type 'make all' to compile the main program and CGIs.
#出现上面的提示就表示编译成功,按提示make all 就可以了
[root@nagios nagios]# make all
.......
*** Support Notes *******************************************
If you have questions about configuring or running Nagios,
please make sure that you:
- Look at the sample config files
- Read the documentation on the Nagios Library at:
Nagios Library
before you post a question to one of the mailing lists.
Also make sure to include pertinent information that could
help others help you. This might include:
- What version of Nagios you are using
- What version of the plugins you are using
- Relevant snippets from your config files
- Relevant error messages from the Nagios log file
For more information on obtaining support for Nagios, visit:
Home
*************************************************************
Enjoy.
#提示上面的内容这可以安装了
[root@nagios nagios]# make install && make install-init && make install-commandmode && make install-config && make install-webconf
#为了检查mysql的状态,我们要安装mysql,安装完毕我们来把插件解压到目录
[root@nagios ~]# yum -y install mysql mysql-devel -y
[root@nagios nagios]# tar xf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
[root@nagios nagios]# cd !$
cd /usr/local/src/
[root@nagios src]# cd nagios-plugins-2.1.1/
[root@nagios nagios-plugins-2.1.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd
.........
config.status: creating po/POTFILES
config.status: creating po/Makefile
--with-apt-get-command:
--with-ping6-command: /bin/ping6 -n -U -w %d -c %d %s
--with-ping-command: /bin/ping -n -U -w %d -c %d %s
--with-ipv6: yes
--with-mysql: /usr/bin/mysql_config
--with-openssl: yes
--with-gnutls: no
--enable-extra-opts: yes
--with-perl: /usr/bin/perl
--enable-perl-modules: no
--with-cgiurl: /nagios/cgi-bin
--with-trusted-path: /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
--enable-libtap: no
[root@nagios nagios-plugins-2.1.1]# make && make install
[root@nagios nagios-plugins-2.1.1]# ls /usr/local/nagios/libexec/
check_apt check_file_age check_load check_ntp check_sensors check_uptime
check_breeze check_flexlm check_log check_ntp_peer check_simap check_users
check_by_ssh check_ftp check_mailq check_ntp_time check_smtp check_wave
check_clamd check_http check_mrtg check_nwstat check_spop negate
check_cluster check_icmp check_mrtgtraf check_oracle check_ssh urlize
check_dhcp check_ide_smart check_mysql check_overcr check_ssmtp utils.pm
check_dig check_ifoperstatus check_mysql_query check_ping check_swap utils.sh
check_disk check_ifstatus check_nagios check_pop check_tcp
check_disk_smb check_imap check_nntp check_procs check_time
check_dns check_ircd check_nntps check_real check_udp
check_dummy check_jabber check_nt check_rpc check_ups
#安装完成,开始设置密码登录
[root@nagios nagios-plugins-2.1.1]# htpasswd /usr/local/nagios/etc/htpasswd.users mafei0728
....
#真实机器无法访问内网,太麻烦,为了不影响实验,我们用虚拟Ip映射出去,我们将织梦网站端口改成8080(不赘述),避免冲突
[root@nagios nagios-plugins-2.1.1]# ifconfig eth2:1 192.168.1.105
#在防火墙上
[root@firewall /]# iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to 192.168.1.105
#在真实机器上通过外网地址可以登录如下,输入账号密码后
服务器配置完毕!!
15-1-2 客服端安装nagios
由于本作业只采用讲过的技术,所有采用ssh,部署,脚本也可以,安装包中附一键安装脚本,客服端需要安装nagios-plugins-2.1.1.tar.gz, nrpe-2.15.tar.gz
#在防火墙上将nagios-plugins-2.1.1.tar.gz,nrpe-2.15.tar.gz包传给所有的客服端(不赘述),批量操作,防火墙为第一视角
#解决依赖
[root@firewall ~]# yum install -y openssl openssl-devel gcc glibc glibc-common php gd gd-devel libpng libmng libjpeg zlib
#创建用户和组
[root@firewall ~]# useradd -s /sbin/nologin nagios
[root@firewall ~]# groupadd nagcmd
[root@firewall ~]# usermod -G nagcmd nagios
[root@firewall ~]# id nagios
uid=500(nagios) gid=500(nagios) groups=500(nagios),501(nagcmd)
#安装xinetd(服务端不安装)
[root@firewall ~]# yum install xinetd -y
#解压安装包
[root@firewall ~]# tar xf nagios-plugins-2.1.1.tar.gz -C /usr/local/src/
[root@firewall ~]# tar xf nrpe-2.15.tar.gz -C /usr/local/src/
[root@firewall ~]# cd !$
cd /usr/local/src/
#编译三部曲
[root@firewall nagios-plugins-2.1.1]# cd nagios-plugins-2.1.1/
[root@firewall nagios-plugins-2.1.1]# ./configure && make && make install
[root@firewall nagios-plugins-2.1.1]# cd ../nrpe-2.15
[root@firewall nrpe-2.15]# ./configure && make && make install
#服务端到此为止
#安装配置文件
[root@firewall ~]# cd /usr/local/src/nrpe-2.15
[root@firewall nrpe-2.15]# make install-daemon-config && make install-xinetd
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/etc
/usr/bin/install -c -m 644 -o nagios -g nagios sample-config/nrpe.cfg /usr/local/nagios/etc
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe
#修改配置文件
[root@firewall ~]# vim /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
flags = REUSE
socket_type = stream
port = 5666
wait = no
user = nagios
group = nagios
server = /usr/local/nagios/bin/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
only_from = 127.0.0.1,192.168.1.100 --------------------------------->加入nagios服务器地址
}
wq!
#将配置文件复制到每台客服端
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.50:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.51:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.52:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.53:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.201:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.202:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.203:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.204:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.205:/etc/xinetd.d/
nrpe 100% 476 0.5KB/s 00:00
[root@firewall ~]# scp /etc/xinetd.d/nrpe 192.168.1.140:/etc/xinetd.d/
nrpe
#注册端口,开启服务器
[root@firewall ~]# echo "nrpe 5666/tcp # NRPE" >> /etc/services
[root@dns nrpe-2.15]# /etc/init.d/xinetd start
Starting xinetd: [ OK ]
[root@dns nrpe-2.15]# netstat -anput |grep 5666
tcp 0 0 :::5666 :::* LISTEN 29620/xinetd
[root@dns nrpe-2.15]# chkconfig xinetd on
#服务端最后一步
[root@nagios ~]# cd /usr/local/src/nrpe-2.15/
[root@nagios nrpe-2.15]# make install-plugin && make install-daemon
cd ./src/ && make install-plugin
make[1]: Entering directory `/usr/local/src/nrpe-2.15/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/libexec
/usr/bin/install -c -m 775 -o nagios -g nagios check_nrpe /usr/local/nagios/libexec
make[1]: Leaving directory `/usr/local/src/nrpe-2.15/src'
cd ./src/ && make install-daemon
make[1]: Entering directory `/usr/local/src/nrpe-2.15/src'
/usr/bin/install -c -m 775 -o nagios -g nagios -d /usr/local/nagios/bin
/usr/bin/install -c -m 775 -o nagios -g nagios nrpe /usr/local/nagios/bin
make[1]: Leaving directory `/usr/local/src/nrpe-2.15/src'
[root@nagios nrpe-2.15]# ls /usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/libexec/check_nrpe
#定义监控外部的命令,加入下面此行
[root@nagios objects]# vim commands.cfg
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
#安装完毕
15-2 配置监控集群
- 通用监控项目有
- cpu负载,内存,ping,ssh,磁盘空间,总进程,SSH
- 特殊监控项目按具体服务器角色来定
-
严格按照三部曲
- 定义主机,定义服务,定义命令
#在nagios下插入定义主机,服务的配置文件 [root@nagios ~]# vim /usr/local/nagios/etc/nagios.cfg #加下下面三行 #Third stage cluster project evaluation monitoring cfg_file=/usr/local/nagios/etc/objects/hosts.cfg cfg_file=/usr/local/nagios/etc/objects/service.cfg cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
15-2-1 配置集群监控定义主机配置文件
#定义主机配置文件
[root@nagios ~]# vim /usr/local/nagios/etc/objects/hosts.cfg
#firewall.mafei0728.cn
define host{
use linux-server
hostgroup_name generic_server
host_name firewall
address 192.168.1.254
}
#dns.mafei0728.cn
define host{
use linux-server
hostgroup_name generic_server
host_name dns
address 192.168.1.140
}
#lvs1.mafei0728.cn
define host{
use linux-server
host_name lvs1
hostgroups lvs_keepalived,generic_server
address 192.168.1.201
}
#lvs2.mafei0728.cn
define host{
use linux-server
host_name lvs2
hostgroups lvs_keepalived,generic_server
address 192.168.1.202
}
#nfs.mafei0728.cn
define host{
use linux-server
hostgroup_name generic_server
host_name nfs
address 192.168.1.205
}
#apache1.mafei0728.cn
define host{
use linux-server
host_name apache1
hostgroups web_server,generic_server
address 192.168.1.203
}
#apache2.mafei0728.cn
define host{
use linux-server
host_name apache2
hostgroups web_server,generic_server
address 192.168.1.204
}
#atlas.mafei0728.cn
define host{
use linux-server
hostgroup_name generic_server
host_name atlas
address 192.168.1.53
}
#master.mafei0728.cn
define host{
use linux-server
host_name master
hostgroups mysql_server,generic_server
address 192.168.1.50
}
#slave1.mafei0728.cn
define host{
use linux-server
host_name slave1
hostgroups mysql_server,generic_server
address 192.168.1.51
}
#slave2.mafei0728.cn
define host{
use linux-server
host_name slave2
hostgroups mysql_server,generic_server
address 192.168.1.52
}
15-2-2 配置服务监控组
#定义keepalived+lvs组
define hostgroup{
hostgroup_name lvs_keepalived
members lvs1,lvs2
}
#定义web服务器组
define hostgroup{
hostgroup_name web_server
members apache1,apache2
}
#定义数据库组
define hostgroup{
hostgroup_name mysql_server
members master,slave1,slave2
}
#常规监控组
define hostgroup{
hostgroup_name generic_server
members firewall,dns,lvs1,lvs2,master,slave1,slave2,atlas,apache1,apache2,nfs
}
15-3 配置客户端监控配置(超多)
15-3-1 通用监控配置(所有客户端)
#修改配置文件(通用,全部)
[root@firewall ~]# vim /usr/local/nagios/etc/nrpe.cfg
......
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda ------------------>虚拟机只有一个分区,只好监视整个硬盘
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
.......
#复制到所有机器,重启服务
[root@firewall ~]# service xinetd restart
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]
15-3-2 keepalived,lvs监控配置
15-3-2-1 监控keepalived
#思路keepalivek启动有3个进程,不等于3个就报警
[root@lvs1 ~]# vim /usr/local/nagios/etc/nrpe.cfg
# The following examples use hardcoded command arguments...
......
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_swap]=/usr/local/nagios/libexec/check_procs -c 3:3 -C keepalived ----------------------->监控进程来判断
15-3-2-1 监控lvs的连接数
#我们只需要在lvs1上安装
#安装web服务
[root@lvs1 ~]# yum install httpd php -y
#下载画图工具,并安装
[root@lvs1 ~]# yum install rrdtool -y
#上传lvs-rrd包并解压到httpd网站根目录
[root@lvs1 ~]# tar xf lvs-rrd-v0.7.tar.gz -C /var/www/html
#修改配置文件
[root@lvs1 ~]# cd /var/www/html/lvs/
[root@lvs1 lvs]# ls
Changelog graph-lvs.sh graphs index.php lvs-rrd.php lvs.rrd.update README rrd
[root@lvs1 lvs]# vim lvs.rrd.update
# User set variables.
# Change these to match your system config.
RRDTOOL="/usr/bin/rrdtool"
IPVSADM="/sbin/ipvsadm"
WORKDIR="/var/www/html/lvs/rrd"
wq!
#修改画图脚本
[root@lvs1 lvs]# vim graph-lvs.sh
#!/bin/bash
# WORKDIR must match the directory used in the update script.
WORKDIR="/var/www/html/lvs/rrd"
RRDTOOL="/usr/bin/rrdtool"
# Where to put the graphs.
GRAPHS="/var/www/html/lvs/graphs"
WEBPATH="/lvs/graphs"
wq!
#修改php文件
[root@lvs1 lvs]# vim lvs-rrd.php
<?php
header("Cache-Control: max-age=300, must-revalidate");
system("/var/www/html/lvs/graph-lvs.sh -H");
?>
wq!
#定时收集
[root@lvs1 lvs]# crontab -e
*/20 * * * * /usr/sbin/ntpdate dns.mafei0728.cn >/dev/null 2>&1
* * * * * sh /var/www/html/lvs/lvs.rrd.update >/dev/null 2>&1
wq!
#开机启动服务
[root@lvs1 lvs]# service httpd start
Starting httpd:
[root@lvs1 lvs]# chkconfig httpd on
#在nagios监控主机中配置
[root@nagios objects]# vim hosts.cfg
#lvs1.mafei0728.cn
define host{
use linux-server
host_name lvs1
alias lvs1
hostgroups lvs_keepalived,generic_server
address 192.168.1.201
notes_url http://192.168.1.201/lvs
}
配置完毕查看效果
keepalived
lvs链接监控
15-3-3 监控nfs
#nfs监控,官方网站下有插件下载,在防火墙上上传安装包到nahios和nfs服务器(不赘述)\
#安装以nagios服务器一样,nfs服务器同意(nagios服务器可以不安装)
[root@nagios ~]# tar -xf monitoringplug-0.16.tar.gz -C /usr/local/src/
[root@nagios ~]# cd !$
cd /usr/local/src/
[root@nagios src]# cd monitoringplug-0.16/
[root@nagios monitoringplug-0.16]# ./configure --prefix=/usr/local/nagiosextend
[root@nagios monitoringplug-0.16]# make && make install
[root@nagios monitoringplug-0.16]# cd /usr/local/nagiosextend/lib/nagios/plugins/
[root@nagios plugins]# ls
check_bonding check_enforce check_mem check_mysql check_nrped check_sebool notify_mail
check_dhcp check_file check_memcached check_mysql_rows check_redis check_sockets notify_sms
check_dummy check_gsm_signal check_multipath check_nfs check_rpc_ping check_timeout notify_stdout
[root@nagios plugins]# cp check_nfs /usr/local/nagios/libexec/
#在nfs服务器在配置
[root@nfs libexec]# vim ../etc/nrpe.cfg
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_nfs]=/usr/local/nagios/libexec/check_nfs -H 127.0.0.1 -w 10s -c 3s
#在nagios服务器在配置
#测试正常
[root@nagios libexec]# ./check_nrpe -H 192.168.1.205 -c check_nfs
OK - mountd export by udp:mountdv3, tcp:mountdv3
#定义service.cfg,加入下面
[root@nagios objects]# vim service.cfg
#检查NFS
define service{
use local-service
host_name nfs
service_description nfs
check_command check_nrpe!check_keepalived
}
#检查配置文件
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 3.5.1
.....
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
#重启服务
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
#查看效果
#停掉服务看下
[root@nfs libexec]# service nfs stop
Shutting down NFS daemon: [ OK ]
Shutting down NFS mountd: [ OK ]
Shutting down NFS quotas: [ OK ]
Shutting down NFS services: [ OK ]
Shutting down RPC idmapd: [ OK ]
nfs配置完毕
15-3-4 监控web服务器
#web服务器监控异常简单,只需要在nagios服务器定义
[root@nagios objects]# vim commands.cfg
# 'check_http' command definition
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
[root@nagios objects]# vim service.cfg
# 'check_http' command definition
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use local-service
hostgroup_name web_server
service_description HTTP
check_command check_http!-p 8080 ------------------>端口前面有修改
notifications_enabled 0
}
#查看效果
配置完毕
15-3-4 监控mha和atlas
15-3-4-1 监控atlas
#监控atlas我么监控两个端口就行了1234,2345
#nagios服务器配置,加入下面多行
[root@nagios objects]# vim service.cfg
#检查atlas
define service{
use local-service
host_name atlas
service_description atlas_listen
check_command check_tcp!1234
}
define service{
use local-service
host_name atlas
service_description atlas_manager
check_command check_tcp!2345
}
#检查,重二部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
15-3-4-2 监控mha
#自定义脚本
[root@alats libexec]# vim ma.sh
#!/bin/bash
_status=`masterha_check_status --conf=/etc/masterha/app1.cnf|awk '{print $5}'|awk -F. '{print $4}'`
if [[ $_status -eq 50 ]]
then
echo "ha is running,the master is 192.168.1.50"
exit 0
elif [[ $_sttus -eq 51 ]]
then
echo "ha is running,the master is 192.168.1.51"
exit 0
elif [[ $_sttus -eq "" ]]
then
echo "ha is not running"
exit 2
else
echo "unknow error!!"
exit 3
fi
wq!
#修改配置文件
[root@alats libexec]# vim ../etc/nrpe.cfg
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 40% -c 20%
command[check_mha]=/usr/local/nagios/libexec/ma.sh
#服务器定义命令
#检查MHA
[root@nagios objects]# vim service.cfg
define service{
use local-service
host_name atlas
service_description mha
check_command check_nrpe!check_mha
}
#检查重启三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@nagios objects]# service httpd restart && service nagios restart
#查看状态.
15-3-5 监控mysql主从
15-3-5-1 mysql连接状态
#监控mysql状态
#命令
[root@nagios objects]# vim commands.cfg
# 'check_mysql'command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -P $ARG3$ -d $ARG4$
}
#服务
#检查mysql连接
define service{
use local-service
hostgroup_name mysql_server
service_description check_mysql
check_command check_mysql!mafei0728!mafei0728!3306!mafei0728
}
#三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
[root@nagios objects]# service httpd restart && service nagios restart
#效果后面跟主从一起看
15-3-5-2 mysql主从复制
#只能自定义脚本了
[root@nagios libexec]# vim check_mysql_slave_status.sh
#!/bin/sh
slave_status=($(mysql -umafei0728 -pmafei0728 -h $1 -e "show slave status\G"|grep "Running:" |awk '{print $2}'))
if [[ ${slave_status[0]} = Yes ]] && [[ ${slave_status[1]} = Yes ]]
then
echo "OK slave is running"
exit 0
else
echo "slave is error"
exit 2
fi
wq!
#修改命令
[root@nagios objects]# vim commands.cfg
# 'check_mysql_slave'command definition
define command{
command_name check_mysql_slave
command_line $USER1$/check_mysql_slave_status.sh $HOSTADDRESS$
}
#修改服务
[root@nagios objects]# vim service.cfg
#检查mysql主从复制
define service{
use local-service
host_name slave1,slave2
service_description check_mysql_slave_status
check_command check_mysql_slave_status
}
#三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios objects]# service httpd restart && service nagios restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
#查看效果
15-3-6 监控NTP,DNS服务器
#监控这些异常简单,监控端口就可以了
#配置命令
[root@nagios objects]# vim commands.cfg
# 'check_ntp' command definition
define command{
command_name check_ntp
command_line $USER1$/check_ntp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}
# 'check_dns' command definition
define command{
command_name check_dns
command_line $USER1$/check_dns -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}
#配置服务
[root@nagios objects]# vim service.cfg
#监控dns服务器
define service{
use local-service
host_name dns
service_description check_dns
check_command check_dns!1!3
}
#监控ntp
define service{
use local-service
host_name dns
service_description check_ntp
check_command check_dns!1!3
#测试重启三部曲
[root@nagios objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
[root@nagios libexec]# service httpd restart && service nagios restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
#看效果本地dns服务器有点卡
配置完毕,现在把所有命令和服务配置贴出来,并贴出监控通用的监控配置(硬盘,cpu负载,内存,用户,总进程)
#servce.cfg
# Define a service to "ping" the local machine
define service{
use local-service
hostgroup_name generic_server
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# < 10% free space on partition.
define service{
use local-service
hostgroup_name generic_server
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
# if > 50 users.
define service{
use local-service
hostgroup_name generic_server
service_description Current Users
check_command check_local_users!20!50
}
# > 400 users.
define service{
use local-service
hostgroup_name generic_server
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
# Define a service to check the load on the local machine
define service{
use local-service
hostgroup_name generic_server
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service
hostgroup_name generic_server
service_description Swap Usage
check_command check_local_swap!20!10
}
# 'check_http' command definition
# Disable notifications for this service by default, as not all users may have HTTP enabled.
define service{
use local-service
hostgroup_name web_server
service_description HTTP
check_command check_http!-p 8080
notifications_enabled 0
}
###############################################################################
#检查keepalived服务器
define service{
use local-service
hostgroup_name lvs_keepalived
service_description keepalived
check_command check_nrpe!check_keepalived
}
#检查NFS
define service{
use local-service
host_name nfs
service_description nfs
check_command check_nrpe!check_nfs
}
#检查atlas
define service{
use local-service
host_name atlas
service_description atlas_listen
check_command check_tcp!1234
}
define service{
use local-service
host_name atlas
service_description atlas_manager
check_command check_tcp!2345
}
#检查MHA
define service{
use local-service
host_name atlas
service_description mha
check_command check_nrpe!check_mha
}
#检查mysql连接
define service{
use local-service
hostgroup_name mysql_server
service_description check_mysql
check_command check_mysql!mafei0728!mafei0728!3306!mafei0728
}
#检查mysql主从复制
define service{
use local-service
host_name slave1,slave2
service_description check_mysql_slave_status
check_command check_mysql_slave_status
}
#监控dns服务器
define service{
use local-service
host_name dns
service_description check_dns
check_command check_dns!1!3
}
#监控ntp
define service{
use local-service
host_name dns
service_description check_ntp
check_command check_ntp!1!3
}
#commmands.cfg
# 'check_local_disk' command definition
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
# 'check_local_load' command definition
define command{
command_name check_local_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}
# 'check_local_procs' command definition
define command{
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}
# 'check_local_users' command definition
define command{
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}
# 'check_local_swap' command definition
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
# 'check_local_mrtgtraf' command definition
define command{
command_name check_local_mrtgtraf
command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
}
################################################################################
# NOTE: The following 'check_...' commands are used to monitor services on
# both local and remote hosts.
################################################################################
# 'check_ftp' command definition
define command{
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}
# 'check_hpjd' command definition
define command{
command_name check_hpjd
command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
}
# 'check_snmp' command definition
define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}
# 'check_http' command definition
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
# 'check_ssh' command definition
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
# 'check_dhcp' command definition
define command{
command_name check_dhcp
command_line $USER1$/check_dhcp $ARG1$
}
# 'check_ping' command definition
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}
# 'check_pop' command definition
define command{
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}
# 'check_imap' command definition
define command{
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}
# 'check_smtp' command definition
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}
# 'check_tcp' command definition
define command{
command_name check_tcp
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
# 'check_udp' command definition
define command{
command_name check_udp
command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
# 'check_ntp' command definition
define command{
command_name check_ntp
command_line $USER1$/check_ntp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}
# 'check_dns' command definition
define command{
command_name check_dns
command_line $USER1$/check_dns -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$
}
#
# SAMPLE PERFORMANCE DATA COMMANDS
#
# These are sample performance data commands that can be used to send performance
# data output to two text files (one for hosts, another for services). If you
# plan on simply writing performance data out to a file, consider using the
# host_perfdata_file and service_perfdata_file options in the main config file.
#
################################################################################
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}
# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
# 'check_mysql'command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -P $ARG3$ -d $ARG4$
}
# 'check_mysql_slave'command definition
define command{
command_name check_mysql_slave_status
command_line $USER1$/check_mysql_slave_status.sh $HOSTADDRESS$
}
15-3-7 配置短信报警
#查看默认邮件工具,我们安装sendmail发送
[root@nagios ~]# yum install sendmail -y
[root@nagios ~]# service postfix stop
Shutting down postfix: [ OK ]
[root@nagios ~]# chkconfig postfix off
[root@nagios ~]# service sendmail restart
Shutting down sm-client: [ OK ]
Shutting down sendmail: [FAILED]
Starting sendmail: [ OK ]
Starting sm-client: [ OK ]
[root@nagios ~]# alternatives --config mta
There are 2 programs which provide 'mta'.
Selection Command
-----------------------------------------------
1 /usr/sbin/sendmail.postfix
*+ 2 /usr/sbin/sendmail.sendmail
Enter to keep the current selection[+], or type selection number: 2
#测试邮件和短信是否正常
[root@nagios ~]# echo "hello,this is a test message" >11.11
[root@nagios ~]# mail -s 'hello mafei0728' 17689215817@wo.cn <11.11
#修改配置文件
define contact{
contact_name nagiosadmin ; Short name of user
use generic-contact ; Inherit default values from generic-contact template (defined above)
alias Nagios Admin ; Full name of user
email 1768921xxxxx@wo.cn ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
} (此处就不暴露我的手机邮箱了)
wq!
#测试重启,三部曲
[root@nagios ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Checking misc settings...
Total Warnings: 0
Total Errors: 0
[root@nagios ~]# service httpd restart && service nagios restart
Stopping httpd: [ OK ]
Starting httpd: [ OK ]
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
[root@nagios ~]#
#测试报警是否正常,我们关闭ntp服务
[root@dns ~]# service ntpd stop
Shutting down ntpd: [ OK ]
#收到短信在开启来
[root@dns ~]# service ntpd restart
Shutting down ntpd: [FAILED]
Starting ntpd: [ OK ]
ntp服务器短息报警和恢复截图
ntp服务器邮件报警截图
nagios报警配置完毕,现在nagios监控整个集群的项目就做完看
监控配置完毕,现在看下效果图
监控配置完毕,现在看下效果图
I truly appreciate this post. I have been looking everywhere for this! Thank goodness I found it on Bing. You have made my day! Thank you again!
I like this post, enjoyed this one regards for putting up. “Money is a poor man’s credit card.” by Herbert Marshall McLuhan.
Someone necessarily lend a hand to make seriously posts I would state. That is the very first time I frequented your web page and so far? I surprised with the analysis you made to create this particular publish extraordinary. Wonderful job!
guximn,Definitely believe that which you said. Your favourite justification appeared to be on the net the simplest thing to remember of.
ksqsjfgl,This website truly has alll of the information and facts I wanted about this subject and didn?t know who to ask.
iewdxckszng,Your blog was informative and valuable to me. Thanks for sharing.
ffqvjqsnz,Thanks for ones marvelous posting! I actually enjoyed reading it, you will be a great author.I will always bookmark your blog and will bevtfqd,come back from now on. I want to encourage that you continue your great writing, have a nice afternoon!
opuaknsjkat,Thanks for sharing such an amazing blog. I am so happy found this informative blog.
I simply want to mention I am just very new to weblog and actually loved your web-site. Probably I’m want to bookmark your site . You certainly come with very good articles and reviews. Bless you for sharing your web site.