IPMI Sensors - Nagios Plugin

Basic information

Installation

- include_tasks: ipmi-sensors-plugin.yml
  when: ipmi_sensor_plugin.stat.exists == false
- include_tasks: ipmi-plugin-status.yml

This taskfile is included only when the file /usr/local/nagios/libexec/check_ipmi_sensor doesn’t exist. This state is registered in nagios-plugins-installed.yml, with the module stat.

More information in the section Installing Nagios Plugins.

The installation process consists on cloning the git repository and copying the file check_ipmi_sensor to the Nagios plugins directory.

---
- name: Assure existence of tempDir
  file:
    path: "{{ temp_dir }}/check_ipmi_sensor_v3"
    state: directory

- name: Clone git repo in temp dir
  git:
    repo: "{{ ipmi_plugin_url }}"
    clone: yes
    dest: "{{ temp_dir }}/check_ipmi_sensor_v3"

- name: Copy the IPMI sensors plugin
  copy:
    src: "{{ temp_dir }}/check_ipmi_sensor_v3/check_ipmi_sensor"
    dest: /usr/local/nagios/libexec/check_ipmi_sensor

After installing or not the plugin, the taskfile ipmi-plugin-status.yml is executed, checking the owner, the group and the permissions over the plugin.

---
- name: Correct setup of the executable
  file:
    path: /usr/local/nagios/libexec/check_ipmi_sensor
    owner: nagios
    group: nagcmd
    mode: 0550

Configuration

---
- name: Assures existence of ipmi-config Directory
  file:
    path: /etc/ipmi-config/
    state: directory

- name: Syncronize IPMI configuration
  template:
    src: "etc/ipmi-config/{{ item }}.j2"
    dest: "/etc/ipmi-config/{{ item }}"
    owner: root
    group: nagcmd
    mode: 0640
  with_items:
    - ipmi-ilo.cfg
    - ipmi-dell.cfg
  notify:
    - nagios_restart

Synchronizes the ipmi-config file with the version present in the repo. The passwords are cyphered with Ansible Vault.

Usage

The following steps are required for setting up this plugin in a specific host:

  1. Add the attribute _ipmi_ip in the host definition. This attribute is required by the check_ipmi_sensors plugin. The attribute _ipmi_excluded_sensors is necessary only when the error IPMI Status: Critical [Presence = Critical, Presence = Critical] occurs.

    define host{
      host_name                host_1
          address                  192.168.1.1
          _ipmi_ip                 192.168.1.1
          _ipmi_excluded_sensors   56
    

    Note

    The names of these variables start with an underscore and are in lowercase. More info about the usage of custom object variables [1] .

  2. Add the command definition. In this implementation, the command is added in /usr/local/nagios/etc/objects/commands.cfg

    define command {
      command_name  check_ipmi_sensor
      command_line  $USER1$/check_ipmi_sensor -H $_HOSTIPMI_IP$ -f $ARG1$ -x $_HOSTIPMI_EXCLUDED_SENSORS$ $ARG2$ $ARG3$
    }
    
  3. Add the service definition. In this implementation, the service is added in /usr/local/nagios/etc/objects/common-services.cfg

    Note

    If you want to ignore the SEL log entries warning, add the flag –nosel in the check_command field (See example below)

    The plugin can be configured for checking each sensor type independently:

    define service{
      use                  generic-service
      host_name            host1
      service_description  IPMI
      check_command        check_ipmi_sensor!/etc/ipmi-config/ipmi.cfg!--nosel!-T <sensor_type>
    }
    

    Note

    The sensor types are listed in the page: IPMI Sensor Types [2]

    Or configured for checking everything in one Service definition:

    define service{
      use                  generic-service
      host_name            host1
      service_description  IPMI
      check_command        check_ipmi_sensor!/etc/ipmi-config/ipmi.cfg
    }
    

    Note

    If the IPMI plugin is configured for multiple nodes and there is not a common user/password between them, you can configure one service per each different credential, defining different ipmi-config files.

    define service{
      use                  generic-service
      host_name            host1
      service_description  IPMI
      check_command        check_ipmi_sensor!/etc/ipmi-config/file1.cfg
    }
    
        define service{
      use                  generic-service
      host_name            host2
      service_description  IPMI
      check_command        check_ipmi_sensor!/etc/ipmi-config/file2.cfg
    }
    

    Note

    The user used for this IPMI monitoring doesn’t need special permissions.

  4. Create the file with the credentials and with the correct permissions.

    username user
    password passw0rd
    privilege-level user
    ipmi-sensors-interpret-oem-data on
    
    • Owner: nagios

    • Group: nagcmd

    • Mode: 0640

    Note

    Read [3] for more information about freeIPMI configuration file.

Troubleshooting

IPMI Status: Critical [X system event log (SEL) entries present]

  1. Read System Entry Logs before deleting them. It’s important to see if there is a bad behavior registered in these logs.

    ipmiutil sel -N (host_ip|hostname) -F lan2 -U user -P passwd

  2. Clear System Entry Logs with the credentials of a user with enough privileges. ipmiutil sel -d -N (host_ip|hostname) -F lan2 -U user -P passwd

Note

The password should be written between apostrophes () if contains special characters.

IPMI Status: Critical [Presence = Critical, Presence = Critical]

  1. Execute the following command to identify which sensors are absent.

    check_ipmi_sensor -H <Host-IP> -f <Archivo de configuración> -vvv | grep Critical

    Example of STOUT:

    ID  | Name     | Type            | State    | Reading | Units | Lower NR  | Lower C | Lower NC | Upper NC | Upper C | Upper NR | Event
    56  | Presence | Entity Presence | Critical |   N/A   |  N/A  |    N/A    |   N/A   |    N/A   |    N/A   |   N/A   |    N/A   | 'Entity Absent'
    58  | Presence | Entity Presence | Critical |   N/A   |  N/A  |    N/A    |   N/A   |    N/A   |    N/A   |   N/A   |    N/A   | 'Entity Absent'
    IPMI Status: Critical [Presence = Critical ('Entity Absent'), Presence = Critical ('Entity Absent')] | 'Inlet Temp'=17.00;3.00:42.00;-7.00:47.00 'CPU Usage'=100.00;~:101.00; 'IO Usage'=0.00;~:101.00;
    'MEM Usage'=0.00;~:101.00; 'SYS Usage'=100.00;~:101.00; 'Pwr Consumption'=320.00;~:452.00;~:540.00 'Current'=1.50 'Temp'=80.00 'Temp'=65.00
    Presence = 'Entity Absent' (Status: Critical)
    Presence = 'Entity Absent' (Status: Critical)
    
  2. Add the attribute _ipmi_excluded_sensors which value is a comma-separated list of sensor IDs that contain the absent sensors discovered.

    Example:

    define host{
      host_name               host-example
      address                 0.0.0.0
      _ipmi_ip                0.0.0.0
      _ipmi_excluded_sensors  56,58
    }
    

References