User Guide for GridDB Monitoring Template for Zabbix

Revision: 3890



1 Introduction

1.1 Purpose of this Document

This document explains how to set up GridDB Monitoring Template for Zabbix and how to monitor applying the Template

1.2 Note

GridDB Monitoring Template for Zabbix is a template that helps you monitor GridDB clusters Customize the Template to use it for your system to operate.

Note questions concerning operation and actions of a monitoring system using the Template as well as Zabbix are beyond the scope of our GridDB support services.

2 What is GridDB Monitoring Template for Zabbix?

2.1 Overview

GridDB Monitoring Template for Zabbix is a template that helps you monitor GridDB in Zabbix in a number of ways including alive monitoring, resource monitoring, and performance monitoring.

2.2 File configuration

GridDB Monitoring Template is found in the directory /misc/zabbix-template in the installation media.

This Template consists of the following file:

2.3 Working environment

GridDB Monitoring Template requires the following software for you to use: Zabbix version 4.0 or higher

The following instructions assume the above software is installed and servers to be monitored are registered as hosts in Zabbix.

[note]

2.4 How to install

2.4.1 Setting the Zabbix agent

This Monitoring Template uses ActiveCheck to retrieve GridDB event logs. To activate ActiveCheck, enter the following in the settings file (The default file is /etc/zabbix/zabbix_agentd.conf for the Zabbix agent

setup value description
Server address of the Zabbix server
Hostname host name set in Zabbix
ServerActive address of the Zabbix server

After changing the settings, restart the Zabbix agent.

2.4.2 Importing a template

Log in to the Zabbix frontend and perform the following steps to import the Monitoring Template:

  1. Select Configuration →Templates →Import.
  2. Specify griddb_templates.xml as the import file.
  3. Click on Import with the given default rules.

If the Template is successfully imported, the template [Template GridDB] will be added to the list.

2.4.3 Setting template macros

Select [Template GridDB] →Macros tab and change the default for each macro to match GridDB settings.

macro default description
{$GSHOME} /var/lib/gridstore GridDB home directory
{$GSLOG} /var/lib/gridstore/log GridDB event log storage directory
{$GSHOSTGROUP} GridDB nodes Zabbix host group name
{$GSHOSTPORT} 10040 port number for the operational management of GridDB nodes
{$GSUSER} admin administrative user of GridDB clusters
{$GSPASS} admin password for an administrative user of GridDB clusters
{$GSPARTITIONNUM} 128 number of partitions
{$GSWEBAPIURL} http://localhost:8081/griddb/v2/myCluster/dbs/public URI for the GridDB WebAPI

[note]

2.4.4 Setting a template to a host

Monitoring starts once the Monitoring Template is set to a host where the Zabbix agent and the GridDB server are installed.

Follow the steps below to set the Template:

  1. Go to Configuration → Hosts and select the target host and open the configuration screen for the host.
  2. Switch to the Templates tab and click on Link new templates.
  3. Select Template GridDB and Add → Save.

Saving template settings will automatically start monitoring. To view the results of monitoring, go to Monitoring → Latest data section and select the target host in the list displayed.

2.5 What the Monitoring Template monitors

2.5.1 Applications

The following applications are currently available:

name Overview
gs_stat set of items for performance information that can be obtained using the gs_stat command
gs_logs set of items concerning the GridDB server log
gs_aggregation set of items that aggregates data for host group
gs_webapi set of items that retrieve information through the GridDBWebAPI.

2.5.2 Dependent item

This section describes items in each application.

2.5.2.1 gs_stat

name type monitoring interval Overview
[GridDB] gs_stat master HTTP agent 30 sec. retrieves performance information from nodes
used as a master file of miscellaneous performance information items
[GridDB] (JSON Path) dependent item - Miscellaneous performance information items with /node/stat as the master file.
[GridDB] (JSON Path).diff dependent item - items that calculate the differences between the previous and current cumulative values, from among miscellaneous performance information.

For details about gs_stat items, see the GridDB Features Reference.

2.5.2.2 gs_logs

name type monitoring interval Overview
[GridDB] Event logs Zabbix agent (active) 1 sec. collects event log files
[GridDB] Event logs INFO Zabbix agent (active) 1 sec. collects INFO logs
[GridDB] Event logs WARNING Zabbix agent (active) 1 sec. collects WARNING logs
[GridDB] Event logs ERROR Zabbix agent (active) 1 sec. collects ERROR logs
[GridDB] Periodic checkpoint elapsed time Zabbix agent (active) 10 sec. retrieves from logs the elapsed time for periodic checkpoint execution
[GridDB] Slow query logs Zabbix agent (active) 1 sec. collects slow query logs

2.5.2.3 gs_aggregation

gs_aggregation items perform aggregation operations for the host group specified in {$GSHOSTSGROUP} to collect cluster-level information.

name type monitoring interval Overview
[GridDB] Owner partition count Zabbix aggregate 30 sec. number of cluster owner partitions
[GridDB] Backup partition count Zabbix aggregate 30 sec. number of cluster backup partitions
[GridDB] Store total use Zabbix aggregate 30 sec. capacity (in bytes) of all the data owned by a cluster

2.5.3 Triggers

Set a trigger for a monitoring item to detect and report incidents and events related to GridDB. For events other than information, create failure events.

name severity requirements Overview
[GridDB] OWNER_LOSS partition has been detected. High patitionStatus has transitioned to OWNER LOSS. reports problems with partitions.
[GridDB] ABNORMAL node has been detected. High nodeStatus has transitioned to ABNORMAL. reports problems with nodes.
[GridDB] Log duplication has been stopped by some kind of error. Average duplicateLog has changed to -1. reports automatic backups have stopped due to some error.
[GridDB] Some error has been detected on {HOST.NAME}. Average Logs containing the string ERROR have been detected. reports an error output in an event log.
[GridDB] REPLICA_LOSS partition has been detected. Warning partitionStatus has transitioned to REPLICA_LOSS. reports changes in partition status.
[GridDB] Node has been left from the cluster. Warning clusterStatus has transitioned to SUB_CLUSTER. reports a node has left a cluster.
[GridDB] Cluster status has been stable. Information activeCount has changed to be equivalent to designatedCount. reports cluster status has become stable.
[GridDB] Number of all nodes in the cluster has been decreased. Information designatedCount has decreased. reports reduction in cluster size.
[GridDB] Number of all nodes in the cluster has been increased. Information designatedCount has increased. reports expansion in cluster size.

2.5.4 Graphs

The Monitoring Template provides custom graphs that summarize multiple items to position them on a screen and a dashboard.

name type use
[GridDB] Cluster health Pie for checking whether a cluster is stable; only master nodes are displayed.
[GridDB] Store memory usage Exploded for grasping the amount of memory used divided by the memory limit for data management, as a percentage.
[GridDB] Checkpoint memory usage Exploded for grasping the amount of memory used divided by the memory limit for the checkpoint, as a percentage.
[GridDB] storeDetail.*** Exploded for grasping detailed store information.
[GridDB] Network status Normal for displaying the current network conditions.
[GridDB] Total read and write operation Normal for displaying the (total) number of data Reads/Writes
[GridDB] Total checkpoint and backup Normal for displaying the (total) number of data Reads/Writes
[GridDB] memoryDetail total Stacked for grasping the breakdown of the total amount of memory attached.
[GridDB] memoryDetail cached Stacked for grasping the breakdown of the amount of cached memory.

2.5.5 Screens

The Monitoring Template also provides screens that summarize items and graphs on node information. These screens can be viewed from a host screen for each host.

name Overview
[GridDB] Node status displays items and custom graphs concerning node status.
[GridDB] Store details displays custom graphs for detailed store information on one screen.

3 Applications

3.1 Adding a monitoring item using GridDB WebAPI

GridDB WebAPI enables you to execute any SELECT statement to obtain the result in JSON format. You can also monitor the status of GridDB clusters by aggregating data in miscellaneous meta tables using SQL.

In Zabbix, create monitoring items as indicated in the table below to monitor the status of GridDB cluster, using GridDB WebAPI and meta tables.

item type item to create
HTTP agent executes an SQL statement through the API.
dependent item treats the above HTTP agent item as a master item and extracts a parameter using a JSON Path in preprocessing.

These two items allow you to perform more flexible monitoring in Zabbix; a separate JDBC application specifically for status collection is not needed.

The Monitoring Template comes with the following application as a reference. These items store cluster-level information; activate these items for one host only.

Applications name item type description
gs_webapi [GridDB] Query count master HTTP agent aggregates the total number of meta table #sqls queries
gs_webapi [GridDB] Query count dependent item total number of running queries
total number of running queries
total number of running queries

3.2 Creating a dashboard

Dashboards cannot be included in a template. To fully utilize the Monitoring Template in the actual monitoring system, you need to create a widget or position template graphs on a dashboard, among others.

Moreover, many of the items included in the Monitoring Template display node-level information. To see cluster-level information, utilize a dashboard widget.

Below are the configuration examples of a widget and a dashboard which includes this widget.

3.2.1 Example of widget configuration

3.2.1.1 Process memory history

item name [GridDB] processMemory
widget type Graphs
use for grasping a summary of memory usage
process memory
process memory

3.2.1.2 Overview of node status

item name (any item)
widget type plain text
use for grasping node status
node status
node status

3.2.1.3 Number of reference rows

item name [GridDB] totalRowRead.diff
widget type Graphs
use for grasping changes in load due to disk reads
number of reference rows
number of reference rows

3.2.1.4 Number of registration rows

item name [GridDB] totalRowWrite.diff
widget type Graphs
use for grasping changes in load due to disk writes
number of registration rows
nmber of registration rows

3.2.1.5 Elapsed time for periodic checkpoint execution

item name [GridDB] Periodic checkpoint elapsed time
widget type Graph
use for grasping changes in load due to disk writes
elapsed time for periodic checkpoint execution
elapsed time for periodic checkpoint execution

3.2.1.6 Slow query logs

item name [GridDB] Slow query logs
widget type plain text
use for analyzing causes of a slowdown if there is one.
slow query logs
slow query logs

3.2.2 Example of dashboard configuration

3.2.2.1 For cluster monitoring

Example of dashboard configuration (cluster monitoring)
Example of dashboard configuration (cluster monitoring)

To monitor an entire cluster, display graphs that aggregate information on each node, information on event logs, load on each node, and resource usage.

Moreover, you could also use items included in the Template OS Linux and item keys for the Zabbix agent to display information on OS resources together in addition to the information above, which will be useful for identifying bottlenecks.

Additionally, it is recommended to configure a cluster in such a way that gives you a visual representation of incident status at a glance by fully utilizing various Zabbix features including action logs, incident information, and maps.

3.2.2.2 For node monitoring

Example of dashboard configuration (node monitoring)
Example of dashboard configuration (node monitoring)

It is also recommended to create a dash board for node monitoring in addition to cluster monitoring.

A dashboard for node monitoring aggregates and displays more detailed information about nodes, including node event logs, breakdowns of memory usage, and disk space. Such information will be useful for cause analysis when specific nodes are highly loaded or a node failure occurs.

Set each widget as a Dynamic item; this will allow you to switch nodes to display by selecting Host on the upper right-hand side of the window.

4 Trademarks