GridDB Database Administrators Guide

Revision: 3935



1 Introduction

This document describes design, build, and operations provided by GridDB.

It is for designers involved in system development with GridDB and administrators involved in operations management of GridDB.

This document is organized as follows:

 

2 Overview

2.1 Features of GridDB

GridDB is a high performance distributional NoSQL-type database that delivers expandability and availability, featuring:

 

Data models and supporting interfaces

GridDB has a data model that evolved out of a key-value database. It stores data in a container corresponding to RDB tables.

Data model
Data model

It supports client APIs and an SQL-like query language called TQL as the NoSQL interface, and JDBC/ODBC and SQL as the NewSQL interface. *A NewSQL interface is only supported by Advanced Edition (henceforth referred to as AE).

NoSQL/New SQL interfaces
NoSQL/New SQL interfaces

 

Availability

GridDB works with the clusters consisting of multiple nodes (server process).

When a node fails, a cluster automatically detects a failure and moves the role of the failed node to another node to allow the cluster to continue running. Further, within clusters, GridDB duplicates and multi-configures (replicate) data on multiple nodes. As such, even after a failure has occurred, clusters automatically relocate replicas (autonomous data placement), which enables continuous data access.

Availability
Availability

 

Scalability

One way to improve the performance of a cluster in GridDB is to scale up the server where the nodes are running; another is to perform a scale-out operation by adding nodes to a cluster.

In a scale-out operation, the system can be expanded online. Those nodes added to a system during a scale-out operation are easy to operate; data in the nodes will be properly placed depending on the system load.

Scalability
Scalability

 

High speed

To minimize latency for executing database, GridDB allocates memory and database files that threads occupy for each CPU core thread, which eliminates latency for assignment, exclusion, and synchronization.

High speed
High speed

To achieve high speed, GridDB additionally has the following features:

For details, see the GridDB Features Reference (GridDB_FeaturesReference.html).

 

2.2 GridDB: Architectural overview

This section gives an overview of the architecture of GridDB required for its design, build, and operation.

 

2.2.1 Thread configuration

Nodes contain multiple threads (called services) which execute various types of processing. This section describes thread configuration and what each thread does.

Number of threads
Service configuration
Thread name What threads do
Transaction services Executes transaction processing, including row registration and search operations, and database file write operations, among others.
Checkpoint services Executes checkpoint processing.
Cluster services Executes cluster configuration processing, including heartbeat and node participation and departure.
Sync services During partition relocation, executes long-term and short-term synchronization operations for data synchronization.
System services Accepts requests for production and management operations
SQL services (AE only) Executes SQL processing

 

 

2.2.2 Data management

Database files and transaction services both of which manage data have a one-to-one relationship.

These database files have an internal structure called partition which manages multiple containers. They have another internal structure called partition group which merges multiple partitions into one. Each transaction service operates on only one partition group; it can be executed without the need for exclusive processing from other transaction services.

Relationship of partitions, threads, and files managed by one node
Relationship of partitions, threads, and files managed by one node

 

 

2.2.3 File configuration

Files that nodes manage include definition files and database files. This section describes file configuration and content.

File configuration managed by nodes
File configuration managed by nodes

 

As mentioned above, each partition group has its own database files consisting of one checkpoint file and three transaction files.

In transaction log processing, data is sequentially written onto transaction log files to sync update, which provides transaction assurance. In checkpoint processing, blocks of updated data in memory is regularly saved in checkpoint files, which provides data persistence.

 

2.2.4 Memory structure

GridDB is provided with various memory areas to reduce disk I/O and make processing efficient.

Memory
Memory
name description
database buffer I/O buffer for caching retrieved data.
caches data image (block) retrieved from checkpoint files in memory. A larger database buffer size means blocks are more easily cached in memory, which improves search and registration performance.
Each node has one database buffer area; as such, multiple transaction processing threads share in its use.
memory area for checkpoints area used for checkpoint processing;
copies updated blocks in a database buffer to a memory area for checkpoints and exports them in checkpoint files; temporarily stores data to be checkpointed when data is updated during checkpointing.
memory area for transaction processing area used by transaction processing threads which perform row registration and search processing.
buffer for storing intermediate SQL results (AE only) area for storing intermediate results while SQL processing is taking place.
If the size of the intermediate results exceeds the buffer memory size limit, those results are temporarily exported in swap files for intermediate SQL results.
SQL work memory area (AE only) area for temporarily storing intermediate results in SQL processing, including join and aggregation operations.

GridDB is a hybrid-type database which uses both memory and disks. It accelerates access using memory as data cache and stores data in disks to manage large volumes of data. A larger memory size means larger volumes of data are cached in memory, which reduces disk I/O and improves performance.

2.2.5 Cluster management

GridDB configures a cluster with multiple nodes

Nodes have two roles: "master" and "follower." A master manages the entire cluster. All non-master nodes act as "followers."

When a cluster starts, one of the nodes composing a cluster will always be a "master." Followers then perform cluster processing including synchronous processing according to the instructions from the master. A master node is automatically selected by GridDB. If a master node fails, a new master will be automatically selected from a set of follower nodes to keep the cluster running.

Further, within a cluster, GridDB duplicates and multi-configures data (replicas) on multiple nodes. Among replicas, master data is referred to as an owner, whereas duplicated data is referred to as backups. Regardless of which node composing a cluster fails, GridDB will always continue processing through the use of replicas. Moreover, the system automatically relocates data after a node failure (autonomous data placement) without a need for special production operations. Data placed in the failed nodes is restored from replicas and then relocated to reach the automatically set number of replicas.

Replica
Replica

Replicas are created per partition. If nodes are either added to a cluster or deleted from a cluster, partitions will automatically be reassigned and relocated to distribute replicas to each node.

   

3 Physical design

Describes the design items for GridDB needed to implement the non-functional requirements of the system

3.1 Points to consider when designing

As non-functional requirements an established system should meet, IPA (Information-Technology Promotion Agency, Japan) has formulated the "Non-Functional Requirements Grades," part of which are cited below:

"(source: Non-Functional Requirements Grades Usage Guide [Description Manual], p. 7) (c) 2010 IPA

Non-functional requirements: Major category Example requirements Example implementation methods
Availability - Operation schedule (operating hours, outage plans, etc.)
- Operation objectives for failure occurrences, disasters, etc.
- Equipment redundancy, backup center establishment, etc.
- Restoration/recovery methods, and restoration/recovery structure establishment
Performance and scalability - Business volume, and estimates on future growth
- Attributes of business which will be systematized (peak times, normal operation, degraded operation, etc.)
- Sizing based on performance objectives
- Capacity planning (future-oriented equipment and network sizing and allocation)
Operability and maintainability - System operation levels required for system operation
- Response levels in the event that a problem occurs
- Establishment of monitoring methods and backup methods
- Distribution of roles, organizational structure, training, and preparation of manuals in anticipation of problems that may arise
Migratability - System migration period and schemes
- Types of assets to be migrated, and volume of migration
- Migration schedule establishment, migration tool development
- Migration organization establishment, migration rehearsal
Security Usage restrictions
- Unauthorized access prevention
- Access restrictions, data confidentiality
- Fraud tracking, monitoring, detection
- Information security training for operators, etc.

To satisfy these non-functional requirements, GridDB needs the following design items:

Specific design varies considerably depending on the level and content of system requirements; this chapter focuses on points of design that should be considered to leverage "availability" and "performance and scalability" in the table above characterizing GridDB.

 

3.2 Data space design

In GridDB, data is managed per unit of block, container, table, row, partition, and partition group. Data space design using these data units will be a crucial factor in determining "availability" and "performance and scalability."

Partition
Partition
Relationship of processing threads, partition groups, and database files
Relationship of processing threads, partition groups, and database files

 

[notes]

 

The next several subsections summarize points of design for the following items:

 

3.2.1 Block size

A block is a physical data unit for storing row data and meta information for containers among others and is the smallest unit of disk I/O.

Block
Block

Select the block size from 64KB, 1MB, 4MB, 8MB, 16MB and 32MB. The default is 64 KB. Normally, the default works fine; no change is necessary.

Note that block sizes of 64 KB and 1 MB differ in the limit of the number of columns in a container. To create more columns than the limit of the number of columns for the block size of 64 KB, change the block size. For details about upper limits, see the section "System limiting values" in the GridDB Feature Reference (GridDB_FeaturesReference.html).

Data stored in a block

Data stored in a block are of several types, including row data, meta information for containers, and index data. Data is categorically organized by the types of data and stored in a block.

block type description
metaData block for storing meta information for containers.
rowData block for storing row data for containers (without expiration settings).
mapData block for storing index data for containers (without expiration settings).
batchFreeRowData block for storing row data for containers (with expiration settings).
batchFreeMapData block for storing index data for containers (with expiration settings).

One block stores data for multiple containers. Data is stored in the order that it was registered or updated by an application. Storing data which are close in time or type in the same block will localize data and improve memory efficiency. It executes search processing for time-series data on the condition of elapsed time with less resources and at high speed.

related parameters

[notes]

  

3.2.2 Number of partitions

As one way to manage data in a cluster, containers are managed using data boxes called partitions.

Relationship of partitions, threads, and files managed by one node
Relationship of partitions, threads, and files managed by one node

The default number of partitions is 128. Normally, the default works fine. If, however, the following conditional expression is not met, increase the number of partitions.

number of partitions >= concurrency of transaction services X number of configuration nodes in a cluster

related parameters

[notes]

 

3.2.3 Number of processing threads

A GridDB node consists of one process. Within a process are multiple threads (called services) running various operations.

This subsection describes how to determine the concurrency of "transaction services" and "SQL services (AE only)" among the services available.

Service configuration
Service configuration
Thread name Number of threads
(default)
What threads do
Transaction services 4 (default can be set) Executes transaction processing, including row registration and search operations, and database file write operations, among others.
Checkpoint services 1 (default can be set) Executes checkpoint processing
Cluster services 1 Executes cluster configuration processing, including heartbeat and node participation and departure.
Sync services 1 During partition relocation, executes long-term and short-term synchronization operations for data synchronization.
System services 1 Accepts requests for production and management operations
SQL services (AE only) 4 (default can be set) Executes SQL processing

A processing done by each is called an "event." Each service has one event queue and sequentially processes those events registered in the event queue. For one service A to ask another service B to process events, service A registers them in the event queue which service B has.

Events processed by services
Events processed by services

[notes]

 

The threads that especially affect the parallel performance in search and registration are the threads that run transaction processing (transaction services) and threads that run SQL processing (SQL services). Set the concurrency of these processing threads to match the number of CPU cores of the machine that runs nodes.

Set the concurrency considering the following points:

[notes]

related parameters

[notes]

 

3.2.4 Checkpoint processing

Checkpoint processing refers to a processing in which updated blocks in a database buffer are written to checkpoint files.

In checkpoint processing, parameters are available to set the run cycle of checkpoints and whether to enable or disable parallel execution. Normally, default parameters work fine and no change is necessary. If there is a large amount of updated data and high-speed checkpoint processing is needed, enable parallel execution. Enabling parallel execution reduces the time to perform checkpoint processing. But this will increase the CPU load during processing.

Executing checkpoint processing
Executing checkpoint processing

A checkpoint is run on the following occasions:

occasion description
regular run automatically run at regularly scheduled intervals (Specify the cycle using a parameter. Regular run can be temporarily deactivated.).
manual run run when the user runs the gs_checkpoint command.).
node start/outage automatically run after a recovery process during a node startup or when stopping nodes in a normal manner.
long-term synchronization start/end automatically run when starting/ending long-term synchronization.

Additionally, it is possible to specify whether to compress checkpoint files. Compressing checkpoint files can reduce the storage cost that increases in proportion to the amount of data. For details on the data compression feature, see the section on "data block compression" in GridDB Features Reference (GridDB_FeaturesReference.htm).

related parameters

[notes]

    

3.2.5 File configuration

Files that GridDB nodes create or output while running include database files and backup files.

Database files and backup files are characterized by a large file size, and their disk I/O have a great impact on performance; specify the appropriate deployment directory for these files, considering storage capacity and I/O performance. This section describes file configuration for a "normal configuration" and for a "large-data configuration."

3.2.5.1 Normal configuration

Database files and other output files
Database files and other output files

The default deployment directory is the directory under the GridDB home directory (/var/lib/gridstore).

list of output files

related parameters

[notes]

3.2.5.2 Large-data configuration

Database files and other output files for large-data configuration
Database files and other output files for large-data configuration

This is a configuration that splits checkpoint files and deploys them. This subsection highlights some differences with a normal configuration.

list of output files

same as a "normal configuration."

related parameters

  

3.3 Memory space design

GridDB is a hybrid-type database which uses both memory and disks. It accelerates access using memory as data cache and stores data in disks to manage large volumes of data.

As shown in the figure below, a larger memory size means larger volumes of data are cached in memory, which reduces disk I/O and improves performance. Memory size thus has a great impact on the performance of GridDB. Therefore, memory space should be designed with the system's performance requirements and the amount of data in mind.

Combined use of memory and disks
Combined use of memory and disks

 

GridDB nodes have various types of buffers and memory areas for different uses. The main types of memory are:

Memory
Memory
name description
database buffer I/O buffer for caching retrieved data.
caches data image (block) retrieved from checkpoint files in memory. A larger database buffer size means blocks are more easily cached in memory, which improves search and registration performance.
Each node has one database buffer area; as such, multiple transaction processing threads share in its use.
memory area for checkpoints area used for checkpoint processing;
copies updated blocks in a database buffer to a memory area for checkpoints and exports them in checkpoint files; temporarily stores data to be checkpointed when data is updated during checkpointing.
memory area for transaction processing area used by transaction processing threads which perform row registration and search processing.
buffer for storing intermediate SQL results (AE only) area for storing intermediate results while SQL processing is taking place.
If the size of the intermediate results exceeds the buffer memory size limit, those results are temporarily exported in swap files for intermediate SQL results.
SQL work memory area (AE only) area for temporarily storing intermediate results in SQL processing, including join and aggregation operations.

 

In designing memory space, it is necessary to design the upper size limit and other related parameters for each memory area. A key point here is the memory sizes of "database buffers" and "memory areas for SQL processing."

Memory
Memory

The size of "database buffers" has the greatest impact on the system's performance. If there is space in physical memory, it is recommended to assign as much memory size as possible to database buffers.

As for the size of a memory buffer for SQL processing, conduct its assessment with a typical SQL query used by the system and set the memory size.

 

The next several subsections summarize points of design for the following items:

 

3.3.1 Database buffer

This is an area for caching in memory the blocks read from a checkpoint file.

A larger database buffer size means data blocks reside in memory, which improves search and registration performance.

Specify the upper buffer size limit using a node definition file. If the buffer reaches its capacity, old blocks are written to a checkpoint file using LRU to create a space and read the remaining blocks from files. Read operations from files and write operations to files are called swapping. If blocks are in use and hence cannot be loaded to create a space, the buffer size is temporarily expanded. When the processing ends and a space is no longer needed, the buffer size is reduced to its upper limit.

The internal buffer is partitioned on a per-partition group basis before use. The size allotted to each partition group is dynamically determined by nodes according to the amount of data and accessibility status.

database buffer
Database buffer

Immediately after starting up nodes, no blocks are yet read into a database buffer. Therefore, in search and registration processing immediately after a startup, swap processing occurs frequently as a result of reading blocks, which may degrade the speed performance. To accelerate processing immediately after a startup, GridDB provides a warmup feature to read blocks into a database buffer upon a node startup.

related parameters

How to check related information

 

3.3.2 Memory area for checkpoints

This is a memory area used in checkpoint processing.

In checkpoint processing, blocks in a database buffer are copied to a memory area for checkpoints to write them to a checkpoint file. Of all the blocks in a database buffer, those blocks that have been updated by the time checkpoint processing starts will be subject to checkpoint processing.

memory area for checkpoints
Memory area for checkpoints

Further, if concurrently executed transaction processing is to update those blocks that are subject to checkpoint processing, they are copied to a memory area for checkpoints before updating. For example, if row registration is performed during checkpoint processing, those blocks that store index data that are set in the rows will be updated. If those blocks are already subject to checkpoint processing, they will be copied to a memory area for checkpoints and the original blocks in a database buffer will be updated.

For these reasons, if transaction processing that involves block updates occurs frequently, usage of a memory area for checkpoints may increase.

Transaction processing executed during checkpoint processing
Transaction processing executed during checkpoint processing

The size of this memory area does not have much impact on the performance of checkpoint processing; therefore, there is no need to set a large size for this area.

Specify the upper size limit of a memory area using a node definition file. If the size to be used in one checkpoint process is not large enough, the size will be expanded temporarily. After processing ends, the size is reduced to its upper limit.

related parameters

How to check related information

 

3.3.3 Memory area for transaction processing

This is a memory area used in transaction services where row registration and search processing are performed. Each transaction service retrieves from a memory area the portion of memory needed for processing before use. After one transaction process ends, the memory that has been made available is released back to the memory area.

The number of transaction services (concurrency) is equivalent to /dataStore/concurrency (4 by default).

memory area for transaction processing
Memory area for transaction processing

If all the memory in a memory area is in use, and memory required for transaction services cannot be reclaimed, an error will occur.

A transaction may use a large amount of memory in such processing as running tens of millions of hits of queries using TQL, registering huge sized BLOBs, and registering a bulk of huge sized data using MultiPut. Set the upper size limit of a memory area according to the content of transaction processing and the number of transaction services (concurrency: /dataStore/concurrency).

related parameters

How to check related information

 

3.3.4 Memory area for SQL processing (AE only)

There are two different types of memory areas for SQL processing: the buffer for storing intermediate SQL results and SQL work memory areas.

The buffer for storing intermediate SQL results is a memory for storing the data in the tables for intermediate results of such tasks in SQL processing as scan and join. If the size of the intermediate results exceeds the buffer memory size limit, those results are temporarily exported in swap files for intermediate SQL results.

When executing analytical queries on large-volume data using an SQL, it is recommended to set the value of this buffer as high as possible, considering the balance with the database buffer.

SQL work memory is memory used in processing tasks in SQL processing. There is no need to change the default value for work memory.

However, be sure to set the sizes of a buffer for storing intermediate results and of work memory so as to satisfy the following expression:

Unless the above expression is satisfied, exporting to swap files for intermediate results takes place often.

related parameters

How to check related information

 

3.3.5 Memory for other purposes

Memory is not only used for a buffer for processing, but also used to retain the management information on containers and files. The size of the next two types of memory (memory for container management and memory for data management) increases relative to the number of containers and the amount of data, respectively.

 

3.4 Container design

The optimal container design largely depends on system and application requirements. This section describes the basics that help you with container design and provides related information.

3.4.1 Recommended design

Store data generated daily in time-series containers.

Containers are of two types: collections and time-series containers.

First, see if time-series containers can be used to store data; only when time-series containers are not available, consider using collections.

Use time-series containers when device sensor data and logs are generated on a minute-by-minute basis that results in a continuous increase, like IoT systems. Time-series containers are optimized for storing and referencing time-series data. As such, database buffer is more efficiently used than when collections are used.

The features of time-series containers include compression, operation, and row expiration features. Gaining a better understanding of these features specific to time-series data helps you with container design.

[References]

Use a large number of containers rather than small.

Processing for one container is handled on one processing thread for one node server. As such, designing containers to store a large amount of data in every container using a few containers cannot take advantage of node processing concurrency (multicore). For another thing, such design leads to excessive access to particular nodes, which means performance does not scale out when adding nodes.

For example, for time-series data, create time-series containers for each data source as below:

Even in case of the same kind of data, it is recommended to split the container into as many containers as the number of cluster nodes multiplied by the processing concurrency per node.

Running operations of multiple containers serially can degrade performance. Process split containers in parallel as much as possible using batch processing.

[References]

Split containers in a way that suits data.

Three methods are available to split containers: row splitting, column splitting, and table partitioning.

Select the appropriate method to match search criteria or data characteristics.

[References]

Minimal indexing maximizes performance.

Creating indexes appropriately to match the system's data search criteria will improve search performance. Multiple indexes can be created in one container, but the number of indexes must be minimized to what is required.

This is because index data stresses database buffers. In systems where the memory size is not sufficient, excessive indexing decreases the buffer hit ratio and increases swapping, thereby leading to performance degradation.

It is possible to remove unnecessary indexes afterwards, but it may take a long time to complete removal, if a large amount of rows is already stored in the target container. For this reason, take time beforehand to fully design containers and try to create only necessary indexes.

Designing containers in such a way that retrieval by primary key alone is enough to focus data will automatically result in minimal indexing. Moreover, such design automatically means splitting containers, which makes it possible to take advantage of each node's processing concurrency.

To create effective indexes in the Advanced Edition, refer to the SQL optimization rules in the GridDB Advanced Edition SQL Tuning Guide.

[References]

 

3.5 Cluster configuration design

In designing cluster configuration, it is necessary to design the following items, to fulfill the requirements for availability, including the system's utilization rates and RTO.

GridDB autonomously places replicas (data) in the clusters configured by multiple nodes (server processes) and automatically determines the master node that manages the entire clusters. Even if a node fails, processing from the client application (NoSQL interface) can continue using failover.

Failover
Failover

  

3.5.1 Number of nodes configuring a cluster

Depending on the number of node servers configuring clusters, the maximum number of node servers that can tolerate simultaneous node failures to allow cluster services to continue varies.

To meet the required utilization rates of the system, determine the number of nodes configuring the cluster, depending on the maximum number of node servers that can tolerate simultaneous node downs.

[notes]

If a node goes down, the following recovery processing will automatically be performed:

Partitions (replicas) are relocated not only when a cluster is contracted due to a node down, but also when a cluster is extended.

Before relocation, if backup data is older than owner data, the cluster will forward the delta data to the backup node and synchronize the owner and backup data.

Synchronization in cluster reconfiguration
Synchronization in cluster reconfiguration

Synchronization described above is automatically performed by the server. There is no item that needs design work.

 

3.5.2 Failure detection (heartbeat)

A cluster detects failure through a heartbeat between nodes. To check the survival of nodes, the master sends a heartbeat to all followers at regularly scheduled intervals. Followers receiving a heartbeat send back their responses to the master.

Heartbeat
Heartbeat

To check if a heartbeat sent from the master has arrived, followers check the received time of a heartbeat at regularly scheduled intervals.

To reflect the latest data between the master and followers, a heartbeat exchanges information on configured nodes and partition information management tables between the master and followers.

The length of regularly scheduled intervals for both the master and followers is by default 5 seconds (value of /cluster/heartbeatInterval in gs_cluster).

related parameters

3.5.3 Replica processing (number of replicas, processing mode)

For greater data availability, GridDB creates replicas and retains them by distributing them among multiple nodes. For this reason, even if a node fails, GridDB can continue data access using replicas in the rest of the nodes.

The next several subsections explain points of design for replicas; replication in which replicas are created; and the mechanism of synchronization where replicas are recovered from failure.

 

3.5.3.1 Replica

A replica has the following features:

The number of replicas is determined by the maximum number of node servers that can withstand multiple failures to ensure data access, as dictated by the system's utilization rates.

A multiple failure refers to a failure occurring in multiple nodes which causes them to go down all at once.

3 replicas and 2 node servers where multiple failures are occurring
3 replicas and 2 node servers where multiple failures are occurring
3 replicas and 3 node servers where multiple failures are occurring
3 replicas and 3 node servers where multiple failures are occurring

More replicas means greater availability with the following adverse effects:

  

Below is a guideline for the number of replicas:

 

related parameters

[notes]

 

3.5.3.2 Replication processing

Data updated in transaction processing, including data registration and deletion can be written to disks and made permanent. This protects data updates from missing data in case of a node failure. Further, availability can be increased by forwarding updated data to a node containing backup data and replicating it in this node through replication.

Transaction log write and replication
Transaction log write and replication

In transaction processing, each of writing to log files and replication has two modes, including synchronous and asynchronous modes. Select one based on the system's availability and performance requirements.

The table below explains combinations of transaction logs and replication modes, and process flows and performance for each combination.

mode
combination
process flow performance state of data upon notification of completion sent to the application
[1]
log:
asynchronous (1 sec.)

replication:
asynchronous

(default)
log: asynchronous, replication: asynchronous high speed - flushes updated data within a second

- forwards data to a backup node (but has not acknowledged receipt)
[2]
log:
asynchronous (1 sec.)

replication:
semi-synchronous
log: asynchronous, replication: semi-synchronous moderately high speed - flushes updated data within a second

- forwards data to a backup node
[3]
log:
synchronous

replication:
asynchronous
log: synchronous, replication: asynchronous moderately low speed -completes flushing updated data

- forwards data to a backup node (but has not acknowledged receipt)
[4]
log:
synchronous

replication:
semi-synchronous
log: synchronous, replication: semi-synchronous low speed - flushes updated data within a second

- forwards data to a backup node

Below is a guideline for setups for transaction log write and replication modes.

related parameters

[notes]

 

3.5.4 Client failover timeout

Even if a node fails, the application can continue accessing data thanks to failover functionality provided by a node and a client API (NoSQL interface).

The rest of this subsection gives a detailed explanation of how failover is performed when a node fails by following all steps ranging from an occurrence of a failure to recovery.

flow figure description
(1) client's request for processing

Flow of failover 1 when a failure occurs

1. The client requests processing by indicating what to do with a container through a client API.
The client API connects to node 1, which stores container A to operate on.
(2) Failure occurrence

Flow of failover 2 when a failure occurs

2. While processing a request from the client, the node goes down due to a node failure in node 1.
(3) automatic recovery 1

Flow of failover 3 when a failure occurs

3. The client API loses connection to node 1 and automatically retries processing.

4. The cluster automatically detects node 1 has gone down and reconfigures the cluster using the remaining nodes.
(4) automatic recovery 2

Flow of failover 4 when a failure occurs

5. As a replacement for owner data in node 1, the cluster changes backup data in node 2 to owner data.
In the background, the cluster creates backup in other nodes.
(5) continuation of processing

Flow of failover 5 when a failure occurs

6. The client API establishes connection to the new owner of container A, namely node 2, and automatically retries processing.

The client continues processing without causing an error.

If the node in the partition the client API is accessing fails, the client API automatically retries processing. If, thanks to the cluster's autonomous data placement, the partition recovers while retrying, the client API automatically continues processing. The retrying time, or "failover timeout," can be changed by specifying a property for application connection processing for the application.

[notes]

related parameters

 

3.6 Network design

In GridDB, it is necessary to design two types of networks, namely nodes and clusters.

 

3.6.1 Node network configuration

The GridDB nodes perform a variety of network communication tasks to communicate with the client and other nodes over the network. Communication paths used for this are of several types as shown in the table below:

communication to nodes
communication to nodes
No. item communication path description
A transaction processing - client-node
- between nodes
- communication for data operations over a NoSQL interface
- communication for replication processing of transactions
B SQL processing (AE only) - client-node
- between nodes
- communication for data operations over a NewSQL interface
- communication for processing SQL by parallel and distributed computing.
C production and management operations client-node communication for accepting an operation request for production and management
D Cluster management between nodes communication for sending and receiving a heartbeat used for checking the survival of nodes, as well as information on cluster management
E synchronization between nodes communication for data synchronization through partition relocation

Main operation tools provided by GridDB use multiple types of communication as illustrated in the figure below:

Network communication used by operation tools
Network communication used by operation tools
Operation tools communication types to be used
integrated operations management GUI (gs_admin) A. transaction processing
B. SQL processing (AE only)
C. production and management operations
interpreter (gs_sh) A. transaction processing
B. SQL processing (AE only)
C. production and management operations
operation command (including gs_joincluster and gs_stat) C. production and management operations
export/import (gs_export_gs_import) A. transaction processing
B. SQL processing (AE only)

[notes]

related parameters

[notes]

In performing data operations using TQL or SQL, the client connects to a cluster through a NoSQL/NewSQL interface to perform processing. Note that in the case of SQL, data operations using SQL involve distributed processing between nodes and hence require a high amount of communication between nodes.

Communication flow when performing data operations using a NoSQL/NewSQL interface
Communication flow when performing data operations using a NoSQL/NewSQL interface

The points of design required for a node network are port numbers and bandwidth. The next several subsections explain points of design for each.  

 

3.6.1.1 Port number

Port numbers GridDB nodes use are of several types as shown in the tables below:

By default, the numbers in the two tables are used. If the port number that the system plans to use is used by another application or elsewhere, then the default port number must be changed.

network communication
network communication
No item description Port number
A transaction processing communication port for performing transaction processing (AE only) 10001
B SQL processing communication port for performing SQL processing (AE only) 20001
C production and management operations communication port for accepting an operation request for production and management 10040
D Cluster management communication port for sending and receiving a heartbeat used for checking the survival of nodes, as well as information on cluster management 10010
E synchronization communication port for data synchronization through partition relocation 10020

To use multicast communication as a connection method, the following three additional port numbers in the table below are used:

No item Port number
F multicast for transaction processing 31999
G multicast for SQL processing (AE only) 41999
H multicast for cluster management 20000

 

3.6.1.2 Bandwidth

GridDB communicates a large volume of data. Therefore, the recommended bandwidth for the network bandwidth is 10 GbE.

The following three communication paths involve large data volumes:

Network communication with large data volumes
Network communication with large data volumes

 

When there is a shortage of network bandwidth due to a large amount of data traffic, it is recommended to increase bandwidth using multiple network interface cards (NICs).

[Example] normal case

[Example] case of high traffic

 

3.6.2 Cluster network configuration

The following three connection methods are available to configure clusters between nodes and communicate with the client: a multicast method, a fixed list method, and a provider method. The following explains the benefits and drawbacks of each connection method. Select one of the three methods to suit the network environment.

Connection methods
Connection methods

 

method name description settings/build whether stop is needed when adding nodes network environment
Multicast method uses multicast communication benefit:
Easy to describe the definition file
benefit:
no need to stop the cluster
drawback
: Communication is not enabled unless all nodes and the client are placed in the same subnet.
Fixed list method Sets a list of addresses of all nodes in the client and the nodes.
It uses unicast communication to communicate in the cluster based on an address list.
drawback:
difficult to describe the definition file
drawback:
need to stop the cluster
benefit:
no restrictions on the environment
Provider method Sets a list of addresses of all nodes in the provider.
The client and the nodes then obtain the address list from the provider to use unicast communication to communicate.
drawback:
need to implement a provider
benefit: no need to stop the cluster benefit:
no restrictions on the environment

Where multicast commutation can be used, a multicast method is recommended. (Specify the connection method using a cluster definition file. By default, the connection method is set to a multicast method.)

Keep in mind, however, that a multicast method has restrictions on the network environment.

In a multicast method, because of communication limitations in multicast communication, the client and all nodes should be placed in the same subnet. If they are placed in separate subnets, the multicast method cannot be used, unless multicast routing settings are configured on all routers in the network.

Multicast method
Multicast method

In the cloud, such as AWS, there may be cases where multicast communication is not available. In such cases, use a fixed list method or a provider method.

Between the fixed list method and the provider method, the fixed method is easier in terms of settings and environment building. By comparison, the provider method requires to build a Web service which provides address lists.

On the other hand, the provider method is not affected by node addition. That is, in the provider method, all that is needed is to change information in the address list that the provider has; there is no need to stop the cluster. In comparison, the fixed list method requires to stop the cluster to modify the cluster definition file. It also requires to change the specifications of APIs for client-side connections. For this reason, for systems expecting to add nodes, the provider method is recommended.

[notes]

The next three subsections give the details of each connection method.

 

3.6.2.1 Multicast method

A multicast method is a method that uses multicast for communication. Multicast refers to communication that sends from one sender the same data at the same time to given multiple destinations.

Multicast method
Multicast method

To use a multicast method, place the client and all nodes in the cluster in the same subnet. If they are placed in separate subnets, the multicast method cannot be used, unless multicast routing settings are configured on all routers in the network.

In a multicast method, the nodes poll the network and receive data sent at regularly scheduled intervals using multicast communication. The nodes do not receive data immediately; it takes time amounting to at most one regular interval before communication starts. The length of a regular interval can be specified in the cluster definition file.

related parameters

[notes]

 

3.6.2.2 Fixed list method

A fixed list method is a method to explicitly specify the IP address of all nodes in the cluster for the client and each of the nodes. It uses unicast communication to communicate in the cluster based on an address list.

Fixed list method
Fixed list method

 

related parameters

[notes]

  

3.6.2.3 Provider method

A provider method is a method to locate a Web service that delivers a list of the IP addresses of all nodes in the cluster. The client and each node first access the provider to obtain the IP addresses of all nodes.

Provider method
Provider method

The nodes obtain an address list from the provider when they start. They continue to obtain the updated address list from the provider at regular intervals. This means that in such cases as node addition, once the provider's address list is updated, the new information is automatically available in the nodes.

 

related parameters

[notes]

 

3.6.3 Network configuration to isolate communication between external and internal networks

3.6.3.1 Objective

Of communication done by the GridDB nodes, communication for transaction processing and SQL processing has two types of communication paths. These two are client-node client communication (external communication) and cluster internal communication between the nodes. Both communicate through the same network interfaces.

The increasing scale of the system increases flow both in external and internal communications and hence the network can be a performance bottleneck.

In terms of system configuration, occasionally it might be necessary to isolate the interfaces for client communication into external networks and the interfaces for cluster internal communication into internal networks, respectively.

To deal with these cases, GridDB makes it possible to assign both a network interface for client communication and a network interface for cluster internal communication, to communication both for transaction processing and for SQL processing. This creates a network configuration that isolates communication between external and internal networks.

Isolation of network interfaces
Isolation of network interfaces
No. item communication path description
A transaction processing client-node external communication for data operations over a NoSQL interface
A' transaction processing between nodes internal communication replication processing of transactions
B SQL processing (AE only) client-node external communication for data operations over a NewSQL interface.
B' SQL processing (AE only) between nodes internal communication for processing SQL by parallel and distributed processing

3.6.3.2 Node network configuration

To create a network configuration that isolates communication between external and internal networks, first set up a node network configuration.

related parameters

[notes]

Specify the IP address of the network interfaces in serviceAddress and localServiceAddress, each using a different network interface. To configure a cluster with multiple nodes, all nodes must be set up.

Moreover, it is recommended to set up serverAddress for processing other than transaction processing and SQL processing without omitting it.

[Example]

"cluster":{
    "serviceAddress":"192.168.10.11",
    "servicePort":10010
},
"sync":{
    "serviceAddress":"192.168.10.11",
    "servicePort":10020
},
"system":{
    "serviceAddress":"192.168.10.11",
    "servicePort":10040,
          :
},
"transaction":{
    "serviceAddress":"172.17.0.11",
    "localServiceAddress":"192.168.10.11",
    "servicePort":10001,
          :
},
"sql":{
    "serviceAddress":"172.17.0.11",
    "localServiceAddress":"192.168.10.11",
    "servicePort":20001,
          :
},

3.6.3.3 Cluster network configuration

Set up a cluster network configuration according to the node network configuration.

In the multicast method, no special setup is required. See the section "multicast method."

In the fixed list method, information on localServiceAddress must be added to the address list set in the cluster definition file.

[Example]

"notificationMember": [
    {
        "cluster":           {"address":"192.168.10.11", "port":10010},
        "sync":              {"address":"192.168.10.11", "port":10020},
        "system":            {"address":"192.168.10.11", "port":10040},
        "transaction":       {"address":"172.17.0.11", "port":10001},
        "sql":               {"address":"172.17.0.11", "port":20001},
        "transactionLocal":  {"address":"192.168.10.11", "port":10001},
        "sqlLocal":          {"address":"192.168.10.11", "port":20001}
    },
    {
        "cluster":           {"address":"192.168.10.12", "port":10010},
        "sync":              {"address":"192.168.10.12", "port":10020},
        "system":            {"address":"192.168.10.12", "port":10040},
        "transaction":       {"address":"172.17.0.12", "port":10001},
        "sql":               {"address":"172.17.0.12", "port":20001},
        "transactionLocal":  {"address":"192.168.10.12", "port":10001},
        "sqlLocal":          {"address":"192.168.10.12", "port":20001}
    },
          :
          :

In the provider method, information on localServiceAddress must be added to the address list returned by the provider.

[Example]

[
    {
        "cluster":           {"address":"192.168.10.11", "port":10010},
        "sync":              {"address":"192.168.10.11", "port":10020},
        "system":            {"address":"192.168.10.11", "port":10040},
        "transaction":       {"address":"172.17.0.11", "port":10001},
        "sql":               {"address":"172.17.0.11", "port":20001},
        "transactionLocal":  {"address":"192.168.10.11", "port":10001},
        "sqlLocal":          {"address":"192.168.10.11", "port":20001}
    },
    {
        "cluster":           {"address":"192.168.10.12", "port":10010},
        "sync":              {"address":"192.168.10.12", "port":10020},
        "system":            {"address":"192.168.10.12", "port":10040},
        "transaction":       {"address":"172.17.0.12", "port":10001},
        "sql":               {"address":"172.17.0.12", "port":20001},
        "transactionLocal":  {"address":"192.168.10.12", "port":10001},
        "sqlLocal":          {"address":"192.168.10.12", "port":20001}
    },
          :
          :

The connection is made from the client through the network interfaces specified in serviceAddress like a normal network configuration.

[note]

3.7 Security design

3.7.1 Access control

In security design, it is necessary to first examine common security requirements, including the number of users with access rights and the range of data access and, based on the result, design the following:

 

GridDB has "database" features which allow to separate the range of accessibility in order to set a different range of accessible data for each user.

A general user can have access to multiple databases but cannot execute a cross-database SQL query, for example, joining containers residing in different databases. Therefore, it is recommended to group all the containers that one user accesses into one database as much as possible.

Database and SQL search
Database and SQL search

user's role

user role
administrative user Creates databases and general users; Grant the general user access to the database.
general user Accesses databases to which access is given to perform operations such as data registration and search.

3.7.1.1 Operational procedures

Below are the steps to create a general user and a database and to grant access, assuming that "ALL" is granted.

  1. Connect to the cluster through an administrative user account.
$ gs_sh
gs> setcluster cluster myCluster 239.0.0.1 31999 // Set information about the cluster to connect to
gs> setuser admin admin       // Specify the name and password of the administrative user.
gs> connect $cluster          // Connect to the cluster.
gs[public]>
  1. Create a general user.
gs[public]> createuser userA password
  1. Create a database.
gs[public]> createdatabase databaseA
  1. Grant the general user access to the database.
gs[public]> grantacl ALL databaseA userA

 

3.7.2 Encryption

In GridDB, data stored in the database files is not encrypted. The same is with network node-to-node/client-to-client communication data.

It is recommended to place such data in a secure network that should never be directly accessed from the outside world.

 

3.8 Monitoring design

The following summarizes points to consider when designing monitoring for the GridDB cluster. Checking GridDB operations and resource usage is required to optimize performance and proactively avoid failures. Monitoring items are determined by the system's service level.

These three monitoring items can be monitored using Zabbix as shown in the figure below.

Example of a dashboard configuration for Zabbix
Example of a dashboard configuration for Zabbix

For information about monitoring templates for Zabbix, see the GridDB Zabbix Template Guide (GridDB_ZabbixTemplateGuide.html).

 

3.9 Backup design

3.9.1 Purposes of backup

Backups are taken to guard against data corruption resulting from multiple failures in hardware, malfunctioning applications, and other causes.

3.9.2 Points to consider when designing

Generally, backup and restore requirements are defined according to the system's service level. A list of review items is shown below:

GridDB provides the following backup/restore methods: Choose the method that satisfies the requirements defined above.

Use this backup method Recovery point Features
automatic replica creation by cluster management point immediately before the failure benefit: If the number of replicas is specified in the cluster configuration, the cluster will automatically create replicas between the nodes and manage them.
drawback: This method is not applicable when pieces of hardware crash simultaneously and the number of those pieces is equal to or more than the number of replicas, or when data is lost because of a human error.
data export using the gs_export tool point at which data is exported
*on a per-container basis
benefit: The backup size can be reduced by having only the required data backed up.
drawback: This method executes the export tool and hence may stress the currently running cluster.
drawback: Data recovery requires to reimport data.
drawback: Exports are dealt with on a per-container basis; hence, recovery points may differ for each container.
offline backup point at which the cluster stops benefit: Recovery points cannot differ from node to node.
drawback: This method requires to stop the cluster until the backup copy is complete.
Online backup by node (baseline with differential/incremental) point at which backup is taken
*on a per-node basis
benefit: Backups can be taken online using a GridDB backup command.
drawback: Recovery points may differ between nodes depending on when backups are completed.
Online backup by node (automatic log) point immediately before the failure benefit: Backups can be taken online using a GridDB backup command. This method also allows to restore to the point immediately before the failure.
drawback: This method uses transaction logs to recover data to the latest state, which can result in longer recovery time.
File system level online backup (snapshot) point at which snapshots are taken benefit: Working with snapshots of OS and storage to take backups could shorten the time to perform backups.
drawback: Even if snapshots are run on each node simultaneously, a time difference of about one second may occur between nodes, given that the transaction log write mode is set to DELAYED_SYNC.
File system level online backup (snapshot with automatic log) point immediately before the failure benefit: Working with snapshots of OS and storage to take backups could shorten the time to perform backups.
drawback: This method uses transaction logs to recover data to the latest state, which can result in longer recovery time.

 

The following table shows a guideline for selecting an appropriate backup method:

Requirements and uses of backup Use this backup method
A simultaneous crash of pieces of hardware (the number being equal to or more than the number of replicas) or data loss a caused by a human error are acceptable. automatic replica creation by cluster management
The data to be backed up can be specified. data export using the gs_export tool
The system can be stopped while running backups. offline backup
The system cannot be stopped while running backups. Online backup by node (automatic log backup)
Because of a huge size of a database file, online backups do not complete within the allotted time. File system level online backup (snapshot with automatic log)

 

3.9.3 How to back up and restore

This subsection describes each backup method, including what it is and how it works, the benefits and drawbacks, and the amount of time needed to back up and restore. For details about operations, see the GridDB Features Reference (GridDB_FeaturesReference.html).

 

3.9.3.1 Offline backup

Taking backups offline

Stop the cluster and all nodes to copy a database file, using OS commands and/or storage snapshots, or by other means. After completion of the copy in all the nodes, the system starts the nodes and the cluster.

offline backup
Offline backup
benefits Backups are taken after stopping the cluster, which results in the same recovery point for all nodes.
Backups are taken after stopping the nodes, which leads to faster restores.
drawbacks The method requires to stop the cluster and nodes; therefore, it is not applicable for systems where the services cannot be stopped from running.
backup time time taken to copy a database file
Recovery point point at which the cluster stops
recovery time time taken to copy a backup file

 

3.9.3.2 Online backup

Backups of each node are taken online while the cluster is operating.

The following two main methods are available for online backup in terms of how to create a backup file:

Online backup
Online backup

The main differences between the two methods are indicated in the table below:

method Recovery point
Online backup by the node point at which the backup per node is completed
→The amount of data to back up differs for each node. This may result in a large time difference between nodes as to when backups are completed.
File system level online backup point at which snapshot taking is completed
→All nodes can have almost the same recovery point, if, for each node, an identical value is specified for the time to take a snapshot

 

3.9.3.3 Online backup by the node

This subsection describes how to back up online by the node.

3.9.3.3.1 Full backup

The entire database is backed up.

If you indicate that full backups be taken using a backup command on a per-node basis, each node will copy all the data in databases to create a backup file.

benefits Faster restores than differential and incremental backups. (recovery time: full backup < differential backup < incremental backup)
- Backups can be taken while nodes are running.
drawbacks All data in the database files is copied; therefore, extra time is required for backups. (backup time: incremental backup < differential backup < full backup)
- To store multiple generations of backups, disk capacity equal to the size of database files multiplied by the number of generations is required.
Because the recovery point is a point at which backups per node are completed, there may be differences in recovery points between nodes.
backup time time for a node to create a backup file (depends on the size of a database file)
Recovery point point at which the backup per node is completed.
recovery time time taken to copy a backup file
3.9.3.3.2 Differential backup

A differential backup first completes a full backup (baseline) that would serve as the base (base point) of the differential and then backs up only the data that has changed since that full backup.

If you indicate that differential backups be taken using a backup command on a per-node basis, each node will copy only the data that has changed since that full backup to create a backup file.

benefits Faster backups than full backups. (backup time: incremental backup < differential backup < full backup)
- Faster restores than differential backups. (recovery time: full backup < differential backup < incremental backup)
- Backups can be taken while nodes are running.
drawbacks More time is required for backups than incremental backups.
Because the recovery point is a point at which backups per node are completed, there may be differences in recovery points between nodes.
backup time time for a node to create a backup file (depends on the amount of data updated since the baseline)
Recovery point point at which the backup per node is completed
recovery time time taken to copy a backup file
time to taken to recover blocks that are different.
3.9.3.3.3 Incremental backup

An incremental backup first completes a full backup (baseline) and then backs up only the data that has changed since the previous backup point. The previous backup point refers to a point at which a baseline backup, a differential backup, or an incremental backup is performed.

If you indicate that incremental backups be taken using a backup command on a per-node basis, each node will copy only the data that has changed since the previous backup to create a backup file.

benefits Faster backups than full and differential backups. (backup time: incremental backup < differential backup < full backup)
- Backups can be taken while nodes are running.
drawbacks Because it is required to restore all the incremental backups since the baseline or differential backups, more time is required for restores than full and differential backups. (recovery time: full backup < differential backup < incremental backup)
- Because the recovery point is a point at which backups per node are completed, there may be differences in recovery points between nodes.
backup time time for a node to create a backup file (depends on amount of the data that has changed since a previous backup)
Recovery point point at which the backup per node is completed
recovery time time taken to copy a backup file
time to taken to recover blocks that are different.

 

Automatic log backup

An automatic log backup first completes a full backup (baseline) and then collects transaction logs in a backup directory.

benefits Because the latest transaction logs are reflected on backup files, this method allows to restore to the point immediately before the failure.
drawbacks Transaction logs are automatically copied during normal operations, which burdens operations to some extent.
Extra time is required for restores.
backup time none (Once a baseline is created, logs are automatically copied. Therefore, there is no need to perform backups.)
Recovery point point immediately before the failure
recovery time time taken to copy a backup file
time taken to recover updated logs stored in transaction logs

 

3.9.3.4 File system level online backup

This subsection describes a file system level online backup.

A file system level online backup uses a snapshot feature for LVM or storage, which allows to process huge database files at a high speed. It also significantly reduces time required for backups and brings the recovery point of each node in the cluster close as much as possible.

Consider using a file system level online backup, if database files are huge.

benefits Using a snapshot feature for LVM or storage allows to process huge database files at a high speed.
Together with automatic log backups, it also allows to revert to the latest state.
drawbacks A snapshot feature for LVM or storage is required.
drawback: Even if snapshots are run on each node simultaneously, a time difference of about one second may occur between nodes, given that the transaction log write mode is set to DELAYED_SYNC.
backup time none
Recovery point point at which snapshots are taken
recovery time time taken to copy a backup file

 

4 Build

This chapter describes how to install and uninstall GridDB and configure environment settings.

 

4.1 Media and RPM configuration for GridDB

The GridDB installation media contains files under the following directory configuration:

   Linux/          RPM packages for GridDB
  misc/           How to use socat which supports access from applications outside the subnet and RPM
                   How to integrate with Integrated Monitoring Software "Zabbix" and its sample templates
  Readme.txt      Readme
  Fixlist.pdf     module revision history

The RPM packages under the Linux directory are used to install GridDB. RMP packages come in the following types:

type package name file name description
server griddb-xx-server griddb-xx-server-X.X.X-linux.x86_64.rpm contains node modules and part of operation tools.
client
(operation tool)
griddb-xx-client griddb-xx-client-X.X.X-linux.x86_64.rpm contains a variety of operation tools.
Java libraries griddb-xx-java_lib griddb-xx-java_lib-X.X.X-linux.x86_64.rpm contains Java libraries.
C libraries griddb-xx-c_lib griddb-xx-c_lib-X.X.X-linux.x86_64.rpm contains C header files and libraries.
Web API griddb-xx-webapi griddb-xx-webapi-X.X.X-linux.x86_64.rpm contains Web API applications.
Python libraries griddb-xx-python_lib griddb-xx-python_lib-X.X.X-linux.x86_64.rpm contains Python libraries.
Node.js libraries griddb-xx-nodejs_lib griddb-xx-nodejs_lib-X.X.X-linux.x86_64.rpm contains Node.js libraries.
Go libraries griddb-xx-go_lib griddb-xx-go_lib-X.X.X-linux.x86_64.rpm contains Go libraries.
NewSQL libraries griddb-xx-newsql griddb-xx-newsql-X.X.X-linux.x86_64.rpm contains NewSQL interface libraries (Java). (AE only)

* xx indicates the GridDB edition (se or ae).
* XXX indicates the GridDB version.

 

4.2 Installation

4.2.1 Checking the environment before installation

Check that the GridDB hardware and software requirements are met. For information about the requirements, see the "Release notes."

4.2.2 Time synchronization for nodes

It is assumed that OS times do not differ on the relevant node servers which configure a cluster. It is not recommended to operate while OS times differ between node servers. GridDB does not synchronize time between nodes.

If OS times differ between node servers, the following problems may occur:

For a multi-node cluster configuration, time synchronization using NTP is recommended. Use slew mode in NTP.

[note]

4.2.3 Selecting a package to install

Packages that must be installed differ depending on which module to execute as well as what the machine is used for.

purpose packages that must be installed
To run GridDB nodes server: griddb-xx-server
client (operation tool): griddb-xx-client
Java library: griddb-xx-java_lib
NewSQL library: griddb-xx-newsql (AE only)
To run an operation tool client (operation tool): griddb-xx-client
Java library: griddb-xx-java_lib
NewSQL library: griddb-xx-newsql (AE only)
To develop an application libraries that support the programming languages of the applications to be developed

[notes]

For example, to achieve the following machine configuration, the following packages must be installed on each machine:

Example machine configuration
Example machine configuration
machine use purpose packages to install
A. server machine that runs GridDB nodes To run GridDB nodes and operation tools . griddb-xx-server
griddb-xx-client
griddb-xx-java_lib
griddb-xx-newsql(AE only)
B. machine that runs operation tools remotely To remotely run operation tools on GridDB nodes and clusters. griddb-xx-client
griddb-xx-java_lib
griddb-xx-newsql(AE only)
C. application development machine To develop C applications. griddb-xx-c_lib

[notes]

 

4.2.4 Installation procedures

Depending on the package to be installed, complete one of the following steps:

step what to do In which machine to install
1 Install a package. all machines
2 Setting up the OS user machine on which the server package (griddb-xx-server)
or
the client package (griddb-xx-client) is installed
3 Set up the network environment. machine on which the server package (griddb-xx-server) is installed
4 Set up the node environment. machine on which the server package (griddb-xx-server) is installed
5 Setting up the environment for operation tools machine on which the client package (griddb-xx-client) is installed
6 Setting up the library environment machine on which library packages are installed

In the case above, complete the steps shown in callouts pointed to each of the three machines.

Installation procedure
Installation procedure

4.2.5 Installing the package

Install the packages needed on a machine where GridDB is used. Perform a package installation as a root user.

[example of executing uninstallation commands]

# cd <mount paths for a CD-ROM or DVD-ROM>/Linux/rpm
# rpm -ivh griddb-xx-server-X.X.X-linux.x86_64.rpm
preparing...                ########################################### [100%]
User gsadm and group gridstore have been registered.
GridDB uses new user and group.
   1:griddb-xx-server          ########################################### [100%]
#
# rpm -ivh griddb-xx-client-X.X.X-linux.x86_64.rpm
preparing...                ########################################### [100%]
User and group has already been registered correctly.
GridDB uses existing user and group.
   1:griddb-xx-client          ########################################### [100%]
#
# rpm -ivh griddb-xx-java_lib-X.X.X-linux.x86_64.rpm
preparing...                ########################################### [100%]
   1:griddb-xx-java_lib        ########################################### [100%]
#
# rpm -ivh griddb-xx-c_lib-X.X.X-linux.x86_64.rpm
preparing...                ########################################### [100%]
   1:griddb-xx-c_lib           ########################################### [100%]
#
  :
  :
  :

[note]

Installing the server package (griddb-xx-server) or the client package (griddb-xx-client) will automatically set up the following:

4.2.6 Setting up the OS user

This setup is required for a machine on which the server package (griddb-xx-server) or the client package (griddb-xx-client) is installed.

Installing the server package or the client package will create the OS user "gsadm." A password is not for gsadm. Set a password.

[example of executing uninstallation commands]

# passwd gsadm
Changing password for user gsadm
New UNIX password:
Retype new UNIX password:
passwd: all authentication tokens updated successfully.

4.2.7 Setting up the network environment

This setup is required for a machine on which the server package (griddb-xx-server) is installed.

To run GridDB nodes, the host name of each node server must correspond to an IP address.

First check how a host name corresponds to an IP address. Execute the command "host -i" to check the host settings and IP address settings. If the IP address is already set as shown below, the task described in this section on network settings is not required.

The error message below and the loopback address 127.0.0.1 indicate that the IP address is not set yet. Perform the steps 1 to 3 below:

 

How to set the network

  1. Check the host name and IP address set by OS.
  1. Set how a host name corresponds to an IP address.
  1. Check that the correspondence is correctly set.

 

4.2.8 Setting up the node environment

This setup is required for a machine which runs nodes. For all other machines, no setup is needed.

Set up the definition files. Set up the environment in one of the machines that run nodes first; then, distribute the definition files in that machine to other machines.

Set up the node environment.
Setting up the node environment

4.2.8.1 Setting parameters

Definition files for a node are of two types: node definition files "gs_node.json" and cluster definition files "gs_cluster.json".

 

4.2.8.2 Setting up administrative users

An administrative user is used for running operation commands and performing operations only allowed to those with administrator authority.

The installation of a package creates the following administrative user by default:

administrative user password
admin admin
system manager

For increased security, change the default password for administrative users. Use the command gs_passwd to change the password for administrative users.

[example of executing uninstallation commands]

# su - gsadm
$ gs_passwd admin
Password: (Enter the password)
Retype password: (Re-enter the password)

 

To add a new administrative user, use the command gs_adduser. The name of an administrative user should start with "gs#". The characters gs# should be followed by one or more characters which can contain only ASCII alphanumeric characters or underscores (_). The maximum number of characters allowed is 64.

[example of executing uninstallation commands]

# su - gsadm
$ gs_adduser gs#newuser
Password: (Enter the password)
Retype password: (Re-enter the password)

  

Information on the administrative user is stored in the user defined file (/var/lib/gridstore/conf/password).

[notes]

4.2.8.3 Setting services

Services automatically start nodes.

To perform cluster configuration at the same time as nodes start up, the setting of services is required. For details see the GridDB Operation Tools Reference (GridDB_OperationToolsReference.html).

4.2.8.4 Distributing definition files

Set up the environment in one of the machines that run nodes first; then, distribute the definition files in that machine to other machines.

 

4.2.9 Setting up the environment for operation tools

Set up the environment for operation tools to be used For details about how to set up each tool, see the GridDB Operation Tools Reference (GridDB_OperationToolsReference.html).

 

4.2.10 Setting up the library environment

4.2.10.1 Python libraries

To use Python libraries, install Python packages (griddb_python) using the following commands: Run the commands as gsadm user.

$ pip install /usr/griddb/lib/python

4.2.10.2 Node.js libraries

To use Node.js libraries include them in several different paths as shown below:

$ export LIBRARY_PATH=${LIBRARY_PATH}:/usr/griddb/lib
$ export NODE_PATH=/usr/griddb/lib/nodejs
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/griddb/lib/nodejs

4.2.10.3 Go libraries

To use Go libraries, include them in several different paths as shown below and build the libraries.

$ export LIBRARY_PATH=${LIBRARY_PATH}:/usr/griddb/lib
$ export GOPATH=$GOPATH:/usr/griddb/lib/go
$ export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/griddb/lib/go
$ go install griddb/go_client

4.2.10.4 Developing applications on the Windows environment

4.2.10.4.1 Developing applications using JDBC

Use the JDBC driver to develop applications using the Java language library while working in combination with applications and BI tools that access GridDB on the Windows environment.

Files for the JDBC driver are stored in the GridDB installation media for you to use. Copy the jar files in the \Windows\JDBC folder on the installation media into the folder referenced by the applications.

4.2.10.4.2 Developing applications using ODBC

Use the ODBC driver to develop applications using the C language library while working in combination with applications and BI tools that access GridDB on the Windows environment. See the GridDB Advanced Edition ODBC Driver User Guide (GridDB_AE_ODBC_Driver_UserGuide.html) to install the driver and set up its environment.

 

4.3 Uninstallation

If GridDB is no longer needed, uninstall all installed packages. To uninstall, complete the following steps:

  1. 1. Check the packages installed.
  2. 2. Uninstall them.

[example of executing uninstallation commands]

// Check the packages installed.
# rpm -qa | grep griddb
griddb-xx-server-X.X.X-linux.x86_64
griddb-xx-client-X.X.X-linux.x86_64
griddb-xx-java_lib-X.X.X-linux.x86_64

// uninstall
# rpm -e griddb-xx-server
# rpm -e griddb-xx-client
# rpm -e griddb-xx-java_lib

 

5 Operation

5.1 Responses when a failure occurs

If a problem occurs while building a system using GridDB or while using the system, determine what the symptoms of the problem are and under which conditions the problem has occurred based on action taken, the error codes, and other information. After checking the problem status, review possible solutions and apply them.

5.1.1 Responses to node failures

Node Event log files record messages about events such as exceptions occurring within a node, as well as information about system activity.

If a problem exists with node behavior, check the node event logs for error or warning messages. Because a cluster is composed of multiple nodes, make sure to check event logs for each of the nodes for such messages.

  

5.1.2 Responses to application failures

If a problem occurs while running an application that uses a client API (NoSQL/New SQL interface), check the error codes and error messages returned by the client API.

Depending on which client API is used, the steps for retrieving error codes and error messages are different. For details, see the references for each API.

See the GridDB Error Codes (GridDB_ErrorCodes.html) to determine the meaning of an error code number, check its causes, and take measures.

If the error is due to the node, not the application, check the node event log with the time when the error occurred and check the status of the node when an error occurred.

[Example] Java API

  

5.1.3 Responses to failures in operation tools

If an error occurs in an operation tool, see the log file of that operation tool to check the description of the error that has occurred.

The log file destinations for selected operation tools are shown below:

Operation tools output destination How to specify an output destination
operation commands (such as gs_joincluster) /var/lib/gridstore/log environment variable GS_LOG for gsadm user
integrated operations management GUI (gs_admin) /var/lib/gridstore/admin/log log settings file (WAS directory /webapps/gs_admin/WEB-INF/classes/logback.xml)
shell command gs_sh /var/lib/gridstore/log log settings file (/usr/griddb/prop/gs_sh_logback.xml)
parameter logPath
export/import tool /var/lib/gridstore/log property file (/var/lib/gridstore/expimp/conf/gs_expimp.properties)
parameter logPath

[Example] log file for the gs_startnode command file name: gs_startnode.log

2014-10-20 15:25:16,876 [25444] [INFO] <module>(56) /usr/bin/gs_startnode start.
2014-10-20 15:25:17,889 [25444] [INFO] <module>(156) wait for starting node. (node=127.0.0.1:10040 waitTime=0)
2014-10-20 15:25:18,905 [25444] [INFO] <module>(192) /usr/bin/gs_startnode end.

[notes]

 

5.2 Configuration management

5.2.1 Adding nodes

5.2.1.1 Points to consider when designing

One of the best practices to optimize resource placement is to start with a minimum number of machines at low cost in the initial operation of the system, and, if the system runs out of resources due to data growth, add nodes to scale out the system.

Estimate an increase in the amount of the system's data and plan the timing of adding nodes.

Node addition in proportion to the increase in data volume
Node addition in proportion to the increase in data volume

Consider scale-out if the system runs out of resources during its operation due to the following reasons:

 

5.2.1.2 Procedures for adding nodes

Before adding node, it is recommended to stop clusters if possible. It is also possible to add nodes online during the system's operation.

  1. Build a GridDB environment online during the system's operation.
  1. Modify settings for applications and operation tools, and cluster configuration scripts.
  1. Add nodes.

5.2.2 Changing parameters

Set the behavior of nodes and clusters using node definition files (gs_node.json) and cluster definition files (gs_cluster.json). Three cases are possible as to whether parameter change is allowed after starting nodes or while running the cluster.

allowed or not description
change not allowed Once nodes are started, setup values cannot be changed. To change setup values, database must be initialized.
change allowed (restart ) Change setup values in the definition files and restart the nodes to apply new values.
change allowed (online) Parameters can be changed online while nodes and the cluster are running. Use the command gs_paramconf to change them. Changes made online are not permanent. To make them permanent, manually modify the corresponding definition files. Otherwise, parameters are reset to the original value after restart.

Procedures for changing parameters online

  1. Change parameters using the gs_paramconf command for changing parameters.
  1. Verify that parameters have been changed.
  1. Manually modify the definition files specifying parameters that have been changed online.

[notes]

 

5.3 Migrating data

5.3.1 How to migrate data to a new machine

This section describes how to build a new cluster having the same settings as the currently working cluster to replace a machine or for other purposes.

In such cases, data migration can be completed just by stopping nodes and physically copying the database files to a new machine

  1. Install GridDB on a new machine.
  1. Copy definition files.
  1. Stop the currently working cluster.
  1. Copy database files.
  1. Start the cluster on a new machine.

 

5.3.2 How to migrate data if the existing database is not compatible

If the current database becomes incompatible after upgrading GridDB or for other reasons, make sure to perform data migration by exporting and importing data.

Exporting and importing may take time depending on the amount of data and the performance of disk I/O.

  1. Export all data in the cluster.
  1. Perform an upgrade installation to install GridDB.

  2. Import all data.