Monitoring Guide


Red Hat Gluster Storage 3.3

Monitoring Gluster Cluster

Rakesh Ghatvisave

Red Hat Customer Content Services

Abstract

This guide provides essential information on how to import a Gluster cluster using Red Hat Gluster Storage Web Administration and how to monitor your Gluster cluster health, performance, and status. The monitoring and metrics visualization is provided by the Grafana monitoring platform which is integrated into the Web Administration interface.

Chapter 1. Overview

Red Hat Gluster Storage Web Administration provides visual monitoring and metrics infrastructure for Red Hat Gluster Storage 3.3.1 and is the primary method to monitor your Red Hat Gluster Storage environment. The Red Hat Gluster Storage Web Administration is based on the Tendrl upstream project and utilizes Ansible automation for installation. The key goal of Red Hat Gluster Storage Web Administration is to provide deep metrics and visualization of Red Hat Storage Gluster clusters and the associated physical storage elements such as storage nodes, volumes, and bricks.

Key Features

  1. Monitoring dashboards for Clusters, Hosts, Volumes, and Bricks
  2. Top-level list views of Clusters, Hosts, and Volumes
  3. SNMPv3 Configuration and alerting
  4. User Management
  5. Importing Gluster cluster

Chapter 2. Import Cluster

2.1. Importing Cluster

The following procedure outlines the steps to import a Gluster cluster.

Procedure. Importing Cluster

  1. Log in the Web Administration interface.

    Figure 2.1. Login Page

    Login Page
  2. In the default landing interface of the Web Administration, click Import Cluster.

    Figure 2.2. Import Cluster

    Import Cluster
  3. The available hosts are listed. By default, Volume profiling is enabled. Click Import to continue.

    Figure 2.3. Available Hosts

    Import Cluster
  4. The cluster import request is submitted. To view the task progress, click View Task Progress.

    Figure 2.4. Import Cluster Submitted

    Import Cluster
  5. The import cluster task is completed.

    Figure 2.5. Task Detail

    Import Cluster
  6. Navigate to the Clusters tab. The Cluster is successfully imported and ready for use.

Figure 2.6. Cluster Ready

Cluster Ready
Important

Before initiating Web Administration installation using tendrl-ansible, it is highly recommended to create the Gluster Cluster to be imported first in the absence of an existing one. If the Gluster cluster is created after installation of Web Administration, the cluster import operation may fail. In case of Web Administration manual installation, create the Gluster cluster first and then subsequently install the tendrl-node-agent to avoid cluster import failure.

2.1.1. Volume Profiling

Volume profiling enables additional telemetry information to be collected on a per volume basis for a given cluster, which helps in troubleshooting, capacity planning, and performance tuning.

Volume profiling can be enabled or disabled on a per cluster basis when a cluster is actively managed and monitored using the Web Administration interface.

Note: Enabling volume profiling results in richer set of metrics being collected which may cause performance degradation to occur as system resources, for example, CPU and memory, may get used for volume profiling data collection.

Volume profiling is enabled by default and is seen on the Discovered Hosts interface after clicking the Import Cluster button.

To disable Volume Profiling after the cluster is imported, following these instructions:

  1. Navigate to the Clusters menu from the navigation pane and locate the Cluster. At the right-hand side, click Disable Profiling.

    Figure 2.7. Disable Volume Profiling

    Disable Volume Profiling
  2. A notification appears confirming Volume Profiling is successfully disabled.

Figure 2.8. Volume Profiling Disabled

Volume Profiling Disabled

Chapter 3. Cluster Expansion

To expand an existing Gluster cluster already imported and managed by the Web Administration environment, perform the following sequence of actions:

  1. Unmanage the existing cluster from Web Administration.
  2. Expand the Gluster storage nodes.
  3. Install the Web Administration components via tendrl-ansible.
  4. Reimport the cluster in the Web Administration environment.

3.1. Unmanaging Cluster

Warning

For all the clusters that are currently managed by Web Administration, un-managing one cluster will result in all the clusters being un-managed. You will have to re-import any cluster that is required.

Unmanage a cluster from Web Administration

  1. Stop and uninstall all tendrl-* services and Collectd on the storage nodes monitored by Web Administration. See Commands for Stopping Tendrl Services section for the commands required to stop the services.
  2. Stop all tendrl-* and related services like Grafana or Graphite on Tendrl server. See Commands for Stopping Tendrl Services section for the commands required to stop the services.
  3. Backup and delete data directories from the Web Administration server for Graphite and Carbon services to ensure stale metrics do not persist. See Commands for Deleting Database Files section for the commands required to remove database files.
  4. Backup and uninstall Etcd from Web Administration server and delete all data from etcd. See Commands for Deleting Database Files section for the commands required to remove database files.

    Note

    Delete the etcd %data_dir from all members of the etcd cluster. For more details, see Data Directory Lifecycle documentation.

3.1.1. Commands for Stopping Tendrl Services

Stopping Services on Storage Nodes

Run the following commands to stop individual services on the Storage Nodes:

To stop the tendrl-node-agent service:

# service tendrl-node-agent stop

To stop the collectd service:

# service collectd stop

To stop the tendrl-gluster-integration service:

# service tendrl-gluster-integration stop

Stopping services on Tendrl server

Run the following commands to stop individual services on the Tendrl server.

To stop the etcd service:

# service etcd stop

To stop the tendrl-monitoring-integration service:

# service tendrl-monitoring-integration stop

To stop the tendrl-api service:

# service tendrl-api stop

To stop the tendrl-notifier service:

# service tendrl-notifier stop

To stop the carbon-cache service:

# service carbon-cache stop

To stop the grafana-server service:

# service grafana-server stop

To stop the tendrl-node-agent service:

# service tendrl-node-agent stop

3.1.2. Commands for Deleting Database Files

The default paths mentioned here are for Etcd, Carbon and Grafana and are not for Tendrl services. The data_dir can be set to any directory as part of their configuration. Please ensure to remove the data_dir from the correct path.

Commands for Removing the database files from Server Node:

To remove Etcd files:

# rm -rf /var/lib/etcd/*

To remove Carbon files which will remove the monitoring data:

# rm -rf /var/lib/carbon/whisper/*

To remove Grafana files:

# rm -rf /var/lib/grafana/grafana.db

3.2. Expanding Storage Nodes

To expand the cluster, see the Expanding Volumes section in the Red Hat Gluster Storage Administration Guide.

  1. After the cluster is expanded, install Web Administration. For detailed intallation instructions, see the Installing Web Administration chapter in the Red Hat Gluster Storage Web Administration Quick Start Guide.
  2. Start the following services on the Storage nodes:

    To start the tendrl-node-agent service:

    # service tendrl-node-agent start

    To start the collectd service:

    # service collectd start

3.3. Reimporting Cluster

Once the services are started, import the cluster in Web Administration. For instructions on importing cluster, see the Import Cluster chapter of this Guide.

Chapter 4. Monitoring and Metrics

Gluster Web Administration provides deep metrics and visualization of Gluster clusters, the physical server nodes and the storage elements (disks) through the Grafana open-source monitoring platform.

Chapter 5. Monitoring Dashboard and Concepts

The Monitoring Dashboard provides high level visual information on health, performance and utilization of cluster wide resources.

5.1. Dashboard Selector

The Dashboard Selector is the primary navigation tool to move between different dashboards.

Figure 5.1. Dashboard Selector

Dashboard Selector

5.2. Dashboard Panels

The Dashboard is composed of individual visualization blocks displaying different metrics and statistics termed as Panels. The panels exhibit different colors based on the current status of the metrics. Panels can be dragged and dropped and rearranged on the Dashboard.

Figure 5.2. Dashboard Panels

Dashboard Panels

There are following types of panels available to visualize monitoring data:

  • Graph: The Graph panel allows to visualize unrestrained amounts of metrics. The Connection Trend and the Throughput Trend are examples of Graph panel.

Figure 5.3. Graph Panel Example

Graph Panel Example
  • Singlestat: The Singlestat panel displays the aggregated value of a series in a single number data. For example, the Health, volume, snapshots are Singlestat panels.

Figure 5.4. Singestat Panel Example

Singestat Panel Example

5.3. Dashboard Rows

A row is a logical divider in a given Dashboard. The panels of the dashboard are arranged and organized in rows to give a streamlined look and visual.

5.4. Dashboard Color Codes

The Dashboard panels text displays the following color codes to represent health status information:

  • Green: Healthy
  • Orange: Degraded
  • Red: Unhealthy, Down, or Unavailable

Chapter 6. Monitoring Dashboard Features

6.2. Dashboard Time Range

The Grafana interface provides time range management of the the data being visualized. You can change the time range for a graph to view the data at different points in time

At the top right, you can access the master Dashboard time picker. It shows the current selected time range and the refresh interval.

time range menu

Clicking the master Dashboard time picker, toggles a menu for time range controls.

time control

Time range

The time range filter allows to mix both explicit and relative time ranges. The explicit time range format is YYYY-MM-DD HH:MM:SS.

Quick Range

Quick ranges are preset values to choose a relative time.

Refreshing every

When enabled, auto-refresh will reload the dashboard at the specified time range.

6.3. Dashboard Sharing

The Dashboard Selector is the primary navigation tool to move between different dashboards.

Figure 6.2. Dashboard Selector

Dashboard Selector

Chapter 7. Monitoring Dashboard Navigation

To access the Monitoring Dashboard, follow these steps:

  1. Log in to the Web Administration interface at http://web-admin-server.example.com.
  2. In the default Cluster view, locate the cluster and click Launch Dashboard.
  3. The Cluster dashboard showing the aggregated metrics view is opened in a new window.

7.1. Cluster View Dashboard

The Cluster view dashboard allows the Gluster Administrator to:

  • View at-a-glance information about the Gluster cluster that includes health and status information, key performance indicators such as IOPS, throughput, etc, and alerts that can highlight attention to potential issues in the cluster, host, volume, and brick.
  • Compare a metric such as IOPS, CPU, Memory, Network Load across hosts within the cluster.
  • Compare utilization across bricks within a volume, for example, IOPS, capacity, etc.

Figure 7.1. Cluster View Dashboard

Cluster View Dashboard

7.2. Hosts View Dashboard

The Host view Dashboard allows the Gluster Administrator to:

  • View at-a-glance information about the Gluster host that includes health and status information, key performance indicators such as IOPS, throughput, etc and alerts that highlights attention to potential issues in the host, volume, brick, and disk.
  • Compare one or more metrics such as IOPS, CPU, Memory, Network Load across bricks within the host.
  • Compare utilization such as IOPS, capacity, etc across bricks within a host.
host view dash

7.3. Volume View Dashboard

The Volume view Dashboard allows the Gluster Administrator to:

  • View at-a-glance information about the Gluster Volume that includes health and status information, key performance indicators that highlights attention to potential issues in the volume, brick, and disk.
volume view dash

7.4. Brick View Dashboard

The Brick view dashboard allows the Gluster Administrator to:

  • View at-a-glance information about the Gluster brick that includes health and status information, key performance indicators such as IOPS, throughput, latency, etc and alerts that can highlight attention to potential issues in the brick and underlying disks.
  • Look at performance by brick to address diagnosing of RAID 6 disk failure/rebuild/degradation poor performance on one brick.
brick view dash

Chapter 8. Monitoring Cluster Metrics

8.1. Cluster Level Dashboard

This is the default dashboard of the Monitoring interface that shows the overview of the selected cluster.

8.1.1. Monitoring and Viewing Cluster Health

To monitor the Cluster health status and the metrics associated with it, view the panels in the Cluster Dashboard. For detailed panel descriptions and health indicators, see Table 7.1. Cluster Health Panel Descriptions.

8.1.1.1. Health and Snapshots

The Health panel displays the overall health of the selected cluster and the Snapshots panel shows the active number of snapshots.

health snapshot
8.1.1.2. Hosts, Volumes and Bricks

The Hosts, Volumes, and Bricks panels displays status information. The following is an example screen displaying the respective status information.

hosts volumes bricks
  • Hosts: In total, there are 6 Hosts, out of which 1 is offline.
  • Volumes: In total, there are zero Volumes
  • Bricks: In total, there are 12 Bricks, out of which, 2 are offline.
8.1.1.3. Geo-Replication Session

The Geo-Replication Session panel displays geo-replication session information from a given cluster, including the total number of geo-replication session and a count of geo-replication sessions by status.

geo replication session
8.1.1.4. Health Panel Descriptions

The following table lists the Panels and the descriptions.

Table 8.1. Cluster Health Panel Descriptions
PanelDescriptionHealth Indicator

Health

The Health panel displays the overall health of the selected cluster, which is either Healthy or Unhealthy

Green: Healthy

Red: Unhealthy

Orange: Degraded

Snapshots

The Snapshots panel displays the count of the active snapshots

 

Hosts

The Hosts panel displays host status information including the total number of hosts and a count of hosts by status

 

Volume

The Volumes panel displays volume status information for the selected cluster, including the total number of volumes and a count of volumes by status

 

Bricks

The Bricks panel displays brick status information for the selected cluster, including the total number of bricks in the cluster, and a count of bricks by status

 

Geo-Replication Session

The Geo-Replication Session panel displays geo-replication session information from a given cluster, including the total number of geo-replication session and a count of geo-replication sessions by status

 

8.1.2. Monitoring and Viewing Cluster Performance

Cluster performance metrics can be monitored by the data displayed in the following panels.

Connection Trend

The Connection Trend panel displays the total number of client connections to bricks in the volumes for the selected cluster over a period of time. Typical statistics may look like this:

cluster connection trend

IOPS

The IOPS panel displays IOPS for the selected cluster over a period of time. IOPS is based on the aggregated brick level read and write operations collected using gluster volume profile info.

cluster iops

Capacity Utilization and Capacity Available

The Capacity Utilization panel displays the capacity utilized across all volumes for the selected cluster.

The Capacity Available panel displays the available capacity across all volumes for the selected cluster.

capacity uti available

Weekly Growth Rate

The Weekly Growth Rate panel displays the forecasted weekly growth rate for capacity utilization computed based on daily capacity utilization.

weekly growth rate

Weeks Remaining

The Weeks Remaining panel displays the estimated time remaining in weeks till volumes reach full capacity based on the forecasted Weekly Growth Rate.

weeks remaining

Throughput Trend

The Throughput Trend panel displays the network throughput for the selected cluster over a period of time.

throughput trend

8.1.3. Top Consumers

The Top Consumers panels displays the highest capacity utilization by the cluster resources.

To view the top consumers of the cluster:

  1. In the Cluster level dashboard, at the bottom, click Top Consumers to expand the menu.

    cluster status bottom

Top 5 Utilization By Bricks

The Top 5 Utilization By Bricks panel displays the bricks with the highest capacity utilization.

top bricks

Top 5 Utilization by Volume

The Top 5 Utilization By Volumes panel displays the volumes with the highest capacity utilization.

top volume

CPU Utilization by Host

The CPU Utilization by Host panel displays the CPU utilization of  each node in the cluster.

top cpu host

Memory Utilization By Host

The Memory Utilization by Hosts panel displays memory utilization of  each node in the cluster.

top memory host

Ping Latency Trend

The Ping Latency Trend panel displays the ping latency for each host in a given cluster.

ping latency trend

8.1.4. Monitoring and Viewing Cluster Status

To view the status of the overall cluster:

  1. In the Cluster level dashboard, at the bottom, click Status to expand the menu.

    cluster status bottom
  2. The Volume, Host, and Brick status are displayed in the panels.
expanded cluster status

Volume Status

The Volume Status panel displays the status code of each volume for the selected cluster.

voume cluster status

The volume status is displayed in numerals and colors. The following are the corresponding status of the numerals.

  • 0 = Up
  • 3 = Up (Degraded)
  • 4 = Up (Partial)
  • 5 = Unknown
  • 8 = Down

Host Status

The Host Status panel displays the status code of each host for the selected cluster.

host cluster status

The Host status is displayed in numeric codes:

  • 0 = Up
  • 8 = Down

Brick Status

The Brick Status panel displays the status code of each brick for the selected cluster.

brick cluster status

The Brick status is displayed in numeric codes:

  • 1 = Started
  • 10 = Stopped

8.2. Host Level Dashboard

8.2.1. Monitoring and Viewing Health and Status

To monitor the Cluster Hosts status and the metrics associated with it, navigate to the Hosts Level Dashboard and view the panels.

Health

The Health panel displays the overall health for a given host.

host health

Bricks and Bricks Status

The Bricks panel displays brick status information for a given host, including the total number of bricks in the host, and a count of bricks by status.

host bricks

The Brick Status panel displays the status code of each brick for a given host.

hosts bricks status
  • 1 = Started
  • 10 = Stopped

8.2.2. Monitoring and Viewing Performance

8.2.2.1. Memory and CPU Utilization

Memory Available

The Memory Available panel displays the sum of memory free and memory cached.

memeory available

Memory Utilization

The Memory Utilization panel displays memory utilization percentage for a given host that includes buffers and caches used by the kernel over a period of time.

memeory utilization
  • Buffered: Amount of memory used for buffering, mostly for I/O operations
  • Cached: Memory used for caching disk data for reads, memory-mapped files or tmpfs data
  • Slab Rec: Amount of reclaimable memory used for slab kernel allocations
  • Slab Unrecl: Amount of unreclaimable memory used for slab kernel allocations
  • Used: Amount of memory used, calculated as Total - Free (Unused Memory) - Buffered - Cache
  • Total: Total memory used

Swap Free

The Swap Free panel displays the available swap space in percent for a given host.

swap free

Swap Utilization

The Swap Utilization panel displays the used swap space in percent for a given host.

swap utilization

CPU Utilization

The CPU utilization panel displays the CPU utilization for a given host over a period of time.

cpu uti host

IOPS

The IOPS panel displays IOPS for a given host over a period of time.  IOPS is based on the aggregated brick level read and write operations.

host iops
8.2.2.2. Capacity and Disk Load

Total Brick Capacity Utilization Trend

The Total Brick Capacity Utilization Trend panel displays the capacity utilization for all bricks on a given for a period of time.

brick cap trend

Total Brick Capacity Utilization

The Total Brick Capacity Utilization panel displays the current percent capacity utilization for a given host.

brick cap uti

Total Brick Capacity Available

The Total Brick Capacity Available panel displays the current available capacity for a given host.

brick cap avail

Weekly Growth Rate

The Weekly Growth Rate panel displays the forecasted weekly growth rate for capacity utilization computed based on daily capacity utilization.

weekly growth

Weeks Remaining

The Weeks Remaining panel displays the estimated time remaining in weeks till host capacity reaches full capacity based on the forecasted Weekly Growth Rate.

week remaining

Brick Utilization

The Brick Utilization panel displays the utilization of each brick for a given host.

host brick uti

Brick Capacity

The Brick Capacity panel displays the total capacity of each brick for a given host.

host bricks cap

Brick Capacity Used

The Brick Capacity Used panel displays the used capacity of each brick for a given host.

host brick used

Disk Load

The Disk Load panel shows the host’s aggregated read and writes from/to disks over a period of time.

disk load

Disk Operation

The Disk Operations panel shows the host’s aggregated read and writes disk operations over a period of time.

disk ops

Disk IO

The Disk IO panel shows the host’s aggregated I/O time over a period of time.

disk io
8.2.2.3. Network

Throughput

The Throughput panel displays the network throughput for a given host over a period of time.

network trhoughout

Dropped Packets Per Second

The Dropped Packets Per Second panel displays dropped network packets for the host over a period of time.  Typically, dropped packets indicates network congestion, for example, the queue on the switch port your host is connected to is full and packets are dropped because it cannot transmit data fast enough.

dropped packet

Errors Per Second

The Errors Per Second panel displays network errors for a given host over a period of time. Typically, the errors indicate issues that occurred while transmitting packets due to carrier errors (duplex mismatch, faulty cable), fifo errors, heartbeat errors, and window errors, CRC errors too short frames, and/or too long frames. In short, errors typically result from faulty hardware, and/or speed mismatch.

errors per second

8.2.3. Host Dashboard Metric Units

The following table shows the metrics and their corresponding measurement units.

Table 8.2. Host Dashboard Metric Units
MetricsUnits

Memory Available

Megabyte/Gigabyte/Terabyte

Memory Utilization

Percentage %

Swap free

Percentage %

Swap Utilization

Percentage %

CPU Utilization

Percentage %

Total Brick Capacity Utilization

Percentage %

Total Brick Capacity

MB/GB/TB

Weekly Growth Rate

MB/GB/TB

Disk Load

kbps

Disk IO

millisecond ms

Network Throughput

kbps

8.3. Volume Level Dashboard

The Volume view dashboard allows the Gluster Administrator to:

  • View at-a-glance information about the Gluster volume that includes health and status information, key performance indicators such as IOPS, throughput, etc, and alerts that can highlight attention to potential issues in the volume, brick, and disk.
  • Compare 1 or more metrics such as IOPS, CPU, Memory, Network Load across bricks within the volume.
  • Compare utilization such as IOPS, capacity, etc, across bricks within a volume.
  • View performance metrics by brick (within a volume) to address diagnosing of failure, rebuild, degradation, and poor performance on one brick.

8.3.1. Monitoring and Viewing Health

Health

The Health panel displays the overall health for a given volume.

volume health

Snapshots

The Snapshots panel displays the count of active snapshots for the selected cluster.

volume snapshot

Brick Status

The Brick Status panel displays the status code of each brick for a given volume.

volume brick status
  • 1 = Started
  • 10 = Stopped

Bricks

The Bricks panel displays brick status information for a given volume, including the total number of bricks in the volume, and a count of bricks by status.

volume bricks

Subvolumes

The Subvolumes panel displays subvolume status information for a given volume.

subloume

Geo-Replication Sessions

The Geo-Replication Session panel displays geo-replication session information from a given volumes, including the total number of geo-replication session and a count of geo-replication sessions by status.

geo rep status

Rebalance

The Rebalance panel displays rebalance progress information for a given volume, which is applicable when rebalancing is underway.

reblance

Rebalance Status:

The Rebalance Status panel displays the status of rebalancing for a given volume, which is applicable when rebalancing is underway.

rebalance status

8.3.2. Monitoring and Viewing Performance

Capacity Utilization

The Capacity Utilization panel displays the used capacity for a given volume.

vol capacity uti

Capacity Available

The Capacity Available panel displays the available capacity for a given volume.

vol cap avail

Weekly Growth Rate

The Weekly Growth Rate panel displays the forecasted weekly growth rate for capacity utilization computed based on daily capacity utilization.

vol weekly growth rate

Weeks Remaining

The Weeks Remaining panel displays the estimated time remaining in weeks till volume reaches full capacity based on the forecasted Weekly Growth Rate.

weeks remaining

Capacity Utilization Trend

The Capacity Utilization Trend panel displays the volume capacity utilization over a period of time.

caputi

Inode Utilization

The Inode Utilization panel displays inodes used for bricks in the volume over a period of time.

inode uti

Inode Available

The Inode Available panel displays inodes free for bricks in the volume.

inode avail

Throughput

The Throughput panel displays volume throughput based on brick-level read and write operations fetched using gluster volume profile.

through

LVM Thin Pool Metadata %

The LVM Thin Pool Metadata % panel displays the utilization of LVM thin pool metadata for a given volume. Monitoring the utilization of LVM thin pool metadata and data usage is important to ensure they do not run out of space. If the data space is exhausted, I/O operations are either queued or failing based on the configuration. If metadata space is exhausted, you will observe error I/O’s until the LVM pool is taken offline and repair is performed to fix potential inconsistencies. Moreover, due to the metadata transaction being aborted and the pool doing caching there might be uncommitted (to disk) I/O operations that were acknowledged to the upper storage layers (file system) so those layers will need to have checks/repairs performed as well.

lvm metadata

LVM Thin Pool Data Usage %

The LVM Thin Pool Data Usage % panel displays the LVM thin pool data usage for a given volume. Monitoring the utilization of LVM thin pool metadata and data usage is important to ensure they do not run out of space. If the data space is exhausted , I/O operations are either queued or failing based on the configuration. If metadata space is exhausted, you will observe error I/O’s until the LVM pool is taken offline and repair is performed to fix potential inconsistencies. Moreover, due to the metadata transaction being aborted and the pool doing caching there might be uncommitted (to disk) I/O operations that were acknowledged to the upper storage layers (file system) so those layers will need to have checks/repairs performed as well.

lvm usage

8.3.3. Monitoring File Operations

Top File Operations

The Top File Operations panel displays the top 5 FOP (file operations) with the highest % latency, wherein the % latency is the fraction of the FOP response time that is consumed by the FOP.

top fop

File Operations for Locks Trend

The File Operations for Locks Trend panel displays the average latency, maximum latency, call rate for each FOP for Locks over a period of time.

fop trend

File Operations for Read/Write

The File Operations for Read/Write panel displays the average latency, maximum latency, call rate for each FOP for Read/Write Operations over a period of time.

fop wr

File Operations for Inode Operations

The File Operations for Inode Operations panel displays the average latency, maximum latency, call rate for each FOP for Inode Operations over a period of time.

dop inode

File Operations for Entry Operations

The File Operations for Entry Operations panel displays the average latency, maximum latency, call rate for each FOP for Entry Operations over a period of time.

fop entry

8.3.4. Volume Dashboard Metric Units

The following table shows the metrics and their corresponding measurement units.

Table 8.3. Volume Dashboard Metric Units
MetricsUnits

Capacity Utilization

Percentage %

Capacity Available

Megabyte/Gigabyte/Terabyte

Weekly Growth Rate

Megabyte/Gigabyte/Terabyte

Capacity Utilization Trend

Percentage %

Inode Utilization

Percentage %

Lvm Thin Pool Metadata

Percentage %

Lvm Thin Pool Data Usage

Percentage %

File Operations for Locks Trend

MB/GB/TB

File Operations for Read/Write

K

File Operations for Inode Operation Trend

K

File Operations for Entry Operations

K

8.4. Brick Level Dashboard

8.4.1. Monitoring and Viewing Brick Status

The Status panel displays the status for a given brick.

brick status

8.4.2. Monitoring and Viewing Brick Performance

Capacity Utilization

The Capacity Utilization panel displays the percentage of capacity utilization for a given brick.

br cap uti

Capacity Available

The Capacity Available panel displays the available capacity for a given volume.

brick cap uti

Capacity Utilization Trend

The Capacity Utilization Trend panel displays the brick capacity utilization over a period of time.

brick capa uti

Weekly Growth Rate

The Weekly Growth Rate panel displays the forecasted weekly growth rate for capacity utilization computed based on daily capacity utilization.

br week rate

Weeks Remaining

The Weeks Remaining panel displays the estimated time remaining in weeks till brick reaches full capacity based on the forecasted Weekly Growth Rate.

reaminig weeks

Healing

The Healing panel displays healing information for a given volume based on healinfo.

volume healing
Note

The Healing panel will not show any data for volumes without replica.

Inode Utilization

The Inode Utilization panel displays inodes used for a given brick over a period of time.

inode util

Inode Available

The Inode Available panel displays inodes free for a given brick.

inode availa

LVM Thin Pool Metadata %

The LVM Thin Pool Metadata % panel displays the utilization of LVM thin pool metadata for a given brick. Monitoring the utilization of LVM thin pool metadata and data usage is important to ensure they don’t run out of space. If the data space is exhausted , I/O operations are either queued or failing based on the configuration. If metadata space is exhausted, you will observe error I/O’s until the LVM pool is taken offline and repair is performed to fix potential inconsistencies. Moreover, due to the metadata transaction being aborted and the pool doing caching there might be uncommitted (to disk) I/O operations that were acknowledged to the upper storage layers (file system) so those layers will need to have checks/repairs performed as well.

lvm meta

LVM Thin Pool Data Usage %

The LVM Thin Pool Data Usage % panel displays the LVM thin pool data usage for a given brick. Monitoring the utilization of LVM thin pool metadata and data usage is important to ensure they don’t run out of space. If the data space is exhausted , I/O operations are either queued or failing based on the configuration. If metadata space is exhausted, you will observe error I/O’s until the LVM pool is taken offline and repair is performed to fix potential inconsistencies. Moreover, due to the metadata transaction being aborted and the pool doing caching there might be uncommitted (to disk) I/O operations that were acknowledged to the upper storage layers (file system) so those layers will need to have repairs performed as well.

lvm pool

Throughput

The Throughput panel displays brick-level read and write operations fetched using “gluster volume profile.”

brick throuout

IOPS

The IOPS panel displays IOPS for a brick over a period of time.  IOPS is based on brick level read and write operations.

brick iops

Latency

The Latency panel displays latency for a brick  over a period of time.  Latency is based on the average amount of time a brick spends doing a read or write operation.

br latency

8.4.3. Brick Dashboard Metric Units

The following table shows the metrics and their corresponding measurement units.

Table 8.4. Brick Dashboard Metric Units
MetricsUnits

Capacity Utilization

Percentage %

Capacity Available

Megabyte/Gigabyte/Terabyte

Weekly Growth Rate

Megabyte/Gigabyte/Terabyte

Capacity Utilization Trend

Percentage %

Inode Utilization

Percentage %

Lvm Thin Pool Metadata

Percentage %

Lvm Thin Pool Data Usage

Percentage %

Disk Throughput

Percentage %

Chapter 9. Users and Roles Administration

9.1. User Roles

There are three user roles available for Web Administration.

  1. Admin: The Admin role gives complete rights to the user to manage all Web Administration operations.
  2. Normal User: The Normal User role authorizes the user to perform operations such as importing cluster and enabling or disabling volume profiling but restricts managing users and other administrative operations.
  3. Read-only User: Read-only: The Read-only User role authorizes the user to only view and monitor cluster-wide metrics and readable data. The user can launch Grafana dashboards from the Web Administration interface but is restricted to perform any storage operations. This role is suited for users performing monitoring tasks.

9.2. Configuring Roles

To add and configure a new user, follow these steps:

  1. Log In the Web Administration interface and in navigation pane, click Admin > Users.
  2. The users list is displayed. To add a new user, click Add at the right-hand side.

    add user
  3. Enter the user information in the given fields. To enable or disable email notifications, toggle the ON-OFF button.

    add user1
  4. Select a Role from the available three roles and click Save.

    add user3
  5. The new user is successfully created.
add user4

9.2.1. Editing Users

To edit an existing user:

  1. Navigate to the user view by clicking Admin > Users from the interface navigation.
  2. Locate the user to be edited and click Edit at the right-hand side.

    add user4
  3. Edit the required information and click Save.
edit user

9.2.2. Disabling Notifications and Deleting User

Enabling and Disabling Notifications

To enable notifications:

  1. Navigate to the user view by clicking Admin > Users from the interface navigation.

    add user4
  2. Click the vertical elipsis next to the Edit button and click Disable Email Notification from the callout menu.

    disable notif
  3. Email notification is successfuly disabled for the user.
notification disabled

Deleting User

To delete an existing user:

  1. Navigate to the user view by clicking Admin > Users from the interface navigation.
  2. Locate the user to be deleted and click the vertical elipsis next to the Edit button.

    disable notif
  3. From the callout menu, click Delete User.
  4. A confirmation box appears. Click Delete.

    delete use

Chapter 10. Alerts and Notifications

Alerts are current problems and critical conditions that occur in the system and notified to the user. The Grafana monitoring platform generates alerts based on severity levels.

You can configure alerts via SMTP and SNMP protocols. SMTP configuration will send email alerts to users that have email notifications enabled. SNMPv3 configuration will send SNMP trap alerts to the Alerts notifications drawer of the Web Administration environment.

10.1. Types of Alerts

The alerts triggered by the dashboard are classified in the following categories:

  • Status alerts : Alerts arising when a cluster resource undergoes a change of state. For example, Healthy to Unhealthy.
  • Utilization alerts: Alerts arising after a cluster resource exceed the set threshold and after it reverts to the normal state. For example, when the Host CPU utilization is breached, an alert is triggered notifying the user about the event.

10.2. List of Alerts

The list of Web Administration alerts are given in the tables below.

Status Alerts

Table 10.1. Status Alerts
AlertSystem Resource

volume status

Volume and Cluster

volume state

Volume and Cluster

brick status

Volume, Host, and Cluster

peer status

Cluster

rebalance status

Volume and Cluster

Geo-Replication status

Cluster

quorum of volume lost

Volume and Cluster

quorum of volume regained

Volume and Cluster

svc connected

Cluster

svc disconnected

Cluster

minimum number of bricks not up in EC subvolume

Volume and Cluster

minimum number of bricks up in EC subvolume

Volume and Cluster

afr quorum met for subvolume

Volume and Cluster

afr quorum fail for subvolume

Volume and Cluster

afr subvolume up

Volume and Cluster

afr subvolume down

Volume and Cluster

Utilization Alerts

Table 10.2. Utilization Alerts
AlertSystem Resource

cpu utilization

Host

memory utilization

host

swap utilization

host

volume utilization

Volume and Cluster

brick utilization

Volume and Cluster

10.3. Alerts Notifications Drawer

Alerts drawer is a notification delivery utility embedded in the Web Administration interface to display the system wide alerts.

Accessing Alerts Drawer

  1. To access the Alerts drawer, log in the Web Administration interface. In the default landing interface, locate and click on the interactive bell icon on the header bar at the top right-hand side.

    headerbar
  2. The drawer is opened displaying the number of alerts generated.
alerts drawer

To filter alerts, click on the status icons at the right.

alerts filter

If the alert message is truncated and not viewable, hover over the alert message and a dialogue box will open displaying the complete alert message.

alerts truncated

10.4. SMTP Notifications Configuration

To configure SMTP email notifications, install and configure tendrl-notifier first.

  1. Install tendrl-notifier:

    yum install tendrl-notifier
  2. Open the /etc/tendrl/notifier/notifier.conf.yaml file and update:

    etcd_connection: <FQDN of etcd server>
    Note

    Ensure to use FQDNs for volumes creation as Web Administration does not support short hostnames. Volumes already created in the Gluster clusters using short names or IP addresses will display inconsistent data in the Web Administration interface.

After the tendrl-notifier file is configured, configure SMTP email notifications:

  1. Open the /etc/tendrl/notifier/email.conf.yaml file
  2. Update the parameters:

    email_id = <The sender email id>
    
    email_smtp_server = <The smtp server>
    
    email_smtp_port = <The smtp port>
  3. If the SMTP server supports only authenticated email, follow the template in the /etc/tendrl/notifier/email_auth.conf.yaml.sample file and accordingly enable the following:
    auth = <ssl/tls>

    email_pass = <password corresponding to email_id for authenticating to smtp server>

10.5. SNMPv3 Notification Configuration

Configure SNMP

To configure SNMPv3 trap notifications, install and configure tendrl-notifier first.

  1. Install tendrl-notifier:

    yum install tendrl-notifier
  2. Open the /etc/tendrl/notifier/notifier.conf.yaml file and update:

    etcd_connection: <FQDN of etcd server>
    Note

    Ensure to use FQDNs for volumes creation as Web Administration does not support short hostnames. Volumes already created in the Gluster clusters using short names or IP addresses will display inconsistent data in the Web Administration interface.

After the tendrl-notifier file is configured, configure SNMPv3 trap notifications:

  1. Open the tendrl-notifier configuration file:

    # cat /etc/tendrl/notifier/snmp.conf.yaml
  2. Update the parameters in the file for v3 trap alerts:

    For v3_endpoint:
    # For more hosts you can add more entry with endpoint2, endpoint3, etc
    endpoint1:
    
    # Name or IP address of the remote SNMP host.
            host_ip: <Receiving machine ip>
    # Name of the user on the host that connects to the agent.
            username: <Username of receiver>
    # Enables the agent to receive packets from the host.
            auth_key: <md5 password>
    # The private user password
            priv_key: <des password>
    
    
    # For v2_endpoint:
    
    # For more hosts you can add more entry with endpoint2, endpoint3, etc
    
    endpoint1:
    # Name or IP address of the remote SNMP host.
         host_ip: <Receiving machine ip>
         community: <community name>

Chapter 11. Troubleshooting

Importing Cluster

Scenario 1: Successive attempts to import the same cluster on the same Web Administration server fails

In this scenario, when you attempt reimporting the previously failed cluster, It will continue to fail.

Resolution

To resolve this issue, clean up the Tendrl central store (etcd) by following the Unmanage Cluster procedure. For details, navigate to the Unmanaging Cluster section of this Guide.

Scenario 2: The Import cluster UI button is disabled after a failed cluster import operation.

In this scenario, when cluster import fails, the Import button is disabled.

Resolution

To resolve this issue, uninstall Web Administration and install it again. For uninstall instructions, navigate to the Unmanaging Cluster section of this Guide. For installation instructions, see the Installing Web Administration chapter in the Red Hat Gluster Storage Web Administration Quick Start Guide.

Common Scenario

For any cluster import fail scenario, the current troubleshooting method is to uninstall Web Administration, unmanage the cluster and reinstall Web Administation using tendrl-asnible. For details, navigate to the Unmanaging Cluster section of this Guide.

Legal Notice

Copyright © 2017 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.