System notifications for disk and database health

We have a few different system health related email notifications. These notifications were designed for system administrators to prevent situations that could cause a system to crash, or to notify when a database has gone offline.

The goal of the help doc is to help admins understand the settings they have available.

Key notifications about system health are setup by default with every deployment--they address the following:

  1. Low disk space on primary web server
  2. Low inode space on primary web server
  3. [Version 6.1.x] Low disk space in dataset storage engine
  4. [Version 6.1.x] Dataset storage downtime

Prior to Release 6.2.1, emails were being sent to all Admins.  In 6.2.1, new settings on an Admin's User Editor > Info tab allow the individual Admin to elect to receive messages about:

  1. Access Notifications
  2. System Notifications

See Section 3 below for more information.

Example email

User Editor - Google Chrome

1. System variables controlling Disk and Inode notifications options

Occasionally files on your Metric Insights server will become numerous, utilizing precious disk space. By default, the system is set to notify Support Admin users when disk space goes below the percent set via these variables:

  1. MAX_USED_DISK_SPACE_PERCENT -  Set this field to the maximum percentage that your System Disk space can use before Notifications are sent (controlled by #4)
  2. MAX_USED_INODE_PERCENT -  Set this field to the maximum percentage that your INODE Disk space can use before Notifications are sent (controlled by #3)

Example Disk Space Notification emails:

2. System variables controlling Storage Health notification options

As of 6.x versions, database status (admins will be notified if the database goes down) and low disk space notifications will send via email. Interval can be set via these variables: 

  1. SEND_STORAGE_HEALTH_EMAIL - set to 'Y' in order for System Admin to receive notifications
  2. STORAGE_HEALTH_CHECK_INTERVAL - set to the interval (in seconds) for system to check Health of all your Data Storage systems

3. Admin Options:  Admin User Editor > Info tab

  1. [Receive Access Notifications] When checked, sends the Admin emails about problems such as access denied to a  page or, missing Privileges and any related Access Request emails
  2. [Receive System Notifications] When checked   allows Admins to receive emails about such problems as low disk space, low inode space, and data storage health

More help?

How do I check Health and Status online? See Status Monitor Page