Create Metric Template in Git

A metric template is a type of template which is used for doing metric analysis. Users can create a metric template in Github. To create a metric template follow the steps below:

Users can create a Metric template after creating an application. If you haven’t created an application already, click here.

You can create a metric template in a GitHub repository where your deployment manifest files are saved. OpsMx provides two types of sample Metric Templates for each type of data source that supports Metric Analysis. You can use these sample yaml files instead of creating a new template.

Sample templates:

The sample templates for each data source are available here. For example, sample Metric Templates for Prometheus data source are as follows:

Minimal yaml file (prometheus-app-health-springboot-minimal.yaml)
Extended yaml file (prometheus-app-health-springboot-extended.yaml)

Minimal yaml file:

The following sample Prometheus minimal yaml file contains the mandatory parameters to create a metric template. You can use this sample minimal yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-min
data:
  prometheus-app-health-generic-min: |
    accountName: prometheus-account-name
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"})"
              riskDirection: higher
          group: "Memory Usage By Pod Name"
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"}[1m]) * 100)"
              riskDirection: higher
          group: "CPU Usage By Pod Name"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
          group: "Application Error Rate"

Parameters details are as follows:

name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)
accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)
metricType: Type of the metric.
advancedProvider: Provide the name of “PROMETHEUS”.
groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

Extended yaml file:

The following sample Prometheus extended yaml file contains all the available parameters even non-mandatory to create a metric template. You can use this sample extended yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-ext
data:
  prometheus-app-health-springboot-ext: |
    accountName: isd312-saas-prom
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricWeight: 1
    criticality: normal
    nanStrategy: remove
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"})"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
              nanStrategy: replaceWithZero
          group: Memory Usage By Pod Name
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"}[1m]) * 100)"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
          group: CPU Usage By Pod Name
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
              customThresholdHigherPercentage: 50
              customThresholdLowerPercentage: 50
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Error Rate"

Parameters details are as follows:

name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)
accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)
metricType: Type of the metric.
advancedProvider: Provide the name of “PROMETHEUS”.
metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.
criticality:
- Normal: To remove the metric from the metric group for score calculation if it has no data.
- Critical: To fail the entire analysis if this metric fails or has no data.
- MustHave: To fail a metric if data is missing.
nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data
Note:
- If you specify metricWeight, criticality and nanStrategy at global level, it is applicable to all the metric groups.
- If you specify metricWeight, criticality and nanStrategy at local level, it is applicable to only that particular metric group and it will override the same parameters at the global level if already specified.
groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.
- riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’.
criticality:
- Normal: To remove the metric from the metric group for score calculation if it has no data.
- Critical: To fail the entire analysis if this metric fails or has no data.
- MustHave: To fail a metric if data is missing.
groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.
customThresholdHigherPercentage: Percentage difference beyond which the metric is treated as fail
customThresholdLowerPercentage: Percentage difference beyond which the metric is treated as fail
nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data
watchlist: Metrics marked in watchlist will be shown first in the metric analysis report.
metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.
group: group name
Note: If you do not specify a value for “customThresholderHigherPercentage”, it is set to 10 by default.

After creating the metric template in git, you need to create the “OpsMx Provider Configmap”. To create the “OpsMx Provider Configmap” refer here.

PreviousCreate Log Template in Git NextAnalysis Template

Last updated 2 years ago

Sample templates:

The sample templates for each data source are available here. For example, sample Metric Templates for Prometheus data source are as follows:

Minimal yaml file:

The following sample Prometheus minimal yaml file contains the mandatory parameters to create a metric template. You can use this sample minimal yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-min
data:
  prometheus-app-health-generic-min: |
    accountName: prometheus-account-name
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"})"
              riskDirection: higher
          group: "Memory Usage By Pod Name"
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"}[1m]) * 100)"
              riskDirection: higher
          group: "CPU Usage By Pod Name"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
          group: "Application Error Rate"

Parameters details are as follows:

name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)

accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)

metricType: Type of the metric.

advancedProvider: Provide the name of “PROMETHEUS”.

groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

Extended yaml file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-ext
data:
  prometheus-app-health-springboot-ext: |
    accountName: isd312-saas-prom
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricWeight: 1
    criticality: normal
    nanStrategy: remove
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"})"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
              nanStrategy: replaceWithZero
          group: Memory Usage By Pod Name
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"}[1m]) * 100)"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
          group: CPU Usage By Pod Name
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
              customThresholdHigherPercentage: 50
              customThresholdLowerPercentage: 50
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Error Rate"

Parameters details are as follows:

name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)

accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)

metricType: Type of the metric.

advancedProvider: Provide the name of “PROMETHEUS”.

metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.

criticality:

Normal: To remove the metric from the metric group for score calculation if it has no data.
Critical: To fail the entire analysis if this metric fails or has no data.
MustHave: To fail a metric if data is missing.

nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data

Note:

If you specify metricWeight, criticality and nanStrategy at global level, it is applicable to all the metric groups.
If you specify metricWeight, criticality and nanStrategy at local level, it is applicable to only that particular metric group and it will override the same parameters at the global level if already specified.

groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’.

criticality:

Normal: To remove the metric from the metric group for score calculation if it has no data.
Critical: To fail the entire analysis if this metric fails or has no data.
MustHave: To fail a metric if data is missing.

groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

customThresholdHigherPercentage: Percentage difference beyond which the metric is treated as fail

customThresholdLowerPercentage: Percentage difference beyond which the metric is treated as fail

nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data

watchlist: Metrics marked in watchlist will be shown first in the metric analysis report.

metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.

group: group name

Note: If you do not specify a value for “customThresholderHigherPercentage”, it is set to 10 by default.

After creating the metric template in git, you need to create the “OpsMx Provider Configmap”. To create the “OpsMx Provider Configmap” refer here.