Argo Rollouts analysis with ISD and Prometheus

This document explains the end-to-end flow for the analysis run with the Prometheus data source.

Follow the steps below for the end-to-end workflow for analysis run and to get an analysis report.

Step 1: Add Integration

Note: You must add all the integrations through ISD UI only.

To add a Prometheus integration as your data source to allow ISD to perform your metric analysis, follow the steps given below:

  1. From the ISD dashboard, click "Setup" and click "Integrations" and then Click "+New Integration" button as shown in the image below:

  2. The list of available integrators and their respective fields appears. Select the Prometheus integration and fill out the information about it that appears on the right pane of the screen.

    Update the following information on the above screen:

    • Account Name: User-defined account name for your Prometheus access.

    • End Point: Prometheus host address from which you access the Prometheus.

    • User Name(Optional): Prometheus User Name

    • Password(Optional): Prometheus Password

    • Permissions: To restrict permissions to this account, you can select the User Groups.

  3. Once you have updated all the information, click the "Save" button. The newly created Prometheus integrator appears as shown below:

Create a metric template after adding the Prometheus integrator.

Step 2: Create Metric Template

We support the following two modes of template creation

  • Gitops mode template creation

  • Template creation in ISD UI

Gitops mode Metric Template creation:

You can create a metric template in a GitHub repository where your deployment manifest files are saved. The sample Metric Template for Prometheus data source is available here.

OpsMx provides the two types of sample Metric Templates for Prometheus s follows:

  1. Minimal yaml file (prometheus-app-health-springboot-minimal.yaml)

  2. Extended yaml file (prometheus-app-health-springboot-extended.yaml)

Minimal yaml file:

The following sample Prometheus minimal yaml file contains the mandatory parameters to create a metric template. You can use this sample yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-min
data:
  prometheus-app-health-generic-min: |
    accountName: prometheus-account-name
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"})"
              riskDirection: higher
          group: "Memory Usage By Pod Name"
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"}[1m]) * 100)"
              riskDirection: higher
          group: "CPU Usage By Pod Name"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
          group: "Application Error Rate"

Parameters details are as follows:

  • name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)

  • accountName: Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)

  • metricType: Type of the metric.

  • advancedProvider: Provide the name of “PROMETHEUS”.

  • groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

    • riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’.

    • group: group name

Extended yaml file:

The following sample Prometheus extended yaml file contains all the available parameters even non-mandatory to create a metric template. You can use this sample yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-ext
data:
  prometheus-app-health-springboot-ext: |
    accountName: isd312-saas-prom
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricWeight: 1
    criticality: normal
    nanStrategy: remove
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"})"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
              nanStrategy: replaceWithZero
          group: Memory Usage By Pod Name
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"}[1m]) * 100)"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
          group: CPU Usage By Pod Name
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
              customThresholdHigherPercentage: 50
              customThresholdLowerPercentage: 50
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Error Rate"

Parameters details are as follows:

  • name: Give a name to the Metric Template (This Metric Template name, must be used in OpsMx Provider configmap file )

  • accountName: Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI)

  • metricType: Type of the metric.

  • advancedProvider: Provide the name of “PROMETHEUS”.

  • metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.

  • criticality:

    • Normal: To remove the metric from the metric group for score calculation if it has no data.

    • Critical: To fail the entire analysis if this metric fails or has no data.

    • MustHave: To fail a metric if data is missing.

  • nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data

    Note:

    • If you specify metricWeight, criticality and nanStrategy at global level, it is applicable to all the metric groups.

    • If you specify metricWeight, criticality and nanStrategy at local level, it is applicable to only that particular metric group and it will override the same parameters at the global level if already specified.

  • groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

    • riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’.

  • criticality:

    • Normal: To remove the metric from the metric group for score calculation if it has no data.

    • Critical: To fail the entire analysis if this metric fails or has no data.

    • MustHave: To fail a metric if data is missing.

  • groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field.

  • customThresholdHigherPercentage: Percentage difference beyond which the metric is treated as fail

  • customThresholdLowerPercentage: Percentage difference beyond which the metric is treated as fail

  • nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data

  • watchlist: Metrics marked in watchlist will be shown first in the metric analysis report.

  • metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.

  • group: group name

Create Metric Template in ISD UI

You can create a Metric Template on the Setup Analysis Templates page in ISD UI. To create a metric template in ISD UI, follow the steps below:

  1. From the application dashboard, click "Setup" and click “Analysis Templates” and then click "+New Template" button. Refer to the image below.

  2. After clicking the “+New Template” button, two options appear for you to choose the type of template you want to create. Select the “Metric Template” from the available options as shown in the image below.

  3. The new Metric Template window appears. Update the necessary parameters and click “Save” button. Refer to the image below.

Update the following parameters in the above screen:

  • Metric Template Name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file)

  • Select Datasource: Select “PROMETHEUS” as data source from the drop-down.

  • Select Accounts: Select the account of interest in the configured data source from the drop-down. Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI).

  • Metric Scope Placeholder: Metric Scope Placeholder will be replaced by Baseline and New Release values in the Metric Query.

  • Query Name: A meaningful name given to a query or a group of similar queries.

  • Query: Query to fetch the metric from the data source provider.

  • Risk Direction: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’.

  • Threshold percentage: Percentage difference beyond which the metric is treated as fail.

  • Criticality:

    • Normal: To remove the metric from the metric group for score calculation if it has no data.

    • Critical: To fail the entire analysis if this metric fails or has no data.

    • MustHave: To fail a metric if data is missing.

  • Watchlist: Metrics marked in watchlist will be shown first in the metric analysis report.

  • NaN Strategy: Handles NaN values which can occur if there is no data in a particular interval for metric data.

  • Weight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest.

After creating Metric Template, it appears in the list for an application on the “Analysis Templates” page as shown below:

Step 3: Create Application

To create an application, refer here. This application must be specified in the “OpsMx Provider Configmap”.

Step 4: Create OpsMx Provider Configmap

Create “OpsMx Provider Configmap” with opsmx metric provider information, including “metric template name” and “application name”. Sample “OpsMx Provider Configmap” yaml file is available here. To create “OpsMx Provider Configmap” refer here.

Note: The user is given the flexibility to create metric and log templates in ISD and also, to maintain them as config maps via GitOps.

  • In the “OpsMx Provider Configmap”, If gitops is set to “true”, the metric provider prioritizes the presence of metric template as config map, and if not found, tries to load it from ISD.

    gitops: true
  • In the “OpsMx Provider Configmap”, If gitops is set to “false”, the provider only loads the metric template from ISD.

    Note: If you want to load the metric template from ISD, do not specify the following parameters in the “Analysis template”.

         volumeMounts:
          - name: metric-config-volume
            mountPath: /etc/config/templates
    
         volumes:
          - name: metric-config-volume
            configMap:
              name: metrixtemplates

Note: Must be provided the same Metric Template name in OpsMx Provider Configmap, which you specified in the Minimal yaml file\ Extended yaml file\ ISD UI.

Step 5: Create an Analysis template

Specify the “OpsMx Provider Configmap” under the job section in the Analysis template. Sample "Analysis Template" is available here. To create an Analysis Template, refer here.

Step 6: Modify rollout.yaml

Modify the rollout.yaml with the image and include the analysis step by specifying the Analysis template that you have already created as shown below. Sample rollout.yaml is available here.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1401
          imagePullPolicy: Always
          ports:
            - containerPort: 8088
          resources:
            requests:
              memory: 32Mi
              cpu: 5m
  strategy:
    canary:
      steps:
        - setWeight: 25
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: Stable
        - setWeight: 75
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: Stable

Step 7: Deploy application

To deploy the application, refer here.

Note: Please make sure all the above configuration files are stored in the folder where the rollout.yaml manifest file is stored.

Step 8: Trigger analysis run

If a newer version of the application is deployed, the Rollouts strategy will be invoked and an analysis run will be triggered. Update the image version in rollout.yaml as shown below and sync the application to trigger the analysis run. To sync the application, refer here.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1402

Note: For the first version of application deployment, Rollout strategy is not invoked. The Rollout strategy is followed only when a newer version of the application is deployed.

Step 9: View Analysis report

To view the Analysis report from Rollouts Dashboard, refer here.

Last updated