Argo Rollouts analysis with ISD and Prometheus
This document explains the end-to-end flow for the analysis run with the Prometheus data source.
Follow the steps below for the end-to-end workflow for analysis run and to get an analysis report.
Step 1: Add Integration
Note: You must add all the integrations through ISD UI only.
To add a Prometheus integration as your data source to allow ISD to perform your metric analysis, follow the steps given below:
- From the ISD dashboard, click "Setup" and click "Integrations" and then Click "+New Integration" button as shown in the image below: 
- The list of available integrators and their respective fields appears. Select the Prometheus integration and fill out the information about it that appears on the right pane of the screen. - Update the following information on the above screen: - Account Name: User-defined account name for your Prometheus access. 
- End Point: Prometheus host address from which you access the Prometheus. 
- User Name(Optional): Prometheus User Name 
- Password(Optional): Prometheus Password 
- Permissions: To restrict permissions to this account, you can select the User Groups. 
 
- Once you have updated all the information, click the "Save" button. The newly created Prometheus integrator appears as shown below: 
Create a metric template after adding the Prometheus integrator.
Step 2: Create Metric Template
We support the following two modes of template creation
- Gitops mode template creation 
- Template creation in ISD UI 
Gitops mode Metric Template creation:
You can create a metric template in a GitHub repository where your deployment manifest files are saved. The sample Metric Template for Prometheus data source is available here.
OpsMx provides the two types of sample Metric Templates for Prometheus s follows:
- Minimal yaml file (prometheus-app-health-springboot-minimal.yaml) 
- Extended yaml file (prometheus-app-health-springboot-extended.yaml) 
Minimal yaml file:
The following sample Prometheus minimal yaml file contains the mandatory parameters to create a metric template. You can use this sample yaml file instead of creating a new template.
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-min
data:
  prometheus-app-health-generic-min: |
    accountName: prometheus-account-name
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"})"
              riskDirection: higher
          group: "Memory Usage By Pod Name"
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\"${pod_key}\"}[1m]) * 100)"
              riskDirection: higher
          group: "CPU Usage By Pod Name"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
          group: "Application Error Rate"Parameters details are as follows:
- name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file) 
- accountName: Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI) 
- metricType: Type of the metric. 
- advancedProvider: Provide the name of “PROMETHEUS”. 
- groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field. - riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’. 
- group: group name 
 
Extended yaml file:
The following sample Prometheus extended yaml file contains all the available parameters even non-mandatory to create a metric template. You can use this sample yaml file instead of creating a new template.
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-app-health-springboot-ext
data:
  prometheus-app-health-springboot-ext: |
    accountName: isd312-saas-prom
    metricType: ADVANCED
    advancedProvider: PROMETHEUS
    metricWeight: 1
    criticality: normal
    nanStrategy: remove
    metricTemplateSetup:
      groups:
        - metrics:
            - name: "avg(container_memory_usage_bytes{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"})"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
              nanStrategy: replaceWithZero
          group: Memory Usage By Pod Name
        - metrics:
            - name: "avg(rate(container_cpu_usage_seconds_total{namespace=\"${namespace_key}\", pod=~\".*${pod_key}.*\"}[1m]) * 100)"
              riskDirection: higher
              criticality: mustHave
              customThresholdHigherPercentage: 50
          group: CPU Usage By Pod Name
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_sum{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m])) / sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Latency"
        - metrics:
            - name: "avg(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\"}[1m]))"
              riskDirection: higherOrLower
              customThresholdHigherPercentage: 50
              customThresholdLowerPercentage: 50
          group: "Application Request Rate"
        - metrics:
            - name: "sum(rate(http_server_requests_seconds_count{app=\"${app_name}\",kubernetes_pod_name=~\"${pod_key}\",status=~\"^[4-5].*\"}[1m])) or vector(0)"
              riskDirection: higher
              customThresholdHigherPercentage: 50
              criticality: critical
              watchlist: true
          group: "Application Error Rate"Parameters details are as follows:
- name: Give a name to the Metric Template (This Metric Template name, must be used in OpsMx Provider configmap file ) 
- accountName: Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI) 
- metricType: Type of the metric. 
- advancedProvider: Provide the name of “PROMETHEUS”. 
- metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest. 
- criticality: - Normal: To remove the metric from the metric group for score calculation if it has no data. 
- Critical: To fail the entire analysis if this metric fails or has no data. 
- MustHave: To fail a metric if data is missing. 
 
- nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data 
- groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field. - riskDirection: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’. 
 
- criticality: - Normal: To remove the metric from the metric group for score calculation if it has no data. 
- Critical: To fail the entire analysis if this metric fails or has no data. 
- MustHave: To fail a metric if data is missing. 
 
- groups: Groups are the set of metrics to be configured for analysis. Each group can carry multiple metrics and has a group name associated with it, to be specific in the group field. 
- customThresholdHigherPercentage: Percentage difference beyond which the metric is treated as fail 
- customThresholdLowerPercentage: Percentage difference beyond which the metric is treated as fail 
- nanStrategy: Handles NaN values which can occur if there is no data in a particular interval for metric data 
- watchlist: Metrics marked in watchlist will be shown first in the metric analysis report. 
- metricWeight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest. 
- group: group name 
Create Metric Template in ISD UI
You can create a Metric Template on the Setup → Analysis Templates page in ISD UI. To create a metric template in ISD UI, follow the steps below:
- From the application dashboard, click "Setup" and click “Analysis Templates” and then click "+New Template" button. Refer to the image below. 
- After clicking the “+New Template” button, two options appear for you to choose the type of template you want to create. Select the “Metric Template” from the available options as shown in the image below. 
- The new Metric Template window appears. Update the necessary parameters and click “Save” button. Refer to the image below. 
Update the following parameters in the above screen:
- Metric Template Name: Give a name to the Metric Template. (This Metric Template name, must be used in OpsMx Provider configmap file) 
- Select Datasource: Select “PROMETHEUS” as data source from the drop-down. 
- Select Accounts: Select the account of interest in the configured data source from the drop-down. Metric provider account name (Must be provided the same account name, which you have given while adding the Prometheus integrator in ISD UI). 
- Metric Scope Placeholder: Metric Scope Placeholder will be replaced by Baseline and New Release values in the Metric Query. 
- Query Name: A meaningful name given to a query or a group of similar queries. 
- Query: Query to fetch the metric from the data source provider. 
- Risk Direction: Direction in which the metric difference is allowed to expand. You can choose the value as Higher, Lower, ‘Higher or Lower’. 
- Threshold percentage: Percentage difference beyond which the metric is treated as fail. 
- Criticality: - Normal: To remove the metric from the metric group for score calculation if it has no data. 
- Critical: To fail the entire analysis if this metric fails or has no data. 
- MustHave: To fail a metric if data is missing. 
 
- Watchlist: Metrics marked in watchlist will be shown first in the metric analysis report. 
- NaN Strategy: Handles NaN values which can occur if there is no data in a particular interval for metric data. 
- Weight: Numerical importance given to a metric. It can range from 0 as lowest and 1 as highest. 
After creating Metric Template, it appears in the list for an application on the “Analysis Templates” page as shown below:
Step 3: Create Application
To create an application, refer here. This application must be specified in the “OpsMx Provider Configmap”.
Step 4: Create OpsMx Provider Configmap
Create “OpsMx Provider Configmap” with opsmx metric provider information, including “metric template name” and “application name”. Sample “OpsMx Provider Configmap” yaml file is available here. To create “OpsMx Provider Configmap” refer here.
- In the “OpsMx Provider Configmap”, If gitops is set to “true”, the metric provider prioritizes the presence of metric template as config map, and if not found, tries to load it from ISD. - gitops: true
- In the “OpsMx Provider Configmap”, If gitops is set to “false”, the provider only loads the metric template from ISD. - volumeMounts: - name: metric-config-volume mountPath: /etc/config/templates volumes: - name: metric-config-volume configMap: name: metrixtemplates
Step 5: Create an Analysis template
Specify the “OpsMx Provider Configmap” under the job section in the Analysis template. Sample "Analysis Template" is available here. To create an Analysis Template, refer here.
Step 6: Modify rollout.yaml
Modify the rollout.yaml with the image and include the analysis step by specifying the Analysis template that you have already created as shown below. Sample rollout.yaml is available here.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1401
          imagePullPolicy: Always
          ports:
            - containerPort: 8088
          resources:
            requests:
              memory: 32Mi
              cpu: 5m
  strategy:
    canary:
      steps:
        - setWeight: 25
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: Stable
        - setWeight: 75
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: StableStep 7: Deploy application
To deploy the application, refer here.
Step 8: Trigger analysis run
If a newer version of the application is deployed, the Rollouts strategy will be invoked and an analysis run will be triggered. Update the image version in rollout.yaml as shown below and sync the application to trigger the analysis run. To sync the application, refer here.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1402Step 9: View Analysis report
To view the Analysis report from Rollouts Dashboard, refer here.
Last updated
