Argo Rollouts analysis with ISD and Stackdriver

This document explains the end-to-end flow for the analysis run with the Stackdriver data source.

Follow the steps below for the end-to-end workflow for analysis run and to get an analysis report.

Step 1: Add Integration

Note: You must add all the integrations through ISD UI only.

To add a Stackdriver integration as your data source to allow ISD to perform your log analysis, follow the steps given below:

  1. From the ISD dashboard, click "Setup" and click "Integrations" and then Click "+New Integration" button as shown in the image below:

  2. The list of available integrators and their respective fields appears. Select the Stackdriver integration and fill out the information about it that appears on the right pane of the screen.

    Update the following information on the above screen:

    • Account Name: Name of the Stackdriver account to operate on.

    • Encrypted Key File: A path to a Google JSON service account that has permission to publish metrics.

    • Permissions: To restrict permissions to this account, you can select the User Groups.

  3. Once you have updated all the information, click the "Save" button. The newly created Stackdriver integrator appears as shown below:

Create a log template after adding the Stackdriver integrator.

Step 2: Create Log Template

We support the following two modes of template creation

  • Gitops mode template creation

  • Template creation in ISD UI

Gitops mode Log Template creation:

You can create a log template in a GitHub repository where your deployment manifest files are saved. The sample Log Template for Stackdriver data source is available here.

OpsMx provides the two types of sample log Templates for Stackdriver as follows:

  1. Minimal yaml file (stackdriver-log-generic-minimal.yaml)

  2. Extended yaml file (stackdriver-log-generic-extended.yaml)

Minimal yaml file:

The following sample Stackdriver minimal yaml file contains the mandatory parameters to create a log template. You can use this sample yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: stackdriver-generic-minimal
data:
  stackdriver-generic-minimal: |
    monitoringProvider: STACKDRIVER
    accountName: stackdriver-account-name
    filterKey: resource.labels.pod_name
    responseKeywords: textPayload

Parameters details are as follows:

  • name: Give a name to the Log Template (This Log Template name, must be used in OpsMx Provider configmap file).

  • monitoringProvider: Provide the name of “STACKDRIVER”.

  • accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Stackdriver integrator in ISD UI)

  • index: Index containing logs for processing

  • filterKey: Unique Key which identifies logs to be processed in the index

  • responseKeywords: Field name in the index containing logs to be processed.

Extended yaml file:

The following sample Stackdriver extended yaml file contains all the available parameters even non-mandatory to create a log template. You can use this sample yaml file instead of creating a new template.

apiVersion: v1
kind: ConfigMap
metadata:
  name: stackdriver-generic-ext
data:
  stackdriver-generic-ext: |
    monitoringProvider: STACKDRIVER
    accountName: stackdriver-account-name
    filterKey: resource.labels.pod_name
    responseKeywords: textPayload
    errorTopics: 
    - errorString: ArrayIndexOutOfBounds
      topic: ERROR
    - errorString: NullPointerException
      topic: ERROR
    tags:
    - errorString: FATAL
      tag: FatalErrors

Parameters details are as follows:

  • name: Give a name to the Log Template (This Log Template name, must be used in OpsMx Provider configmap file ).

  • monitoringProvider: Provide the name of “STACKDRIVER”.

  • accountName: Log provider account name (Must be provided the same account name, which you have given while adding the Stackdriver integrator in ISD UI)

  • index: Index containing logs for processing

  • filterKey: Unique Key which identifies logs to be processed in the index

  • responseKeywords: The element in Stackdriver record referring to the actual log line. e.g. log, message, etc.

  • errorTopics: Error Topics can be defined to filter the attached errorString to categorize the logs carrying it into a severity level. The definition of an errorTopic contains a combination of errorString and severity to associate with it. errorTopics can have 4 severity levels: CRITICAL, ERROR, WARN and INFO

  • tags: Tags are used for future reference of an issue captured during analysis. An error string (as configured) when found in a log cluster, the cluster gets tagged and comments can be associated with the tag for future reference.

Log Template creation in ISD UI:

You can create a log Template on the Setup Analysis Templates page in ISD UI. To create a log template in ISD UI, follow the steps below:

  1. From the application dashboard, click "Setup" and click “Analysis Templates” and then click "+New Template" button. Refer to the image below.

  2. After clicking “+New Template” button, two options appear for you to choose the type of template you want to create. Select the “Log Template” from the available options as shown in the below image.

  3. The New Log Template window appears and it has three sections to update the necessary parameters as shown below:

    1. Log Provider: Select the data source for analysis and provide relevant parameters

    2. Log Topics: Strings that appear in logs with their characterization

    3. Log Tags: Create custom tags based on business logic.

Log Provider

Select the data source for analysis and update the relevant parameters as per the below instructions.

  • Log Template Name: Provide a unique name to the Log Template in the text box.

  • Provider: Select STACKDRIVER as data source from the Provider drop-down. Based on the selection there will be new options added. In this section, we have selected Stackdriver as an example. Once selected, the new options appear as shown in the image below:

  • Log Account: Select the Account of the Log provider from the “Log Account” drop-down. Refer Integrations tab under Setup for Log Account.

  • Index Pattern: Index containing logs for processing.

  • Query Filter Key: Unique Key which identifies logs to be processed in the index

  • Response Keywords: Field name in the index containing logs to be processed

  • Timestamp Key (Optional): Unique Key which identifies the timestamp for the log. By default, it is the timestamp for ElasticSearch and Graylog.

  • Turn on/off toggle button:

    • Custom Regex: Custom Regular Expression to filter the logs.

    • Autobaseline: ML based learning of the baseline from historic analysis.

    • Contextual Cluster: Enable/disable cluster of unexpected events in similar context.

    • Info Cluster Scoring: Enabling this option will include INFO clusters in scoring.

  • Sensitivity: Select the Level of Sensitivity from the drop-down. Sensitivity means the importance of warning or error. For example - If the sensitivity is high any error or warning will be considered as highly sensitive and the penalty in the final risk score will be more. If medium or low, the penalty in the risk score will be moderate or low respectively.

  • Scoring Algorithm: Click the Scoring Algorithm drop-down and select the type of algorithm and the options are:

  • Click Next to update the Log Topics section.

Log Topics

The Log Topics screen is where the intelligence is provided to the application. Here we have listed some of the most common errors in the industry and categorized them as Critical, Error, Warn and Ignore. The categorization has been done based on industry standards. For example - OutofMemoryError is a show stopper. We have also provided the option for you to change the category based on your requirements.

After updating the Log template section with the necessary parameters, the Log Topics screen appears as shown below:

In the above screen you can do the following:

  1. Click the Characterization Topic drop-down to change the category of the error. So for example, you can set the OnOutOfMemoryError to WARN from CRITICAL. Refer to the image below:

  2. Click the Delete icon to delete a string pattern as shown below:

  3. Click the “+” icon to add a new log topic and a new row will be added. Update the string and set the category as you want and then click “Next” as shown below:

Log Tags

After you click Next, the Log Tags screen appears. As a user, you might want to give some business logic-related input to the analysis. The Log tags help you to do the same. In this screen, you can add the cluster tags. Issues like Infrastructure, build error, etc. you can pre-define in this screen. Refer to the image below:

To add a cluster tag, follow the steps below:

  1. From the “LogTags” screen, click on the “+New Cluster Tag” button as shown below.

  2. Enter the Cluster Tag string and give a name to the Cluster Tag. Refer to the image below:

  3. Click “+New” button to add a new row of Cluster Tag and enter the Cluster Tag string and give a name to the Cluster Tag. In the same way you can create multiple Cluster Tags. Refer to the image below.

  4. After adding the Cluster Tag click the “Submit” button. Refer to the image below.

After creating Log Template, it appears in the list for an application on the “Analysis Templates” page as shown below:

Step 3: Create Application

To create an application, refer here. This application must be specified in the “OpsMx Provider Configmap”.

Step 4: Create OpsMx Provider Configmap

Create “OpsMx Provider Configmap” with opsmx metric provider information, including “log template name” and “application name”. Sample “OpsMx Provider Configmap” yaml file is available here. To create “OpsMx Provider Configmap” refer here.

Note: The user is given the flexibility to create metric and log templates in ISD and also, to maintain them as config maps via GitOps.

  • In the “OpsMx Provider Configmap”, If gitops is set to “true”, the metric provider prioritizes the presence of log template as config map, and if not found, tries to load it from ISD.

    gitops: true
  • In the “OpsMx Provider Configmap”, If gitops is set to “false”, the provider only loads the log template from ISD.

    Note: If you want to load the log template from ISD, do not specify the following parameters in the “Analysis template”.

         volumeMounts:
          - name: metric-config-volume
            mountPath: /etc/config/templates
    
         volumes:
          - name: metric-config-volume
            configMap:
              name: metrixtemplates

Note: Must be provided the same Log Template name in OpsMx Provider Configmap, which you specified in the Minimal yaml file\ Extended yaml file\ ISD UI.

Step 5: Create an Analysis template

Specify the “OpsMx Provider Configmap” under the job section in the Analysis template. Sample "Analysis Template" is available here. To create an Analysis Template, refer here.

Step 6: Modify rollout.yaml

Modify the rollout.yaml with the image and include the analysis step by specifying the Analysis template that you have already created as shown below. Sample rollout.yaml is available here.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1401
          imagePullPolicy: Always
          ports:
            - containerPort: 8088
          resources:
            requests:
              memory: 32Mi
              cpu: 5m
  strategy:
    canary:
      steps:
        - setWeight: 25
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: Stable
        - setWeight: 75
        - pause: { duration: 15s }
        - analysis:
            templates:
              - templateName: opsmx-analysis
            args:
              - name: canary-hash
                valueFrom:
                  podTemplateHashValue: Latest
              - name: baseline-hash
                valueFrom:
                  podTemplateHashValue: Stable

Step 7: Deploy application

To deploy the application, refer here.

Note: Please make sure all the above configuration files are stored in the folder where the rollout.yaml manifest file is stored.

Step 8: Trigger analysis run

If a newer version of the application is deployed, the Rollouts strategy will be invoked and an analysis run will be triggered. Update the image version in rollout.yaml as shown below and sync the application to trigger the analysis run. To sync the application, refer here.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: rollout-app
spec:
  replicas: 4
  selector:
    matchLabels:
      app: rollout-app
  revisionHistoryLimit: 2
  template:
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus_io_path: /mgmt/prometheus
        prometheus_io_port: '8088'
      labels:
        app: rollout-app
    spec:
      containers:
        - name: rollout-app
          image: quay.io/opsmxpublic/canary-issuegen:issue-canary-gen-1402

Note: For the first version of application deployment, Rollout strategy is not invoked. The Rollout strategy is followed only when a newer version of the application is deployed.

Step 9: View Analysis report

To view the Analysis report from Rollouts Dashboard, refer here.

Last updated