Service cookbooks are a way to specify on how to evaluate a service through specific critical metrics and watchlist metrics. OpsMx machine learning algorithm starts with a default cookbook for all known services (e.g., Tomcat, MySQL, etc.). For custom services, a template for cookbook will be generated from metric datastore (e.g., Datadog) based on the service name and system metrics. Applications developers and operators can customize the cookbook further with their understanding of the service.
Step 1: Click on “SETUP” from the Main menu
Step 2: Click on “APPLICATION” tab
Step 3: Click on “+” Button
Setup flow While creating the Metric template, there is an option to select Advanced metric analysis.
Step 4: Option to analyse any metric based on custom query.The metrics template can be configured with one or more queries for the analysis. The placeholders in the queries will be replaced by actual values supplied at the time of triggering analysis.
Configure Template Name
Configuration of new Template, enter Template-Name in the textbox, and choose Cloud-Provider as (Kubernetes,AWS...or Custom) then continue by clicking on “NEXT” Button.
After selecting the autopilot application and the datasource, the user can enter one or more queries to fetch metrics. Each query can have one or more placeholders. In the above screen, Query list is a placeholder which can be specified at the time of triggering the analysis. The threshold for metric analysis can be configured with the slider setting of “Threshold for metric failure”
Step 5: For the configuration of new Template, enter Template-Name in the textbox, and choose Cloud-Provider as (Kubernetes,AWS...or Custom) then continue by clicking on “NEXT” Button. If no specific information is being used from the Cloud-Provider, you can use the generic cloud provider as depicted below.
Note: At-least one of APM or Infrastructure metrics need to chosen APM - Refers to API level monitoring and metrics. Infrastructure - Refers to vital stats of the VM/Container like Memory, CPU usage etc.
Step 6: Choose “Monitoring Provider” as added previously in “Monitoring Credentials” tab by selecting one at a time Newrelic, Prometheus, Datadog etc
Step 7: Based on selected Service Provider it shows up with service name you provided then select that service and continue by clicking on “NEXT” Button.
Configure Application and API
Based on the selected service it displays the list of applications, choose one of it.
Based on the selected applcation it dispalys the list of API`s. if you want to uncheck any API which is not required, and continue by clicking on “NEXT” Button.
In this INFRA Configuration ,select Service Provider it shows up with service name you provided then select that service and continue by clicking on “FINISH” Button.
It opens success pop box with all selected API`s list,you can edit Watchlist metrics and Critical metrics by show/hide option on right side.
Choose weights for the metric groups that have been selected, to help OpsMx score based on them.
Watchlist metrics are metrics that Developers or Operators typically use to monitor during the manual judgment phase. These metrics will be shown for easier tracking in the scoring page.
For Red/Black ACA, a metric that represents the load on the server must be set to enable comparing service versions that are run at different times with potentially different environment and load conditions.
** Important: To allow load-based normalisation, atleast one metric needs to be chosen as the global load, after which it gets categorized under the load section. **
The remaining metrics are used in the final score, but the relative rank and weights are determined by OpsMx machine learning algorithms automatically.
Step 8:: Once the update to the cookbook is completed, click “Save” to save the cookbook. The cookbook will be used for analysis and diagnostics for future runs.
Log templates are used to aid the classification of logs messages by Natural Language Processing (NLP) algorithm with known error patterns as well domain/app specific messages. The string pattern specified can be used either classify an error or warning message or to even completely ignore the message. Once the template is set up, clone the template as a starting point to create new modified templates for new services.
To create a new log template, follow the steps below.
To create a log template for a service,start with template creation by specifiying a unique name. Next step is to select the platform or cloud environment for this service.
Index pattern is for Autopilot to access data from eleasticsearch, you must specify an index pattern. An index pattern tells Autopilot which Elasticsearch indices contain the data that you want to work with.
E.g., To tell Autopilot that you want to analyse data in indices which start with "fluentd-", specify the index pattern as "fluentd-*"
Sample query to ElasticSearch:
Kibana Default Index
Autopilot uses kibana to show the data on which the analysis was carried out. To see the logs using "View Logs" button in the analysis screen, you must specify the default index.
Create the index pattern of interest in kibana ( kibana -> Management -> Index Patterns)
Set the index pattern as the default index ( click on * after selecting the index pattern, in the Index Patterns screen)
Copy defaultindex value from Advanced settings screen in kibana and paste it in Autopilot ( kibana -> Management -> Advanced settings -> defaultindex )
Regular expression is used to filter the logs for analysis. This will be a handy feature to reduce the scope from a larger set of data. Users will now be provided with an option to filter the logs on the basis of the regular expression. If the regular expression provided is in correct format and matches the loglines then the analysis will be carried out on the subset filter on the basis of Regular Expression.
Next step is specify the monitoring provider used for aggregating application logs for this service. User can specify the unique string patterns to match for error or warning classification. User can also specific string patterns to be categorized as ignore (not used in risk scoring) if desired. Typically, domain or app specific string patterns are specified to add in classification of the log messages.
The remaining things in the template creation are same. The only change was creating template for an application and for a service.