ISD On-Prem Production Infrastructure requirements
Last updated
Was this helpful?
Last updated
Was this helpful?
Access: Admin access to ONE namespace
Compute: In general, Nodes with higher memory are preferred because ISD (with Spinnaker) is memory intensive
Minimum: 8CPU, 32GB, 3 nodes, (6 Nodes for max configuration)
Preferred: 64/128 GB RAM Nodes
Network: Is outbound internet access, http and grpc traffic, http traffic to all cloud-endpoints and artifact repos allowed?
If Yes: Proceed with normal installation
If No: Choose Air-Gapped installation.
Yes, but Proxy access is required for http but grpc is not allowed: Same as No and configure proxies as mentioned below.
ISTIO/Service Mesh: If this is in use, additional considerations are required to external access, including DB, cloud-endpoints, artifact and data endpoints. This is to ensure seamless integration.
Aurora Postgres (e.g. RDS): Recommended Size of the Cluster is “db.r6g.xlarge(4CPU’s & 32gb)” up to Ver 13.3 has been tested. This is used by Autopilot (aka OES). As a starting point we start with an estimated 20GB.
S3 Bucket(s): Required for Kayenta (or Verification).
Aurora MySql 5.7(2.07.2 or +) RDS: Recommended Size of the Cluster is supposed to be “db.r5.xlarge(4CPU’s & 32gb)” Clouddriver, Orca and Front50. We can have the same instance or 3 separate instances for better performance.
Spinnaker Service Level Sizing requirements
Front50, 100MB is sufficient.
For Orca, it depends on the number of executions including the number of months of data. assuming 5/day * 360 days * 200 pipelines * 5MB/pipeline = 360Gi.
For Clouddriver, it depends on the number of namespaces and number of resources. assuming 200 namespaces, 20 deployments/namespace * 5MB/deployment = 20Gi Best Practices of Aurora DB’s Document .
Elastic Caching Redis(5.0.6): Recommended Size of the Cluster to be “cache.r6g.large” gate+other services. Typically, one redis instance is adequate for all services (gate, fiat, Orca(?)).
Identify the proxy configuration for accessing any resources. The example “JAVA_OPTS” for http.proxyHost and http.noProxyHosts values need to be defined. We need to add all ISD-services to noProxyHosts.
Identify the SSO used (SAML(e.g. Okta, Jumpcloud), OAuth, AD/LDAP).
Admin User: Create a service-account-user which will act as an admin.
Admin group(s): Identify groups that will give admin-rights to users if they belong to any one of the groups.
RBAC: Define the groups/roles that are needed for organization.
Identify URLs for the application: Three URLs are required at a minimum: isd, isd-gate, and spin.domain-name.com. Two additional URLs may be needed depending on the usage of Separate Spinnaker and Agent based deployments.
Decide on how the traffic from the URLs will be routed to the Kubernetes services: Ingress(nginx, other), ISTIO-gw, or LoadBalancer.
Decide on where TLS termination will happen: Ingress, Load Balancer, or gate+UI.
Decide on how the TLS certificates will be created: cert-manager, Cloud(e.g AWS), or custom certificates.
ISD support for git-encrypted, k8s, and Vault secrets are, already built-in.
Decide where do we want to store secrets: git-encrypted, k8s, vault or other (e.g. AWS Secret Manager, Azure KV, CyberArk, etc.).
Should any customization be required, this needs to be included in the helm chart.
Decide where we want to store kubeconfig files and other secrets “files” and make them available to the application.
Depending on the assessment of the number of cloud accounts, number of objects, etc. an initial sizing will be recommended. Note that in a micro-services architecture, sizing of each component involves a bit of trial-and-error as it depends on the usage pattern. It is best to start with larger values and reduce them to avoid surprises. If cost-saving is a key consideration, start with lower values and increase them based on experience and usage metrics.
Prometheus bases alerting is available by enabling “central monitoring” in the values.yaml. If we want to use any other monitoring, this will need to be configured separately. Typically, the customer’s infrastructure team takes care of this based on the Prometheus alerts that we provide.
Data backup is typically done via the cloud provider. Should we decide to do it ourselves, pipelines need to be set up to back up the data at regular intervals.
Pipeline backup is done via a cron trigger for the syncToGit pipeline in opsmx-gitops application.
If there are any custom CAs or self-signed CAs that need to be honored, they need to be included in oes-cacerts as mentioned .