This is the multi-page printable view of this section. Click here to print.
Managing Workflow
- 1: Tuning Component Settings
- 2: Configure DNS
- 3: Deploy Hooks
- 4: Platform Logging
- 5: Platform Monitoring
- 6: Production Deployments
- 7: Upgrading Workflow
1 - Tuning Component Settings
After you add the Drycc Chart Repository, you can customize the chart using
helm inspect values drycc/workflow > values.yaml
before using helm install
to complete the
installation.
There are a few ways to customize the respective component:
-
If the value is exposed in the
values.yaml
file as derived above, one may modify the section of the component to tune these settings. The modified value(s) will then take effect at chart installation or release upgrade time via either of the two respective commands:$ helm install drycc oci://registry.drycc.cc/charts/workflow \ -n drycc \ --namespace drycc \ -f values.yaml $ helm upgrade drycc oci://registry.drycc.cc/charts/workflow \ -n drycc \ --namespace drycc \ -f values.yaml
-
If the value hasn’t yet been exposed in the
values.yaml
file, one may edit the component deployment with the tuned setting. Here we edit thedrycc-controller
deployment:$ kubectl --namespace drycc edit deployment drycc-controller
Add/edit the setting via the appropriate environment variable and value under the
env
section and save. The updated deployment will recreate the component pod with the new/modified setting. -
Lastly, one may also fetch and edit the chart as served by version control/the chart repository itself:
$ helm fetch oci://registry.drycc.cc/charts/workflow --untar $ $EDITOR workflow/charts/controller/templates/controller-deployment.yaml
Then run
helm install ./workflow --namespace drycc --name drycc
to apply the changes, orhelm upgrade drycc ./workflow
if the cluster is already running.
Setting Resource limits
You can set resource limits to Workflow components by modifying the values.yaml file fetched
earlier. This file has a section for each Workflow component. To set a limit to any Workflow
component just add resources
in the section and set them to the appropriate values.
Below is an example of how the builder section of values.yaml
might look with CPU and memory
limits set:
builder:
imageOrg: "drycc"
imagePullPolicy: "Always"
imageTag: "canary"
resources:
limits:
cpu: 1000m
memory: 2048Mi
requests:
cpu: 500m
memory: 1024Mi
Customizing the Builder
The following environment variables are tunable for the Builder component:
Setting | Description |
---|---|
DEBUG | Enable debug log output (default: false) |
BUILDER_POD_NODE_SELECTOR | A node selector setting for builder job. As it may sometimes consume a lot of node resources, one may want a given builder job to run in a specific node only, so it won’t affect critical nodes. for example pool:testing,disk:magnetic |
Customizing the Controller
The following environment variables are tunable for the Controller component:
Setting | Description |
---|---|
REGISTRATION_MODE | set registration to “enabled”, “disabled”, or “admin_only” (default: “admin_only”) |
GUNICORN_WORKERS | number of gunicorn workers spawned to process requests (default: CPU cores * 4 + 1) |
RESERVED_NAMES | a comma-separated list of names which applications cannot reserve for routing (default: “drycc, drycc-builder”) |
DRYCC_DEPLOY_HOOK_URLS | a comma-separated list of URLs to send deploy hooks to. |
DRYCC_DEPLOY_HOOK_SECRET_KEY | a private key used to compute the HMAC signature for deploy hooks. |
DRYCC_DEPLOY_REJECT_IF_PROCFILE_MISSING | rejects a deploy if the previous build had a Procfile but the current deploy is missing it. A 409 is thrown in the API. Prevents accidental process types removal. (default: “false”, allowed values: “true”, “false”) |
DRYCC_DEPLOY_PROCFILE_MISSING_REMOVE | when turned on (default) any missing process type in a Procfile compared to the previous deploy is removed. When set to false will allow an empty Procfile to go through without removing missing process types, note that new images, configs and so on will get updated on all proc types. (default: “true”, allowed values: “true”, “false”) |
DRYCC_DEFAULT_CONFIG_TAGS | set tags for all applications by default, for example: ‘{“role”: “worker”}’. (default: ‘’) |
KUBERNETES_NAMESPACE_DEFAULT_QUOTA_SPEC | set resource quota to application namespace by setting ResourceQuota spec, for example: {"spec":{"hard":{"pods":"10"}}} , restrict app owner to spawn more then 10 pods (default: “”, no quota will be applied to namespace) |
LDAP authentication settings
Configuration options for LDAP authentication are detailed here.
The following environment variables are available for enabling LDAP authentication of user accounts in the Passport component:
Setting | Description |
---|---|
LDAP_ENDPOINT | The URI of the LDAP server. If not specified, LDAP authentication is not enabled (default: “”, example: ldap://hostname ). |
LDAP_BIND_DN | The distinguished name to use when binding to the LDAP server (default: “”) |
LDAP_BIND_PASSWORD | The password to use with LDAP_BIND_DN (default: “”) |
LDAP_USER_BASEDN | The distinguished name of the search base for user names (default: “”) |
LDAP_USER_FILTER | The name of the login field in the users search base (default: “username”) |
LDAP_GROUP_BASEDN | The distinguished name of the search base for user’s groups names (default: “”) |
LDAP_GROUP_FILTER | The filter for user’s groups (default: “”, example: objectClass=person ) |
Global and per application settings
Setting | Description |
---|---|
DRYCC_DEPLOY_BATCHES | the number of pods to bring up and take down sequentially during a scale (default: number of available nodes) |
DRYCC_DEPLOY_TIMEOUT | deploy timeout in seconds per deploy batch (default: 120) |
IMAGE_PULL_POLICY | the Kubernetes image pull policy for application images (default: “IfNotPresent”) (allowed values: “Always”, “IfNotPresent”) |
KUBERNETES_DEPLOYMENTS_REVISION_HISTORY_LIMIT | how many revisions Kubernetes keeps around for a given Deployment (default: all revisions) |
KUBERNETES_POD_TERMINATION_GRACE_PERIOD_SECONDS | how many seconds Kubernetes waits for a pod to finish work after a SIGTERM before sending SIGKILL (default: 30) |
See the Deploying Apps guide for more detailed information on those.
Customizing the Database
The following environment variables are tunable for the Database component:
Setting | Description |
---|---|
BACKUP_FREQUENCY | how often the database should perform a base backup (default: “12h”) |
BACKUPS_TO_RETAIN | number of base backups the backing store should retain (default: 5) |
Customizing Fluentbit
The following values can be changed in the values.yaml
file or by using the --values
flag with the Helm CLI.
Key | Description |
---|---|
config.service | The service section defines the global properties of the service. |
config.inputs | An input section defines a source (related to an input plugin). |
config.filters | A filter section defines a filter (related to a filter plugin) |
config.outputs | The outputs section specifies a destination that certain records should follow after a Tag match. |
For more information about the various variables that can be set please see the fluentbit.
Customizing the Monitor
Grafana
We have exposed some of the more useful configuration values directly in the chart. This allows them to be set using either the values.yaml
file or by using the --set
flag with the Helm CLI. You can see these options below:
Setting | Default Value | Description |
---|---|---|
user | “admin” | The first user created in the database (this user has admin privileges) |
password | “admin” | Password for the first user. |
allow_sign_up | “true” | Allows users to sign up for an account. |
For a list of other options you can set by using environment variables please see the configuration file in GitHub.
Victoriametrics
You can find a list of values that can be set using environment variables here.
Customizing the Registry
The Registry component can be tuned by following the distribution config doc.
2 - Configure DNS
For example, assuming example.com
were a cluster’s domain:
- The controller should be accessible at
drycc.example.com
- Applications should be accessible (by default) at
<application name>.example.com
Given that this is the case, the primary objective in configuring DNS is to direct traffic for all subdomains of a cluster’s domain to the cluster node(s) hosting the platform’s router component, which can direct traffic within the cluster to the correct endpoints.
With a Load Balancer
Generally, it is recommended that a [load balancer][] be used to direct inbound traffic to one or more routers. In such a case, configuring DNS is as simple as defining a wildcard record in DNS that points to the load balancer.
For example, assuming a domain of example.com
:
- An
A
record enumerating each of your load balancer(s) IPs (i.e. DNS round-robining) - A
CNAME
record referencing an existing fully-qualified domain name for the load balancer- Per AWS’ own documentation, this is the recommended strategy when using AWS Elastic Load Balancers, as ELB IPs may change over time.
DNS for any applications using a “custom domain” (a fully-qualified domain name that is not a subdomain of the cluster’s own domain) can be configured by creating a CNAME
record that references the wildcard record described above.
Although it depends on your distribution of Kubernetes and your underlying infrastructure, in many cases, the IP(s) or existing fully-qualified domain name of a load balancer can be determined directly using the kubectl
tool:
$ kubectl --namespace=istio-nginx describe service | grep "LoadBalancer"
LoadBalancer Ingress: a493e4e58ea0511e5bb390686bc85da3-1558404688.us-west-2.elb.amazonaws.com
The LoadBalancer Ingress
field typically describes an existing domain name or public IP(s). Note that if Kubernetes is able to automatically provision a load balancer for you, it does so asynchronously. If the command shown above is issued very soon after Workflow installation, the load balancer may not exist yet.
Without a Load Balancer
On some platforms (Minikube, for instance), a load balancer is not an easy or practical thing to provision. In these cases, one can directly identify the public IP of a Kubernetes node that is hosting a router pod and use that information to configure the local /etc/hosts
file.
Because wildcard entries do not work in a local /etc/hosts
file, using this strategy may result in frequent editing of that file to add fully-qualified subdomains of a cluster for each application added to that cluster. Because of this, a more viable option may be to utilize the xip.io service.
In general, for any IP, a.b.c.d
, the fully-qualified domain name any-subdomain.a.b.c.d.xip.io
will resolve to the IP a.b.c.d
. This can be enormously useful.
To begin, find the node(s) hosting router instances using kubectl
:
$ kubectl --namespace=istio-ingress describe pod | grep Node:
Node: ip-10-0-0-199.us-west-2.compute.internal/10.0.0.199
Node: ip-10-0-0-198.us-west-2.compute.internal/10.0.0.198
The command will display information for every router pod. For each, a node name and IP are displayed in the Node
field. If the IPs appearing in these fields are public, any of these may be used to configure your local /etc/hosts
file or may be used with xip.io. If the IPs shown are not public, further investigation may be needed.
You can list the IP addresses of a node using kubectl
:
$ kubectl describe node ip-10-0-0-199.us-west-2.compute.internal
# ...
Addresses: 10.0.0.199,10.0.0.199,54.218.85.175
# ...
Here, the Addresses
field lists all the node’s IPs. If any of them are public, again, they may be used to configure your local /etc/hosts
file or may be used with xip.io.
Tutorial: Configuring DNS with Google Cloud DNS
In this section, we’ll describe how to configure Google Cloud DNS for routing your domain name to your Drycc cluster.
We’ll assume the following in this section:
- Your Ingress service has a load balancer in front of it.
- The load balancer need not be cloud based, it just needs to provide a stable IP address or a stable domain name.
- You have the
mystuff.com
domain name registered with a registrar.- Replace your domain name with
mystuff.com
in the instructions to follow.
- Replace your domain name with
- Your registrar lets you alter the nameservers for your domain name (most registrars do).
Here are the steps for configuring cloud DNS to route to your Drycc cluster:
- Get the load balancer IP or domain name
- If you are on Google Container Engine, you can run
kubectl get svc -n istio-ingress
and look for theLoadBalancer Ingress
column to get the IP address
- Create a new Cloud DNS Zone (on the console:
Networking
=>Cloud DNS
, then click onCreate Zone
) - Name your zone, and set the DNS name to
mystuff.com.
(note the.
at the end). - Click on the
Create
button - Click on the
Add Record Set
button on the resulting page - If your load balancer provides a stable IP address, enter the following fields in the resulting form:
DNS Name
:*
Resource Record Type
:A
TTL
: the DNS TTL of your choosing. If you’re testing or you anticipate that you’ll tear down and rebuild many drycc clusters over time, we recommend a low TTLIPv4 Address
: The IP that you got in the very first step- Click the
Create
button - If your load balancer provides the stable domain name
lbdomain.com
, enter the following fields in the resulting form: DNS Name
:*
Resource Record Type
:CNAME
TTL
: the DNS TTL of your choosing. If you’re testing or you anticipate that you’ll tear down and rebuild many drycc clusters over time, we recommend a low TTLCanonical name
:lbdomain.com.
(note the.
at the end)- Click on the
Create
button - In your domain registrar, set the nameservers for your
mystuff.com
domain to the ones under thedata
column in theNS
record on the same page. They’ll often be something like the below (note the trailing.
characters).
ns-cloud-b1.googledomains.com.
ns-cloud-b2.googledomains.com.
ns-cloud-b3.googledomains.com.
ns-cloud-b4.googledomains.com.
Note: If you ever have to re-create your Drycc cluster, simply go back to step 6.4 or 7.4 (depending on your load balancer) and change the IP address or domain name to the new value. You may have to wait for the TTL you set to expire.
Testing
To test that traffic reaches its intended destination, a request can be sent to the Drycc controller like so (do not forget the trailing slash!):
curl http://drycc.example.com/v2/
Or:
curl http://drycc.54.218.85.175.xip.io/v2/
Since such requests require authentication, a response such as the following should be considered an indicator of success:
{"detail":"Authentication credentials were not provided."}
3 - Deploy Hooks
It’s useful to help keep the development team informed about deploys, while it can also be used to integrate different systems together.
After one or more hooks are set up, hook output and errors appear in your drycc grafana app logs:
2011-03-15T15:07:29-07:00 drycc[api]: Deploy hook sent to http://drycc.rocks
Deploy hooks are a generic HTTP hook. An administrator can create and configure multiple deploy hooks by tuning the controller settings via the Helm chart.
HTTP POST Hook
The HTTP deploy hook performs an HTTP POST to a URL. The parameters included in the request are the
same as the variables available in the hook message: app
, release
, release_summary
, sha
and
user
. See below for their descriptions:
app=secure-woodland&release=v4&release_summary=gabrtv%20deployed%35b3726&sha=35b3726&user=gabrtv
Optionally, if a deploy hook secret key is added to the controller through
tuning the controller settings, a new Authorization
header will be
present in the POST request. The value of this header is computed as the HMAC hex digest of the
request URL, using the secret as the key.
In order to authenticate that this request came from Workflow, use the secret key, the full URL and the HMAC-SHA1 hashing algorithm to compute the signature. In Python, that would look something like this:
import hashlib
import hmac
hmac.new("my_secret_key", "http://drycc.rocks?app=secure-woodland&release=v4&release_summary=gabrtv%20deployed%35b3726&sha=35b3726&user=gabrtv", digestmod=hashlib.sha1).hexdigest()
If the value of the computed HMAC hex digest and the value in the Authorization
header are
identical, then the request came from Workflow.
Note
When computing the signature, ensure that the URL parameters are in alphabetic order. This is critical when computing the cryptographic signature as most web applications don’t care about the order of the HTTP parameters, but the cryptographic signature will not be the same.4 - Platform Logging
We’re working with Quickwit to bring you an application log cluster and search interface.
Architecture Diagram
┌───────────┐ ┌───────────┐
│ Container │ │ Grafana │
└───────────┘ └───────────┘
│ ^
log |
│ |
˅ │
┌───────────┐ ┌───────────┐
│ Fluentbit │─────otel/grpc────>│ Quickwit │
└───────────┘ └───────────┘
Default Configuration
Fluent Bit is based on a pluggable architecture where different plugins play a major role in the data pipeline, with more than 70 built-in plugins available. Please refer to the charts values.yaml for specific configurations.
5 - Platform Monitoring
Description
We now include a monitoring stack for introspection on a running Kubernetes cluster. The stack includes 4 components:
- kube-state-metrics, kube-state-metrics (KSM) is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.
- Node Exporter, Prometheus exporter for hardware and OS metrics exposed by *NIX kernels.
- Victoriametrics, a Cloud Native Computing Foundation project, is a systems and service monitoring system.
- Grafana, Graphing tool for time series data
Architecture Diagram
┌────────────────┐
│ HOST │
│ node-exporter │◀──┐ ┌──────────────────┐
└────────────────┘ │ │kube-state-metrics│
│ └──────────────────┘
┌────────────────┐ │ ▲
│ HOST │ │ ┌─────────────────┐ │
│ node-exporter │◀──┼────│ victoriametrics │─────────────┘
└────────────────┘ │ └─────────────────┘
│ ▲
┌───────────────┐ │ │
│ HOST │ │ ▼
│ node-exporter│◀───┘ ┌──────────┐
└───────────────┘ │ Grafana │
└──────────┘
Grafana
Grafana allows users to create custom dashboards that visualize the data captured to the running VictoriaMetrics component. By default Grafana is exposed using a service annotation through the router at the following URL: http://grafana.mydomain.com
. The default login is admin/admin
. If you are interested in changing these values please see [Tuning Component Settings][].
Grafana will preload several dashboards to help operators get started with monitoring Kubernetes and Drycc Workflow. These dashboards are meant as starting points and don’t include every item that might be desirable to monitor in a production installation.
Drycc Workflow monitoring by default does not write data to the host filesystem or to long-term storage. If the Grafana instance fails, modified dashboards are lost.
Production Configuration
A production install of Grafana should have the following configuration values changed if possible:
- Change the default username and password from
admin/admin
. The value for the password is passed in plain text so it is best to set this value on the command line instead of checking it into version control. - Enable persistence
- Use a supported external database such as mysql or postgres. You can find more information here
On Cluster Persistence
Enabling persistence will allow your custom configuration to persist across pod restarts. This means that the default SQLite database (which stores things like sessions and user data) will not disappear if you upgrade the Workflow installation.
If you wish to have persistence for Grafana you can set enabled
to true
in the values.yaml
file before running helm install
.
grafana:
# Configure the following ONLY if you want persistence for on-cluster grafana
# GCP PDs and EBS volumes are supported only
persistence:
enabled: true # Set to true to enable persistence
size: 5Gi # PVC size
Off Cluster Grafana
If you wish to provide your own Grafana instance you can set grafana.enabled
in the values.yaml
file before running helm install
.
VictoriaMetrics
VictoriaMetrics is a fast and scalable open source time series database and monitoring solution that lets users build a monitoring platform without scalability issues and minimal operational burden, it is fully compatible with the prometheus format.
On Cluster Persistence
You can set node-exporter
and kube-state-metrics
to true
or false
in the values.yaml
.
- If you wish to have persistence for VictoriaMetrics you can set
enabled
totrue
in thevalues.yaml
file before runninghelm install
.
victoriametrics:
vmstorage:
replicas: 3
extraArgs:
- --retentionPeriod=30d
temporary:
enabled: true
size: 5Gi
storageClass: "toplvm-ssd"
persistence:
enabled: true
size: 10Gi
storageClass: "toplvm-hdd"
node-exporter:
enabled: true
kube-state-metrics:
enabled: true
Off Cluster VictoriaMetrics
To use false VictoriaMetrics, please provide the following values in the values.yaml
file before running helm install
.
victoriametrics.enabled=false
grafana.prometheusUrl="http://my.prometheus.url:9090"
controller.prometheusUrl="http://my.prometheus.url:9090"
6 - Production Deployments
Running Workflow without Drycc Storage
In production, persistent storage can be achieved by running an external object store. For users on AWS, GCE/GKE, or Azure, the convenience of Amazon S3, Google GCS, or Microsoft Azure Storage makes running a Storage-less Workflow cluster quite reasonable. For users who have restrictions on using external object storage, Swift object storage can be an option.
Running a Workflow cluster without Storage provides several advantages:
- Removes state from worker nodes
- Reduces resource usage
- Reduces complexity and operational burden of managing Workflow
See Configuring Object Storage for details on removing this operational complexity.
Review Security Considerations
There are some additional security-related considerations when running Workflow in production. See [Security Considerations][] for details.
Registration is Admin-Only
By default, registration with the Workflow controller is in “admin_only” mode. The first user to run a drycc register
command becomes the initial “admin” user, and registrations after that are disallowed unless requested by an admin.
Please see the following documentation to learn about changing registration mode:
Disable Grafana Signups
It is also recommended to disable signups for the Grafana dashboards.
Please see the following documentation to learn about disabling Grafana signups:
7 - Upgrading Workflow
This upgrade process requires:
- Helm version 2.1.0 or newer
- Configured Off-Cluster Storage
Upgrade Process
Note
If upgrading from a Helm Classic install, you’ll need to ‘migrate’ the cluster to a Kubernetes Helm installation. See [Workflow-Migration][] for steps.Step 1: Apply the Workflow upgrade
Helm will remove all components from the previous release. Traffic to applications deployed through Workflow will continue to flow during the upgrade. No service interruptions should occur.
If Workflow is not configured to use off-cluster Postgres, the Workflow API will experience a brief period of downtime while the database recovers from backup.
First, find the name of the release helm gave to your deployment with helm ls
, then run
$ helm upgrade <release-name> oci://registry.drycc.cc/charts/workflow
Note: If using off-cluster object storage on gcs and/or off-cluster registry using gcr and intending to upgrade from a pre-v2.10.0
chart to v2.10.0
or greater, the key_json
values will now need to be pre-base64-encoded. Therefore, assuming the rest of the custom/off-cluster values are defined in the existing values.yaml
used for previous installs, the following may be run:
$ B64_KEY_JSON="$(cat ~/path/to/key.json | base64 -w 0)"
$ helm upgrade <release_name> drycc/workflow -f values.yaml --set gcs.key_json="${B64_KEY_JSON}",registry-token-refresher.gcr.key_json="${B64_KEY_JSON}"
Alternatively, simply replace the appropriate values in values.yaml and do without the --set
parameter. Make sure to wrap it in single quotes as double quotes will give a parser error when
upgrading.
Step 2: Verify Upgrade
Verify that all components have started and passed their readiness checks:
$ kubectl --namespace=drycc get pods
NAME READY STATUS RESTARTS AGE
drycc-builder-2448122224-3cibz 1/1 Running 0 5m
drycc-controller-1410285775-ipc34 1/1 Running 3 5m
drycc-controller-celery-694f75749b-cmxxn 3/3 Running 0 5m
drycc-database-e7c5z 1/1 Running 0 5m
drycc-fluentbit-45h7j 1/1 Running 0 5m
drycc-fluentbit-4z7lw 1/1 Running 0 5m
drycc-fluentbit-k2wsw 1/1 Running 0 5m
drycc-fluentbit-skdw4 1/1 Running 0 5m
drycc-valkey-8nazu 1/1 Running 0 5m
drycc-grafana-tm266 1/1 Running 0 5m
drycc-registry-1814324048-yomz5 1/1 Running 0 5m
drycc-registry-proxy-4m3o4 1/1 Running 0 5m
drycc-registry-proxy-no3r1 1/1 Running 0 5m
drycc-registry-proxy-ou8is 1/1 Running 0 5m
drycc-registry-proxy-zyajl 1/1 Running 0 5m
Step 3: Upgrade the Drycc Client
Users of Drycc Workflow should now upgrade their drycc client to avoid getting WARNING: Client and server API versions do not match. Please consider upgrading.
warnings.
curl -sfL https://www.drycc.cc/install-cli.sh | bash - && sudo mv drycc $(which drycc)