Troubleshooting

Troubleshooting is systematic approach to problem-solving.

1: Troubleshooting Workflow
2: Troubleshooting using Kubectl
3: Troubleshooting Applications

1 - Troubleshooting Workflow

Common issues that users have run into when provisioning Workflow are detailed below.

A Component Fails to Start

For information on troubleshooting a failing component, see Troubleshooting with Kubectl.

An Application Fails to Start

For information on troubleshooting application deployment issues, see Troubleshooting Applications.

Permission denied (publickey)

The most common problem for this issue is the user forgetting to run drycc keys:add or add their private key to their SSH agent. To do so, run ssh-add ~/.ssh/id_rsa and try running git push drycc master again.

If you happen get a Could not open a connection to your authentication agent error after trying to run ssh-add command above, you may need to load the SSH agent environment variables issuing the eval "$(ssh-agent)" command before.

Other Issues

Running into something not detailed here? Please open an issue or hop into #community on Slack for help!

2 - Troubleshooting using Kubectl

Kubernetes provides a command line tool for communicating with a Kubernetes cluster’s control plane, using the Kubernetes API.

This document describes how one can use kubectl to debug any issues with the cluster.

Diving into the Components

Using kubectl, one can inspect the cluster’s current state. When Workflow is installed with helm, Workflow is installed into the drycc namespace. To inspect if Workflow is running, run:

$ kubectl --namespace=drycc get pods
NAME                          READY     STATUS              RESTARTS   AGE
drycc-builder-gqum7            0/1       ContainerCreating   0          4s
drycc-controller-h6lk6         0/1       ContainerCreating   0          4s
drycc-controller-celery-cmxxn  0/3       ContainerCreating   0          4s
drycc-database-56v39           0/1       ContainerCreating   0          4s
drycc-fluentbit-xihr1          0/1       Pending             0          2s
drycc-storage-c2exb            0/1       Pending             0          3s
drycc-grafana-9ccur            0/1       Pending             0          3s
drycc-registry-5bor6           0/1       Pending             0          3s

Note

tip To save precious keystrokes, alias kubectl --namespace=drycc to kd so it is easier to type in the future.

To fetch the logs of a specific component, use kubectl logs:

$ kubectl --namespace=drycc logs drycc-controller-h6lk6
system information:
Django Version: 1.9.6
Python 3.5.1
addgroup: gid '0' in use
Django checks:
System check identified no issues (2 silenced).
[...]

To dive into a running container to inspect its environment, use kubectl exec:

$ kubectl --namespace=drycc exec -it drycc-database-56v39 gosu postgres psql
psql (13.4 (Debian 13.4-1.pgdg100+1))
Type "help" for help.

postgres=# \l
                                                List of databases
     Name          |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-------------------+----------+----------+------------+------------+-----------------------
 drycc_controller  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 drycc_passport    | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 postgres          | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0         | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
                   |          |          |            |            | postgres=CTc/postgres
 template1         | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
                   |          |          |            |            | postgres=CTc/postgres
(4 rows)
postgres=# \connect drycc_controller
You are now connected to database "drycc_controller" as user "postgres".
drycc_controller=# \dt
                                 List of relations
 Schema |              Name              | Type  |      Owner
--------+--------------------------------+-------+-------------------
 public | api_app                        | table | drycc_controller
 public | api_build                      | table | drycc_controller
 public | api_certificate                | table | drycc_controller
 public | api_config                     | table | drycc_controller
 public | api_domain                     | table | drycc_controller
 public | api_key                        | table | drycc_controller
 public | api_push                       | table | drycc_controller
 public | api_release                    | table | drycc_controller
 public | auth_group                     | table | drycc_controller
 --More--
 drycc_controller=# SELECT COUNT(*) from api_app;
 count
-------
     0
(1 row)

3 - Troubleshooting Applications

This document describes how one can troubleshoot common issues when deploying or debugging an application that fails to start or deploy.

Application has a Dockerfile, but a Buildpack Deployment Occurs

When you deploy an application to Workflow using git push drycc master and the Builder attempts to deploy using the Buildpack workflow, check the following steps:

Are you deploying the correct project?
Are you pushing the correct git branch (git push drycc <branch>)?
Is the Dockerfile in the project’s root directory?
Have you committed the Dockerfile to the project?

Application was Deployed, but is Failing to Start

If you deployed your application but it is failing to start, you can use Drycc Grafana to check why the application fails to boot. Sometimes, the application container may fail to boot without logging any information about the error. This typically occurs when the healthcheck configured for the application fails. In this case, you can start by troubleshooting using kubectl. You can inspect the application’s current state by examining the pod deployed in the application’s namespace. To do that, run

$ kubectl --namespace=myapp get pods
NAME                          READY     STATUS                RESTARTS   AGE
myapp-web-1585713350-3brbo    0/1       CrashLoopBackOff      2          43s

We can then describe the pod and determine why it is failing to boot:

Events:
  FirstSeen     LastSeen        Count   From                            SubobjectPath                           Type            Reason          Message
  ---------     --------        -----   ----                            -------------                           --------        ------          -------
  43s           43s             1       {default-scheduler }                                                    Normal          Scheduled       Successfully assigned myapp-web-1585713350-3brbo to kubernetes-node-1
  41s           41s             1       {kubelet kubernetes-node-1}     spec.containers{myapp-web}              Normal          Created         Created container with container id b86bd851a61f
  41s           41s             1       {kubelet kubernetes-node-1}     spec.containers{myapp-web}              Normal          Started         Started container with container id b86bd851a61f
  37s           35s             1       {kubelet kubernetes-node-1}     spec.containers{myapp-web}              Warning         Unhealthy       Liveness probe failed: Get http://10.246.39.13:8000/healthz: dial tcp 10.246.39.13:8000: getsockopt: connection refused

In this instance, we set the healthcheck initial delay timeout for the application at 1 second, which is too aggressive. The application needs some time to set up the API server after the container has booted. By increasing the healthcheck initial delay timeout to 10 seconds, the application is able to boot and is responding correctly.

See Custom Health Checks for more information on how to customize the application’s health checks to better suit the application’s needs.