This is the multi-page printable view of this section. Click here to print.
Troubleshooting
1 - Troubleshooting Workflow
A Component Fails to Start
For information on troubleshooting a failing component, see Troubleshooting with Kubectl.
An Application Fails to Start
For information on troubleshooting application deployment issues, see Troubleshooting Applications.
Permission denied (publickey)
The most common problem for this issue is the user forgetting to run drycc keys:add
or add their
private key to their SSH agent. To do so, run ssh-add ~/.ssh/id_rsa
and try running
git push drycc master
again.
If you happen get a Could not open a connection to your authentication agent
error after trying to run ssh-add
command above, you may need to load the SSH
agent environment variables issuing the eval "$(ssh-agent)"
command before.
Other Issues
Running into something not detailed here? Please open an issue or hop into #community on Slack for help!
2 - Troubleshooting using Kubectl
This document describes how one can use kubectl
to debug any issues with the cluster.
Diving into the Components
Using kubectl
, one can inspect the cluster’s current state. When Workflow is installed
with helm
, Workflow is installed into the drycc
namespace. To inspect if Workflow is
running, run:
$ kubectl --namespace=drycc get pods
NAME READY STATUS RESTARTS AGE
drycc-builder-gqum7 0/1 ContainerCreating 0 4s
drycc-controller-h6lk6 0/1 ContainerCreating 0 4s
drycc-controller-celery-cmxxn 0/3 ContainerCreating 0 4s
drycc-database-56v39 0/1 ContainerCreating 0 4s
drycc-fluentbit-xihr1 0/1 Pending 0 2s
drycc-storage-c2exb 0/1 Pending 0 3s
drycc-grafana-9ccur 0/1 Pending 0 3s
drycc-registry-5bor6 0/1 Pending 0 3s
Note
tip To save precious keystrokes, aliaskubectl --namespace=drycc
to kd
so it is easier to type
in the future.
To fetch the logs of a specific component, use kubectl logs
:
$ kubectl --namespace=drycc logs drycc-controller-h6lk6
system information:
Django Version: 1.9.6
Python 3.5.1
addgroup: gid '0' in use
Django checks:
System check identified no issues (2 silenced).
[...]
To dive into a running container to inspect its environment, use kubectl exec
:
$ kubectl --namespace=drycc exec -it drycc-database-56v39 gosu postgres psql
psql (13.4 (Debian 13.4-1.pgdg100+1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------------+----------+----------+------------+------------+-----------------------
drycc_controller | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
drycc_passport | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
postgres=# \connect drycc_controller
You are now connected to database "drycc_controller" as user "postgres".
drycc_controller=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------------------------+-------+-------------------
public | api_app | table | drycc_controller
public | api_build | table | drycc_controller
public | api_certificate | table | drycc_controller
public | api_config | table | drycc_controller
public | api_domain | table | drycc_controller
public | api_key | table | drycc_controller
public | api_push | table | drycc_controller
public | api_release | table | drycc_controller
public | auth_group | table | drycc_controller
--More--
drycc_controller=# SELECT COUNT(*) from api_app;
count
-------
0
(1 row)
3 - Troubleshooting Applications
Application has a Dockerfile, but a Buildpack Deployment Occurs
When you deploy an application to Workflow using git push drycc master
and the Builder
attempts to deploy using the Buildpack workflow, check the following steps:
- Are you deploying the correct project?
- Are you pushing the correct git branch (
git push drycc <branch>
)? - Is the
Dockerfile
in the project’s root directory? - Have you committed the
Dockerfile
to the project?
Application was Deployed, but is Failing to Start
If you deployed your application but it is failing to start, you can use Drycc Grafana to check why the application fails to boot. Sometimes, the application container may fail to boot without logging any information about the error. This typically occurs when the healthcheck configured for the application fails. In this case, you can start by troubleshooting using kubectl. You can inspect the application’s current state by examining the pod deployed in the application’s namespace. To do that, run
$ kubectl --namespace=myapp get pods
NAME READY STATUS RESTARTS AGE
myapp-web-1585713350-3brbo 0/1 CrashLoopBackOff 2 43s
We can then describe the pod and determine why it is failing to boot:
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
43s 43s 1 {default-scheduler } Normal Scheduled Successfully assigned myapp-web-1585713350-3brbo to kubernetes-node-1
41s 41s 1 {kubelet kubernetes-node-1} spec.containers{myapp-web} Normal Created Created container with container id b86bd851a61f
41s 41s 1 {kubelet kubernetes-node-1} spec.containers{myapp-web} Normal Started Started container with container id b86bd851a61f
37s 35s 1 {kubelet kubernetes-node-1} spec.containers{myapp-web} Warning Unhealthy Liveness probe failed: Get http://10.246.39.13:8000/healthz: dial tcp 10.246.39.13:8000: getsockopt: connection refused
In this instance, we set the healthcheck initial delay timeout for the application at 1 second, which is too aggressive. The application needs some time to set up the API server after the container has booted. By increasing the healthcheck initial delay timeout to 10 seconds, the application is able to boot and is responding correctly.
See Custom Health Checks for more information on how to customize the application’s health checks to better suit the application’s needs.