The Tale of a 504 Gateway Timeout

3 min readDec 3, 2024

It was a typical Monday morning when the mystery unfolded. A routine check of the application exposed through Kubernetes ingress greeted me with an unwelcome sight — a 504 Gateway Timeout error staring back at me. “Ah, the dreaded 504,” I thought. This error wasn’t just a glitch; it was a quest, a puzzle waiting to be solved.

And so began the journey of narrowing down the culprit. Here’s how it unfolded:

The Scene: A Kubernetes Cluster with an Ingress

The application was deployed in a Kubernetes cluster, designed to gracefully handle requests via an ingress. Everything appeared to be in order:

The ingress rules were configured.
The service and deployment were up and running.
DNS was pointing correctly to the ingress controller.

Yet, the application was stubbornly unresponsive, throwing a 504 Gateway Timeout.

Step 1: Verifying the Ingress

The first suspect was the ingress controller. Was the request even reaching the backend? I started by tailing the ingress logs:

kubectl logs -n ingress-nginx <ingress-controller-pod>
....
.
.
2024/12/03 07:59:14 [error] 67#67: *3844 upstream timed out (110: Connection timed out) while connecting to upstream, client: 1.1.1.1, server: test.example.com, request: "GET / HTTP/2.0", upstream: "http://10.10.01.01:80/", host: "test.example.com"

The logs confirmed that the ingress was receiving the request but couldn’t connect to the backend. The error was clear:

upstream timed out (110: Connection timed out) while connecting to upstream.

Step 2: Checking the Service

With the ingress off the hook, the spotlight turned to the Kubernetes service. I ran:

kubectl describe svc <service-name>

Everything looked fine — the correct port, target, and endpoints were listed. But something nagged at me: was the service actually forwarding traffic to the pods?

Step 3: Investigating Pod Health

The pods were up, or so it seemed. I dug deeper by describing the pods:

kubectl describe pod <pod-name>

The culprit revealed itself in the readiness probe logs:

Readiness probe failed: Get "http://10.10.01.01:80/": context deadline exceeded.

Bingo! The readiness probe was failing, signaling that the pod wasn’t ready to serve traffic.

Step 4: Testing the Backend

To confirm, I manually tested the backend:

Port-forwarded the service to access the pod directly:

kubectl port-forward svc/<service-name> 8080:80

2. Sent a request:

curl http://localhost:8080/
.
.
.
200 OK
.
.
.

Got a Response!!!. The backend was reachable, and this made it more trickier.

Step 5: Uncovering the Root Cause

The failing readiness probe hinted at two possible issues:

The application was not responding on the configured port.
It was taking too long to initialize.

To verify, I increased the readiness probe timeout in the deployment YAML:

readinessProbe:
  httpGet:
    path: /
    port: 80
  timeoutSeconds: 5
  periodSeconds: 10

Re-deploying this configuration allowed the probe to pass, but only after a significant delay. Logs from the application revealed it was initializing slowly due to insufficient resources.

The Fix: Scaling the Resources

The solution was to allocate more CPU and memory to the pod:

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1"

After deploying the updated configuration, the application was ready almost instantly, and the readiness probe passed without issue.

Step 6: The Happy Ending

With the backend healthy, I tested the ingress again:

curl https://test.example.com/

And voilà! The application responded with a crisp, clean page. The 504 Gateway Timeout was banished.

Lessons Learned

504 Gateway Timeouts are often caused by backend issues, not the ingress itself.
Always start troubleshooting from the ingress logs and move backward to the application.
Use readiness and liveness probes effectively to catch initialization issues early.
Resource allocation can make or break your application’s performance.

As I wrapped up this mystery, I couldn’t help but feel a sense of accomplishment. Kubernetes may throw challenges our way, but with patience and a systematic approach, even the most stubborn errors can be resolved. Until the next adventure!

The Tale of a 504 Gateway Timeout

The Scene: A Kubernetes Cluster with an Ingress

Lessons Learned

Written by Anju

Responses (1)