The Tale of a 504 Gateway Timeout

Anju
3 min readDec 3, 2024

--

It was a typical Monday morning when the mystery unfolded. A routine check of the application exposed through Kubernetes ingress greeted me with an unwelcome sight — a 504 Gateway Timeout error staring back at me. “Ah, the dreaded 504,” I thought. This error wasn’t just a glitch; it was a quest, a puzzle waiting to be solved.

And so began the journey of narrowing down the culprit. Here’s how it unfolded:

The Scene: A Kubernetes Cluster with an Ingress

The application was deployed in a Kubernetes cluster, designed to gracefully handle requests via an ingress. Everything appeared to be in order:

  • The ingress rules were configured.
  • The service and deployment were up and running.
  • DNS was pointing correctly to the ingress controller.

Yet, the application was stubbornly unresponsive, throwing a 504 Gateway Timeout.

Step 1: Verifying the Ingress

The first suspect was the ingress controller. Was the request even reaching the backend? I started by tailing the ingress logs:

kubectl logs -n ingress-nginx <ingress-controller-pod>
....
.
.
2024/12/03 07:59:14 [error] 67#67: *3844 upstream timed out (110: Connection timed out) while connecting to upstream, client: 1.1.1.1, server: test.example.com, request: "GET / HTTP/2.0", upstream: "http://10.10.01.01:80/", host: "test.example.com"

The logs confirmed that the ingress was receiving the request but couldn’t connect to the backend. The error was clear:

upstream timed out (110: Connection timed out) while connecting to upstream.

Step 2: Checking the Service

With the ingress off the hook, the spotlight turned to the Kubernetes service. I ran:

kubectl describe svc <service-name>

Everything looked fine — the correct port, target, and endpoints were listed. But something nagged at me: was the service actually forwarding traffic to the pods?

Step 3: Investigating Pod Health

The pods were up, or so it seemed. I dug deeper by describing the pods:

kubectl describe pod <pod-name>

The culprit revealed itself in the readiness probe logs:

Readiness probe failed: Get "http://10.10.01.01:80/": context deadline exceeded.

Bingo! The readiness probe was failing, signaling that the pod wasn’t ready to serve traffic.

Step 4: Testing the Backend

To confirm, I manually tested the backend:

  1. Port-forwarded the service to access the pod directly:
kubectl port-forward svc/<service-name> 8080:80

2. Sent a request:

curl http://localhost:8080/
.
.
.
200 OK
.
.
.

Got a Response!!!. The backend was reachable, and this made it more trickier.

Step 5: Uncovering the Root Cause

The failing readiness probe hinted at two possible issues:

  1. The application was not responding on the configured port.
  2. It was taking too long to initialize.

To verify, I increased the readiness probe timeout in the deployment YAML:

readinessProbe:
httpGet:
path: /
port: 80
timeoutSeconds: 5
periodSeconds: 10

Re-deploying this configuration allowed the probe to pass, but only after a significant delay. Logs from the application revealed it was initializing slowly due to insufficient resources.

The Fix: Scaling the Resources

The solution was to allocate more CPU and memory to the pod:

resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"

After deploying the updated configuration, the application was ready almost instantly, and the readiness probe passed without issue.

Step 6: The Happy Ending

With the backend healthy, I tested the ingress again:

curl https://test.example.com/

And voilà! The application responded with a crisp, clean page. The 504 Gateway Timeout was banished.

Lessons Learned

  • 504 Gateway Timeouts are often caused by backend issues, not the ingress itself.
  • Always start troubleshooting from the ingress logs and move backward to the application.
  • Use readiness and liveness probes effectively to catch initialization issues early.
  • Resource allocation can make or break your application’s performance.

As I wrapped up this mystery, I couldn’t help but feel a sense of accomplishment. Kubernetes may throw challenges our way, but with patience and a systematic approach, even the most stubborn errors can be resolved. Until the next adventure!

--

--

Anju
Anju

Written by Anju

A DevOps engineer who loves automating everything (almost), exploring new places, and finding peace in nature. Always looking for the next adventure!

Responses (1)