703 lines
16 KiB
Markdown
703 lines
16 KiB
Markdown
# Troubleshooting Guide
|
|
|
|
Common issues and solutions for API Gateway deployment.
|
|
|
|
## Gateway Issues
|
|
|
|
### Gateway Pods Not Starting
|
|
|
|
**Symptoms**:
|
|
- Gateway pods in `CrashLoopBackOff` or `Error` state
|
|
- Pods continuously restarting
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n <namespace> -l app.kubernetes.io/name=gateway
|
|
|
|
# View pod logs
|
|
kubectl logs -n <namespace> <gateway-pod-name>
|
|
|
|
# Describe pod for events
|
|
kubectl describe pod -n <namespace> <gateway-pod-name>
|
|
```
|
|
|
|
**Common Causes & Solutions**:
|
|
|
|
**1. Configuration Store Connection Failure**
|
|
```bash
|
|
# Check Data Plane Manager is running
|
|
kubectl get pods -n <namespace> -l app=dp-manager
|
|
|
|
# Verify configuration endpoint
|
|
kubectl get configmap <gateway-configmap> -n <namespace> -o yaml | grep endpoint
|
|
|
|
# Expected: Configuration store endpoint URL
|
|
```
|
|
|
|
**2. TLS Certificate Issues**
|
|
```bash
|
|
# Verify TLS secret exists
|
|
kubectl get secret <gateway-tls-secret> -n <namespace>
|
|
|
|
# Check certificate validity
|
|
kubectl get secret <gateway-tls-secret> -n <namespace> -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates
|
|
```
|
|
|
|
**3. Configuration Error**
|
|
```bash
|
|
# Check gateway ConfigMap
|
|
kubectl get configmap <gateway-configmap> -n <namespace> -o yaml
|
|
|
|
# Validate YAML syntax
|
|
kubectl get configmap <gateway-configmap> -n <namespace> -o yaml | yq eval '.'
|
|
```
|
|
|
|
### Gateway Returns 404 for All Routes
|
|
|
|
**Symptoms**:
|
|
- All requests return HTTP 404
|
|
- Routes configured but not working
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check if routes are published in Dashboard
|
|
# Navigate: Services → <service> → Routes
|
|
# Verify: Route status shows "Published"
|
|
|
|
# Check gateway logs for route loading
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway --tail=100 | grep -i route
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Routes Not Published**
|
|
- Routes synced via CLI are NOT active until published
|
|
- Publish each route in Dashboard:
|
|
- Services → Select service → Routes tab
|
|
- Click "Publish" for each route
|
|
- Select appropriate gateway group
|
|
|
|
**2. Wrong Gateway Group**
|
|
```bash
|
|
# Verify gateway group
|
|
kubectl get deployment <gateway-deployment> -n <namespace> -o yaml | grep GATEWAY_GROUP
|
|
|
|
# Expected: Configured group name
|
|
```
|
|
|
|
**3. Host Header Mismatch**
|
|
```bash
|
|
# Test with correct Host header
|
|
curl -H "Host: app.domain.com" http://<gateway-ip>/
|
|
|
|
# Check route configuration
|
|
<cli-tool> dump --server <dashboard-url> --token <TOKEN>
|
|
```
|
|
|
|
### Gateway Service Unavailable (503)
|
|
|
|
**Symptoms**:
|
|
- Requests return HTTP 503
|
|
- Gateway is running but can't reach backends
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check backend service exists
|
|
kubectl get svc -n <namespace> <backend-service-name>
|
|
|
|
# Check backend endpoints
|
|
kubectl get endpoints -n <namespace> <backend-service-name>
|
|
|
|
# Check gateway logs
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway --tail=100 | grep -i "503\|upstream"
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Backend Service Not Found**
|
|
```bash
|
|
# Verify service exists
|
|
kubectl get svc -n <namespace>
|
|
|
|
# Check service name in route config matches
|
|
<cli-tool> dump --server <URL> --token <TOKEN> | grep -A 5 upstream
|
|
```
|
|
|
|
**2. No Healthy Endpoints**
|
|
```bash
|
|
# Check if pods are running
|
|
kubectl get pods -n <namespace> -l app=<backend-app>
|
|
|
|
# Verify endpoints exist
|
|
kubectl get endpoints -n <namespace> <service-name>
|
|
|
|
# If empty, check service selector
|
|
kubectl get svc <service-name> -n <namespace> -o yaml | grep -A 3 selector
|
|
kubectl get pods -n <namespace> --show-labels | grep <label>
|
|
```
|
|
|
|
**3. Service Discovery Not Working**
|
|
- See [Service Discovery Issues](#service-discovery-not-working)
|
|
|
|
## Service Discovery Issues
|
|
|
|
### Service Registry Not Found
|
|
|
|
**Error**: `service registry not found` or `discovery failed`
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check service registry in Dashboard
|
|
# Navigate: Settings → Service Registry
|
|
# Verify: Status shows "Connected" or "Healthy"
|
|
|
|
# Check gateway logs
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway | grep -i discovery
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Service Registry Not Configured**
|
|
- Dashboard → Settings → Service Registry → Add Service Registry
|
|
- Type: Kubernetes
|
|
- Configure:
|
|
```
|
|
Name: kubernetes-cluster
|
|
API Server: https://kubernetes.default.svc.cluster.local:443
|
|
Token Path: /var/run/secrets/kubernetes.io/serviceaccount/token
|
|
```
|
|
|
|
**2. RBAC Permissions Missing**
|
|
```bash
|
|
# Check permissions
|
|
kubectl auth can-i list endpoints --as=system:serviceaccount:<namespace>:default
|
|
|
|
# If "no", create ClusterRoleBinding
|
|
kubectl create clusterrolebinding gateway-discovery \
|
|
--clusterrole=view \
|
|
--serviceaccount=<namespace>:default
|
|
```
|
|
|
|
**3. Service Port Not Named**
|
|
```bash
|
|
# Check service definition
|
|
kubectl get svc <service-name> -n <namespace> -o yaml
|
|
|
|
# Port MUST have a name:
|
|
ports:
|
|
- port: 80
|
|
targetPort: 8000
|
|
name: http # ← Required for service discovery
|
|
```
|
|
|
|
### Endpoints Not Discovered
|
|
|
|
**Symptoms**:
|
|
- Service discovery configured but endpoints not updating
|
|
- Scaling pods doesn't update gateway
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check service endpoints
|
|
kubectl get endpoints -n <namespace> <service-name>
|
|
|
|
# Scale pods and verify endpoints update
|
|
kubectl scale deployment <name> -n <namespace> --replicas=5
|
|
kubectl get endpoints -n <namespace> <service-name>
|
|
|
|
# Check gateway discovers endpoints
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway | grep -i endpoint
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Check Service Registry Connection**
|
|
```bash
|
|
# In Dashboard, verify registry status
|
|
# Settings → Service Registry → kubernetes-cluster
|
|
# Status should be "Connected"
|
|
```
|
|
|
|
**2. Verify Service Name Format**
|
|
```yaml
|
|
# Format: <namespace>/<service-name>:<port-name>
|
|
upstream:
|
|
discovery_type: kubernetes
|
|
service_name: <namespace>/web-service:http
|
|
# NOT: web-service or web-service.<namespace>.svc.cluster.local
|
|
```
|
|
|
|
**3. Restart Gateway Pods**
|
|
```bash
|
|
kubectl rollout restart deployment/<gateway-deployment> -n <namespace>
|
|
```
|
|
|
|
## Ingress & Certificate Issues
|
|
|
|
### Certificate Not Trusted / Invalid
|
|
|
|
**Symptoms**:
|
|
- Browser shows "Not Secure" warning
|
|
- Certificate errors in logs
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check certificate
|
|
kubectl get certificate -n <namespace>
|
|
|
|
# Describe certificate for errors
|
|
kubectl describe certificate <cert-name> -n <namespace>
|
|
|
|
# Check cert-manager logs
|
|
kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager --tail=50
|
|
|
|
# Test certificate
|
|
openssl s_client -connect demo.domain.com:443 -servername demo.domain.com < /dev/null 2>/dev/null | openssl x509 -noout -dates -issuer
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Certificate Not Ready**
|
|
```bash
|
|
# Check certificate status
|
|
kubectl get certificate -n <namespace>
|
|
|
|
# If not "True", check cert-manager logs
|
|
kubectl logs -n cert-manager -l app.kubernetes.io/name=cert-manager
|
|
```
|
|
|
|
**2. DNS Challenge Failed**
|
|
```bash
|
|
# Check ClusterIssuer
|
|
kubectl get clusterissuer
|
|
|
|
# Verify Cloudflare API token
|
|
kubectl get secret cloudflare-api-token-secret -n cert-manager
|
|
|
|
# Check challenge status
|
|
kubectl get challenge -A
|
|
```
|
|
|
|
**3. Manual Certificate Creation**
|
|
```bash
|
|
# If cert-manager fails, use acme.sh
|
|
export CF_Token="<CLOUDFLARE_TOKEN>"
|
|
~/.acme.sh/acme.sh --issue --dns dns_cf -d "*.domain.com" -d "domain.com"
|
|
|
|
# Create Kubernetes secret
|
|
kubectl create secret tls wildcard-tls \
|
|
--cert=~/.acme.sh/*.domain.com_ecc/fullchain.cer \
|
|
--key=~/.acme.sh/*.domain.com_ecc/*.domain.com.key \
|
|
-n <namespace>
|
|
```
|
|
|
|
### LoadBalancer Stuck in Pending
|
|
|
|
**Symptoms**:
|
|
- Ingress EXTERNAL-IP shows `<pending>`
|
|
- Cannot access services externally
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check MetalLB
|
|
kubectl get pods -n metallb-system
|
|
|
|
# Check IPAddressPool
|
|
kubectl get ipaddresspool -A
|
|
|
|
# Check service
|
|
kubectl describe svc -n ingress-nginx nginx-ingress-lb-custom
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. MetalLB Not Running**
|
|
```bash
|
|
# Check MetalLB pods
|
|
kubectl get pods -n metallb-system
|
|
|
|
# Restart if needed
|
|
kubectl rollout restart deployment -n metallb-system
|
|
```
|
|
|
|
**2. IP Pool Exhausted**
|
|
```bash
|
|
# Check IP pool configuration
|
|
kubectl get ipaddresspool -A -o yaml
|
|
|
|
# Check allocated IPs
|
|
kubectl get svc -A -o wide | grep LoadBalancer
|
|
```
|
|
|
|
**3. Annotation Error**
|
|
```bash
|
|
# Check MetalLB annotation
|
|
kubectl get svc <name> -n <namespace> -o yaml | grep metallb
|
|
|
|
# Correct format:
|
|
metadata:
|
|
annotations:
|
|
metallb.universe.tf/loadBalancerIPs: "<external-ip>"
|
|
```
|
|
|
|
### Ingress Returns 503 Backend Unavailable
|
|
|
|
**Symptoms**:
|
|
- NGINX Ingress returns 503
|
|
- Backend service is running
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check ingress
|
|
kubectl describe ingress <name> -n <namespace>
|
|
|
|
# Check NGINX logs
|
|
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
|
|
|
|
# Check backend service
|
|
kubectl get svc <backend-service> -n <namespace>
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Wrong Backend Service**
|
|
```bash
|
|
# Verify ingress points to Gateway Gateway, not application
|
|
kubectl get ingress <name> -n <namespace> -o yaml | grep -A 5 backend
|
|
|
|
# Should be:
|
|
backend:
|
|
service:
|
|
name: <gateway-deployment>-gateway
|
|
port:
|
|
number: 80
|
|
```
|
|
|
|
**2. Service Port Mismatch**
|
|
```bash
|
|
# Check service ports
|
|
kubectl get svc <gateway-deployment>-gateway -n <namespace>
|
|
|
|
# Ingress should point to port 80, not 9080
|
|
```
|
|
|
|
## Control Plane Issues
|
|
|
|
### Dashboard Not Accessible
|
|
|
|
**Symptoms**:
|
|
- Cannot access https://<dashboard-url>
|
|
- Connection timeout or refused
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check dashboard pod
|
|
kubectl get pods -n <namespace> -l app=<namespace>3-dashboard
|
|
|
|
# Check dashboard service
|
|
kubectl get svc -n <namespace> <namespace>3-0-1759339083-dashboard
|
|
|
|
# Check ingress
|
|
kubectl get ingress -n <namespace> <namespace>3-0-1759339083-dashboard
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Pod Not Running**
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n <namespace> -l app=<namespace>3-dashboard
|
|
|
|
# View logs
|
|
kubectl logs -n <namespace> -l app=<namespace>3-dashboard
|
|
```
|
|
|
|
**2. Port Forward as Workaround**
|
|
```bash
|
|
kubectl port-forward -n <namespace> svc/<namespace>3-0-1759339083-dashboard 7080:7080
|
|
|
|
# Access at http://localhost:7080
|
|
```
|
|
|
|
### PostgreSQL Connection Failed
|
|
|
|
**Symptoms**:
|
|
- Dashboard/Portal shows database errors
|
|
- Logs show "connection refused" to PostgreSQL
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check PostgreSQL pod
|
|
kubectl get pods -n <namespace> -l app=postgresql
|
|
|
|
# Check PostgreSQL service
|
|
kubectl get svc -n <namespace> postgresql
|
|
|
|
# Test connection from dashboard pod
|
|
kubectl exec -n <namespace> -it <dashboard-pod> -- psql -h postgresql -U <namespace> -d <namespace>
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. PostgreSQL Pod Not Running**
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n <namespace> postgresql-0
|
|
|
|
# View logs
|
|
kubectl logs -n <namespace> postgresql-0
|
|
```
|
|
|
|
**2. Credentials Mismatch**
|
|
```bash
|
|
# Check credentials in secret
|
|
kubectl get secret postgresql -n <namespace> -o jsonpath='{.data.postgres-password}' | base64 -d
|
|
|
|
# Compare with DSN in dashboard config
|
|
kubectl get configmap <namespace>3-0-1759339083-dashboard-config -n <namespace> -o yaml | grep dsn
|
|
```
|
|
|
|
**3. Storage Issues**
|
|
```bash
|
|
# Check PVC
|
|
kubectl get pvc -n <namespace> data-postgresql-0
|
|
|
|
# If storage full, expand PVC (if storage class supports it)
|
|
kubectl patch pvc data-postgresql-0 -n <namespace> -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'
|
|
```
|
|
|
|
## Application Issues
|
|
|
|
### Image Pull Errors
|
|
|
|
**Error**: `ImagePullBackOff` or `ErrImagePull`
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check pod status
|
|
kubectl get pods -n <namespace>
|
|
|
|
# Describe pod
|
|
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 Events
|
|
|
|
# Check image name
|
|
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[0].image}'
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Registry Authentication**
|
|
```bash
|
|
# Create registry secret
|
|
kubectl create secret docker-registry registry-secret \
|
|
--docker-server=<registry-url> \
|
|
--docker-username=<USERNAME> \
|
|
--docker-password=<TOKEN> \
|
|
-n <namespace>
|
|
|
|
# Add to deployment
|
|
spec:
|
|
template:
|
|
spec:
|
|
imagePullSecrets:
|
|
- name: registry-secret
|
|
```
|
|
|
|
**2. Image Does Not Exist**
|
|
```bash
|
|
# Verify image exists
|
|
docker pull <registry-url>/web:main
|
|
|
|
# Check available tags via Gitea UI or API
|
|
curl -u <username>:<token> https://<registry-url>/api/v1/packages/demos
|
|
```
|
|
|
|
**3. Wrong Image Name**
|
|
```bash
|
|
# Correct format:
|
|
<registry-url>/web:main
|
|
|
|
# NOT:
|
|
<registry-url>:main # Missing /web
|
|
```
|
|
|
|
### Application Crashing
|
|
|
|
**Symptoms**:
|
|
- Pods in `CrashLoopBackOff`
|
|
- Application logs show errors
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check pod logs
|
|
kubectl logs -n <namespace> <pod-name>
|
|
|
|
# Check previous pod logs (if restarted)
|
|
kubectl logs -n <namespace> <pod-name> --previous
|
|
|
|
# Check events
|
|
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Port Mismatch**
|
|
```bash
|
|
# Verify app runs on correct port
|
|
# Check Dockerfile CMD or deployment env vars
|
|
|
|
# Common issue: App runs on 8000, container expects 3000
|
|
```
|
|
|
|
**2. Missing Dependencies**
|
|
```bash
|
|
# Check application logs for import errors
|
|
kubectl logs -n <namespace> <pod-name>
|
|
|
|
# Rebuild image with correct requirements.txt
|
|
```
|
|
|
|
**3. Resource Limits**
|
|
```bash
|
|
# Check if OOMKilled
|
|
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 "Last State"
|
|
|
|
# If memory limit too low, increase:
|
|
resources:
|
|
limits:
|
|
memory: 512Mi # Increase from 256Mi
|
|
```
|
|
|
|
## CLI / Configuration Issues
|
|
|
|
### CLI Sync Fails
|
|
|
|
**Error**: `failed to sync configuration` or authentication errors
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Test CLI connection
|
|
<cli-tool>ping \
|
|
--backend <namespace> \
|
|
--server https://<dashboard-url> \
|
|
--token <TOKEN> \
|
|
--tls-skip-verify
|
|
|
|
# Validate configuration file
|
|
<cli-tool>validate -f config.yaml
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Invalid Token**
|
|
```bash
|
|
# Generate new token in Dashboard
|
|
# User → API Tokens → Generate Token
|
|
|
|
# Test with new token
|
|
<cli-tool>sync -f config.yaml --server <URL> --token <NEW_TOKEN>
|
|
```
|
|
|
|
**2. YAML Syntax Error**
|
|
```bash
|
|
# Validate YAML
|
|
<cli-tool>validate -f config.yaml
|
|
|
|
# Or use yq/yamllint
|
|
yq eval '.' config.yaml
|
|
```
|
|
|
|
**3. SSL Certificate Error**
|
|
```bash
|
|
# Use --tls-skip-verify flag (for self-signed certs)
|
|
<cli-tool>sync -f config.yaml --server <URL> --token <TOKEN> --tls-skip-verify
|
|
```
|
|
|
|
### Configuration Not Applied
|
|
|
|
**Symptoms**:
|
|
- CLI sync succeeds but changes not visible
|
|
- Routes not working as expected
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Dump current configuration
|
|
<cli-tool>dump --backend <namespace> --server <URL> --token <TOKEN> > current.yaml
|
|
|
|
# Compare with expected
|
|
diff config.yaml current.yaml
|
|
```
|
|
|
|
**Solutions**:
|
|
|
|
**1. Routes Not Published**
|
|
- Synced routes are NOT active until published
|
|
- Publish via Dashboard UI
|
|
|
|
**2. Wrong Gateway Group**
|
|
```bash
|
|
# Specify correct gateway group
|
|
<cli-tool>sync -f config.yaml --gateway-group default
|
|
```
|
|
|
|
**3. Cache Issue**
|
|
```bash
|
|
# Restart gateway pods to force reload
|
|
kubectl rollout restart deployment/<gateway-deployment> -n <namespace>
|
|
```
|
|
|
|
## Useful Debugging Commands
|
|
|
|
### Check All Resources
|
|
|
|
```bash
|
|
# All resources in namespace
|
|
kubectl get all -n <namespace>
|
|
|
|
# Wide output with more details
|
|
kubectl get all -n <namespace> -o wide
|
|
|
|
# All resource types including configmaps, secrets
|
|
kubectl get all,cm,secret,ingress,pvc -n <namespace>
|
|
```
|
|
|
|
### Log Collection
|
|
|
|
```bash
|
|
# All gateway logs
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway --all-containers=true --tail=200
|
|
|
|
# Dashboard logs
|
|
kubectl logs -n <namespace> -l app=<namespace>3-dashboard --tail=100
|
|
|
|
# Stream logs in real-time
|
|
kubectl logs -n <namespace> -l app.kubernetes.io/name=gateway -f
|
|
```
|
|
|
|
### Network Testing
|
|
|
|
```bash
|
|
# Test from within cluster
|
|
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- sh
|
|
|
|
# Inside pod:
|
|
curl http://<gateway-deployment>-gateway.<namespace>.svc.cluster.local
|
|
curl http://web-service.<namespace>.svc.cluster.local
|
|
```
|
|
|
|
### Performance Analysis
|
|
|
|
```bash
|
|
# Check resource usage
|
|
kubectl top pods -n <namespace>
|
|
kubectl top nodes
|
|
|
|
# Describe for resource limits
|
|
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 Limits
|
|
```
|
|
|
|
---
|
|
|
|
*Comprehensive troubleshooting guide for API Gateway infrastructure deployment.*
|