Debugging Workflows

When things go wrong, follow a systematic approach. These workflows guide you from symptom to solution.

Workflow 1: Pod Won’t Start

Use this workflow when a pod with runtimeClassName: edera fails to start.

Step 1: Check Pod Status

kubectl get pod <pod-name> -n <namespace>

Note the pod state: Pending, ContainerCreating, CrashLoopBackOff, etc.

Step 2: Inspect Pod Events

kubectl describe pod <pod-name> -n <namespace>

Look at the Events section for error messages. Common events:

Failed to create pod sandbox - CRI issue
RuntimeClass not found - Missing RuntimeClass
FailedScheduling - Node scheduling issue
ImagePullBackOff - Image pull problem

Step 3: Verify RuntimeClass

kubectl get runtimeclass edera -o yaml

Ensure it exists and the handler is set to edera.

Step 4: Check Node Status

Find which node the pod is scheduled on (or should be):

kubectl get pod <pod-name> -n <namespace> -o wide

Check node health:

kubectl describe node <node-name>
kubectl kubectl top node

Look for:

Node is Ready
No resource pressure (MemoryPressure, DiskPressure, PIDPressure)
Sufficient allocatable resources

Step 5: SSH to Node and Check Services

ssh root@<node-ip>
systemctl list-units --type=service | grep protect

All six Edera services must be active.

If any service is down:

systemctl status protect-cri
journalctl -u protect-cri -n 50

Step 6: Check CRI Logs

journalctl -u protect-cri --since "10 minutes ago"

Look for errors related to pod creation.

Step 7: Enable Debug Logging (if needed)

Edit the CRI service to enable debug logging:

nano /usr/lib/systemd/system/protect-cri.service

Find the Environment line and set:

Environment="RUST_LOG=debug"

Reload and restart:

systemctl daemon-reload
systemctl restart protect-cri

Attempt to create the pod again and review logs:

journalctl -u protect-cri -f

Workflow 2: Node Not Ready

Use this workflow when a node shows NotReady status after Edera installation.

Step 1: Check Node Status

kubectl get nodes
kubectl describe node <node-name>

Look at the Conditions section for clues.

Step 2: SSH to Node and Check Kubelet

ssh root@<node-ip>
systemctl status kubelet

If kubelet is failed:

journalctl -u kubelet -n 100

Step 3: Verify Kubelet Configuration

cat /etc/default/kubelet

Ensure it points to the Edera CRI socket:

KUBELET_EXTRA_ARGS="--container-runtime-endpoint=unix:///var/lib/edera/protect/cri.socket"

For LKE/Akamai, also include --cloud-provider=external.

Step 4: Verify CRI Socket Exists

ls -la /var/lib/edera/protect/cri.socket

The socket should exist. If not, check that protect-cri service is running:

systemctl status protect-cri

Step 5: Restart Kubelet

systemctl restart kubelet
systemctl status kubelet

Step 6: Wait for Node to Rejoin

kubectl get nodes --watch

The node should transition to Ready within 1-2 minutes.

Workflow 3: Service Failure

Use this workflow when Edera services fail to start or crash.

Step 1: Identify Failed Service

systemctl list-units --type=service | grep protect

Look for services not in active state.

Step 2: Check Service Status

systemctl status <service-name>

Note the exit code and any error messages.

Step 3: Review Logs

journalctl -u <service-name> -n 100 --no-pager

Look for:

Crash messages
Permission denied errors
Missing dependencies
Port conflicts

Step 4: Check Dependencies

Edera services depend on each other. Ensure prerequisite services are running:

protect-daemon - Core service, others depend on it
protect-cri - Depends on daemon and networking

If protect-daemon is down, start it first:

systemctl start protect-daemon
systemctl status protect-daemon

Then start other services:

systemctl start protect-cri
systemctl start protect-networking-daemon

Step 5: Verify File Permissions

Check Edera binary permissions:

ls -la /usr/bin/protect*

Binaries should be executable by root.

Step 6: Reinstall if Necessary

If services won’t start and logs indicate corruption or missing files, consider reinstalling Edera on the node.

Workflow 4: Zone Not Created

Use this workflow when a pod starts but doesn’t appear in protect zone list.

Step 1: Verify Pod is Running

kubectl get pod <pod-name> -n <namespace>

Ensure pod status is Running.

Step 2: Check RuntimeClass Assignment

kubectl describe pod <pod-name> -n <namespace> | grep -i runtime

Verify Runtime Class Name: edera is present.

Step 3: SSH to Node and List Zones

ssh root@<node-ip>
protect zone list

If the pod is missing, it’s not being handled by Edera.

Step 4: Check CRI Logs

journalctl -u protect-cri --since "5 minutes ago" | grep <pod-name>

Look for messages indicating the pod was created via the Edera CRI.

Step 5: Verify Kubelet Configuration

cat /etc/default/kubelet

Ensure kubelet is using the Edera CRI socket.

Step 6: Delete and Recreate Pod

kubectl delete pod <pod-name> -n <namespace>
kubectl apply -f <pod-definition.yaml>

Monitor zone creation:

watch protect zone list

Workflow 5: Performance Issues

Use this workflow when pods are slow or resource usage is high.

Step 1: Check Node Resources

kubectl top nodes

Identify nodes with high CPU or memory usage.

Step 2: Check Pod Resources

kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memory

Identify resource-hungry pods.

Step 3: Review Zone Resource Usage

SSH to the node and check Xen hypervisor metrics:

xl top

This shows CPU and memory usage for each Edera zone.

Step 4: Check for Resource Limits

kubectl describe pod <pod-name> -n <namespace>

Look at resource requests and limits. If limits are too low, pods may be throttled.

Step 5: Review Application Logs

Application-level performance issues may not be Edera-related:

kubectl logs <pod-name> -n <namespace>

Check for slow queries, inefficient processing, or external dependencies.

Step 6: Scale Resources

If resource constraints are the issue:

Vertical scaling:

Increase pod resource requests/limits
Redeploy pods

Horizontal scaling:

Add more pod replicas
Scale the number of nodes

Debugging Tips

Start broad, then narrow:

Check high-level status (nodes, pods) before diving into logs
Eliminate obvious issues (missing RuntimeClass, service down) before deep debugging

Use timestamps:

Correlate pod events with service logs
Focus on logs around the time of failure

Enable verbose logging sparingly:

Debug logging can generate huge volumes of data
Enable only when necessary, disable after troubleshooting

Document your findings:

Keep notes on what you check and what you find
This helps when escalating to Edera support

Up next: Helpful Utilities →

Last updated on 2025-12-01

Common Issues & Solutions Helpful Utilities

Welcome to Edera