Debugging Workflows
When things go wrong, follow a systematic approach. These workflows guide you from symptom to solution.
Workflow 1: Pod Won’t Start
Use this workflow when a pod with runtimeClassName: edera fails to start.
Step 1: Check Pod Status
kubectl get pod <pod-name> -n <namespace>Note the pod state: Pending, ContainerCreating, CrashLoopBackOff, etc.
Step 2: Inspect Pod Events
kubectl describe pod <pod-name> -n <namespace>Look at the Events section for error messages. Common events:
Failed to create pod sandbox- CRI issueRuntimeClass not found- Missing RuntimeClassFailedScheduling- Node scheduling issueImagePullBackOff- Image pull problem
Step 3: Verify RuntimeClass
kubectl get runtimeclass edera -o yamlEnsure it exists and the handler is set to edera.
Step 4: Check Node Status
Find which node the pod is scheduled on (or should be):
kubectl get pod <pod-name> -n <namespace> -o wideCheck node health:
kubectl describe node <node-name>
kubectl kubectl top nodeLook for:
- Node is
Ready - No resource pressure (MemoryPressure, DiskPressure, PIDPressure)
- Sufficient allocatable resources
Step 5: SSH to Node and Check Services
ssh root@<node-ip>
systemctl list-units --type=service | grep protectAll six Edera services must be active.
If any service is down:
systemctl status protect-cri
journalctl -u protect-cri -n 50Step 6: Check CRI Logs
journalctl -u protect-cri --since "10 minutes ago"Look for errors related to pod creation.
Step 7: Enable Debug Logging (if needed)
Edit the CRI service to enable debug logging:
nano /usr/lib/systemd/system/protect-cri.serviceFind the Environment line and set:
Environment="RUST_LOG=debug"Reload and restart:
systemctl daemon-reload
systemctl restart protect-criAttempt to create the pod again and review logs:
journalctl -u protect-cri -fWorkflow 2: Node Not Ready
Use this workflow when a node shows NotReady status after Edera installation.
Step 1: Check Node Status
kubectl get nodes
kubectl describe node <node-name>Look at the Conditions section for clues.
Step 2: SSH to Node and Check Kubelet
ssh root@<node-ip>
systemctl status kubeletIf kubelet is failed:
journalctl -u kubelet -n 100Step 3: Verify Kubelet Configuration
cat /etc/default/kubeletEnsure it points to the Edera CRI socket:
KUBELET_EXTRA_ARGS="--container-runtime-endpoint=unix:///var/lib/edera/protect/cri.socket"For LKE/Akamai, also include --cloud-provider=external.
Step 4: Verify CRI Socket Exists
ls -la /var/lib/edera/protect/cri.socketThe socket should exist. If not, check that protect-cri service is running:
systemctl status protect-criStep 5: Restart Kubelet
systemctl restart kubelet
systemctl status kubeletStep 6: Wait for Node to Rejoin
kubectl get nodes --watchThe node should transition to Ready within 1-2 minutes.
Workflow 3: Service Failure
Use this workflow when Edera services fail to start or crash.
Step 1: Identify Failed Service
systemctl list-units --type=service | grep protectLook for services not in active state.
Step 2: Check Service Status
systemctl status <service-name>Note the exit code and any error messages.
Step 3: Review Logs
journalctl -u <service-name> -n 100 --no-pagerLook for:
- Crash messages
- Permission denied errors
- Missing dependencies
- Port conflicts
Step 4: Check Dependencies
Edera services depend on each other. Ensure prerequisite services are running:
protect-daemon- Core service, others depend on itprotect-cri- Depends on daemon and networking
If protect-daemon is down, start it first:
systemctl start protect-daemon
systemctl status protect-daemonThen start other services:
systemctl start protect-cri
systemctl start protect-networking-daemonStep 5: Verify File Permissions
Check Edera binary permissions:
ls -la /usr/bin/protect*Binaries should be executable by root.
Step 6: Reinstall if Necessary
If services won’t start and logs indicate corruption or missing files, consider reinstalling Edera on the node.
Workflow 4: Zone Not Created
Use this workflow when a pod starts but doesn’t appear in protect zone list.
Step 1: Verify Pod is Running
kubectl get pod <pod-name> -n <namespace>Ensure pod status is Running.
Step 2: Check RuntimeClass Assignment
kubectl describe pod <pod-name> -n <namespace> | grep -i runtimeVerify Runtime Class Name: edera is present.
Step 3: SSH to Node and List Zones
ssh root@<node-ip>
protect zone listIf the pod is missing, it’s not being handled by Edera.
Step 4: Check CRI Logs
journalctl -u protect-cri --since "5 minutes ago" | grep <pod-name>Look for messages indicating the pod was created via the Edera CRI.
Step 5: Verify Kubelet Configuration
cat /etc/default/kubeletEnsure kubelet is using the Edera CRI socket.
Step 6: Delete and Recreate Pod
kubectl delete pod <pod-name> -n <namespace>
kubectl apply -f <pod-definition.yaml>Monitor zone creation:
watch protect zone listWorkflow 5: Performance Issues
Use this workflow when pods are slow or resource usage is high.
Step 1: Check Node Resources
kubectl top nodesIdentify nodes with high CPU or memory usage.
Step 2: Check Pod Resources
kubectl top pods --all-namespaces --sort-by=cpu
kubectl top pods --all-namespaces --sort-by=memoryIdentify resource-hungry pods.
Step 3: Review Zone Resource Usage
SSH to the node and check Xen hypervisor metrics:
xl topThis shows CPU and memory usage for each Edera zone.
Step 4: Check for Resource Limits
kubectl describe pod <pod-name> -n <namespace>Look at resource requests and limits. If limits are too low, pods may be throttled.
Step 5: Review Application Logs
Application-level performance issues may not be Edera-related:
kubectl logs <pod-name> -n <namespace>Check for slow queries, inefficient processing, or external dependencies.
Step 6: Scale Resources
If resource constraints are the issue:
Vertical scaling:
- Increase pod resource requests/limits
- Redeploy pods
Horizontal scaling:
- Add more pod replicas
- Scale the number of nodes
Debugging Tips
Start broad, then narrow:
- Check high-level status (nodes, pods) before diving into logs
- Eliminate obvious issues (missing RuntimeClass, service down) before deep debugging
Use timestamps:
- Correlate pod events with service logs
- Focus on logs around the time of failure
Enable verbose logging sparingly:
- Debug logging can generate huge volumes of data
- Enable only when necessary, disable after troubleshooting
Document your findings:
- Keep notes on what you check and what you find
- This helps when escalating to Edera support
Up next: Helpful Utilities →
