Setting Up the Monitoring Stack
Here’s the situation: you need metrics from every Edera-enabled node in your cluster. Manually configuring scrape targets? That doesn’t scale. Hard-coding node IPs? That breaks when nodes rotate.
The solution is Kubernetes service discovery with automatic node detection. Prometheus will discover every node in your cluster, scrape Edera metrics from port 3035, and store them with proper labels. Grafana will visualize everything with a purpose-built dashboard.
This isn’t a toy setup. You’re deploying a production-ready stack with:
- Persistent storage (50GB for Prometheus, 10GB for Grafana)
- Proper RBAC permissions
- Automatic service discovery
- Pre-configured dashboards
- Health checks and resource limits
Architecture Overview
The monitoring stack runs in a dedicated edera-monitoring namespace with these components:
┌──────────────────────────────────────────────────┐
│ EKS Cluster │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ edera-monitoring namespace │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Prometheus │ │ Grafana │ │ │
│ │ │ │──────▶│ │ │ │
│ │ │ Port: 9090 │ │ Port: 3000 │ │ │
│ │ │ │ │ │ │ │
│ │ │ Storage: │ │ Storage: │ │ │
│ │ │ 50GB PVC │ │ 10GB PVC │ │ │
│ │ └──────┬───────┘ └──────────────┘ │ │
│ │ │ │ │
│ └─────────┼──────────────────────────────────┘ │
│ │ Scrapes metrics │
│ │ via K8s service discovery │
│ ▼ │
│ ┌────────────────────────────────────────────┐ │
│ │ EKS Nodes (Edera-enabled) │ │
│ │ │ │
│ │ Node 1: :3035/metrics │ │
│ │ Node 2: :3035/metrics │ │
│ │ Node N: :3035/metrics │ │
│ └────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────┘How it works:
- Prometheus uses Kubernetes API to discover all nodes (via
kubernetes_sd_configs) - Relabel configuration targets port 3035 on each node’s IP
- Metrics are scraped every 10 seconds
- Data is stored in persistent EBS volumes
- Grafana queries Prometheus and renders the dashboard
Prerequisites
Before deploying, ensure you have:
- An EKS cluster with Edera installed - Nodes must be exposing metrics on port 3035
- kubectl configured - Access to your cluster with admin permissions
- Storage class configured - For automatic PVC provisioning (EBS gp3 recommended)
- The monitoring stack repository - Clone from
https://github.com/edera-dev/edera-prom
Verify your cluster is ready:
# Check kubectl access
kubectl cluster-info
# Verify storage class exists
kubectl get storageclassDeployment Steps
Step 1: Clone the Repository
git clone https://github.com/edera-dev/edera-prom.git
cd edera-promStep 2: Review the Configuration
The main configuration file is monitoring-stack.yaml. Let’s understand the key settings:
Prometheus Configuration (lines 9-47):
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'edera'
scrape_interval: 10s
scrape_timeout: 5s
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '([^:]+)(?::\d+)?'
replacement: '${1}:3035'
target_label: __address__What’s happening here:
kubernetes_sd_configs: role: nodediscovers all cluster nodes automatically- The relabel configuration rewrites discovered addresses to target port 3035
- Scrape interval is 10 seconds (configurable based on your scale)
Grafana Configuration (lines 239-333):
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: "feeltheteal" # CHANGE THIS IN PRODUCTION
- name: GF_USERS_ALLOW_SIGN_UP
value: "false"Important: The default password is feeltheteal. You should change this before deploying to production.
Step 3: Deploy the Stack
The repository includes an automated deployment script:
chmod +x edera-setup-script.sh
./edera-setup-script.shThis script will:
- Verify prerequisites (kubectl, cluster access)
- Create the
edera-monitoringnamespace - Deploy Prometheus with RBAC permissions
- Deploy Grafana with dashboard provisioning
- Create the dashboard ConfigMap
- Wait for pods to become ready
Expected output:
✓ kubectl is installed
✓ Cluster is accessible
✓ Configuration file exists
✓ Applying Kubernetes manifests...
✓ Creating dashboard ConfigMap...
✓ Waiting for pods to be ready...
✓ Prometheus is running
✓ Grafana is running
Setup complete! Access Grafana at:
http://<EXTERNAL-IP>:3000
Username: admin
Password: feelthetealNote: The LoadBalancer external IP can take 2-5 minutes to provision on AWS.
Step 4: Verify Deployment
Check that all components are running:
# Check namespace
kubectl get ns edera-monitoring
# Check pods
kubectl get pods -n edera-monitoring
# Expected output:
# NAME READY STATUS RESTARTS AGE
# prometheus-0 1/1 Running 0 2m
# grafana-xxxxxxxxxx-xxxxx 1/1 Running 0 2m
# Check services
kubectl get svc -n edera-monitoring
# Expected output:
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
# prometheus ClusterIP 10.100.x.x <none> 9090/TCP
# grafana LoadBalancer 10.100.x.x a1b2c3.us-west-2 3000:xxxxx/TCPStep 5: Verify Metric Collection
The repository includes a test script to verify everything is working:
chmod +x edera-test-script.sh
./edera-test-script.shThis checks:
- Namespace and pod status
- Edera metrics endpoint accessibility on each node
- Prometheus target scraping status
- Grafana health endpoint
- Dashboard availability
Manual verification:
# Port-forward to Prometheus
kubectl port-forward -n edera-monitoring svc/prometheus 9090:9090
# Open http://localhost:9090 and check:
# 1. Status → Targets → All nodes should be UP
# 2. Graph → Execute query: zones
# 3. You should see metrics with zone_id labelsAccess Methods
You have two options for accessing Grafana:
Option 1: LoadBalancer (Production)
Get the external IP and access directly:
kubectl get svc grafana -n edera-monitoring -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'
# Access at: http://<EXTERNAL-IP>:3000
# Username: admin
# Password: feelthetealOption 2: Port-Forward (Development/Testing)
Forward the Grafana port to localhost:
kubectl port-forward -n edera-monitoring svc/grafana 3000:3000
# Access at: http://localhost:3000
# Username: admin
# Password: feelthetealCommon Deployment Issues
Issue: Pods stuck in “Pending” state
Check PVC binding:
kubectl get pvc -n edera-monitoringIf PVCs are “Pending”, verify your storage class:
kubectl get storageclass
# Ensure default storage class is setIssue: No metrics appearing in Grafana
Verify Prometheus targets:
kubectl port-forward -n edera-monitoring svc/prometheus 9090:9090
# Navigate to: http://localhost:9090/targets
# All nodes should show as UPIf targets are down, check Edera metrics endpoint:
# Get a node IP
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
# Test metrics endpoint
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://$NODE_IP:3035/metricsIssue: Can’t access Grafana LoadBalancer
Check security groups:
kubectl describe svc grafana -n edera-monitoring
# Look for "LoadBalancer Ingress" eventsEnsure your EKS node security groups allow ingress on port 3000.
Issue: Dashboard not auto-loading
Verify the dashboard ConfigMap exists:
kubectl get configmap grafana-dashboards -n edera-monitoring
# Check dashboard provisioning:
kubectl logs -n edera-monitoring deployment/grafana | grep -i dashboardCustomization Options
Changing the admin password:
Edit monitoring-stack.yaml line 263:
- name: GF_SECURITY_ADMIN_PASSWORD
value: "your-secure-password-here"Then reapply:
kubectl apply -f monitoring-stack.yaml
kubectl rollout restart deployment/grafana -n edera-monitoringAdjusting scrape intervals:
For large clusters (100+ nodes), you may want to reduce scrape frequency. Edit the Prometheus ConfigMap:
kubectl edit configmap prometheus-config -n edera-monitoring
# Change:
scrape_interval: 30s # from 10sThen reload Prometheus:
kubectl exec -n edera-monitoring prometheus-0 -- kill -HUP 1Increasing storage:
Edit PVC sizes in monitoring-stack.yaml lines 137 and 296 before deployment:
resources:
requests:
storage: 100Gi # from 50GiAdding HTTPS with Ingress:
Instead of LoadBalancer, use an Ingress with TLS:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana-ingress
namespace: edera-monitoring
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- grafana.yourdomain.com
secretName: grafana-tls
rules:
- host: grafana.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000Production Hardening
Before using this in production, address these security considerations:
1. Change default credentials
- Update
GF_SECURITY_ADMIN_PASSWORDto a strong password - Consider using Kubernetes secrets instead of environment variables
2. Implement RBAC least privilege
- The Prometheus ClusterRole has broad read permissions
- Review and restrict to only necessary resources for your environment
3. Enable TLS
- Use an Ingress controller with certificate management
- Consider AWS ACM with Application Load Balancer
4. Configure network policies
- Restrict traffic to the monitoring namespace
- Only allow necessary ingress/egress
5. Set resource limits
- Current limits are generous; tune based on your cluster size
- Monitor Prometheus memory usage and adjust as needed
6. Configure retention policies
- Default retention is 30 days
- Adjust based on your compliance and storage requirements
7. Implement backup strategy
- Back up Prometheus data (PVC snapshots)
- Back up Grafana dashboards and datasources
Your monitoring stack is now running. Prometheus is discovering nodes and scraping metrics. Grafana is ready to visualize everything.
Next: Let’s dive into understanding Edera metrics →
