Setting Up the Monitoring Stack

Here’s the situation: you need metrics from every Edera-enabled node in your cluster. Manually configuring scrape targets? That doesn’t scale. Hard-coding node IPs? That breaks when nodes rotate.

The solution is Kubernetes service discovery with automatic node detection. Prometheus will discover every node in your cluster, scrape Edera metrics from port 3035, and store them with proper labels. Grafana will visualize everything with a purpose-built dashboard.

This isn’t a toy setup. You’re deploying a production-ready stack with:

Persistent storage (50GB for Prometheus, 10GB for Grafana)
Proper RBAC permissions
Automatic service discovery
Pre-configured dashboards
Health checks and resource limits

Architecture Overview

The monitoring stack runs in a dedicated edera-monitoring namespace with these components:

┌──────────────────────────────────────────────────┐
│                    EKS Cluster                   │
│                                                  │
│  ┌────────────────────────────────────────────┐  │
│  │         edera-monitoring namespace         │  │
│  │                                            │  │
│  │  ┌──────────────┐       ┌──────────────┐   │  │
│  │  │  Prometheus  │       │   Grafana    │   │  │
│  │  │              │──────▶│              │   │  │
│  │  │  Port: 9090  │       │  Port: 3000  │   │  │
│  │  │              │       │              │   │  │
│  │  │  Storage:    │       │  Storage:    │   │  │
│  │  │  50GB PVC    │       │  10GB PVC    │   │  │
│  │  └──────┬───────┘       └──────────────┘   │  │
│  │         │                                  │  │
│  └─────────┼──────────────────────────────────┘  │
│            │ Scrapes metrics                     │
│            │ via K8s service discovery           │
│            ▼                                     │
│  ┌────────────────────────────────────────────┐  │
│  │         EKS Nodes (Edera-enabled)          │  │
│  │                                            │  │
│  │  Node 1: :3035/metrics                     │  │
│  │  Node 2: :3035/metrics                     │  │
│  │  Node N: :3035/metrics                     │  │
│  └────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────┘

How it works:

Prometheus uses Kubernetes API to discover all nodes (via kubernetes_sd_configs)
Relabel configuration targets port 3035 on each node’s IP
Metrics are scraped every 10 seconds
Data is stored in persistent EBS volumes
Grafana queries Prometheus and renders the dashboard

Prerequisites

Before deploying, ensure you have:

An EKS cluster with Edera installed - Nodes must be exposing metrics on port 3035
kubectl configured - Access to your cluster with admin permissions
Storage class configured - For automatic PVC provisioning (EBS gp3 recommended)
The monitoring stack repository - Clone from https://github.com/edera-dev/edera-prom

Verify your cluster is ready:

# Check kubectl access
kubectl cluster-info

# Verify storage class exists
kubectl get storageclass

Deployment Steps

Step 1: Clone the Repository

git clone https://github.com/edera-dev/edera-prom.git
cd edera-prom

Step 2: Review the Configuration

The main configuration file is monitoring-stack.yaml. Let’s understand the key settings:

Prometheus Configuration (lines 9-47):

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'edera'
    scrape_interval: 10s
    scrape_timeout: 5s
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - source_labels: [__address__]
        regex: '([^:]+)(?::\d+)?'
        replacement: '${1}:3035'
        target_label: __address__

What’s happening here:

kubernetes_sd_configs: role: node discovers all cluster nodes automatically
The relabel configuration rewrites discovered addresses to target port 3035
Scrape interval is 10 seconds (configurable based on your scale)

Grafana Configuration (lines 239-333):

env:
  - name: GF_SECURITY_ADMIN_PASSWORD
    value: "feeltheteal"  # CHANGE THIS IN PRODUCTION
  - name: GF_USERS_ALLOW_SIGN_UP
    value: "false"

Important: The default password is feeltheteal. You should change this before deploying to production.

Step 3: Deploy the Stack

The repository includes an automated deployment script:

chmod +x edera-setup-script.sh
./edera-setup-script.sh

This script will:

Verify prerequisites (kubectl, cluster access)
Create the edera-monitoring namespace
Deploy Prometheus with RBAC permissions
Deploy Grafana with dashboard provisioning
Create the dashboard ConfigMap
Wait for pods to become ready

Expected output:

✓ kubectl is installed
✓ Cluster is accessible
✓ Configuration file exists
✓ Applying Kubernetes manifests...
✓ Creating dashboard ConfigMap...
✓ Waiting for pods to be ready...
✓ Prometheus is running
✓ Grafana is running

Setup complete! Access Grafana at:
http://<EXTERNAL-IP>:3000

Username: admin
Password: feeltheteal

Note: The LoadBalancer external IP can take 2-5 minutes to provision on AWS.

Step 4: Verify Deployment

Check that all components are running:

# Check namespace
kubectl get ns edera-monitoring

# Check pods
kubectl get pods -n edera-monitoring

# Expected output:
# NAME                          READY   STATUS    RESTARTS   AGE
# prometheus-0                  1/1     Running   0          2m
# grafana-xxxxxxxxxx-xxxxx      1/1     Running   0          2m

# Check services
kubectl get svc -n edera-monitoring

# Expected output:
# NAME         TYPE           CLUSTER-IP       EXTERNAL-IP        PORT(S)
# prometheus   ClusterIP      10.100.x.x       <none>             9090/TCP
# grafana      LoadBalancer   10.100.x.x       a1b2c3.us-west-2   3000:xxxxx/TCP

Step 5: Verify Metric Collection

The repository includes a test script to verify everything is working:

chmod +x edera-test-script.sh
./edera-test-script.sh

This checks:

Namespace and pod status
Edera metrics endpoint accessibility on each node
Prometheus target scraping status
Grafana health endpoint
Dashboard availability

Manual verification:

# Port-forward to Prometheus
kubectl port-forward -n edera-monitoring svc/prometheus 9090:9090

# Open http://localhost:9090 and check:
# 1. Status → Targets → All nodes should be UP
# 2. Graph → Execute query: zones
# 3. You should see metrics with zone_id labels

Access Methods

You have two options for accessing Grafana:

Option 1: LoadBalancer (Production)

Get the external IP and access directly:

kubectl get svc grafana -n edera-monitoring -o jsonpath='{.status.loadBalancer.ingress[0].hostname}'

# Access at: http://<EXTERNAL-IP>:3000
# Username: admin
# Password: feeltheteal

Option 2: Port-Forward (Development/Testing)

Forward the Grafana port to localhost:

kubectl port-forward -n edera-monitoring svc/grafana 3000:3000

# Access at: http://localhost:3000
# Username: admin
# Password: feeltheteal

Common Deployment Issues

Issue: Pods stuck in “Pending” state

Check PVC binding:

kubectl get pvc -n edera-monitoring

If PVCs are “Pending”, verify your storage class:

kubectl get storageclass
# Ensure default storage class is set

Issue: No metrics appearing in Grafana

Verify Prometheus targets:

kubectl port-forward -n edera-monitoring svc/prometheus 9090:9090
# Navigate to: http://localhost:9090/targets
# All nodes should show as UP

If targets are down, check Edera metrics endpoint:

# Get a node IP
NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')

# Test metrics endpoint
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://$NODE_IP:3035/metrics

Issue: Can’t access Grafana LoadBalancer

Check security groups:

kubectl describe svc grafana -n edera-monitoring
# Look for "LoadBalancer Ingress" events

Ensure your EKS node security groups allow ingress on port 3000.

Issue: Dashboard not auto-loading

Verify the dashboard ConfigMap exists:

kubectl get configmap grafana-dashboards -n edera-monitoring

# Check dashboard provisioning:
kubectl logs -n edera-monitoring deployment/grafana | grep -i dashboard

Customization Options

Changing the admin password:

Edit monitoring-stack.yaml line 263:

- name: GF_SECURITY_ADMIN_PASSWORD
  value: "your-secure-password-here"

Then reapply:

kubectl apply -f monitoring-stack.yaml
kubectl rollout restart deployment/grafana -n edera-monitoring

Adjusting scrape intervals:

For large clusters (100+ nodes), you may want to reduce scrape frequency. Edit the Prometheus ConfigMap:

kubectl edit configmap prometheus-config -n edera-monitoring

# Change:
scrape_interval: 30s  # from 10s

Then reload Prometheus:

kubectl exec -n edera-monitoring prometheus-0 -- kill -HUP 1

Increasing storage:

Edit PVC sizes in monitoring-stack.yaml lines 137 and 296 before deployment:

resources:
  requests:
    storage: 100Gi  # from 50Gi

Adding HTTPS with Ingress:

Instead of LoadBalancer, use an Ingress with TLS:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: edera-monitoring
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - grafana.yourdomain.com
      secretName: grafana-tls
  rules:
    - host: grafana.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: grafana
                port:
                  number: 3000

Production Hardening

Before using this in production, address these security considerations:

1. Change default credentials

Update GF_SECURITY_ADMIN_PASSWORD to a strong password
Consider using Kubernetes secrets instead of environment variables

2. Implement RBAC least privilege

The Prometheus ClusterRole has broad read permissions
Review and restrict to only necessary resources for your environment

3. Enable TLS

Use an Ingress controller with certificate management
Consider AWS ACM with Application Load Balancer

4. Configure network policies

Restrict traffic to the monitoring namespace
Only allow necessary ingress/egress

5. Set resource limits

Current limits are generous; tune based on your cluster size
Monitor Prometheus memory usage and adjust as needed

6. Configure retention policies

Default retention is 30 days
Adjust based on your compliance and storage requirements

7. Implement backup strategy

Back up Prometheus data (PVC snapshots)
Back up Grafana dashboards and datasources

Your monitoring stack is now running. Prometheus is discovering nodes and scraping metrics. Grafana is ready to visualize everything.

Next: Let’s dive into understanding Edera metrics →

Last updated on 2025-12-01

Understanding Edera Metrics

Welcome to Edera