TraceMyPods - DevOps & Cloud Architecture Guide

Architecture Diagram AWS Istio Prometheus Grafana

Infrastructure Overview

TraceMyPods is a production-grade AI platform deployed on AWS EKS with GPU-powered nodes. The architecture follows cloud-native principles with microservices, service mesh, and comprehensive observability.

Key Components:

                                  Internet
                                     │
                                     ▼
                               Route 53 DNS
                                     │
                                     ▼
                          AWS Application Load Balancer
                                     │
                                     ▼
                          Istio Ingress Gateway
                                     │
                         ┌───────────┴───────────┐
                         │                       │
                         ▼                       ▼
                  AI Frontend Service     TraceMyPods Paid Service
                         │                       │
                         ▼                       ▼
              ┌─────────────────────┐    ┌─────────────────────┐
              │                     │    │                     │
              ▼                     ▼    ▼                     ▼
         askapi Service      tokenapi Service      Other API Services
              │                     │                     │
              └─────────┬───────────┘                     │
                        │                                 │
                        ▼                                 ▼
                    Redis Cache <───────────────────> MongoDB
                        │
                        ▼
              ┌─────────────────────┐
              │                     │
              ▼                     ▼
      TinyLlama (GPU Node)    Mistral (GPU Node)

AWS Resources

EKS Cluster Configuration

The application runs on an AWS EKS cluster with two node groups:

Standard Node Group:
- Instance type: t3.large (recommended)
- Min/Max nodes: 2/5
- Auto-scaling enabled
- Used for: Frontend, backend APIs, Redis, monitoring
GPU Node Group:
- Instance type: g4dn.xlarge (NVIDIA T4 GPU)
- Min/Max nodes: 1/3
- Auto-scaling enabled with longer cooldown periods
- Used for: AI model inference (TinyLlama, Mistral)
- Node taints: gpu=true:NoSchedule

Networking

VPC: Dedicated VPC with public and private subnets
ALB: Application Load Balancer with TLS termination
Route 53: DNS management for domain routing
Security Groups:
- EKS control plane: 443 inbound from worker nodes
- Worker nodes: Allow cluster internal communication
- ALB: 80/443 inbound from internet

Storage

EBS: For Redis persistent storage via StorageClass
S3: For model artifacts and backups (optional)

Kubernetes Architecture

Namespace Structure

ai-assistant/
├── frontend/
│   └── ai-frontend deployment & service
├── backend/
│   ├── tracemypods-paid deployment & service
│   ├── askapi, tokenapi, orderapi, etc.
│   └── redis deployment & service with PVC
├── ai-models/
│   ├── ollama-tinyllama deployment & service
│   └── ollama-mistral deployment & service
└── monitoring/
    ├── prometheus, grafana, loki
    ├── jaeger, kiali
    └── service monitors

Service Mesh (Istio)

Gateway: Routes external traffic to internal services
VirtualService: Path-based routing (/api → backend, / → frontend)
mTLS: Enabled for service-to-service communication
Traffic Management: Supports canary deployments and A/B testing

Resource Management

Resource Requests/Limits:
- Frontend: 100m/200m CPU, 256Mi/512Mi memory
- Backend APIs: 200m/500m CPU, 512Mi/1Gi memory
- Redis: 100m/200m CPU, 512Mi/1Gi memory
- AI Models: 500m/2000m CPU, 2Gi/4Gi memory, 1 GPU
HPA (Horizontal Pod Autoscaler):
- Frontend: Scale based on CPU utilization (target: 70%)
- Backend: Scale based on CPU utilization (target: 70%)
PDB (Pod Disruption Budget):
- Ensures high availability during voluntary disruptions
- minAvailable: 1 for each critical service

CI/CD Pipeline

GitHub Actions Workflow

name: TraceMyPods CI/CD
 
on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
 
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1
      
      - name: Login to DockerHub
        uses: docker/login-action@v1
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
      
      - name: Build and push frontend
        uses: docker/build-push-action@v2
        with:
          context: ./appcode/frontend
          push: true
          tags: noscopev6/tracemypods-frontend:latest,noscopev6/tracemypods-frontend:${{ github.sha }}
      
      # Similar steps for other components
      
  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-south-1
      
      - name: Update kubeconfig
        run: aws eks update-kubeconfig --name tracemypods-cluster --region ap-south-1
      
      - name: Deploy to EKS
        run: |
          # Update image tags in Kubernetes manifests
          sed -i "s|noscopev6/tracemypods-frontend:.*|noscopev6/tracemypods-frontend:${{ github.sha }}|g" EKS-Deploy/app-deploy-K/frontend.yaml
          # Apply Kubernetes manifests
          kubectl apply -f EKS-Deploy/app-deploy-K/namespace.yaml
          kubectl apply -f EKS-Deploy/app-deploy-K/storage-class.yaml
          kubectl apply -f EKS-Deploy/app-deploy-K/redis-pod.yaml
          kubectl apply -f EKS-Deploy/app-deploy-K/backend.yaml
          kubectl apply -f EKS-Deploy/app-deploy-K/frontend.yaml
          kubectl apply -f EKS-Deploy/app-deploy-K/llmmodels.yaml
          kubectl apply -f EKS-Deploy/Istiod/istio-gateway.yaml

Observability Stack

Monitoring

Prometheus:
- Scrapes metrics from all services
- Custom metrics for token usage and AI model performance
- AlertManager for critical alerts
Grafana:
- Dashboards for:
  - Cluster health and resource utilization
  - API performance and error rates
  - Token usage and expiration metrics
  - AI model inference times and GPU utilization

Logging

Loki:
- Centralized log aggregation
- Log retention policy: 14 days
- Structured logging with service, pod, and namespace labels

Tracing

Jaeger:
- Distributed tracing across microservices
- Trace sampling rate: 10%
- Retention: 7 days
Kiali:
- Service mesh visualization
- Traffic flow monitoring
- Health status of services

Security Considerations

Authentication & Authorization

Token-based Access:
- Short-lived tokens (1 hour) for standard users
- Persistent tokens in MongoDB for premium users
- Token validation middleware in all API services
Kubernetes RBAC:
- Namespace-scoped service accounts
- Principle of least privilege

Network Security

Istio mTLS:
- Encrypted service-to-service communication
- Certificate rotation every 24 hours
Network Policies:
- Default deny all ingress/egress
- Explicit allow rules for required communication paths

Vulnerability Management

Container Scanning:
- Trivy for image vulnerability scanning in CI pipeline
- Block deployment of images with critical vulnerabilities
Kube-Bench:
- CIS Kubernetes benchmark compliance checking
- Weekly automated scans
Falco:
- Runtime security monitoring
- Alerts on suspicious container activity

Scaling Strategy

Horizontal Scaling

Frontend/Backend:
- HPA based on CPU utilization (target: 70%)
- Min/Max replicas: 2/10
AI Models:
- Scale based on GPU utilization and queue depth
- Min/Max replicas: 1/3 per model

Vertical Scaling

Node Groups:
- Standard nodes can be upgraded from t3.large to t3.xlarge
- GPU nodes can be upgraded from g4dn.xlarge to g4dn.2xlarge

Cluster Scaling

Cluster Autoscaler:
- Automatically adjusts node count based on pod scheduling demands
- Scale-up threshold: Unable to schedule pods
- Scale-down threshold: Node utilization < 50% for 10 minutes

Disaster Recovery

Backup Strategy

Redis:
- Automated snapshots to S3 every 6 hours
- Retention: 7 days
MongoDB:
- Daily backups
- Point-in-time recovery enabled
- Retention: 30 days

Recovery Procedures

Service Disruption:
- Automatic pod rescheduling via Deployments
- Readiness/liveness probes ensure healthy services
Node Failure:
- Pods automatically rescheduled to healthy nodes
- PVs remounted to new pods
Cluster Failure:
- Infrastructure as Code (Terraform) for quick cluster recreation
- Automated deployment pipeline to restore services
- Redis and MongoDB data restored from backups

Cost Optimization

Resource Optimization

Right-sizing:
- Regular review of resource requests/limits
- Adjust based on actual usage patterns
Spot Instances:
- Consider using spot instances for non-critical workloads
- Not recommended for GPU nodes due to potential interruptions

GPU Optimization

GPU Sharing:
- Multiple models can share a single GPU using time-slicing
- Consider NVIDIA MPS for improved utilization
Auto-scaling:
- Scale down to zero GPU nodes during periods of low usage
- Implement warm-up procedures for cold starts

Cost Monitoring

AWS Cost Explorer:
- Regular review of EKS and EC2 costs
- Tag-based cost allocation for different components
Kubecost:
- Namespace and workload level cost visibility
- Recommendations for resource optimization

Operational Procedures

Deployment

Infrastructure Provisioning:

cd terraform
terraform init
terraform apply -var-file=prod.tfvars

Cluster Configuration:

aws eks update-kubeconfig --name tracemypods-cluster --region ap-south-1
kubectl apply -f EKS-Deploy/Istiod/

Application Deployment:

kubectl apply -f EKS-Deploy/app-deploy-K/

Monitoring & Troubleshooting

Access Grafana:

kubectl port-forward svc/grafana 3000:3000 -n monitoring
# Open http://localhost:3000 in browser

Check Pod Logs:

kubectl logs -f deployment/askapi -n ai-assistant

Istio Service Mesh Visualization:

kubectl port-forward svc/kiali 20001:20001 -n istio-system
# Open http://localhost:20001 in browser

Trace Requests:

kubectl port-forward svc/jaeger-query 16686:16686 -n istio-system
# Open http://localhost:16686 in browser

Maintenance

Kubernetes Version Upgrades:
- Test in staging environment first
- Use managed EKS upgrades
- Plan for 1-2 hour maintenance window
Application Updates:
- Use rolling updates (default deployment strategy)
- Monitor for errors during and after deployment
- Have rollback plan ready
Certificate Rotation:
- ACM certificates auto-renewed
- Istio certificates rotated automatically
- Monitor certificate expiration alerts

References

Created by: Ahmad Raza - DevOps Engineer | Cloud Infra Specialist
Last Updated: May 24, 2025

Alb Ingress Gpu Dcgm Exporter