TraceMyPods – Application Workflow (Dev Focused)

TraceMyPods is AI ChatBot with OllamaAI integration, enabling users to interact with AI models effortlessly. This platform is designed for developers and DevOps engineers, offering a comprehensive solution for building, deploying, and managing microservice architectures

Tech Stack : Kafka | Redis | MongoDB | Ollama (AI) | Node.js | Python | GO |VectorDB (qdrant) | Microservices | Payments (razorpay) | Invoice/Reports | EMAIL | PostmanCollection | LoadTesting | OpenTelemetry | Gemini_AI | S3 (File Browser) | Image Generation API

📍 Landing Page

🤖 AI Assistant (Chat Box)

Workflow Overview

Workflow Architecture

This document outlines the architecture and workflow of the TraceMyPods application, detailing how various components interact to provide a seamless AI chatbot experience.

App level communication
kafka uses
why vectors and otel
why razorpay payment
why mongo db and redis
why ollama and free model -->

Routes and APIs and post requests

/api/ask : Handles AI queries and routes them to the appropriate model.
/api/redis-data : Provides Redis analytics and active user data.
/api/db-data : Displays MongoDB analytics and user data.
/api/s3-data : Fetches S3 analytics and file browser data.
/api/s3-page : Displays S3 file browser with pagination.
/api/s3-analytics : Provides S3 analytics and file statistics.
/api/s3-presigned : Generates presigned URLs for S3 files.
/api/s3-folders : Lists S3 folders for file organization.
/api/s3-search : Implements advanced search capabilities for S3 files.
/deliver : Handles invoice delivery via email.
/order : Manages order processing and payment verification.
/send-otp : Sends OTP for email verification during order processing.
/verify-otp : Verifies OTP for email confirmation.
/create-order : Creates an order after payment verification.
/verify-payment : Verifies payment status with Razorpay.
/api/token : Manages token generation and validation.
/api/validate-premium-token : Validates premium tokens for model access.
/api/services : Lists services of jaeger and their traces.
/api/heatmap : Provides a heatmap of user interactions and application performance from jaeger.
/api/traces : Displays trace information for requests and their paths.
/api/chat : Manages chat sessions and user interactions in oteldashapi.
/api/optimize : Provides optimization suggestions for application performance.
/otel (rewritten to /) : Integrates OpenTelemetry for observability.
/ (default/fallback route) : Serves the main application interface, including the chat box and admin dashboard.
Public API used :
- Cloudflare workers AI for Image Generation
- Gemini AI assistant in Jaeger Traces Explanation
Internal Routes:
- /api/embedding : Handles embedding generation for user queries.
- /api/vector : Manages vector storage and retrieval using Qdrant.

🧩 Components Overview (ARCH)

1. Frontend Pod: `ai-frontend`

Simple browser UI as Landing Page build on pure html

Tech: HTML, CSS, JavaScript
/chat.html : Frontend chat box with prompt & token input
/image.html : Image Generation UI with text input
/admin.html : Admin dashboard for analytics and data management
- s3 file browser
- active users
- redis and mongo db analytics
- Bucket Invoices
- S3 Search
- Total Orders and Revenue
/otel : OpenTelemetry dashboard for observability integrated with jaeger and Gemini AI
/pay.html : Payment page to buy premium model access

2. adminapi microservice

Tech: Node.js, Express, MongoDB, Redis, S3
Handles admin operations, analytics, and data management.
Provides endpoints for viewing Redis, MongoDB, and S3 analytics.
Show active users, total orders, and revenue.
Integrates with S3 for invoice preview with presigned URLs.
Backed by Advanced search capabilities for S3 files.

3. askapi microservice

Tech: Node.js, Express, Redis, MongoDB, Ollama
Handles AI queries and token validation from mongo db and redis.
Routes requests to the appropriate AI model.
Integrates with vectorapi to create embedding and store in vector DB and handle vector search for enhanced query handling.
Uses OpenTelemetry for tracing and send otel traces to Jaeger and then later visualized in otel dashboard.

4. deliverapi microservice

Tech: Node.js, Express, S3, Kafka
Manages invoice generation and and Email delivery to users.
Integrates with S3 for invoice storage
Uses Kafka for event-driven processing of order events.

5. tokenapi microservice

Tech: Node.js, Express, MongoDB, Redis
Manages token generation and validation.
Issues free tokens for basic model access and store in Redis with 1-hour TTL.
Generate and Validates premium tokens against MongoDB for paid model and cache in Redis.
Integrates with orderapi for premium token issuance after payment.

6. orderapi microservice

Tech: Node.js, Express, MongoDB, Redis, Kafka
Handles order processing, Email verification via OTP and payment verification.
Integrates paymentapi which is backend by Razorpay for payment processing.
Integrate with Tokenapi to Generates premium tokens upon successful payment and stores them in Redis and MongoDB.
Created orders and tokens send to kafka topic for further processing e.g. deliverapi

7. paymentapi microservice

Tech: Node.js, Express, Razorpay
Manages payment operations using Razorpay (test-keys).
Handles OTP verification and payment completion.
Integrates with orderapi to create premium tokens after payment.

8. oteldash microservice

Tech: React, Gemini AI
Provides observability dashboards for monitoring application traces and performance.
Integrates with Jaeger for distributed tracing visualization.
Displays metrics and traces from various microservices.
Dashboard AI Feature include :
- Trace visualization with detailed spans and logs.
- Integration with Gemini AI for enhanced trace explanations.
- Get Optimized Recommendations based on trace data.
- Trace Explanation using Gemini AI for better understanding of complex traces.

9. otelapi microservice

Tech: Go, OpenTelemetry, Jaeger
Backend for OpenTelemetry metrics and traces.
Get traces from jaeger and visualized in oteldash.
Provides APIs for querying and visualizing metrics.
handle the AI explanation of traces using Gemini AI.

10. vectorapi microservice

Tech: Python, Qdrant, Embeddings
Manages vector embeddings and similarity search.
Integrates with Qdrant for efficient vector storage and retrieval.
Integrate with embeddingapi to create embeddings for user queries and store them in Qdrant.
Integrates with askapi to check vector cache before querying AI models.

11. embeddingapi microservice

Tech: Python, OpenAI Embeddings, Qdrant
Generates embeddings for user queries using OpenAI's embedding models.
Handles embedding generation for text queries.
Integrates with vectorapi for storing and querying embeddings.

12. Other Services

ollamapods

Hosts various AI models using Ollama.
Provides endpoints for querying AI models.
Supports Custom Model Hosting and management.

REDIS

Caching layer for token storage and user sessions ttl 1 hour
Used for quick lookups and reducing database load

MongoDB

Primary database for user data, orders details and premium tokens
Stores persistent data with high availability

Qdrant (VectorDB)

Specialized database for storing and querying vector embeddings
Supports efficient similarity search and retrieval

Kafka (Event Streaming)

Used to handle order and delivery events
Ensures decoupled communication between services

S3 (AWS)

Object storage for invoices and other files
Provides scalable storage with presigned URLs for secure access

Cloudflare AI Workers

Provides AI capabilities for text/image generation
Integrates with the application for enhanced AI features

13. Load Testing

Load testing is performed using k6 to ensure the application can handle high traffic.
Simulates user interactions and measures performance metrics.

AI Models Overview

Top Models

Model	Size (Quantized)	RAM (Min)	GPU (Optional)	Notes
TinyLlama	~1.1 GB	4 GB	None / 2 GB+	Lightweight
Mistral-7B	~4.2 GB	8–16 GB	8 GB+ VRAM	Powerful general-purpose
CodeLlama	4.5–10 GB	16–24 GB	8–16 GB+ VRAM	Code-optimized
LLaMA 2	4.5–40 GB	16–80 GB	8–64 GB+ VRAM	Versatile but resource-heavy
Phi-2	~1.7 GB	6–8 GB	4 GB+ VRAM	Efficient and compact

Mini Models

Model Name	Approx. Size	RAM Required	Description
TinyLlama (1.1B)	~1.1 GB	2–3 GB	Extremely lightweight; suitable for simple QA/chat
Phi-1.5 / Phi-2	~1.5–1.7 GB	3–4 GB	Compact model from Microsoft optimized for reasoning
Gemma-2B (Google)	~2.1 GB (quantized)	~4 GB	Lightweight open-source model focused on chat

📌 Author

Ahmad Raza
Sr. DevOps Engineer | Cloud Infra Specialist
🔗 ahmadraza.in (opens in a new tab)
🔗 linkedin.com/in/ahmad-raza-devops (opens in a new tab)

For more, visit ahmadraza.in (opens in a new tab)
Detailed commands, manifests, and guides are available on my blog.

Restrict Priv Pod.yaml Operation (DevOps) - Guide