TraceMyPods β Application Workflow (Dev Focused)
TraceMyPods is AI ChatBot with OllamaAI integration, enabling users to interact with AI models effortlessly. This platform is designed for developers and DevOps engineers, offering a comprehensive solution for building, deploying, and managing microservice architecturesTech Stack :Kafka|Redis|MongoDB|Ollama (AI)|Node.js|Python|GO|VectorDB (qdrant)|Microservices|Payments (razorpay)|Invoice/Reports|PostmanCollection|LoadTesting|OpenTelemetry|Gemini_AI|S3(File Browser) |Image Generation API
π Landing Page

π€ AI Assistant (Chat Box)

Workflow Overview

Workflow Architecture
This document outlines the architecture and workflow of the TraceMyPods application, detailing how various components interact to provide a seamless AI chatbot experience.- App level communication
- kafka uses
- why vectors and otel
- why razorpay payment
- why mongo db and redis
- why ollama and free model β>
Routes and APIs and post requests
/api/ask: Handles AI queries and routes them to the appropriate model./api/redis-data: Provides Redis analytics and active user data./api/db-data: Displays MongoDB analytics and user data./api/s3-data: Fetches S3 analytics and file browser data./api/s3-page: Displays S3 file browser with pagination./api/s3-analytics: Provides S3 analytics and file statistics./api/s3-presigned: Generates presigned URLs for S3 files./api/s3-folders: Lists S3 folders for file organization./api/s3-search: Implements advanced search capabilities for S3 files./deliver: Handles invoice delivery via email./order: Manages order processing and payment verification./send-otp: Sends OTP for email verification during order processing./verify-otp: Verifies OTP for email confirmation./create-order: Creates an order after payment verification./verify-payment: Verifies payment status with Razorpay./api/token: Manages token generation and validation./api/validate-premium-token: Validates premium tokens for model access./api/services: Lists services of jaeger and their traces./api/heatmap: Provides a heatmap of user interactions and application performance from jaeger./api/traces: Displays trace information for requests and their paths./api/chat: Manages chat sessions and user interactions in oteldashapi./api/optimize: Provides optimization suggestions for application performance./otel(rewritten to/) : Integrates OpenTelemetry for observability./(default/fallback route) : Serves the main application interface, including the chat box and admin dashboard.- Public API used :
- Cloudflare workers AI for Image Generation
- Gemini AI assistant in Jaeger Traces Explanation
- Internal Routes:
/api/embedding: Handles embedding generation for user queries./api/vector: Manages vector storage and retrieval using Qdrant.
π§© Components Overview (ARCH)
1. Frontend Pod: ai-frontend
Simple browser UI as Landing Page build on pure html
- Tech: HTML, CSS, JavaScript
- /chat.html : Frontend chat box with prompt & token input
- /image.html : Image Generation UI with text input
- /admin.html : Admin dashboard for analytics and data management
- s3 file browser
- active users
- redis and mongo db analytics
- Bucket Invoices
- S3 Search
- Total Orders and Revenue
- /otel : OpenTelemetry dashboard for observability integrated with jaeger and Gemini AI
- /pay.html : Payment page to buy premium model access
2. adminapi microservice
- Tech: Node.js, Express, MongoDB, Redis, S3
- Handles admin operations, analytics, and data management.
- Provides endpoints for viewing Redis, MongoDB, and S3 analytics.
- Show active users, total orders, and revenue.
- Integrates with S3 for invoice preview with presigned URLs.
- Backed by Advanced search capabilities for S3 files.
3. askapi microservice
- Tech: Node.js, Express, Redis, MongoDB, Ollama
- Handles AI queries and token validation from mongo db and redis.
- Routes requests to the appropriate AI model.
- Integrates with vectorapi to create embedding and store in vector DB and handle vector search for enhanced query handling.
- Uses OpenTelemetry for tracing and send otel traces to Jaeger and then later visualized in otel dashboard.
4. deliverapi microservice
- Tech: Node.js, Express, S3, Kafka
- Manages invoice generation and and Email delivery to users.
- Integrates with S3 for invoice storage
- Uses Kafka for event-driven processing of order events.
5. tokenapi microservice
- Tech: Node.js, Express, MongoDB, Redis
- Manages token generation and validation.
- Issues free tokens for basic model access and store in Redis with 1-hour TTL.
- Generate and Validates premium tokens against MongoDB for paid model and cache in Redis.
- Integrates with orderapi for premium token issuance after payment.
6. orderapi microservice
- Tech: Node.js, Express, MongoDB, Redis, Kafka
- Handles order processing, Email verification via OTP and payment verification.
- Integrates
paymentapiwhich is backend by Razorpay for payment processing. - Integrate with
Tokenapito Generates premium tokens upon successful payment and stores them in Redis and MongoDB. - Created orders and tokens send to
kafka topicfor further processing e.g.deliverapi
7. paymentapi microservice
- Tech: Node.js, Express, Razorpay
- Manages payment operations using
Razorpay(test-keys). - Handles OTP verification and payment completion.
- Integrates with
orderapito create premium tokens after payment.
8. oteldash microservice
- Tech: React, Gemini AI
- Provides observability dashboards for monitoring application traces and performance.
- Integrates with Jaeger for distributed tracing visualization.
- Displays metrics and traces from various microservices.
- Dashboard AI Feature include :
- Trace visualization with detailed spans and logs.
- Integration with Gemini AI for enhanced trace explanations.
- Get Optimized Recommendations based on trace data.
- Trace Explanation using Gemini AI for better understanding of complex traces.
9. otelapi microservice
- Tech: Go, OpenTelemetry, Jaeger
- Backend for OpenTelemetry metrics and traces.
- Get traces from jaeger and visualized in oteldash.
- Provides APIs for querying and visualizing metrics.
- handle the AI explanation of traces using Gemini AI.
10. vectorapi microservice
- Tech: Python, Qdrant, Embeddings
- Manages vector embeddings and similarity search.
- Integrates with Qdrant for efficient vector storage and retrieval.
- Integrate with
embeddingapito create embeddings for user queries and store them in Qdrant. - Integrates with
askapito check vector cache before querying AI models.
11. embeddingapi microservice
- Tech: Python, OpenAI Embeddings, Qdrant
- Generates embeddings for user queries using OpenAIβs embedding models.
- Handles embedding generation for text queries.
- Integrates with
vectorapifor storing and querying embeddings.
12. Other Services
ollamapods
- Hosts various AI models using Ollama.
- Provides endpoints for querying AI models.
- Supports Custom Model Hosting and management.
REDIS
- Caching layer for token storage and user sessions
ttl 1 hour - Used for quick lookups and reducing database load
MongoDB
- Primary database for user data, orders details and premium tokens
- Stores persistent data with high availability
Qdrant (VectorDB)
- Specialized database for storing and querying vector embeddings
- Supports efficient similarity search and retrieval
Kafka (Event Streaming)
- Used to handle order and delivery events
- Ensures decoupled communication between services
S3 (AWS)
- Object storage for invoices and other files
- Provides scalable storage with presigned URLs for secure access
Cloudflare AI Workers
- Provides AI capabilities for text/image generation
- Integrates with the application for enhanced AI features
13. Load Testing
- Load testing is performed using
k6to ensure the application can handle high traffic. - Simulates user interactions and measures performance metrics.
AI Models Overview
Top Models
| Model | Size (Quantized) | RAM (Min) | GPU (Optional) | Notes |
|---|---|---|---|---|
| TinyLlama | ~1.1 GB | 4 GB | None / 2 GB+ | Lightweight |
| Mistral-7B | ~4.2 GB | 8β16 GB | 8 GB+ VRAM | Powerful general-purpose |
| CodeLlama | 4.5β10 GB | 16β24 GB | 8β16 GB+ VRAM | Code-optimized |
| LLaMA 2 | 4.5β40 GB | 16β80 GB | 8β64 GB+ VRAM | Versatile but resource-heavy |
| Phi-2 | ~1.7 GB | 6β8 GB | 4 GB+ VRAM | Efficient and compact |
Mini Models
| Model Name | Approx. Size | RAM Required | Description |
|---|---|---|---|
| TinyLlama (1.1B) | ~1.1 GB | 2β3 GB | Extremely lightweight; suitable for simple QA/chat |
| Phi-1.5 / Phi-2 | ~1.5β1.7 GB | 3β4 GB | Compact model from Microsoft optimized for reasoning |
| Gemma-2B (Google) | ~2.1 GB (quantized) | ~4 GB | Lightweight open-source model focused on chat |
π Author
Ahmad RazaSr. DevOps Engineer | Cloud Infra Specialist
π ahmadraza.in
π linkedin.com/in/ahmad-raza-devops
For more, visit ahmadraza.in
Detailed commands, manifests, and guides are available on my blog.

