HELLO, I'M
Nikhil Juluri
Master's student in Computer Science at UIC specializing in Machine Learning, NLP, and Inference Optimizations. Passionate about building scalable RAG pipelines, GenAI applications, and LLM workflows. Former Software Developer at Deloitte with 2+ years of experience delivering enterprise solutions.

+
Years Experience
+
Projects Completed
K+
Lines of Code
+
Technolgies Mastered

About Me
I'm Nikhil Juluri, an AI/ML Engineer currently pursuing my Master's in Computer Science at the University of Illinois Chicago (GPA: 3.8). with a strong foundation in Electronics and Communication from CBIT, Hyderabad. I've spent over 2 years as a Software Engineer at Deloitte, building scalable AI systems and optimized financial workflows.
My passion lies in building intelligent systems that can understand and process information at scale. As a Graduate Research Assistant at UIC, I'm focused on developing RAG pipelines and LLM inference workflows using PyTorch and Hugging Face Transformers.
I specialize in:
- Large Language Models (LLMs) & RAG
- Deep Learning & Natural Language Processing (NLP)
- Data Engineering & Scalable ML Pipelines
- MLOps & Infrastructure (AWS, Docker, Kubernetes)
I'm experienced in fine-tuning models (LoRA/QLoRA), optimizing inference with vLLM, and managing end-to-end ML lifecycles with MLflow and Ray Tune. I'm driven by the challenge of translating complex research into production-ready solutions that deliver real-world value.
Featured Projects
Showcasing advanced capabilities in GenAI, Large Language Models, and MLOps infrastructure.
Financial Portfolio Automation using RAG & LLMs
- •Developed an AI-powered system to streamline financial account management and portfolio rebalancing for clients, integrating LLMs and RAG pipelines for context-aware recommendations.
- •Implemented a Retrieval-Augmented Generation pipeline using LangChain with FAISS vector database to retrieve financial statements and ETF/mutual fund data.
- •Integrated LoRA adapters on DistilBERT for specialized financial text understanding, achieving 95% accuracy in extracting actionable insights.
- •Deployed APIs using FastAPI and Docker for inference and connected real-time front-end dashboards built with React.js, reducing processing time by 60%.
Large Language Model Fine-Tuning & Inference
- •Fine-tuned an open-source LLaMA-8B quantized model on domain-specific financial datasets to generate high-quality financial insights.
- •Conducted hyperparameter tuning with learning rate scheduling and batch size optimization, along with RLHF-inspired reward modeling.
- •Deployed inference using vLLM on AWS EC2 GPU cluster, integrated with FastAPI, Docker, and Kubernetes, enabling sub-200ms token latency.
End-to-End MLOps Pipeline for Financial AI
- •Built a full MLOps and LLMOps infrastructure to streamline training, deployment, monitoring, and versioning of ML and LLM models.
- •Developed a reproducible pipeline using Kubeflow and Airflow for data ingestion, preprocessing, training, and model versioning.
- •Integrated MLflow for experiment tracking and automated deployment to Kubernetes clusters with Docker containers.
- •Orchestrated high-performance LLM inference with vLLM and Triton, achieving 30% cost reduction.
My Experience
My professional journey in software and AI development.
Graduate Research Assistant
University of Illinois at Chicago
- LLM & RAG Development: Working on building RAG pipelines and LLM inference workflows using PyTorch and Hugging Face Transformers - mostly experimented with LLaMA and DistilBERT models. Set up vector databases like FAISS and Weaviate to handle the retrieval part. Tried out supervised fine-tuning and some RLHF-based alignment techniques on domain-specific datasets to make the inference faster and more memory-efficient. Spent a lot of time with Ray Tune doing hyperparameter tuning, playing around with learning rates, batch sizes, and attention parameters until we got better performance.
- Data Processing & EDA: Put together scalable AI/ML pipelines that covered the whole process - data ingestion, ETL, exploratory analysis, and feature engineering. Used Pandas, NumPy, and scikit-learn pretty heavily for all of this. Made sure we had good data quality throughout and kept everything version-controlled so the experiments could be reproduced easily. This was really important for making our ML work research-ready.
- Deployment & MLOps: Got hands-on experience packaging and deploying models with Docker and Kubernetes on AWS. Set up basic monitoring and logging for our experimental inference workflows so we could see how things were running in real-time. This made it a lot easier to catch issues early and iterate quickly during development.
Software Engineer | ML & LLM Inference, GenAI Systems
Deloitte
- Built AI-powered financial account systems using RAG pipelines, LLM inference, vector databases, LangChain, and LangGraph, deploying FastAPI microservices with Docker and Kubernetes on AWS (Lambda, EC2, S3) for real-time queries across millions of accounts.
- Optimized payment workflows with Node.js, TypeScript, and Apex, reducing processing time by 60%, helping a client with a $50,000 Flex Account achieve $10,000 turnover growth.
- Fine-tuned transformer models using LoRA/QLoRA with MLflow experiment tracking and Optuna hyperparameter optimization, cutting inference latency by 25%. Integrated vLLM for high-throughput serving with optimized GPU utilization, quantization, and KV caching.
- Built RESTful and GraphQL APIs via AWS API Gateway and FastAPI Lambda that improved data accuracy by 40% and reduced support tickets by 30%, helping families track 529 plans and reach $20,000 annual savings goals.
- Led data migrations using SOQL, Data Loader, and ETL pipelines with PyTorch, NumPy, Pandas, and scikit-learn for ML validation, achieving 98% accuracy while reducing manual work by 30% and migrating $5M in legacy plans.
- Established MLOps with MLflow for model lifecycle management and built CI/CD pipelines for automated deployment and monitoring with FastAPI. Implemented continuous retraining, logging, and rollback systems maintaining 99.9% uptime.
Software Engineer Intern | AI/ML & LLM Systems
Deloitte
- Conducted exploratory data analysis on financial datasets using Pandas, NumPy, and Matplotlib to uncover portfolio trends and anomalies. Built Python ML pipelines with PyTorch and scikit-learn for predictive analytics and recommendation tasks.
- Developed proof-of-concept LLM prototypes using DistilBERT and LLaMA for small-scale RAG retrieval pipelines that provided AI-powered insights into account workflows.
- Created React.js and LWC dashboards to visualize ML insights and portfolio recommendations for end users. Integrated RESTful APIs with AWS Lambda and API Gateway to connect ML pipelines with dashboards.
- Implemented ETL pipelines to clean, preprocess, and transform financial datasets, achieving around 95% data accuracy. Packaged experiments in Docker containers for reproducibility and shared cloud deployment across the team.
My Education
University of Illinois Chicago
Master of Science in Computer Science
Chaitanya Bharathi Institute of Technology
Bachelor of Engineering in Electronics and Communication
My Skills
Technical proficiency across various domains.
Programming Languages
Data Science & ML
Deep Learning & GenAI
Cloud & MLOps
Frameworks & Tools
Certifications & Awards
Get In Touch
Let's work together and create something extraordinary.