HELLO, I'M
Nikhil Juluri
Master's student in Computer Science at UIC specializing in Machine Learning, NLP, and Inference Optimizations. Passionate about building scalable RAG pipelines, GenAI applications, and LLM workflows. Former Software Engineer at Deloitte with 3.5+ years of experience delivering enterprise solutions.

+
Years Experience
+
Projects Completed
K+
Lines of Code
+
Technologies Mastered

About Me
Hi, I'm Nikhil Juluri, an AI and Machine Learning Engineer currently pursuing my Master's in Computer Science at the University of Illinois Chicago, where I maintain a GPA of 3.8. I completed my undergraduate studies in Electronics and Communication Engineering at CBIT in Hyderabad, which gave me a strong foundation in systems thinking and problem solving. Before returning to academia, I spent over three and a half years at Deloitte as a Software Engineer, working on production-grade platforms in the financial domain and building cloud-based systems that supported real business operations.
During my time at Deloitte, I worked across backend engineering, cloud infrastructure, and applied machine learning. I helped develop scalable microservices, APIs, and data-driven applications, and contributed to AI-powered workflows that combined natural language processing with recommendation systems. Alongside this, I gained strong enterprise experience integrating Salesforce platforms and delivering secure, reliable solutions for clients. These experiences shaped my interest in building intelligent systems that move beyond prototypes and operate reliably in real-world environments.
I now focus my work on Generative AI and large language models as a Graduate Research Assistant at UIC. My research centers on retrieval-augmented generation pipelines and efficient LLM inference using PyTorch and Hugging Face Transformers. I spend much of my time experimenting with model fine-tuning, optimizing inference performance, and building reproducible pipelines that take models from experimentation to deployment. I enjoy working at the intersection of research and engineering, where ideas turn into usable systems.
I specialize in multimodal retrieval systems and agent-based workflows, large language model inference optimization, parameter-efficient fine-tuning with LoRA and QLoRA, and building end-to-end MLOps pipelines using AWS, Docker, and Kubernetes. I am driven by the challenge of translating complex research into production-ready solutions, and I enjoy creating scalable AI systems that deliver meaningful impact through thoughtful design, careful evaluation, and continuous learning.
Featured Projects
Showcasing advanced capabilities in GenAI, Large Language Models, and MLOps infrastructure.
Adaptive LLM Evaluation & Self-Optimizing Agents
- •Built an adaptive LLM orchestration framework using LangGraph and DSPy to dynamically route queries across multiple foundation models, improving response quality consistency by 27%.
- •Implemented self-refinement pipelines with TruLens for hallucination detection and automated response scoring, reducing factual error rates by 32% while preserving end-to-end latency.
- •Established LLMOps pipelines with vLLM and async FastAPI, integrating Prometheus/Grafana for token and latency monitoring, reducing inference costs by 24%.
Financial Portfolio Automation using RAG & LLMs
- •Developed an AI-powered system to streamline financial account management and portfolio rebalancing for clients, integrating LLMs and RAG pipelines for context-aware recommendations.
- •Implemented a Retrieval-Augmented Generation pipeline using LangChain with FAISS vector database to retrieve financial statements and ETF/mutual fund data.
- •Integrated LoRA adapters on DistilBERT for specialized financial text understanding, achieving 95% accuracy in extracting actionable insights.
- •Deployed APIs using FastAPI and Docker for inference and connected real-time front-end dashboards built with React.js, reducing processing time by 60%.
Large Language Model Fine-Tuning & Inference
- •Fine-tuned an open-source LLaMA-8B quantized model on domain-specific financial datasets to generate high-quality financial insights.
- •Conducted hyperparameter tuning with learning rate scheduling and batch size optimization, along with RLHF-inspired reward modeling.
- •Deployed inference using vLLM on AWS EC2 GPU cluster, integrated with FastAPI, Docker, and Kubernetes, enabling sub-200ms token latency.
End-to-End MLOps Pipeline for Financial AI
- •Built a full MLOps and LLMOps infrastructure to streamline training, deployment, monitoring, and versioning of ML and LLM models.
- •Developed a reproducible pipeline using Kubeflow and Airflow for data ingestion, preprocessing, training, and model versioning.
- •Integrated MLflow for experiment tracking and automated deployment to Kubernetes clusters with Docker containers.
- •Orchestrated high-performance LLM inference with vLLM and Triton, achieving 30% cost reduction.
My Experience
My professional journey in software and AI development.
Graduate Research Assistant
University of Illinois at Chicago
- Designed multimodal RAG systems combining transformer-based LLaMA models with Vision Transformers for text-image reasoning, integrating FAISS similarity indexing and Weaviate to retrieve structured and unstructured data, improving contextual response quality by 30% across domain-specific research benchmarks.
- Enhanced model efficiency through LoRA fine-tuning and alignment strategies while optimizing inference via mixed precision, KV-caching, dynamic batching, parallelism, flashAttention and controlled decoding; leveraged Optuna for hyperparameter tuning, balancing latency, throughput, and relevance.
- Deployed GPU-accelerated LLM pipelines on AWS using Docker and Kubernetes, implementing CUDA-based optimizations, memory management, token-level monitoring, and retrieval latency tracking to support scalable experimentation and real-time performance visibility.
Software Engineer — Machine Learning Engineer
Deloitte
- Built enterprise GenAI assistants using RAG over financial statements and external market APIs, integrating LangChain with vector databases to automate advisor workflows, reducing manual analysis time by 40%. Implemented response evaluation using answer relevance, groundedness, and hallucination checks.
- Enhanced personalization by adapting transformer-based foundation models via LoRA/QLoRA and supervised fine-tuning, applying few-shot, role prompting, and controlled decoding (temperature, top-k/top-p). Improved recommendation relevance by 25% while monitoring token usage and optimizing inference costs.
- Deployed scalable GenAI services on AWS Bedrock and SageMaker with provisioned throughput models, orchestrated via FastAPI and Lambda. Integrated structured RAG pipelines with database and API sources, implementing cost monitoring and latency tracking dashboards to maintain 99.9% uptime.
Software Engineer Intern — Machine Learning Engineer
Deloitte
- Built credit risk ML pipelines using ETL, feature engineering, and XGBoost/Random Forest with 5-fold CV (AUC 0.86), improving loan approval accuracy by 18%; extended to hybrid investment recommendations with collaborative filtering and content-based models, increasing CTR by 15%.
- Completed the platform with fraud detection using Isolation Forest and Gradient Boosting (ROC-AUC, recall), reducing false positives by 25%; automated pipelines with scikit-learn and Docker, deploying FastAPI REST services on AWS Lambda for real-time transaction monitoring.
My Education
University of Illinois Chicago
Master of Science in Computer Science
Chaitanya Bharathi Institute of Technology
Bachelor of Engineering in Electronics and Communication
My Skills
Technical proficiency across various domains.
Programming & Data
Machine Learning
GenAI & NLP
Systems & MLOps
Certifications & Awards
Get In Touch
Let's work together and create something extraordinary.