Nik.

HELLO, I'M

Nikhil Juluri

I am a |

Master's student in Computer Science at UIC specializing in Machine Learning, NLP, and Inference Optimizations. Passionate about building scalable RAG pipelines, GenAI applications, and LLM workflows. Former Software Engineer at Deloitte with 3.5+ years of experience delivering enterprise solutions.

Nikhil Juluri
Scroll

+

Years Experience

+

Projects Completed

K+

Lines of Code

+

Technologies Mastered

About Me

About Me

Hi, I'm Nikhil Juluri, an AI and Machine Learning Engineer currently pursuing my Master's in Computer Science at the University of Illinois Chicago, where I maintain a GPA of 3.8. I completed my undergraduate studies in Electronics and Communication Engineering at CBIT in Hyderabad, which gave me a strong foundation in systems thinking and problem solving. Before returning to academia, I spent over three and a half years at Deloitte as a Software Engineer, working on production-grade platforms in the financial domain and building cloud-based systems that supported real business operations.

During my time at Deloitte, I worked across backend engineering, cloud infrastructure, and applied machine learning. I helped develop scalable microservices, APIs, and data-driven applications, and contributed to AI-powered workflows that combined natural language processing with recommendation systems. Alongside this, I gained strong enterprise experience integrating Salesforce platforms and delivering secure, reliable solutions for clients. These experiences shaped my interest in building intelligent systems that move beyond prototypes and operate reliably in real-world environments.

I now focus my work on Generative AI and large language models as a Graduate Research Assistant at UIC. My research centers on retrieval-augmented generation pipelines and efficient LLM inference using PyTorch and Hugging Face Transformers. I spend much of my time experimenting with model fine-tuning, optimizing inference performance, and building reproducible pipelines that take models from experimentation to deployment. I enjoy working at the intersection of research and engineering, where ideas turn into usable systems.

I specialize in multimodal retrieval systems and agent-based workflows, large language model inference optimization, parameter-efficient fine-tuning with LoRA and QLoRA, and building end-to-end MLOps pipelines using AWS, Docker, and Kubernetes. I am driven by the challenge of translating complex research into production-ready solutions, and I enjoy creating scalable AI systems that deliver meaningful impact through thoughtful design, careful evaluation, and continuous learning.

Featured Projects

Showcasing advanced capabilities in GenAI, Large Language Models, and MLOps infrastructure.

Adaptive LLM Evaluation & Self-Optimizing Agents

LangGraphDSPyvLLMTruLensGuardrails AIFastAPIPrometheusDockerAWS
  • Built an adaptive LLM orchestration framework using LangGraph and DSPy to dynamically route queries across multiple foundation models, improving response quality consistency by 27%.
  • Implemented self-refinement pipelines with TruLens for hallucination detection and automated response scoring, reducing factual error rates by 32% while preserving end-to-end latency.
  • Established LLMOps pipelines with vLLM and async FastAPI, integrating Prometheus/Grafana for token and latency monitoring, reducing inference costs by 24%.

Financial Portfolio Automation using RAG & LLMs

PythonPyTorchLangChainFAISSLoRAFastAPIDockerAWSReact.js
  • Developed an AI-powered system to streamline financial account management and portfolio rebalancing for clients, integrating LLMs and RAG pipelines for context-aware recommendations.
  • Implemented a Retrieval-Augmented Generation pipeline using LangChain with FAISS vector database to retrieve financial statements and ETF/mutual fund data.
  • Integrated LoRA adapters on DistilBERT for specialized financial text understanding, achieving 95% accuracy in extracting actionable insights.
  • Deployed APIs using FastAPI and Docker for inference and connected real-time front-end dashboards built with React.js, reducing processing time by 60%.

Large Language Model Fine-Tuning & Inference

PyTorchHugging FaceLoRAvLLMFastAPIDockerKubernetesAWS EC2/EKS
  • Fine-tuned an open-source LLaMA-8B quantized model on domain-specific financial datasets to generate high-quality financial insights.
  • Conducted hyperparameter tuning with learning rate scheduling and batch size optimization, along with RLHF-inspired reward modeling.
  • Deployed inference using vLLM on AWS EC2 GPU cluster, integrated with FastAPI, Docker, and Kubernetes, enabling sub-200ms token latency.

End-to-End MLOps Pipeline for Financial AI

PyTorchKubeflowAirflowMLflowvLLMTritonDockerKubernetesAWS
  • Built a full MLOps and LLMOps infrastructure to streamline training, deployment, monitoring, and versioning of ML and LLM models.
  • Developed a reproducible pipeline using Kubeflow and Airflow for data ingestion, preprocessing, training, and model versioning.
  • Integrated MLflow for experiment tracking and automated deployment to Kubernetes clusters with Docker containers.
  • Orchestrated high-performance LLM inference with vLLM and Triton, achieving 30% cost reduction.

My Experience

My professional journey in software and AI development.

Graduate Research Assistant

University of Illinois at Chicago

June 2025 – PresentChicago, IL
  • Designed multimodal RAG systems combining transformer-based LLaMA models with Vision Transformers for text-image reasoning, integrating FAISS similarity indexing and Weaviate to retrieve structured and unstructured data, improving contextual response quality by 30% across domain-specific research benchmarks.
  • Enhanced model efficiency through LoRA fine-tuning and alignment strategies while optimizing inference via mixed precision, KV-caching, dynamic batching, parallelism, flashAttention and controlled decoding; leveraged Optuna for hyperparameter tuning, balancing latency, throughput, and relevance.
  • Deployed GPU-accelerated LLM pipelines on AWS using Docker and Kubernetes, implementing CUDA-based optimizations, memory management, token-level monitoring, and retrieval latency tracking to support scalable experimentation and real-time performance visibility.

Software Engineer — Machine Learning Engineer

Deloitte

Sep 2022 - Jul 2024Hyderabad, India
  • Built enterprise GenAI assistants using RAG over financial statements and external market APIs, integrating LangChain with vector databases to automate advisor workflows, reducing manual analysis time by 40%. Implemented response evaluation using answer relevance, groundedness, and hallucination checks.
  • Enhanced personalization by adapting transformer-based foundation models via LoRA/QLoRA and supervised fine-tuning, applying few-shot, role prompting, and controlled decoding (temperature, top-k/top-p). Improved recommendation relevance by 25% while monitoring token usage and optimizing inference costs.
  • Deployed scalable GenAI services on AWS Bedrock and SageMaker with provisioned throughput models, orchestrated via FastAPI and Lambda. Integrated structured RAG pipelines with database and API sources, implementing cost monitoring and latency tracking dashboards to maintain 99.9% uptime.

Software Engineer Intern — Machine Learning Engineer

Deloitte

May 2022 - Aug 2022Hyderabad, India
  • Built credit risk ML pipelines using ETL, feature engineering, and XGBoost/Random Forest with 5-fold CV (AUC 0.86), improving loan approval accuracy by 18%; extended to hybrid investment recommendations with collaborative filtering and content-based models, increasing CTR by 15%.
  • Completed the platform with fraud detection using Isolation Forest and Gradient Boosting (ROC-AUC, recall), reducing false positives by 25%; automated pipelines with scikit-learn and Docker, deploying FastAPI REST services on AWS Lambda for real-time transaction monitoring.

My Education

University of Illinois Chicago

Master of Science in Computer Science

Aug 2024 – May 2026 (Expected)
GPA: 3.8
Coursework: Cloud Computing, Algorithms, DBMS, Big Data Mining, Text Mining, ML on Graphs, Deep Learning with NLP

Chaitanya Bharathi Institute of Technology

Bachelor of Engineering in Electronics and Communication

Aug 2018 – June 2022
Focus: Computer Networking, Operating Systems, AI and Machine Learning

My Skills

Technical proficiency across various domains.

Programming & Data

Python95%
TypeScript/JavaScript90%
Java85%
SQL90%
NumPy & Pandas95%
SciPy & Plotly85%

Machine Learning

XGBoost & Random Forest90%
Clustering (K-Means, DBSCAN)85%
Recommendation Systems90%
Hyperparameter Tuning90%
Model Evaluation90%
ETL & Feature Engineering85%

GenAI & NLP

PyTorch & Transformers90%
Multimodal RAG95%
PEFT (LoRA/QLoRA)90%
spaCy & NLTK85%
Vector Search (FAISS, Pinecone)90%

Systems & MLOps

AWS Bedrock & SageMaker90%
Docker & Kubernetes85%
MLflow85%
FastAPI & Node.js90%
CI/CD Pipelines80%

Certifications & Awards

AWS Certified Generative AI Developer – Professional
AWS Certified AI Practitioner
Salesforce Certified AI Associate
5x Salesforce Certified (Platform Developer, App Builder, etc.)
Deloitte SPOT Award (Outstanding Contributions)
AI and Machine Learning Internship - National Instruments & Cognibot

Get In Touch

Let's work together and create something extraordinary.

Contact Information

Location

821 South Laflin, Chicago, IL

Follow Me