Nik.

HELLO, I'M

Nikhil Juluri

I am a |

Master's student in Computer Science at UIC specializing in Machine Learning, NLP, and Inference Optimizations. Passionate about building scalable RAG pipelines, GenAI applications, and LLM workflows. Former Software Developer at Deloitte with 2+ years of experience delivering enterprise solutions.

Nikhil Juluri
Scroll

+

Years Experience

+

Projects Completed

K+

Lines of Code

+

Technolgies Mastered

About Me

About Me

I'm Nikhil Juluri, an AI/ML Engineer currently pursuing my Master's in Computer Science at the University of Illinois Chicago (GPA: 3.8). with a strong foundation in Electronics and Communication from CBIT, Hyderabad. I've spent over 2 years as a Software Engineer at Deloitte, building scalable AI systems and optimized financial workflows.

My passion lies in building intelligent systems that can understand and process information at scale. As a Graduate Research Assistant at UIC, I'm focused on developing RAG pipelines and LLM inference workflows using PyTorch and Hugging Face Transformers.

I specialize in:

  • Large Language Models (LLMs) & RAG
  • Deep Learning & Natural Language Processing (NLP)
  • Data Engineering & Scalable ML Pipelines
  • MLOps & Infrastructure (AWS, Docker, Kubernetes)

I'm experienced in fine-tuning models (LoRA/QLoRA), optimizing inference with vLLM, and managing end-to-end ML lifecycles with MLflow and Ray Tune. I'm driven by the challenge of translating complex research into production-ready solutions that deliver real-world value.

Featured Projects

Showcasing advanced capabilities in GenAI, Large Language Models, and MLOps infrastructure.

Financial Portfolio Automation using RAG & LLMs

PythonPyTorchLangChainFAISSLoRAFastAPIDockerAWSReact.js
  • Developed an AI-powered system to streamline financial account management and portfolio rebalancing for clients, integrating LLMs and RAG pipelines for context-aware recommendations.
  • Implemented a Retrieval-Augmented Generation pipeline using LangChain with FAISS vector database to retrieve financial statements and ETF/mutual fund data.
  • Integrated LoRA adapters on DistilBERT for specialized financial text understanding, achieving 95% accuracy in extracting actionable insights.
  • Deployed APIs using FastAPI and Docker for inference and connected real-time front-end dashboards built with React.js, reducing processing time by 60%.

Large Language Model Fine-Tuning & Inference

PyTorchHugging FaceLoRAvLLMFastAPIDockerKubernetesAWS EC2/EKS
  • Fine-tuned an open-source LLaMA-8B quantized model on domain-specific financial datasets to generate high-quality financial insights.
  • Conducted hyperparameter tuning with learning rate scheduling and batch size optimization, along with RLHF-inspired reward modeling.
  • Deployed inference using vLLM on AWS EC2 GPU cluster, integrated with FastAPI, Docker, and Kubernetes, enabling sub-200ms token latency.

End-to-End MLOps Pipeline for Financial AI

PyTorchKubeflowAirflowMLflowvLLMTritonDockerKubernetesAWS
  • Built a full MLOps and LLMOps infrastructure to streamline training, deployment, monitoring, and versioning of ML and LLM models.
  • Developed a reproducible pipeline using Kubeflow and Airflow for data ingestion, preprocessing, training, and model versioning.
  • Integrated MLflow for experiment tracking and automated deployment to Kubernetes clusters with Docker containers.
  • Orchestrated high-performance LLM inference with vLLM and Triton, achieving 30% cost reduction.

My Experience

My professional journey in software and AI development.

Graduate Research Assistant

University of Illinois at Chicago

June 2025 – PresentChicago, IL
  • LLM & RAG Development: Working on building RAG pipelines and LLM inference workflows using PyTorch and Hugging Face Transformers - mostly experimented with LLaMA and DistilBERT models. Set up vector databases like FAISS and Weaviate to handle the retrieval part. Tried out supervised fine-tuning and some RLHF-based alignment techniques on domain-specific datasets to make the inference faster and more memory-efficient. Spent a lot of time with Ray Tune doing hyperparameter tuning, playing around with learning rates, batch sizes, and attention parameters until we got better performance.
  • Data Processing & EDA: Put together scalable AI/ML pipelines that covered the whole process - data ingestion, ETL, exploratory analysis, and feature engineering. Used Pandas, NumPy, and scikit-learn pretty heavily for all of this. Made sure we had good data quality throughout and kept everything version-controlled so the experiments could be reproduced easily. This was really important for making our ML work research-ready.
  • Deployment & MLOps: Got hands-on experience packaging and deploying models with Docker and Kubernetes on AWS. Set up basic monitoring and logging for our experimental inference workflows so we could see how things were running in real-time. This made it a lot easier to catch issues early and iterate quickly during development.

Software Engineer | ML & LLM Inference, GenAI Systems

Deloitte

Sep 2022 - Jul 2024Hyderabad, India
  • Built AI-powered financial account systems using RAG pipelines, LLM inference, vector databases, LangChain, and LangGraph, deploying FastAPI microservices with Docker and Kubernetes on AWS (Lambda, EC2, S3) for real-time queries across millions of accounts.
  • Optimized payment workflows with Node.js, TypeScript, and Apex, reducing processing time by 60%, helping a client with a $50,000 Flex Account achieve $10,000 turnover growth.
  • Fine-tuned transformer models using LoRA/QLoRA with MLflow experiment tracking and Optuna hyperparameter optimization, cutting inference latency by 25%. Integrated vLLM for high-throughput serving with optimized GPU utilization, quantization, and KV caching.
  • Built RESTful and GraphQL APIs via AWS API Gateway and FastAPI Lambda that improved data accuracy by 40% and reduced support tickets by 30%, helping families track 529 plans and reach $20,000 annual savings goals.
  • Led data migrations using SOQL, Data Loader, and ETL pipelines with PyTorch, NumPy, Pandas, and scikit-learn for ML validation, achieving 98% accuracy while reducing manual work by 30% and migrating $5M in legacy plans.
  • Established MLOps with MLflow for model lifecycle management and built CI/CD pipelines for automated deployment and monitoring with FastAPI. Implemented continuous retraining, logging, and rollback systems maintaining 99.9% uptime.

Software Engineer Intern | AI/ML & LLM Systems

Deloitte

Jun 2022 - Aug 2022Hyderabad, India
  • Conducted exploratory data analysis on financial datasets using Pandas, NumPy, and Matplotlib to uncover portfolio trends and anomalies. Built Python ML pipelines with PyTorch and scikit-learn for predictive analytics and recommendation tasks.
  • Developed proof-of-concept LLM prototypes using DistilBERT and LLaMA for small-scale RAG retrieval pipelines that provided AI-powered insights into account workflows.
  • Created React.js and LWC dashboards to visualize ML insights and portfolio recommendations for end users. Integrated RESTful APIs with AWS Lambda and API Gateway to connect ML pipelines with dashboards.
  • Implemented ETL pipelines to clean, preprocess, and transform financial datasets, achieving around 95% data accuracy. Packaged experiments in Docker containers for reproducibility and shared cloud deployment across the team.

My Education

University of Illinois Chicago

Master of Science in Computer Science

Aug 2024 – May 2026 (Expected)
GPA: 3.8
Coursework: Big Data Mining, Data Mining and Text Mining, Cloud Computing, Computer Algorithms, Database Management Systems, Machine Learning

Chaitanya Bharathi Institute of Technology

Bachelor of Engineering in Electronics and Communication

Aug 2018 – June 2022
Focus: Computer Networking, Operating Systems, AI and Machine Learning

My Skills

Technical proficiency across various domains.

Programming Languages

Python95%
Java85%
C++80%
C80%
SQL (MSSQL)85%
JavaScript/TypeScript85%

Data Science & ML

NumPy & Pandas95%
Scikit-learn/SciPy90%
Matplotlib & Seaborn85%
XGBoost85%
NLTK & SpaCy85%
Exploratory Data Analysis90%

Deep Learning & GenAI

PyTorch & TensorFlow90%
Hugging Face Transformers90%
RAG & LLM Inference90%
RLHF & Fine-tuning85%
Ray Tune80%

Cloud & MLOps

AWS85%
Docker & Kubernetes85%
MLflow & Optuna85%
CI/CD Pipelines80%

Frameworks & Tools

React/Next.js85%
FastAPI & Spring Boot85%
FAISS & Weaviate85%
Git90%
MongoDB & PostgreSQL80%

Certifications & Awards

Salesforce Certified AI Associate
Salesforce Sharing and Visibility Architect
Salesforce Platform App Builder
Salesforce Administrator
Salesforce Platform Developer 1
AI and Machine Learning Internship - National Instruments & Cognibot
SPOT Award - Deloitte (Outstanding Contributions)

Get In Touch

Let's work together and create something extraordinary.

Contact Information

Location

821 South Laflin, Chicago, IL

Follow Me