Technical Expertise
Generative AI & LLMs
Agentic AI, Advanced RAG, LangChain, OpenAI, Hugging Face, Llama Index, Multimodal Models, Prompt Engineering, Fine-Tuning, Semantic Chunking, Hybrid Retrieval.
Data & ML Engineering
Python, R, SQL, TensorFlow, PyTorch, Keras, Scikit-learn, Pandas, Matplotlib, Data Structures and Algorithms, Text/Image Analysis.
Cloud & MLOps
AWS (EC2, Lambda, S3), Azure Data Warehousing, Azure Synapse, HPC, CI/CD, MLOps, Prefect, Spark/PySpark, Git/GitHub.
Deep Learning
Transformers, CNNs, RNNs, LSTMs, CLIP, Reinforcement Learning.
Education
M.S. in Information Science (ML)
University of Arizona
July 2024 - May 2026 | GPA: 4.0
- Machine Learning & Deep Learning
- Applied Natural Language Processing
- Data Mining & Analysis
- Cloud Data Warehousing (Azure)
Bachelor's in Data Science
Jain University
Aug 2021 - July 2024 | CGPA: 3.7/4
- Statistical Inference & Modeling
- Machine Learning Algorithms
- Time-Series Analysis
- Advanced Data Visualization
About Me
Avi Kumar Talaviya is a data scientist and ML engineer passionate about turning complex data into actionable intelligence. With hands-on experience in statistical inference, data visualisation, data analytics tools like pandas, numpy, statsmodels, sklearn, machine learning, deep learning, and large language models, he has applied advanced analytics to domains ranging from healthcare EEG/fMRI research to environmental AQI forecasting and traffic-severity prediction to find critical insights and present to non-tech audience.
Avi is skilled in Python, R, SQL, cloud deployment on AWS, and building end-to-end AI solutions using frameworks like TensorFlow, PyTorch, and LangChain. He enjoys designing scalable pipelines, optimising models on high-performance computing environments, and mentoring learners in data analytics. Avi Kumar Talaviya is suitable for the data science, machine learning and natural language processing data scientist roles and open for such roles in the technology, product and healthcare domains.
Interview Prep ✨ (Powered by Gemini)
Click to generate a challenging interview question based on the experience and projects listed on this page.
Work Experience
Research Collaborator - University of Arizona (ECE)
Aug 2025 - Present | Tucson, AZ
- Designed and deployed data-preprocessing pipelines for EEG-fMRI signals, resolving inconsistencies in raw recordings and boosting downstream model stability by 15%.
- Implemented YOLO-based object detection to automate region-of-interest identification in neuroimaging data, cutting manual labelling time by 30%.
- Trained and fine-tuned transformer models on the university’s high-performance computing cluster, reducing runtime by 25%.
- Integrated multi-modal EEG and fMRI features using transformer architectures for an 18% gain in classification accuracy.
- Optimised the CLIP multi-model to train medical image-text pairs for the classification and report generation, achieving comparable SOTA performance.
Data Science Specialist - TOPS Technologies
Sep 2024 - June 2025 | Surat
- Processed 500GB+ of raw, multi-source data using Python (Pandas, NumPy), reducing data preparation time by 40%.
- Implemented robust validation checks, achieving 99.9% accuracy in mission-critical financial reports.
- Optimized SQL workflows through stored procedures, decreasing query execution time by 60%.
- Analyzed customer data to identify churn drivers, contributing to an 8% reduction in churn.
- Developed basic predictive models using Scikit-learn to forecast demand with 85% accuracy, helping to optimise inventory and reduce warehousing costs by $50,000 annually.
- Built custom KPIs, calculated columns, and DAX measures in BI tools, establishing a single source of truth for 15+ metrics and increasing user confidence in reporting by 30%.
Junior Machine Learning Engineer - Omdena
Sep 2022 - July 2024 | Remote
- Developed Decision Tree model to predict high-risk patients, achieving a final Recall of 83%.
- Performed SMOTE upsampling to reduce dataset imbalance, enhancing identification of minority class patients.
- Utilized K-means clustering to segment patients, distinguishing between demographics for targeted analysis.
- Operationalized model predictions by implementing Python-based training and ranking scripts.
Data Science Project Lead - Omdena Mumbai
Mar 2023 - May 2024 | Remote
- Led a team of 25+ members in the development of a predictive model for Air Quality Index (AQI) forecasting.
- Managed analysis and complex preprocessing of time-series data, achieving 90% efficiency increase in model performance.
- Implemented cloud-based data ingestion pipelines using AWS Lambda and S3.
- Developed a time-series forecasting model to predict AQI in Mumbai with over 80% accuracy.
Career Timeline
Research Associate
Multimodal EEG-fMRI signal processing & transformer fine-tuning at UofA.
Data Science Specialist
Enterprise data engineering and BI optimization at TOPS Technologies.
Bachelors in Data Science
Graduated from Jain University with 9.3/10 CGPA.
Omdena Project ML lead
Managed cloud infrastructure and time-series forecasting for AQI projects.
Featured Projects
Healthcare Lifestyle Analysis
Python, R, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn, EDA, Predictive Modeling
Clinical Behavior & Chronic Risk
- Correlated patient habits (diet, activity, substance use) with chronic disease risks.
- Conducted detailed EDA using Matplotlib and Seaborn to identify lifestyle health indicators.
- Developed predictive models in Jupyter to forecast health trends based on demographic data.
- Streamlined data cleaning and manipulation using Pandas and NumPy for large-scale clinical data.
Medical Healthcare LLM Fine-Tuning
NLP, LLMs, Fine-tuning, PEFT (LoRA), Hugging Face, Transformers, Python
NLP-Project Repository
- Fine-tuned an LLM on medical healthcare chat data for domain-specific dialogue.
- Applied PEFT (LoRA) techniques to optimize model weights with limited compute.
- Implemented data cleaning pipelines for unstructured medical chat datasets.
- Evaluated accuracy using ROUGE and BLEU metrics for healthcare intent.
Collaborative Book Recommender
Recommendation Systems, Matrix Factorization, Dimensionality Reduction, Python, Scikit-Learn
- Built a matrix factorization engine for user-item recommendations.
- Applied dimensionality reduction to handle large-scale rating data.
- Achieved high precision in predicting similar reading preferences.
Cloud Data Warehousing
Cloud DW, SQL, ETL, Data Engineering, Star Schema, Azure, BI Reporting
- Designed a RDBMS scalable Star Schema for Instacart store analytics.
- Developed automated ETL pipelines for multi-source data ingestion to develop the data warehouse.
- Integrated BI dashboards for real-time operational reporting.
Surgical Suture Analysis & Detection
Python, YOLO (Ultralytics), MLflow, Computer Vision, Data Augmentation, MLOps
Medical Imaging Pipeline
- Developed a computer vision pipeline for precise detection of surgical sutures using YOLO models.
- Boosted model performance from 50% to 70% mAP@50 through rigorous optimization.
- Implemented image data augmentation and enhanced annotation techniques to improve model generalization.
- Utilized MLflow for systematic model tracking, experiment monitoring, and performance auditing.
Cyberbullying Detection NLP
NLP, Text Classification, Word2vec, XGBoost, Python, Pandas, Scikit-Learn
- Processed unstructured text data using word2vec embeddings.
- Trained XGBoost classifier with optimized log-loss performance.
- Resolved class imbalance in toxicity datasets for robust detection.
Traffic Severity Prediction
Machine Learning, PCA, Feature Engineering, Classification, Python, Pandas, Scikit-Learn
- Applied Principal Component Analysis (PCA) for feature dimensionality reduction.
- Treated multi-source traffic data for missing values and outliers.
- Categorical encoding for spatial and temporal features.
Research Work
Classifying Sleep and Rest States using EEG-fMRI Fusion Transformer
Multimodal Deep Learning for Healthcare
- Developed a novel multimodal transformer fusion architecture for classifying brain states (sleep/rest) by integrating EEG and fMRI signals.
- Utilized fMRI's submillimeter-to-millimeter spatial resolution for precise localization of neural processes.
- Incorporated EEG's millisecond-level temporal resolution to capture rapid neural dynamics for high-fidelity state tracking.
- Design a strategy for multimodal training (fMRI + EEG) that enables low-cost, portable, and accessible inference using only EEG, maximizing utility and accessibility.
Fine-Tuning CLIP for Medical Image-Text Pair Classification and Report Generation
Multimodal Deep Learning for Healthcare
- Fine-tuned the CLIP (Contrastive Language–Image Pre-training) model on medical image-text pairs to enhance its capability in understanding and generating medical reports.
- Developed a robust data preprocessing pipeline to clean and structure unstructured medical image-text datasets, ensuring high-quality input for model training.
- Achieved comparable performance to state-of-the-art models in medical image classification and report generation tasks, demonstrating the effectiveness of the fine-tuned CLIP model in the healthcare domain.