Sanket Bhujbal

Sanket Bhujbal
Data Scientist | ML/AI Engineer 
Hi, I'm Sanket👋 
I'm a Data Scientist & Machine Learning Engineer specializing in taking advanced AI applications from concept to scalable, production-ready systems.
About
Projects
Contact
Turning Data Complexity into Measurable Business Value
I turn complex data into measurable business advantage, specializing in translating intricate datasets into actionable insights and robust, scalable solutions. My expertise spans comprehensive Data Science methodologies, cutting-edge Machine Learning paradigms, and advanced AI application, delivering full-cycle product development and deployment solutions from conception to operational impact.
As a proud alumnus of Virginia Tech, I possess a strong foundation in quantitative analysis, computational methods, and data-driven problem-solving. My professional journey has been marked by a consistent drive to leverage advanced analytical techniques to solve real-world business challenges, optimize operations, and foster innovation across diverse sectors.
My core competencies and areas of focus include:
Data Science & Analytics: Predictive modeling, statistical analysis, A/B testing, advanced data visualization, and large-scale data processing to uncover hidden patterns and drive informed strategic decisions.
Machine Learning Engineering: Designing, developing, and deploying robust ML models, including Deep Learning (CNNs, RNNs, Transformers), Natural Language Processing (NLP), Computer Vision, and traditional supervised/unsupervised algorithms.
AI Application Development: Integrating intelligent systems into existing infrastructures, MLOps practices, leveraging generative AI for novel solutions, and building scalable AI-powered applications that enhance efficiency and create new capabilities.
I am adept at collaborating with cross-functional teams, stakeholders, and product managers to define requirements, design innovative solutions, and ensure that AI and ML initiatives are meticulously aligned with overarching strategic business objectives. My approach emphasizes ethical AI practices, model interpretability, and a steadfast commitment to continuous learning in the rapidly evolving landscape of artificial intelligence and data science.
My goal is to empower organizations to harness the full potential of their data, transforming raw information into strategic assets that not only drive efficiency and uncover new opportunities but also fundamentally enhance decision-making processes and foster sustainable growth.
My Value Proposition: The Full-Cycle Engineer
I bridge the gap between complex research and tangible business outcomes. My expertise covers the full spectrum of the modern data and AI lifecycle: from raw data extraction and ML modeling to robust MLOps deployment.
Core Expertise
Education
Let's Build Something Scalable
I am actively seeking a challenging Data Scientist or ML/AI Engineer role where I can utilize this full spectrum of skills—from the cloud to the code—to drive innovation and deliver substantial, scalable business results.

Ready to discuss your team's next project?

Contact

LinkedIn

GitHub
Featured Project Showcase
The following projects demonstrate the application of my full skillset, focusing on real-world constraints, architecture design, and measurable impact.
PDFConverse: RAG Chatbot
The Challenge
Analysts waste countless hours manually searching through large PDF documents, leading to inefficiency and frustration in information retrieval workflows.
The Solution
Multi-RAG Chatbot delivering verifiable, contextual answers using Langchain, FAISS, and OpenAI technologies.
Impact
80%
Time Reduction
Faster information retrieval
Live Demo
GitHub
PDFConverse: Architecture & MLOps
PDF Processing
Documents chunked and embedded for efficient vector storage
FAISS Vector Store
Optimized retrieval using semantic search capabilities
RAG Query
Context-aware LLM generates accurate responses
Output Validation
Verifiable answers with source citations
Evaluation & Continuous Improvement
Iterative prompt tuning optimizes contextual Q&A without costly LLM fine-tuning. Thumbs up/down feedback curates labeled data for continuous refinement.
Manages LLM context window limits efficiently
Optimizes API cost dependencies
Maintains faithfulness score above 90%
90%
Faithfulness Score
PaperPalooza: Research Platform
The Problem
Researchers struggle with fragmented tools across discovery, citation management, and writing workflows, leading to inefficiency and frustration.
The Innovation
All-in-one AI platform integrating OpenAI, arXiv, and Crossref APIs for seamless research assistance.
30% Time Savings
Centralized workflow efficiency
Key Features
PDF Summarization
Context-aware AI Chatbot
Citation management
Grammar checker
PaperPalooza: Cloud Architecture
API Integration
OpenAI, arXiv, Crossref streaming data
PostgreSQL Layer
Unified data persistence
Docker Deployment
Rancher orchestration
Production Ready
99%+ uptime achieved
SDLC Maturity
Full-stack system demonstrating enterprise-grade DevOps pipeline expertise with robust deployment practices.
Continuous Improvement
Usage logging informs product roadmap, ensuring development focuses on highest-value features for researchers.
ThreatLens: Cyber Dashboard
Security Challenge
Security analysts overwhelmed by noisy, unprioritized RSS feeds, leading to delayed threat response and missed critical vulnerabilities.
Intelligent Solution
LLM-powered aggregation with automated summarization and objective risk scoring using OpenAI and Streamlit.
50%
Triage Time Reduction
Automated risk scoring
Live Demo
GitHub
ThreatLens: Intelligence Pipeline
1
RSS Stream Ingestion
Real-time threat feed collection
2
Micro-Batch Processing
Efficient data aggregation
3
OpenAI Scoring
Prompt-engineered risk assessment
4
Dashboard Display
Prioritized threat visualization
Validation & Accuracy
Offline regression testing validates scoring prompts against labeled threat history. Structured JSON output ensures consistency and prevents model hallucination in critical data points.
Data freshness: articles displayed within 30 minutes
Risk score accuracy exceeds 85% vs. expert opinion
Continuous prompt refinement based on feedback
85%
Risk Score Accuracy
Ducky: AI Coding Assistant
Developer Pain Point
Developers face 40% longer turnaround times due to manual debugging, context loss, and fragmented coding workflows.
Smart Solution
ChatGPT assistant designed for multi-turn coding with reliable context retention using GPT API, prompt-tuning, and state management.
40% Faster
Development task completion time
Context Retention
Conversation history management
Ducky: Prompt Engineering Excellence
Conversation History
Advanced state management ensures high fidelity in code modifications, debugging, and refactoring tasks.
Chain-of-Thought
Leveraged prompt-tuning and CoT reasoning to maximize debugging and code generation accuracy.
A/B Testing
Validated success through time-to-completion comparison against standard search methods.
95%
Code Correctness Rate
High accuracy in generated solutions
ScoutSmart: Moneyball for European Football
An end-to-end scouting analytics engine built with Python, SQL, and Power BI.
Built a full-stack BI solution to solve a real-world problem: identifying undervalued talent. By processing 10+ years of match data, this dashboard helps scouts look past 'lucky' goals to find players with sustainable, high-value creative output. 
Live Dashboard
GitHub
The Problem 
Traditional scouting relies too heavily on "counting stats" (Goals & Assists). This leads to expensive mistakes: buying players on a lucky hot streak or missing creative geniuses whose strikers fail to score. 
The Solution 
A dynamic dashboard that normalizes performance data for over 2,000 players across the Top 5 European Leagues. It uses advanced metrics like xG (Expected Goals) and xGBuildup to reveal a player's true contribution to the game, independent of variance.
AWS S3 Storage
Scalable data staging and ingestion
Databricks/Spark
Complex xT calculations and feature engineering
Snowflake
Optimized data warehouse for fast scouting queries
Power BI
Interactive visualization dashboards
Python
Custom web scraper & Data Cleaning
Statistics
Regression analysis & per-90 normalization
Supply Chain Commander: Automated Inventory AI
An end-to-end decision engine that predicts future demand (7-days out) and automatically triggers restocking orders to prevent revenue loss.
How It Works
Dynamic Forecasting: Uses Facebook Prophet to detect complex seasonality (e.g., high demand for winter jackets in Jan, low in July).
Closed-Loop Architecture: Unlike static notebooks, this system writes forecasts back to a PostgreSQL database for real-time retrieval.
Actionable API: A FastAPI microservice calculates reorder quantities and serves alerts to a Streamlit dashboard.
Tech Stack
🛠️ Python • Docker • FastAPI • PostgreSQL • Streamlit
GitHub
Back to Top
Contact
©️ 2026 Sanket Bhujbal 
Made with