Projects
FinSight AI
Cloud & AI EngineeringEngineered an end-to-end document intelligence pipeline parsing SEC 10-K filings (HTML + PDF) via BeautifulSoup and AWS Textract, extracting and structuring Risk Factor disclosures into standardized JSON. Integrated AWS Bedrock (Nova Lite) to auto-classify risks into 13 categories and generate executive-level summaries, enabling AI-powered year-over-year and cross-company risk comparison.
Multimodal Image Search Engine
Multimodal MLBuilt a CLIP-based multimodal retrieval system for text-to-image search by projecting images and text queries into a shared embedding space. Designed the full pipeline from data preprocessing and batch embedding generation to FAISS ANN indexing, and evaluated retrieval quality with Recall@1/5/10 and Median Rank.
End-to-End Instacart Reorder Prediction System
ML EngineeringAn end-to-end machine learning pipeline for predicting user reorder behavior on the Instacart platform. Focuses on data engineering and ML workflow design, including ETL, feature aggregation, temporal data splitting, model training, and inference. Trained a Random Forest model on user–product interaction data with emphasis on preventing data leakage and building a reproducible, production-oriented pipeline.
Jenkins as a Service (JaaS)
DevOps Platform EngineeringDesigned an enterprise Jenkins-as-a-Service platform to replace fragmented CI/CD tooling across teams. Standardized pipeline templates, centralized RBAC and audit logging, and planned rollback- oriented release workflows on VMware to improve delivery reliability, security compliance, and operating efficiency.
Bank Marketing Subscription Predictor
Machine LearningBuilt an end-to-end ML classification pipeline to predict customer subscription likelihood for bank term deposits. Addressed 88% class imbalance using SMOTE, tuned decision threshold to optimize Recall/Precision tradeoff, and applied SHAP values to deliver interpretable, business-actionable insights. Achieved ROC-AUC of 0.80 with Random Forest.
Cloud-Based Real-Time Stock Data Pipeline
Data EngineeringBuilt a cloud-based real-time stock data streaming pipeline using Apache Kafka on AWS EC2. Implemented Python producers and consumers to simulate live market data ingestion. Persisted streaming data to Amazon S3 and integrated AWS Glue Data Catalog and Amazon Athena to enable scalable, serverless SQL analytics.
Spotify Podcast Popularity Analysis
Data AnalysisAnalyzed 228,000+ Spotify podcast episodes to identify factors driving Top 10 rankings. Performed EDA across 22 countries, engineered predictive features from audio/video and genre attributes, and trained a Random Forest classifier evaluated by accuracy and AUC.
Real-Time Flight Delay Prediction System
ML EngineeringDeveloped an end-to-end machine learning system to predict flight delays using high-cardinality categorical features such as airline carriers and origin–destination pairs. Focuses on production-oriented ML engineering, including feature processing, model training with CatBoost, and real-time inference through an interactive web interface.
Skills
- Python
- MySQL
- Pandas
- NumPy
- Scikit-learn
- PyTorch
- Transformers
- CLIP
- FAISS
- CatBoost
- ETL Pipelines
- EDA & Data Cleaning
- Machine Learning
- NLP
- Flask
- Streamlit
- AWS (EC2, S3, RDS)
- Git / GitHub
- Linux
- CI/CD
- Java
- Airflow
- SQLite
- MLOps