Hi, I am Mutian He
Machine Learning & Data Science
Master’s student in Information Systems at Santa Clara University, with a bachelor’s degree from the University of Glasgow. Interested in machine learning and data analytics, with hands-on experience using Python and SQL to clean, analyze, and build end-to-end data workflows. Passionate about working with real-world data and developing data-driven insights.
Projects
Jenkins-as-a-Service Platform
Designed a high-level architecture for an internal Jenkins-as-a-Service CI/CD platform. Built workflows, user stories, and pipeline diagrams; proposed RBAC security model and collaborated in Agile to deliver technical documentation.
- Jenkins
- CI/CD
- DevOps
End-to-End Instacart Reorder Prediction System
An end-to-end machine learning pipeline for predicting user reorder behavior on the Instacart platform. The project focuses on data engineering and ML workflow design, including ETL, feature aggregation, temporal data splitting, model training, and inference. A Random Forest model was trained on user–product interaction data, with emphasis on preventing data leakage and building a reproducible, production-oriented pipeline.
- Python
- Pandas / Scikit-learn
- ETL Pipeline
- ML Engineering
Intelligent Job Description Analyzer
Developed an NLP-based system to transform unstructured job descriptions into structured insights, including required skills, education, and seniority level. The project focuses on text processing, modular extractor design, and end-to-end ML workflow, enabling real-time analysis of both raw job text and job posting URLs.
- Python
- NLP
- Flask API
Real-time Flight Delay Prediction System – ML Engineering
Developed an end-to-end machine learning system to predict flight delays using high-cardinality categorical features such as airline carriers and origin–destination pairs. The project focuses on production-oriented ML engineering, including feature processing, model training with CatBoost, and real-time inference through an interactive web interface.
- Python
- CatBoost
- ML Pipeline
- Streamlit
Skills
- Python
- MySQL
- Pandas
- NumPy
- Scikit-learn
- CatBoost
- ETL Pipelines
- EDA & Data Cleaning
- Machine Learning
- NLP
- Flask
- Streamlit
- AWS (EC2, S3, RDS)
- Git / GitHub
- Linux
- CI/CD
- Java