
About Me
This page is a bit about my background, skills, and interests.
Here's where I am presently, and where I'm headed
- 3-Years OPT (STEM qualified), NO need of sponsorship
- After 3 years: I can bring you more profits than a lot people who don't need sponsorship
- Do you wanna contribute to my future life? Hire Me!
- Graduated with GPA 3.96/4.0
- Focus on: Data Science, Data Engineering, ML/AI, Cloud Computing, Database, Algorithms
- Certified: Tableau Data Analyst, AWS, Applied AI
Time Range
Sept. 2023 - May 2025Projects
Please view Projects PageRelevant Coursework
- Data Analytics Engineering
- Deterministic Operations Research
- Data Mining
- Data Management and Database Design
- Visualization for Analytics
- Cloud Computing
- Data Structure
- Algorithms
- Statistic
- Unified data systems via ETL pipelines and warehouse modeling.
- Built Tableau dashboards and predictive models to drive business insights.
Key Achievements:
- Streamlined data quality monitoring and reporting, cutting manual workload by 25% and improving reliability
- Audited and optimized database schemas by resolving normalization issues and improving data design
- Designed and deployed ETL pipelines to unify disparate data sources into a centralized warehouse
- Built interactive Tableau dashboards with drill-down views to visualize key performance indicators
- Developed ML models to predict client outcomes, using feature engineering and scikit-learn for performance gains
- Automated routine data validation processes, boosting consistency and operational efficiency
Technical Stack:
- Data Engineering: ETL Pipelines, Apache Airflow, dbt, Data Warehousing, Data Modeling
- Cloud & Infrastructure: AWS (S3, EC2, Lambda, Redshift), Docker, CI/CD, Infrastructure as Code
- Analytics & BI: Tableau, Power BI, Python (Pandas, NumPy), SQL, A/B Testing
- ML & Automation: Scikit-learn, TensorFlow, Feature Engineering, Automated Data Validation
- Database: MySQL, PostgreSQL, Database Normalization, Query Optimization
- Built ETL pipeline for satellite data (TB-scale)
- Trained CV models (87% accuracy) for rural condition detection
- Contributed to ML project recognized in national competition
Key Achievements:
- Built computer vision models to detect rural environmental conditions, achieving 87% classification accuracy
- Designed and executed A/B tests to evaluate platform features and analyze user engagement
- Performed statistical analysis on usage data to uncover drivers of user retention and feature adoption
- Contributed to a nationally recognized ML project for environmental monitoring and policy insights
Technical Stack:
- Big Data: Apache Spark (PySpark), Kafka, Hadoop (MapReduce), Distributed Computing
- Cloud & Infrastructure: AWS (S3, EC2, Redshift), Docker, CI/CD
- Data Engineering: ETL Pipelines, Data Lake Architecture, Data Modeling, Data Quality
- Computer Vision & ML: OpenCV, TensorFlow, PyTorch, MLOps
- Analytics: Python (Pandas, Seaborn, Matplotlib), SQL, A/B Testing, Statistical Analysis
Achievements:
- Top 10% for all semesters
- President of Youth Leader Club
- 3 Entrepreneurship Competition Top Awards
- Published research on fraud detection
- Maintained A's in all mathematics courses
Core Competencies:
- Data Structures & Algorithms
- Database Management Systems
- Machine Learning & AI
- Software Engineering
- Computer Architecture & Networks
- Built fraud detection model (99.07% accuracy, 1M+ txns)
- Engineered features & validated via out-of-time testing
- Published in MLBDBI 2020 (DOI: 10.1109/MLBDBI51377.2020.00025)
Key Achievements:
- Cleaned and transformed 1M+ financial transactions, resolving missing values and outliers
- Engineered features using statistical methods (SelectKBest) and ROC-AUCโdriven selection
- Applied out-of-time validation to simulate real-world fraud model deployment
- Optimized Different Decision Tree models (GBDT, LightGBM, XGBoost) with Bayesian hyperparameter tuning, reaching 99.07% accuracy
- Published research methodology and results in IEEE conference proceedings (DOI: 10.1109/MLBDBI51377.2020.00025)
Updated Project Details:
Plz view Projects PageSkills & Expertise
Certifications & Certificates
- Tableau Certified Data Analyst
- Apache Airflow Foundation
- Data Analytics on AWS
- Applied AI Certificate
- AI Literacy Certificate
- Graduate Leadership Certificate
Programming Languages
- Python (Advanced)
- SQL (MySQL, PostgreSQL, NoSQL)
- Java & JSP
- JavaScript
- Shell Scripting
- R
- C/C++
- MATLAB
Data Engineering
- ETL/ELT Pipeline Development
- Data Warehousing & Data Lake
- Data Modeling (Star Schema)
- Apache Spark (PySpark)
- Apache Kafka & Flink
- Hadoop Ecosystem
- Apache Airflow
- dbt
- Snowflake
- Stream & Batch Processing
- Data Governance & Quality
Cloud & DevOps
- AWS (S3, EC2, Lambda, Redshift, Glue, Kinesis)
- Azure DevOps
- Docker & Kubernetes
- CI/CD Pipelines
- Infrastructure as Code (Terraform, Ansible)
- Git & GitHub Actions
- Linux/Unix
Backend & API Development
- FastAPI
- Flask
- Django & Django REST Framework
- GraphQL
- RESTful API Design
- Authentication (JWT, OAuth)
- Microservices Architecture
- SQLAlchemy & ORM
Databases & Big Data
- MySQL & PostgreSQL
- MongoDB & Redis
- Snowflake & BigQuery
- Database Design & Normalization
- Query Optimization & Indexing
- JDBC & SQLAlchemy
Machine Learning & AI
- Scikit-learn & TensorFlow
- PyTorch
- Feature Engineering & PCA
- NLP & Computer Vision
- Predictive Modeling
- MLOps & Model Deployment
- Time Series Analysis
- Clustering (K-Means, KNN)
Generative AI
- OpenAI API
- Google Gemini API
- LangChain
- Hugging Face Transformers
- RAG (Retrieval-Augmented Generation)
- Vector Databases (Pinecone, Chroma)
- Prompt Engineering
Data Visualization & BI
- Tableau (Certified)
- Power BI
- Streamlit
- Plotly & Matplotlib
- Looker & Domo
- Excel (Advanced, Power Query)
- Google Sheets
Analytics & Statistics
- Statistical Analysis & Hypothesis Testing
- A/B Testing
- Regression & Classification
- Data Mining
- Sentiment & Text Analysis
- Operations Research
Specialized Domains
- Geospatial & Remote Sensing Data
- Financial Fraud Detection
- Healthcare Analytics
- Supply Chain & Retail Analytics
- Content & Social Media Analytics
Business Tools
- MS Office Suite
- Jupyter Notebook & Anaconda
- UML & System Design
- Technical Documentation
- Project Management
What Else
Beyond the world of data and code๏ผ
๐๏ธ I enjoy exploring beautiful trails, finding tranquility and perspective on weekend hiking adventures.
๐ When indoors, I like reading books that expand my horizonsโfrom data science literature to thought-provoking fiction.
๐ฎ I'm also an enthusiastic League of Legends player, where strategic thinking and teamwork offer a different kind of problem-solving challenge.
๐ฐ In my kitchen, you'll find me experimenting with baking techniques, applying the same precision and creativity that drives my professional work to create perfect pastries and breads.
My constant companion on life's adventures is:
๐ Wangwang Shen, my beloved dog!
โ๏ธ He made the incredible journey with me from China to the USA.
๐ฆ Wangwang's curiosity and joy remind me to appreciate the simple pleasures and find wonder in our surroundings, no matter how busy life gets.
๐ We really enjoy the US's friendly pet-environment.
To pet him and enjoy his fluffy tail, HIRE ME!
Contact
Feel free to reach out to me at yanna.cshen@gmail.com or connect with me on LinkedIn and GitHub. You can also access to the above links and download my resume from side bar.
Iโm here to help transform your data into valuable insights and profit!