Yanna Shen

About Me

This page is a bit about my background, skills, and interests.

๐Ÿ’ก Click cards for details

Here's where I am presently, and where I'm headed

Future
Tech Related Roles
Open to Sponsorship Open to Relocation
  • 3-Years OPT (STEM qualified), NO need of sponsorship
  • After 3 years: I can bring you more profits than a lot people who don't need sponsorship
  • Do you wanna contribute to my future life? Hire Me!
๐Ÿ“ˆ
May 2025
Master of Science in Data Analytics
  • Graduated with GPA 3.96/4.0
  • Focus on: Data Science, Data Engineering, ML/AI, Cloud Computing, Database, Algorithms
  • Certified: Tableau Data Analyst, AWS, Applied AI

Time Range

Sept. 2023 - May 2025

Projects

Please view Projects Page

Relevant Coursework

  • Data Analytics Engineering
  • Deterministic Operations Research
  • Data Mining
  • Data Management and Database Design
  • Visualization for Analytics
  • Cloud Computing
  • Data Structure
  • Algorithms
  • Statistic
๐Ÿ‘ฉ๐Ÿปโ€๐ŸŽ“
June - Sept. 2024
Data Engineer Intern (Full Stack)
Uplift Northwest Seattle, WA
  • Unified data systems via ETL pipelines and warehouse modeling.
  • Built Tableau dashboards and predictive models to drive business insights.

Key Achievements:

  • Streamlined data quality monitoring and reporting, cutting manual workload by 25% and improving reliability
  • Audited and optimized database schemas by resolving normalization issues and improving data design
  • Designed and deployed ETL pipelines to unify disparate data sources into a centralized warehouse
  • Built interactive Tableau dashboards with drill-down views to visualize key performance indicators
  • Developed ML models to predict client outcomes, using feature engineering and scikit-learn for performance gains
  • Automated routine data validation processes, boosting consistency and operational efficiency

Technical Stack:

  • Data Engineering: ETL Pipelines, Apache Airflow, dbt, Data Warehousing, Data Modeling
  • Cloud & Infrastructure: AWS (S3, EC2, Lambda, Redshift), Docker, CI/CD, Infrastructure as Code
  • Analytics & BI: Tableau, Power BI, Python (Pandas, NumPy), SQL, A/B Testing
  • ML & Automation: Scikit-learn, TensorFlow, Feature Engineering, Automated Data Validation
  • Database: MySQL, PostgreSQL, Database Normalization, Query Optimization
๐Ÿš€
Sept. 2020 - May 2022
Data Analyst
  • Built ETL pipeline for satellite data (TB-scale)
  • Trained CV models (87% accuracy) for rural condition detection
  • Contributed to ML project recognized in national competition

Key Achievements:

  • Built computer vision models to detect rural environmental conditions, achieving 87% classification accuracy
  • Designed and executed A/B tests to evaluate platform features and analyze user engagement
  • Performed statistical analysis on usage data to uncover drivers of user retention and feature adoption
  • Contributed to a nationally recognized ML project for environmental monitoring and policy insights

Technical Stack:

  • Big Data: Apache Spark (PySpark), Kafka, Hadoop (MapReduce), Distributed Computing
  • Cloud & Infrastructure: AWS (S3, EC2, Redshift), Docker, CI/CD
  • Data Engineering: ETL Pipelines, Data Lake Architecture, Data Modeling, Data Quality
  • Computer Vision & ML: OpenCV, TensorFlow, PyTorch, MLOps
  • Analytics: Python (Pandas, Seaborn, Matplotlib), SQL, A/B Testing, Statistical Analysis
๐Ÿ“Š
Aug. 2021
Bachelor of Engineering in Computer Science and Technology
Graduated with Honors.

Achievements:

  • Top 10% for all semesters
  • President of Youth Leader Club
  • 3 Entrepreneurship Competition Top Awards
  • Published research on fraud detection
  • Maintained A's in all mathematics courses

Core Competencies:

  • Data Structures & Algorithms
  • Database Management Systems
  • Machine Learning & AI
  • Software Engineering
  • Computer Architecture & Networks
๐ŸŽ“
Feb. - Aug. 2020
Data Science Researcher
Prof. Stephen Coggeshall's Research Group Online

Key Achievements:

  • Cleaned and transformed 1M+ financial transactions, resolving missing values and outliers
  • Engineered features using statistical methods (SelectKBest) and ROC-AUCโ€“driven selection
  • Applied out-of-time validation to simulate real-world fraud model deployment
  • Optimized Different Decision Tree models (GBDT, LightGBM, XGBoost) with Bayesian hyperparameter tuning, reaching 99.07% accuracy
  • Published research methodology and results in IEEE conference proceedings (DOI: 10.1109/MLBDBI51377.2020.00025)

Updated Project Details:

Plz view Projects Page
๐Ÿค–
START

Skills & Expertise

Certifications & Certificates

  • Tableau Certified Data Analyst
  • Apache Airflow Foundation
  • Data Analytics on AWS
  • Applied AI Certificate
  • AI Literacy Certificate
  • Graduate Leadership Certificate

Programming Languages

  • Python (Advanced)
  • SQL (MySQL, PostgreSQL, NoSQL)
  • Java & JSP
  • JavaScript
  • Shell Scripting
  • R
  • C/C++
  • MATLAB

Data Engineering

  • ETL/ELT Pipeline Development
  • Data Warehousing & Data Lake
  • Data Modeling (Star Schema)
  • Apache Spark (PySpark)
  • Apache Kafka & Flink
  • Hadoop Ecosystem
  • Apache Airflow
  • dbt
  • Snowflake
  • Stream & Batch Processing
  • Data Governance & Quality

Cloud & DevOps

  • AWS (S3, EC2, Lambda, Redshift, Glue, Kinesis)
  • Azure DevOps
  • Docker & Kubernetes
  • CI/CD Pipelines
  • Infrastructure as Code (Terraform, Ansible)
  • Git & GitHub Actions
  • Linux/Unix

Backend & API Development

  • FastAPI
  • Flask
  • Django & Django REST Framework
  • GraphQL
  • RESTful API Design
  • Authentication (JWT, OAuth)
  • Microservices Architecture
  • SQLAlchemy & ORM

Databases & Big Data

  • MySQL & PostgreSQL
  • MongoDB & Redis
  • Snowflake & BigQuery
  • Database Design & Normalization
  • Query Optimization & Indexing
  • JDBC & SQLAlchemy

Machine Learning & AI

  • Scikit-learn & TensorFlow
  • PyTorch
  • Feature Engineering & PCA
  • NLP & Computer Vision
  • Predictive Modeling
  • MLOps & Model Deployment
  • Time Series Analysis
  • Clustering (K-Means, KNN)

Generative AI

  • OpenAI API
  • Google Gemini API
  • LangChain
  • Hugging Face Transformers
  • RAG (Retrieval-Augmented Generation)
  • Vector Databases (Pinecone, Chroma)
  • Prompt Engineering

Data Visualization & BI

  • Tableau (Certified)
  • Power BI
  • Streamlit
  • Plotly & Matplotlib
  • Looker & Domo
  • Excel (Advanced, Power Query)
  • Google Sheets

Analytics & Statistics

  • Statistical Analysis & Hypothesis Testing
  • A/B Testing
  • Regression & Classification
  • Data Mining
  • Sentiment & Text Analysis
  • Operations Research

Specialized Domains

  • Geospatial & Remote Sensing Data
  • Financial Fraud Detection
  • Healthcare Analytics
  • Supply Chain & Retail Analytics
  • Content & Social Media Analytics

Business Tools

  • MS Office Suite
  • Jupyter Notebook & Anaconda
  • UML & System Design
  • Technical Documentation
  • Project Management

What Else

Beyond the world of data and code๏ผš

๐Ÿ”๏ธ I enjoy exploring beautiful trails, finding tranquility and perspective on weekend hiking adventures.

๐Ÿ“š When indoors, I like reading books that expand my horizonsโ€”from data science literature to thought-provoking fiction.

๐ŸŽฎ I'm also an enthusiastic League of Legends player, where strategic thinking and teamwork offer a different kind of problem-solving challenge.

๐Ÿฐ In my kitchen, you'll find me experimenting with baking techniques, applying the same precision and creativity that drives my professional work to create perfect pastries and breads.


My constant companion on life's adventures is:

๐Ÿ• Wangwang Shen, my beloved dog!

โœˆ๏ธ He made the incredible journey with me from China to the USA.

๐Ÿฆ‹ Wangwang's curiosity and joy remind me to appreciate the simple pleasures and find wonder in our surroundings, no matter how busy life gets.

๐ŸŒˆ We really enjoy the US's friendly pet-environment.

To pet him and enjoy his fluffy tail, HIRE ME!

Wangwang Shen

Contact

Feel free to reach out to me at yanna.cshen@gmail.com or connect with me on LinkedIn and GitHub. You can also access to the above links and download my resume from side bar.

Iโ€™m here to help transform your data into valuable insights and profit!


ยฉ 2025. All rights reserved.

Powered by Hydejack v7.5.2