Summary
Overview
Work History
Education
Skills
Timeline
background-images

YongLin He

Auckland,Auckland Central

Summary

Results-Oriented Data Analyst with extensive experience in Python, SQL, R, Matlab, Matplotlib. PowerBI and Tableau, skilled in data visualization, predictive modelling, and database optimization. With a proven track record in Agile environments and a passion for transforming data into actionable insights, I am seeking new opportunities to drive data-driven decision-making and create impactful solutions for organizational success.

Overview

3
3
years of professional experience

Work History

Data Analyst Intern

Huazhong University of Science and Technology
06.2024 - 12.2024
  • Building the database using PostgreSQL and PostGIS, following the Darwin Core standard, with MVCC mechanisms ensuring data consistency and optimized storage structures reducing redundancy while improving query performance.
  • Implemented geographic coordinate conversion for plant specimens in Wuhan, transforming textual descriptions into WGS84 coordinates, and validated elevation data using Digital Elevation Models (DEM) to enhance geographic accuracy.
  • Utilized ST_Intersects function to associate specimen data with Wuhan’s ecological reserves, wetlands, and nature reserves, supporting conservation efforts and scientific research.
  • Optimized complex queries using WITH clauses and indexing, enabling efficient searches based on species, time, and geographic location, supporting historical analysis of plant distributions in Wuhan.
  • Developed habitat suitability models based on elevation, slope, and aspect, integrating environmental factors to assess plant growth suitability in Wuhan, aiding ecological conservation planning.
  • Designed a field expedition management system, facilitating task allocation and route planning for research teams in Wuhan, with Dijkstra’s shortest path algorithm optimizing fieldwork efficiency.
  • Integrated hotel and hospital location services in Wuhan to ensure safety and convenience for field research teams, improving the sustainability of scientific investigations.
  • Incorporated ecological reserve, transportation, and climate data of Wuhan, utilizing QGIS and PostgreSQL for data processing and spatial analysis, ensuring the integrity and usability of plant specimen data.

Machine Learning Intern

University Of Melbourne
02.2024 - 06.2024
  • Conducted EDA on imbalanced datasets (18,000+ samples), identifying key distribution patterns.
  • Applied TF-IDF vectorization and hybrid resampling (SMOTE & RUS) to improve model robustness.
  • Built and optimized ensemble models (MultinomialNB, SGD, LGBM, CatBoost) with soft voting, enhancing classification accuracy.
  • Fine-tuned hyperparameters using Optuna, achieving a 4.6% accuracy boost over baseline models.
  • Successfully improved text classification accuracy to 82.7% on Kaggle, enhancing AI-generated text detection.

Data Analyst Intern

International Business Machines Corporation
06.2022 - 01.2023
  • Built an end-to-end data pipeline to predict rental prices for residential properties and apartments in Wuhan, China over the next three years.
  • Processed and integrated multi-source datasets (10K+ records from 2019-2022) using APIs, web scraping (BeautifulSoup, Selenium), and geospatial data (OpenStreetMap, PTV public transport, future school locations, shopping centers, and demographic trends).
  • Conducted extensive data preprocessing and feature engineering using Pandas, NumPy, Scikit-learn, and Geopandas, ensuring data quality and enhancing predictive accuracy.
  • Designed and implemented multiple machine learning models (XGBoost, Decision Tree, Random Forest, Linear Regression, MLP) using Scikit-learn, XGBoost, and TensorFlow/PyTorch, optimizing hyperparameters with GridSearchCV.
  • Created interactive data visualizations and dashboards using Matplotlib, Seaborn, Plotly, and Tableau to identify key market trends and insights.
  • Implemented geospatial and urban planning analysis to assess rental price determinants with QGIS.

Education

Master of Science - Information Technology

University of Melbourne
Melbourne
12-2024

Bachelor of Science - Data Science

University of Melbourne
Melbourne
12-2022

Skills

  • Data Processing: SQL, Python, Excel, Snowflake, R
  • Data Visualization: PowerBI, Tableau, Python(Matplotlib)
  • Data Modelling: Machine Learning, Statistical Analysis, A/B test, Hypothesis testing
  • Database ManagementL SQL Server, MySQL, Postgre SQL, Oracle, MongoDB

Timeline

Data Analyst Intern

Huazhong University of Science and Technology
06.2024 - 12.2024

Machine Learning Intern

University Of Melbourne
02.2024 - 06.2024

Data Analyst Intern

International Business Machines Corporation
06.2022 - 01.2023

Master of Science - Information Technology

University of Melbourne

Bachelor of Science - Data Science

University of Melbourne
YongLin He