top of page

Data Analytics Practice

The following projects involve collecting and processing public data for personal research purposes and visualizing that data. Python and R were used for data collection and processing, while Tableau was used for data visualization. Detailed information about each project is provided within the respective project descriptions. 

01

Climate Data in the United States

A Brief Summary:

The above data visualization processed NOAA's climate data to show the unit time precipitation, maximum temperature, minimum temperature, and climate anomalies. 

A "Map Layer" filter has been added to enable navigating data set. "Year" and "Month" filters and the color scale legend were also added to make data navigation easier.

 

Hovering over the map reveals statistics for each state for the selected year and month. For example, "Temperature Anomaly" map layer for September 2022 would show that Montana had a temperatrue 3.72 degrees Celsius higher than the average. 

Note that, according to the Paris Agreement, the critical threshold is to keep the temperature rise below 1.5 degrees Celsius above pre-industrial levels. As of 2023, the average global temperature has already approached 1.5 degrees Celsius above pre-industrial levels (https://www.nasa.gov/news-release/nasa-analysis-confirms-2023-as-warmest-year-on-record/).

 

Clicking on a state on the map will display a trend graph below, showing the monthly trends for each year.

A brief explanation of each Map Layer is as follows. Detailed information about data processing is described below.

  • Maximum Temperature Index: The highest recorded temperatures during the period (average of the top 5%).

  • Minimum Temperature Index: The lowest recorded temperatures during the period (average of the bottom 5%).

  • Temperature Anomaly: The degree to which the period's temperature deviates from the average.

  • Extreme Precipitation Index: The highest recorded precipitation during the period (average of the top 5%).

Details:

 

Data Source:

National Centers For Environmental Information (NCEI) at National Oceanic and Atmospheric Administration (NOAA)

Data Processing (.nc files):

  • Preprocessing:

    • Loaded the files with netCDF4 package. 

    • Selected data from 1995 to 2023 and structured it. 

    • Used the geopandas and shapely packages to get U.S. states from latitude and longitude information.

    • Created a data frame of 164,298,456 observations (no data available for Alaska and Hawaii).

    • Interpolated missing values based on the same date and the U.S. states. 

    • Analyzed 164,293,932 observations after dropping missing values. 

  • Index Creation​

    • For each state in a given month, selected the top five percent of precipitation and maximum temperature (and the lowest five percent of minimum temperature). 

    • Calculated the average precipitation and temperature above the 5% threshold (below the 5% threshold for the minimum temperature data set).

    • This index indicates: 

      • The average amount of top five percent of precipitation in a given month for each state.

      • The average temperature of the top five percent maximum temperature in a given month for each state. 

      • The average temperature of the lowest five percent minimum temperature in a given month for each state.

Data Processing (.txt files):

  • Preprocessing:

    • Loaded the files with BeautifulSoup package. 

    • Selected data from 1995 to 2022 and structured it (no data available for 2023). 

    • Created a data frame of 296,013 observations.

    • Data mapping and interpolation for missing values were the same as above.  

    • Analyzed 15,110 observations after dropping missing values. 

  • Index Creation​

    • Anomaly data indicates how much the current temperature deviates from the average temperature. A positive index means the current temperature is higher than the average, while a negative index means lower. 

    • The index was constructed based on the average temperature anomalies for a given month for each state. 

Python code:​

  • Available upon request.  ​

02

Homelessness in the United States

Data Source:

The U.S. Department of Housing and Urban Development (HUD): https://www.hudexchange.info/resource/3031/pit-and-hic-data-since-2007/

  • 2007 - 2023 PIT Estimates by State

  • 2007 - 2023 HIC by State

Data Processing:

  • Homeless Data:

    • Selected 51 states per year from 2007 to 2023 (867 rows). 

    • Selected 39 columns from PIT and 6 columns from HIC.

    •  ​Integrated the data sets based on Year and State columns to attain 867 X 43 data (excluding the two index columns).

  • Census Data:

    • Collected census data from 2009 to 2022 for total population and racial population per state (51 states).

    • Data for 2023 were generated utilizing linear interpolation to attain 765 X 7 data.

  • Computation:​

    • Data points including  homeless ratio, unsheltered homeless ratio, basic and extra shelter capacities, shares of veterans, young adults, and youth population to the total homeless population, etc. are calculated. 

  • Python code:​

    • Available upon request. ​

Data Visualization:

  • Added 'year' filter. 

  • Hovering over the map reveals the statistics for each state for the selected year. 

  • Clicking the sates reveals the homeless trends over time for the computed data points. 

  • Linear trend lines show the significance of the trends over time.  

03

Storm Events in the United States

Data Source:

National Centers For Environmental Information (NCEI) at National Oceanic and Atmospheric Administration (NOAA) - https://www.ncdc.noaa.gov/stormevents/ftp.jsp

Data Processing:

  • There were 1,658,549 observations from January 1995 to December 2023. ​

  • Event types were reclassified into eight categories. 

  • A pivot table was created with 'year-month' and 'state' as indices to show the totals. 

  • The dollar value of the property and crop damages was adjusted based on the Consumer Price Index (CPI). 

  • The final dataset consisted of 19,808 rows. 

Data Visualization:

  • Added 'year-month' as a filter. 

  • Hovering over the map reveals the statistics for each state during the selected time frame. 

  • The color represents the total number of storm events. 

  • The circles visualize the damage amounts. 

©2017 BY BYUNG WOOK (BK) KIM. PROUDLY CREATED WITH WIX.COM

bottom of page