Chapter 4 Missing values
We would like to explore some meaningful patterns based on the distribution of missing values ie.NAs from our weather dataset. (Note: given a fact that our Bike datasets have no missing values, the weather dataset would be our sole data.table to analyze NAs.)
Data table: dat.dt.RData Varibles of our concern: ‘min_temp’, ‘max_temp’, ‘max_steady_wind’, ‘total_daily_precipitation’, ‘description’
4.1 Heatmap for the overall pattern
Firstly, we would like to present an overall missing-value distribution of continuous days from 2019-01-01 to 2020-12-31 (total 731 days) and the plot shows:
- X : 6 most relevent variables Y: missing patterns (each day from 2019-01-01 to 2020-12-31)
- Vertically, variable ‘max_steady_wind’ has most missing values.
- Horizontally, missing values concentrate at the end of the year 2019 and year 2020
4.2 Visna() for the overall pattern
Visna() is another method to explore the potential patterns of missing-value distribution of continuous days (731 days)
- X : 6 most relevent variables Y: missing patterns (each day from 2019-01-01 to 2020-12-31)
- Vertically, variable ‘max_steady_wind’ has the most missing values as it ranks number-one at the bottom column-sort.
- Horizontally, for missing values patterns, pattern ‘max_steady_wind’ took place most frequently up to 71 times, which means there are 71 rows of data with ‘max_steady_wind’ value missing as NA. The next pattern is ‘min_temp, max_temp, max_steady_wind, total_daily_precipitation, description’ with frequency 5 times, which means 5 rows of data miss values for all variables. The complete pattern-rank goes from top to bottom as the right row-sort shows.
## max_steady_wind min_temp max_temp total_daily_precipitation description
## 1 0 0 0 0 0
## 3 1 0 0 0 0
## 5 1 1 1 1 1
## 4 0 1 1 0 0
## 2 0 0 0 1 0
## attr(,"mar")
## attr(,"mar")$rm
## [,1]
## [1,] 652
## [2,] 71
## [3,] 5
## [4,] 2
## [5,] 1
##
## attr(,"mar")$cm
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1039672 0.009575923 0.009575923 0.008207934 0.006839945
##
## attr(,"orders")
## attr(,"orders")[[1]]
## [1] 1 3 5 4 2
##
## attr(,"orders")[[2]]
## [1] 3 1 2 4 5