Chapter 3 Data transformation
The Capital Bikeshare data was collected by Zhuoyue Hao via “download.file” package. According to our research targets, we transform the data and redefined the variables as (731 samples, each sample indicate a day’s record) :
date, year, month, weekday (day of the week), season (12-15 to 3-15 is defined as winter, 3-15 to 6-15 is defined as spring, 6-15 to 9-15 is defined as summer, 9-15 to 12-15 is defined as fall), registered (num of use by registered users by day), causul (num of use by causul users by day), total (num of use by registered and causul users by day)
The weather data was collected by Yeqi Zhang via web crawler implementation, where the weather website was first downloaded by httr and GET function, and the table data was then extracted from the website by html_table function from rvest package.
Then the two data sources are combined via “merge” package (by “date” variable).
Note that the original weather data collected from freemeteo.com had 10 variables, the other variables were “maximum wind gust”, “snow depth”, “pressure”, “icon (of weather type)”. We abandon those variables for they are harder to be sensed by human beings (such as “pressure”) or less relevant to the questions we are studying, such as “maximum wind gust” and “snow depth”. The former one just indicated the weather condition of a particular moment, not the overall weather circumstance for the whole day, and the later one was just available in some particular days during the winter, while in most days of the year, there is no snow and of course, no “snow depth”. Besides, a few days of weather data from 2019 to 2020 were missing, for example, there was no data for 2020-12-30; therefore, the authors filled in the data of those missing days and marked their values as NAs.
Please click the following link to check out the data transformation for the bike data: https://github.com/ZhuoyueHao/BikeSharing/blob/main/Data%20Transformation/Bike%20Data.Rmd
Please click the following link to check out the data transformation for the weather data: https://github.com/ZhuoyueHao/BikeSharing/blob/main/Data%20Transformation/Weather%20Data.Rmd