This page provides an overview on how to read in and join external data with the class bioacoustic dataset.
Please start by logging into rstudio.pomona.edu.
This table below shows the hypotheses for each group:
Group | Hypotheses | Team |
---|---|---|
1 | ACI and species richness will be greater near Phake lake due to habitat differences (birds aquiring water) | Sy’Vanna, Mari |
2 | We assume that active roads disrupt natural habitats and therefore there will be higher ACI levels further away from Foothill Blvd. | Willa, Danie |
3 | Monitors located in areas with less foliage will have lower ACI levels | Ella, Nico |
4 | ACI is higher on cooler days than hotter days. | Clara, Anna |
5 | The ACI will be higher in more bioddiverse native habitat areas as opposed to the invasive grassland where there is less activity from various species. | Fiona, George |
6 | The SR is lower when Air Quality Index (AQI) is higher. | Jeremy, Sophia |
7 | October and November will have greater ACI, SR, and abundance than March and April | Karl, Elisa |
8 | Areas more recently affected by fires will record a higher species richness. | Evan, Ruth |
Below, we will start by loading packages that we’ll need. Remember: if you get an error for loading a package in your workspace, it might be because it isn’t installed. In that case, just run this command once this semester:
install.packages("package") # replace "package" with the name of the package that you need
### Load packages
library("ggplot2") # plotting functions
library("dplyr") # data wrangling functions
library("readr") # reading in tables, including ones online
library("mosaic") # shuffle (permute) our data
library("lubridate") # package to handle datetime objects
Next, we will pull in our data and inspect it.
### Load in dataset
soundDF <- readr::read_tsv("https://github.com/EA30POM/site/raw/main/data/bioacousticAY22-24.tsv") # read in spreadsheet from its URL and store in soundDF
### Look at the first few rows
soundDF
## # A tibble: 202,433 × 8
## unit date time ACI SR DayNight Month season
## <chr> <date> <chr> <dbl> <dbl> <chr> <chr> <chr>
## 1 CBio4 2023-03-23 18H 0M 0S 155. 3 Night March Spring
## 2 CBio4 2023-03-23 18H 1M 0S 154. 3 Night March Spring
## 3 CBio4 2023-03-23 18H 2M 0S 151. 7 Night March Spring
## 4 CBio4 2023-03-23 18H 3M 0S 155. 3 Night March Spring
## 5 CBio4 2023-03-23 18H 4M 0S 152. 2 Night March Spring
## 6 CBio4 2023-03-23 18H 5M 0S 152. 4 Night March Spring
## 7 CBio4 2023-03-23 18H 6M 0S 159. 3 Night March Spring
## 8 CBio4 2023-03-23 18H 7M 0S 152. 2 Night March Spring
## 9 CBio4 2023-03-23 18H 8M 0S 152. 3 Night March Spring
## 10 CBio4 2023-03-23 18H 9M 0S 153. 7 Night March Spring
## # ℹ 202,423 more rows
### Load AQI and/or temperature datasets
aqi_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_aqi_data.csv")
temp_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_average_temperature.csv")
### Look at the first few rows in a spreadsheet viewer
soundDF %>% head() %>% View()
### Look at the first few rows of the external data
head(aqi_df)
head(temp_df)
Now, we are going to ensure that the date columns are correctly formatted for all 3 datasets.
### Clean up the data using lubridate for date conversion
# Bioacoustic datasset
soundDF <- soundDF %>%
mutate(date = lubridate::ymd(date))
# AQI dataset
aqi_df <- aqi_df %>%
mutate(date = lubridate::ymd(date))
# Temp dataset
temp_df <- temp_df %>%
mutate(date = lubridate::ymd(date))
Below, I’m going to show you how you can merge the data based on the
dates in the data. It is up to you to modify this code
to join on your external dataset with the class soundDF
bioacoustic dataset!
### Merge AQI and temperature data by date
claremont_data <- temp_df %>%
left_join(aqi_df, by = "date") # Join on the 'date' column
### Check the merged dataset
head(claremont_data)
View(claremont_data) # view in spreadsheet format
### Load packages
library("ggplot2") # plotting functions
library("dplyr") # data wrangling functions
library("readr") # reading in tables, including ones online
library("mosaic") # shuffle (permute) our data
library("lubridate") # package to handle datetime objects
### Load in dataset
soundDF <- readr::read_tsv("https://github.com/EA30POM/site/raw/main/data/bioacousticAY22-24.tsv") # read in spreadsheet from its URL and store in soundDF
### Look at the first few rows
soundDF
### Load AQI and/or temperature datasets
aqi_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_aqi_data.csv")
temp_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_average_temperature.csv")
### Look at the first few rows in a spreadsheet viewer
soundDF %>% head() %>% View()
### Look at the first few rows of the external data
head(aqi_df)
head(temp_df)
### Clean up the data using lubridate for date conversion
# Bioacoustic datasset
soundDF <- soundDF %>%
mutate(date = lubridate::ymd(date))
# AQI dataset
aqi_df <- aqi_df %>%
mutate(date = lubridate::ymd(date))
# Temp dataset
temp_df <- temp_df %>%
mutate(date = lubridate::ymd(date))
### Merge AQI and temperature data by date
claremont_data <- temp_df %>%
left_join(aqi_df, by = "date") # Join on the 'date' column
### Check the merged dataset
head(claremont_data)
View(claremont_data) # view in spreadsheet format