1 Overview

This page provides an overview on how to read in and join external data with the class bioacoustic dataset.

Please start by logging into rstudio.pomona.edu.

This table below shows the hypotheses for each group:

Group Hypotheses Team
1 ACI and species richness will be greater near Phake lake due to habitat differences (birds aquiring water) Sy’Vanna, Mari
2 We assume that active roads disrupt natural habitats and therefore there will be higher ACI levels further away from Foothill Blvd. Willa, Danie
3 Monitors located in areas with less foliage will have lower ACI levels Ella, Nico
4 ACI is higher on cooler days than hotter days. Clara, Anna
5 The ACI will be higher in more bioddiverse native habitat areas as opposed to the invasive grassland where there is less activity from various species. Fiona, George
6 The SR is lower when Air Quality Index (AQI) is higher. Jeremy, Sophia
7 October and November will have greater ACI, SR, and abundance than March and April Karl, Elisa
8 Areas more recently affected by fires will record a higher species richness. Evan, Ruth

2 Preparing data

2.1 1: Load packages

Below, we will start by loading packages that we’ll need. Remember: if you get an error for loading a package in your workspace, it might be because it isn’t installed. In that case, just run this command once this semester:

install.packages("package") # replace "package" with the name of the package that you need
### Load packages
library("ggplot2") # plotting functions
library("dplyr") # data wrangling functions
library("readr") # reading in tables, including ones online
library("mosaic") # shuffle (permute) our data
library("lubridate") # package to handle datetime objects

2.2 2: Read data

Next, we will pull in our data and inspect it.

### Load in dataset
soundDF <- readr::read_tsv("https://github.com/EA30POM/site/raw/main/data/bioacousticAY22-24.tsv") # read in spreadsheet from its URL and store in soundDF

### Look at the first few rows
soundDF
## # A tibble: 202,433 × 8
##    unit  date       time        ACI    SR DayNight Month season
##    <chr> <date>     <chr>     <dbl> <dbl> <chr>    <chr> <chr> 
##  1 CBio4 2023-03-23 18H 0M 0S  155.     3 Night    March Spring
##  2 CBio4 2023-03-23 18H 1M 0S  154.     3 Night    March Spring
##  3 CBio4 2023-03-23 18H 2M 0S  151.     7 Night    March Spring
##  4 CBio4 2023-03-23 18H 3M 0S  155.     3 Night    March Spring
##  5 CBio4 2023-03-23 18H 4M 0S  152.     2 Night    March Spring
##  6 CBio4 2023-03-23 18H 5M 0S  152.     4 Night    March Spring
##  7 CBio4 2023-03-23 18H 6M 0S  159.     3 Night    March Spring
##  8 CBio4 2023-03-23 18H 7M 0S  152.     2 Night    March Spring
##  9 CBio4 2023-03-23 18H 8M 0S  152.     3 Night    March Spring
## 10 CBio4 2023-03-23 18H 9M 0S  153.     7 Night    March Spring
## # ℹ 202,423 more rows
### Load AQI and/or temperature datasets
aqi_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_aqi_data.csv")
temp_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_average_temperature.csv")
### Look at the first few rows in a spreadsheet viewer
soundDF %>% head() %>% View()
### Look at the first few rows of the external data
head(aqi_df)
head(temp_df)

2.3 3: Ensure that the date columns are in the correct format.

Now, we are going to ensure that the date columns are correctly formatted for all 3 datasets.

### Clean up the data using lubridate for date conversion
  # Bioacoustic datasset
soundDF <- soundDF %>%
  mutate(date = lubridate::ymd(date))

  # AQI dataset
aqi_df <- aqi_df %>%
  mutate(date = lubridate::ymd(date)) 

  # Temp dataset
temp_df <- temp_df %>%
  mutate(date = lubridate::ymd(date))

2.4 4: Joining on the data

Below, I’m going to show you how you can merge the data based on the dates in the data. It is up to you to modify this code to join on your external dataset with the class soundDF bioacoustic dataset!

### Merge AQI and temperature data by date
claremont_data <- temp_df %>%
  left_join(aqi_df, by = "date") # Join on the 'date' column

### Check the merged dataset
head(claremont_data)
View(claremont_data) # view in spreadsheet format

2.5 5: Code in its entirety

### Load packages
library("ggplot2") # plotting functions
library("dplyr") # data wrangling functions
library("readr") # reading in tables, including ones online
library("mosaic") # shuffle (permute) our data
library("lubridate") # package to handle datetime objects

### Load in dataset
soundDF <- readr::read_tsv("https://github.com/EA30POM/site/raw/main/data/bioacousticAY22-24.tsv") # read in spreadsheet from its URL and store in soundDF

### Look at the first few rows
soundDF

### Load AQI and/or temperature datasets
aqi_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_aqi_data.csv")
temp_df <- readr::read_csv("https://github.com/EA30POM/site/raw/main/data/claremont_average_temperature.csv")

### Look at the first few rows in a spreadsheet viewer
soundDF %>% head() %>% View()

### Look at the first few rows of the external data
head(aqi_df)
head(temp_df)

### Clean up the data using lubridate for date conversion
  # Bioacoustic datasset
soundDF <- soundDF %>%
  mutate(date = lubridate::ymd(date))

  # AQI dataset
aqi_df <- aqi_df %>%
  mutate(date = lubridate::ymd(date)) 

  # Temp dataset
temp_df <- temp_df %>%
  mutate(date = lubridate::ymd(date))

### Merge AQI and temperature data by date
claremont_data <- temp_df %>%
  left_join(aqi_df, by = "date") # Join on the 'date' column

### Check the merged dataset
head(claremont_data)
View(claremont_data) # view in spreadsheet format