We are going to encounter some methods for interacting with, visualizing, and analyzing earth sciences data in this activity. At each table, one student will pick one of the datasets and run the code associated with that dataset. What do you see? Based on the course materials, how does this constitute evidence for global climate change?
Please start by logging into rstudio.pomona.edu.
Note! If you run the command
library(package)
and see an error in red text stating
Error in library(package) : there is no package called ‘package’
,
don’t worry! That simply means that the package is not installed in your
R
environment. All you need to do is just run
install.packages(package)
once. (Of
course, replace package
with the name of the package in
question, such as swirl
).
First, we will all load these packages - make sure you run this code irrespective of whichever variable you select.
library(dplyr)
library(openxlsx)
library(ggplot2)
sheet_url <- "https://github.com/EA30POM/Fall2021/blob/main/data/ClimateChangeSigns.xlsx?raw=true" # location of the Excel spreadsheet
keelingCO2 <- openxlsx::read.xlsx(sheet_url,sheet="Keeling") # everyone needs to download the Keeling CO2 data because we will be combining that with the other variables too.
Visualization can help you get a feel for your data, and can be a
very useful aid in exploring data. We will use the ggplot2
package to create plots in R
.
ggplot2
creates plots based on different layers. The
three necessary components are:
While there are additional layering components that one can include, we will confine our attention to these three layers. For more information, please consult the free, online ggplot2 Book.
Scatterplots are most useful for visualizing two continuous, numeric
variables. In defining a plot using ggplot2
, we specify the
data source (in the ggplot
function), and the mapping of
the variables (aes
, short for aesthetic, within
ggplot
). Subsequently, we specify the type of
geom
(geometry) we want in the plot.
Below, we will follow a convention where we assign an object
(p
) to store the plot. We specify what layers we want to
add to the object p
by using the +
sign.
At each table, one student will pick one of the datasets and run the code associated with that dataset. What do you see? Based on the course materials, how does this constitute evidence for global climate change?
The data source for this variable is NASA’s Global Climate Change Vital Signs of the Planet Program.
p <- ggplot(keelingCO2, aes(x=year,y=meanCO2ppm)) # initiate the plot
p <- p + geom_point() # add scatterplot points
p <- p + geom_line() # add a trend line
p <- p + labs(x="Year",y="CO2 (ppm)") # change axis labels
p <- p + theme_classic() # change plot style
p
CO2mod <- lm(meanCO2ppm~year, data=keelingCO2) # we are running a "linear regression" statistical model. You can think of this like finding the line that fits our data the best.
summary(CO2mod) # what do you see in the Coefficients: table for year?
The data source is here.
seaIce <- openxlsx::read.xlsx(sheet_url,"SeaIce")
seaIceCO2 <- left_join(seaIce, keelingCO2, by=c("year"="year")) # we are joining up the sea ice extent data with global CO2 level data, matching on each year
p <- ggplot(seaIceCO2, aes(x=meanCO2ppm, y=extent))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(x="CO2 (ppm)",y="Minimum sea ice extent (million km2)")
p <- p + theme_classic()
p # what do we see?
seaIceMod <- lm(extent ~ meanCO2ppm, data=seaIceCO2) # we are running a "linear regression" statistical model. You can think of this like finding the line that fits our data the best.
summary(seaIceMod) # what do you see in the Coefficients: table for mean CO2 (ppm)?
The data source is here.
globTemp <- openxlsx::read.xlsx(sheet_url,"GlobTemp")
globTempCO2 <- left_join(globTemp, keelingCO2, by=c("Year"="year")) # we are joining up global temperature data with global CO2 level data, matching on each year
p <- ggplot(globTempCO2, aes(x=meanCO2ppm, y=TempObs))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(x="CO2 (ppm)",y="Global temperature anomaly (*C)")
p <- p + theme_classic()
p # what do we see?
tempMod <- lm(TempObs ~ meanCO2ppm, data=globTempCO2) # we are running a "linear regression" statistical model. You can think of this like finding the line that fits our data the best.
summary(tempMod) # what do you see in the Coefficients: table for mean CO2 (ppm)?
The original data source is here but the data I used came from the EPA. The ocean’s heat content is expressed in zettajoules or \(10^{22}\) joules. In contrast, total global energy consumption is around 580 million terajoules, or 5.8 zettajoules.
oceanHeat <- openxlsx::read.xlsx(sheet_url,"OceanHeat")
oceanHeatCO2 <- left_join(oceanHeat, keelingCO2, by=c("Year"="year")) # we are joining up global ocean heat content data with global CO2 level data, matching on each year
p <- ggplot(oceanHeatCO2, aes(x=meanCO2ppm, y=NOAASea))
p <- p + geom_point()
p <- p + geom_line()
p <- p + labs(x="CO2 (ppm)",y="Ocean heat content (ZJ)")
p <- p + theme_classic()
p # what do we see?
oceanMod <- lm(NOAASea ~ meanCO2ppm, data=oceanHeatCO2) # we are running a "linear regression" statistical model. You can think of this like finding the line that fits our data the best.
summary(oceanMod) # what do you see in the Coefficients: table for mean CO2 (ppm)?
The introverse
package makes super-useful help pages that can demystify the
tidyverse
commands that we’ll be using in this class.
Time permitting, consider the following:
lm
coefficient outputs, what do those numbers
represent? What do they tell us about the impact on some response
variable for changes in \(CO_2\)?
Year
as the explanatory/predictive/x-axis variable?