Today we are going to keep using a subset of country data from The Quality of Governance Institute.
read_csv()
and if you need more help check out the first day of slides.Last week we discussed how to make plots in R:
geom_smooth()
can be used to add a trend line to our data.
geom_smooth()
geom_smooth()
is very powerful, but it is easy to have no idea what you are doing with it as defining trend
is really hard.
You change the type of trend it makes using the method=
argument
Warning
If you cannot explain what is happening to make that line then you should probably avoid using it.
If we have multiple types of observations, we can plot smooths for each one by setting color=
or group=
to that variable.
df_new <- df %>% mutate(
fert_groups = cut(wdi_fertility, breaks=c(0, 1.9, 2.3, Inf),
labels=c("Below", "Replacement", "Above"))
) %>%
group_by(fert_groups) %>% drop_na(fert_groups)
p <- ggplot(df_new, aes(x=bl_asymf, y=mad_gdppc)) +
geom_point(size=3) +
labs(y="GDP per Capita", x="Average Schooling") +
theme_minimal() + scale_y_log10()
p + geom_smooth(aes(group=fert_groups))
Note
You can use aes()
in a geom_*()
if you want to specify something for just that part of the plot.
df_new <- df %>% mutate(
fert_groups = cut(wdi_fertility, breaks=c(0, 1.9, 2.3, Inf),
labels=c("Below", "Replacement", "Above"))
) %>%
group_by(fert_groups) %>% drop_na(fert_groups)
p <- ggplot(df_new, aes(x=bl_asymf, y=mad_gdppc, color=fert_groups)) +
geom_point(size=3) +
labs(y="GDP per Capita", x="Average Schooling") +
theme_minimal() + scale_y_log10() +
scale_color_brewer("Fertility Rate", type="qual", palette = 2) +
geom_smooth(method="lm")
p
Sometimes it is easier to understand the groups if you create individual plots for each group. We call these plots facet and make them with facet_wrap()
To identify what groups to make you put ~variable
.
Pick two interval variables, and an additional variable that is categorical and binary. Make a scatter plot of the two interval variables and add a smooth, and facet on a variable as well.
Don’t forget labels.
df %>%
mutate(type = cut(fh_polity2, breaks=c(0, 3, 7, 10),
labels=c("Autocracy", "Anocracy", "Democracy"))) %>%
drop_na(type) %>%
ggplot(aes(y=wdi_afp, x=mad_gdppc)) +
geom_smooth(method="lm", color='black') +
geom_point(color='orangered3') +
facet_wrap(~type) +
scale_x_log10(labels=scales::label_dollar()) +
theme_minimal() + theme(strip.text=element_text(size=20)) +
labs(y="Percent of Labor\nForce in Military",
x="GDP per Capita\n(Log scale)")
Mutating our variables
Dropping variables with missing values
Setting my x
and y
variables
Creating a linear regression line, in black.
Scatter plot with nice colors
Facet on type
Change the x-axis to be logged and make the axis labels nicer.
Setting the theme up
Labels!
References/tutorials: https://ggplot2.tidyverse.org/
Book that discusses principles of data viz and ggplot2: https://socviz.co/
Less is usually more
People have assumptions about how plots work, don’t break them unless you make it very clear that you are.
Some assumptions people make:
General principles: Follow assumptions
Make it clear to the viewer what they are supposed to take away from it.
Co-Authors: Anne Whitesell (Miami University); Lee A. Hannah (Wright State University)
Always good to look at trends in data:
p1 <- ggplot(tmp, aes(x=lubridate, y=N, fill=Party,
color=Party, shape=Party)) +
geom_point(alpha=.2) + geom_smooth(size=1,method = loess,
method.args=list("span" =.5)) +
theme_minimal() +
scale_y_continuous(labels = scales::label_comma()) +
scale_color_manual("Party",
values=col_pal) +
scale_fill_manual(values=col_pal) +
# geom_vline(xintercept = lubridate::as_date("2020-11-03"), linetype="dashed") +
labs(y="Number of Posts", x="")
With Quarto we can do our analysis, write-up the results, and present plots all in a single document.
This has a few benefits:
Note: This entire presentation is written in Quarto and available here.
In RStudio go to File \(\rightarrow\) New File \(\rightarrow\) Quarto Document. You can change the Title and then click Create
There are two ways to edit Quarto files:
When you want to see your final document click the Render button with the blue arrow.
Click it now (it will ask you to save the file, save it where your country data is located).
When you Render a document Quarto does a few things:
Switch to the Source version of the document (the button near the top).
You should see something like:
---
title: "Country Report"
format: html
editor: visual
---
## Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.
## Running Code
When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
```{r}
1 + 1
```
---
title: "Country Report"
format: html
editor: visual
---
## Quarto
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.
## Running Code
When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
```{r}
1 + 1
```
General Parameters for our document.
Normal every day writing
Headers
R code that we want to run in our document.
What do we want to do?
Whenever we want to execute code in a Quarto document we surround it with:
Our code needs to execute without having anything saved, so we need to include both opening the data and making the plot.
If the Quarto document add the code to open the file, remember to load the necessary package as well (if you saved everything in the same folder you don’t need to worry about the working directory).
Call the data
object by itself afterwards as a check, and then render the whole thing.
We can modify how the code is executed by adding options at the top of the chunk using #|
echo: true
or echo: false
output: true
or output: false
```{r}
#| echo: false
#| output: false
library(tidyverse)
data <- read_csv("country_data.csv")
data
```
Render the document again to see what happens now.
Now we want to make our plot, we can do that in a new code chunk below the old one.
Create a new code chunk, and copy and paste the plot you made before to it:
You can again, either hide the code using #| echo: false
but don’t change the output.
If you want to break up your writeup you can use headers:
You can also access variables from code in your write-up. So lets say you calculate the mean in a code chunk:
In your writeup you can access that by calling markdown `r mean_gdp
:
Above the plot, I want you to write a description of the data. You can get that info here. Make it clear what each variable represents.
Part of making it clear is incorporating what the mean of your variables are. Calculate the mean of your two interval variables in the R block where you read in the data, and then add that info to your description.
You can make a lot of things in Quarto:
format: docx
will produce a docx fileformat: pptx
will produce a powerpoint with each section as a slideformat: pdf
will produce a pdf document.format: revealjs
will produce an html presentation like thisSome of these will require installation of additional packages.
In front of you you likely have a bunch of different R Scripts open right now.
There is a lot of information online but the best way to learn is to find a project and work through it.
What do you do if you run into problems?
Social Media Data
Social media data can be very interesting: