- Integer/Numeric variables: A number, stored by just using
`<-`

to assign the number to a variable. Ex:`x <- 4`

- Strings/Characters: A phrase or word, always surrounded by quotation marks. Ex:
`y <- "A variable"`

- Vectors: A vector is a set of either numerics or characters (never both). Created by surrounding the items of the vector with
`c( )`

and separating them with commas. Ex:`z <- c(1, 2, 3, 4, 5)`

- Dataframes: Similar to an excel spreadsheet, has columns of different variables. You can access different rows and columns in a variety of ways explained below. We will create dataframes using
`read.csv()`

(explained below)`$`

Used to access a particular column in a dataframe. Ex:`df$party_id`

`[ , ]`

Used to access a particular row and/or column in a dataframe. If you put a number before the comma it will give you that row, a number after the comma will give you that column. Ex:`df[1, ]`

(returns the first row),`df[, 2]`

(returns the second column),`df[1, 2]`

(returns the item in the first row, second column).- Note:
`[ ]`

can also be used for vectors but without a the comma as there is only one dimension. In addition you can put logical statements inside the brackets to subset particular rows or columns.

`setwd()`

Used to set the the ‘work directory’ for R. By setting this R knows where to look for files when you call something like`read.csv()`

. Ex:`setwd("/users/kevinreuning/downloads")`

or`setwd("C:/users/reunink/downloads")`

`read.csv()`

Used to read in a csv file. You need to place a file name in quotes to read it in. The function will output a dataframe that you need to save. Ex:`df <- read.csv('file_name.csv')`

`mean()`

,`median()`

,`sd()`

Used to calculate the mean, median, or standard deviation of a vector. Ex:`median(df$income)`

`table()`

Creates a frequency table out of one or two vectors. Ex:`table(df$pid, df$education)`

(would create a frequency table of party identification and education).`prop.table()`

Turns a frequency table into a table of proportions. The first argument will be just the table. It will default to do a proportion out of all cells, if you want something else you need to use`margin=1`

or`margin=2`

.`margin=1`

will create proportions across rows (so each row will sum to 1), while`margin=2`

will create proportions across columns. Ex:`prop.table(table(df$pid, df$education), marin=2)`

`chisq.test()`

Calculates a chi-squared test. The two variables should come first and second. Ex:`chisq.test(df$pid, df$education)`

`t.test()`

Calculates a t-test on two variables. The easiest way to use it is with the formula interface where you write your interval variable, then the ~ (tilde) and then the variable that divided your groups. You can simplify this by using the`data=`

argument if both variables are in a dataframe. In addition`alternative=`

is used to set the alternative hypothesis. It can be: ‘two.sided’, ‘greater’, or ‘less’ Ex:`t.test(per_cap_income~dem_control, data=df, alternative='two.sided')`

`cor()`

Calculates the correlation between two vectors. You should provide the two vectors as the first two arguments. In addition you can select what type of correlation using`method=`

(look at the help file for the options`?cor()`

). The one different part of`cor()`

is that it handles missing values differently than other functions. To have it ignore all missing files set`use='complete.obs'`

. EX:`cor(df$per_cap_income, df$violent_crime_rate, use='complete.obs')`

`cor.test()`

Same as`cor()`

but provides hypothesis testing as well.

- I keep getting
`NA`

as a result.- You probably have missing variables in your vector (they are listed as
`NA`

). Most functions will take`na.rm=T`

as an argument to fix this. By setting`na.rm`

to`True`

you are telling the R function to ignore missing values.

- You probably have missing variables in your vector (they are listed as
- How do I find the number of observations?
- R does not necessarily make this easy to do. One of the ways I’ve found to be most general is by counting up the non-missing observations. This is possible using
`sum(!is.na(x))`

.`is.na()`

is a function that returns if a value is missing or not.`!`

inverts True to False and False to True and then`sum()`

adds up everything. The reason sum works is that we`TRUE`

is equivalent to`1`

and`FALSE`

to`0`

- R does not necessarily make this easy to do. One of the ways I’ve found to be most general is by counting up the non-missing observations. This is possible using
- R is just showing a
`+`

and nothing is happening.- This happens when R thinks that something more is coming. This is often because there is an open parentheses or quotation mark. If you hit the ESC key it will cancel that command and you will see
`>`

again.

- This happens when R thinks that something more is coming. This is often because there is an open parentheses or quotation mark. If you hit the ESC key it will cancel that command and you will see