Jenna Krall, PhD
Thursday, January 15, 2015
Course details
Instructor and TA details
Office hours with Anran
Prerequisites
What you need
Class format
These two portions of the class provide two different ways of learning R
Grading
Course objectives
If there is time
R can do almost anything
R project: http://www.r-project.org/
Comprehensive R Archive Network (CRAN): http://cran.r-project.org/
Ross Ihaka
Robert Gentleman
The R language
R is interactive
R and RStudio
R is easily customizable
SAS
In SAS,
proc means data = data1;
var variable1;
run;
In R,
mean(variable1)
STATA
MATLAB
SPSS
The console
Editor
R vs. RStudio
R version 3.1.2: codename “Pumpkin Helmet”
R version 3.0.2:
5 + 3
[1] 8
123/2 + (2 * 17.2)
[1] 95.9
Can store results using assignment operators
As an example:
nameofobject <- 5
print(nameofobject)
[1] 5
nameofobject2 = 5
print(nameofobject2)
[1] 5
Generally <-
is preferred
Can calculate result as object:
nameofsum <- 5 + 3
print(nameofsum)
[1] 8
nameofsum
[1] 8
Objects can be functions of other objects:
nameofnew <- nameofobject + nameofsum
nameofnew
[1] 13
Objects can be reassigned
nameofobject
nameofobject
[1] 5
nameofobject <- 1500
nameofobject
[1] 1500
class
function in R can tell us the class of an R objectclass(nameofobject)
[1] "numeric"
nameofstring <- "IntrotoR"
nameofstring
[1] "IntrotoR"
class(nameofstring)
[1] "character"
We can also create vectors by combining numbers or text. The c
function in R combines objects together.
vector1 <- c(6, 2, 3, 4)
vector1
[1] 6 2 3 4
class(vector1)
[1] "numeric"
length(vector1)
[1] 4
Vectors can also consist of strings
vectorstring <- c("R", "is", "awesome", "for", "Epidemiologists")
vectorstring
[1] "R" "is" "awesome" "for"
[5] "Epidemiologists"
class(vectorstring)
[1] "character"
length(vectorstring)
[1] 5
Mixing data classes
vectormix <- c("Intro", "to", "R", "version", 1)
vectormix
[1] "Intro" "to" "R" "version" "1"
class(vectormix)
[1] "character"
length(vectormix)
[1] 5
Factors in R
grades <- c("A", "B", "A", "A", "C", "F", "D", "B")
grades
[1] "A" "B" "A" "A" "C" "F" "D" "B"
class(grades)
[1] "character"
factorgrades <- factor(grades)
factorgrades
[1] A B A A C F D B
Levels: A B C D F
class(factorgrades)
[1] "factor"
Why does class matter?
Functions will perform differently for different classes
There are many other classes
Use brackets to select elements of a vector
vectorstring
[1] "R" "is" "awesome" "for"
[5] "Epidemiologists"
vectorstring[1]
[1] "R"
vectorstring[5]
[1] "Epidemiologists"
vector1
[1] 6 2 3 4
mean(vector1)
[1] 3.75
mean(vector1, trim = 0.25)
[1] 3.5
log(1)
[1] 0
length(vector1)
[1] 4
median(vector1)
[1] 3.5
summary(vector1)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 2.75 3.50 3.75 4.50 6.00
ls()
[1] "factorgrades" "grades" "nameofnew" "nameofobject"
[5] "nameofobject2" "nameofstring" "nameofsum" "vector1"
[9] "vectormix" "vectorstring"
Packages include tools to perform many different analyses
Some packages are preloaded
mean
, print
, length
), stats (median
, lm
)Can install and load additional packages from CRAN
Installing a package (may need to specify a CRAN mirror)
install.packages("ggplot2")
Loading a package
library(ggplot2)
Where R
getwd()
[1] "/Users/jennakrall/Dropbox/IntrotoREpi/data"
Change your working directory using setwd
setwd("/Users/jennakrall/Dropbox/")
setwd("~/")
getwd()
[1] "/Users/jennakrall"
R can read in data from many different sources
Types of data files
You must either specify the path to the data or set your working directory to the where the data are located
Example using .RData or .rda files
load("googleflu.RData")
ls()
[1] "flu"
head(flu)
Date United.States Georgia Atlanta HHSRegion4
1 2003-09-28 902 514 519 631
2 2003-10-05 952 532 484 652
3 2003-10-12 1092 557 497 735
4 2003-10-19 1209 608 563 822
5 2003-10-26 1249 745 845 797
6 2003-11-02 1374 767 771 850
class(flu)
[1] "data.frame"
Example using .csv file
flu <- read.csv("googleflu.csv", stringsAsFactors = FALSE)
head(flu)
Date United.States Georgia Atlanta HHSRegion4
1 2003-09-28 902 514 519 631
2 2003-10-05 952 532 484 652
3 2003-10-12 1092 557 497 735
4 2003-10-19 1209 608 563 822
5 2003-10-26 1249 745 845 797
6 2003-11-02 1374 767 771 850
class(flu)
[1] "data.frame"
Example using .xls or .xlsx file
library(XLConnect)
wkbook_flu <- loadWorkbook("googleflu.xlsx")
class(wkbook_flu)
[1] "workbook"
attr(,"package")
[1] "XLConnect"
flu <- readWorksheet(wkbook_flu, 1)
Example using .xls or .xlsx file
head(flu)
Date United.States Georgia Atlanta HHSRegion4
1 2003-09-28 902 514 519 631
2 2003-10-05 952 532 484 652
3 2003-10-12 1092 557 497 735
4 2003-10-19 1209 608 563 822
5 2003-10-26 1249 745 845 797
6 2003-11-02 1374 767 771 850
class(flu)
[1] "data.frame"
Example using SAS
1) Save SAS data as .csv and read into R
proc export data = googleflu
outfile = "googleflu.csv"
dbms = csv replace;
putnames = yes;
run;
read.csv
function in R flu <- read.csv("googleflu.csv", stringsAsFactors = FALSE)
2) Use SAS Xport Transport file
SAS xport files
* Set xport filepath ;
libname lib1 xport "H:\googleflu.xpt";
* Set the xport file ;
data lib1.flu;
* This is your original data;
set flu;
run;
2) Use SAS Xport Transport file
library(Hmisc)
flu <- sasxport.get("googleflu.xpt")
Processing SAS dataset FLU ..
head(flu)
date us georgia atlanta hhs
1 2003-09-28 902 514 519 631
2 2003-10-05 952 532 484 652
3 2003-10-12 1092 557 497 735
4 2003-10-19 1209 608 563 822
5 2003-10-26 1249 745 845 797
6 2003-11-02 1374 767 771 850
3) Read SAS data directly (only if SAS is installed)
read.ssd
function to read .sas7bdat
file directly into R
read.ssd
function in R creates xport file (using SAS) read.xport
.library(foreign)
lib1 <- "c:/"
flu <- read.ssd(lib1, "flu", sascmd = "filepath/to/where/SAS/is")
May need to tell R where to find SAS using sascmd argument.
Other functions to read in data include
# This is the mean of our numeric vector
mean(vector1)
[1] 3.75
Your closest collaborator is you six months ago but you don’t reply to email. – Erin Jonaitis (via Andrew Gelman)
newvector <- c(5, 3, 4, 4)
newvector <- c(28, 90, 10, 57, 66, 93, 29, 95, 19, 14, 96, 78, 61, 51, 1, 87,
60, 46, 43, 35, 17, 64, 2, 55, 54, 25, 92, 32, 42, 94, 97, 86, 77, 6, 13,
23, 20, 67, 30, 68, 12, 5, 24, 59, 33, 75, 26, 65, 88, 31, 47, 38, 53, 70,
27, 98, 16, 8, 37, 15, 11, 40, 85, 83, 76, 91, 81, 48, 80, 7, 36, 22, 89,
39, 4, 63, 21, 79, 99, 45, 56, 100, 44, 18, 3, 58, 73, 52, 62, 72, 69, 71,
74, 84, 82, 49, 34, 50, 9, 41)
Naming conventions
new_vector <- c(1, 2, 3, 4)
new_vector
[1] 1 2 3 4
c
or T
)Style guides:
?function
)?mean
help.search("mean")