3 Coding in RStudio

This chapter will introduce the basics for coding in R studio, including variables, performing calculations, vectors, named vectors, indexing/subsetting vectors, dataframes, colnames/rownames, subsetting dataframes, filtering by conditionals and if-else statements. Use your R Studio Cloud account to follow along with the examples. The best way to learn how to code is with hands-on experience!

Leaving Certificate Syllabus

This chapter is complimentary to:

Leaving Certificate Computer Science Section 2: Data

Variables

The first concept we will cover is variables. Variables are placeholders that store pieces of information. This information can take many forms – a number, a vector, a matrix, a function – the list goes on.

To assign a variable in R, you can use the <- notation or = notation.

In the code block below, we create a character variable by taking the string "Hello World" and assigning it to the variable greeting.

greeting <- "Hello World"

To print the contents of the variable greeting, you can use the print() function.

print(greeting)

Notice how the variable has been stored in our environment (the upper right box in RStudio) as a value.

As previously mentioned, variables are placeholders and as such, can be overwritten and modified.

Change the contents of the greeting variable to hold the string "Hello user" and print the contents. Notice how the contents have changed.

greeting <- "Hello user"

print(greeting)

Note the environment variable has been updated to reflect this change.

Calculations

R is a statistical computing software and at its core, an oversized calculator.

R performs addition, subtraction, multiplication, division, exponentiation and modulo with + - * / ^ %%.

1+4

10-1

12*3

1/6

2^6

3%%9

Returns the results:

It is common practice to store the results of any calculation in a variable so that you can use the result later:

x <- 2+4

print(x)

Data Types

Data types help R interpret our code inputs.

Anything surrounded by double quotes is interpreted as a character string.
Integers and floats are interpreted as numerics on which we can perform mathematical operations on.
TRUE / FALSE statements are known as Booleans.

The box below shows examples of each data type being assigned to a variable. Note, everything after the # is interpreted as a comment on your code block.

Data Types

my_age <- 28 # Numeric variable
my_name <- "Nicholas" # Character variable
is_datascientist <- TRUE # logical variable

Vectors

Vectors are a collection of the same data type. Be careful not to mix data types in a vector!

To initialise a vector, we use the c() function – which stands for concatenate.

Below we will create two vectors:

racing_number <- c(33,44,11,4,3)

driver_names <- c("Verstappen", "Hamilton", "Perez", "Norris", "Riccardo")

Named Vectors

We can use the driver_names vector variable to assign names to the racing_number vector using the names() function. Rename the racing_number variable as drivers to avoid confusion!

drivers <- racing_number

print(drivers)

names(racing_number) <- driver_names

Returns:

Manipulating Vectors

Let’s update our previous vectors to include two new drivers and their racing numbers:

# Add two new numbers to the pre-existing vector

racing_number <- c(racing_number, 16, 24)

# Add two new names to the existing names vector

driver_names <- c(driver_names, "Leclerc", "Zhou")

# Update the names of the racing_number vector

names(racing_number) <- driver_names

# Assign the racing_number vector to drivers

drivers<- racing_number

# Inspect the output

print(drivers)

Returns:

Indexing

Before demonstrating how to delete items from a vector, we need to cover vector indexing. Indexing allows us to access specific items in a vector.

In the example below, we will access the first, last and 2nd to 4th drivers in our vector:

drivers[1]

drivers[7]

drivers[2:4]

Returns:

Note:

Instead of drivers[7] we could use drivers(length(drivers)) to access the last element in the vector. This saves you from having to count the items manually and is programmatically robust to future changes to the vector.

To delete an item from the vector, we place a minus in front of the corresponding index we want to drop.

Drop Lewis Hamilton from our drivers vector. Don’t forget to assign the operation to the drivers variable if you want to save the changes.

drivers[-2]

Returns:

Lists

Lists can be used to store multiple vectors in a single data structure. We can name the vectors in the list, adding another element to this data structure.

We will create a named list attributing four driver pairings to their respective teams:

F1_teams <- list(Scuderia_Ferrari=c("Charles Leclerc", "Carlos Sainz"),
                 Scuderia_Alpha_Tauri_Honda=c("Pierre Gasly", "Yuki Tsunoda"),
                 Alfa_Romeo_Racing_ORLEN=c("Valterri Bottas", "Guanyu Zhou"),
                 HASS_F1_Team=c("Mick Schumacher", "Kevin Magnussen"))

Constructing a list is simple – just assign multiple vectors (e.g Scuderia_Ferrari=c("Charles Leclerc", "Carlos Sainz") – each separated by a comma wrapped in the list() function.

The benefit of lists like these is that you can easily access items in the list using human-readable names instead of numerical indexes (which still works!).

Below are a few examples of how to access the Ferrari drivers, which are the first vector in our list:

F1_teams$Scuderia_Farrari

F1_teams[1]

F1_teams["Scuderia_Farrari"]

These all achieve the same result. If you want to find out who the number 2 driver at HAAS is, apply the same logic used in vector indexing:

F1_teams$HASS_F1_Team[2]

Dataframes

Dataframes are a superior method to lists for storing multiple vectors. Typically, each row in a dataframe corresponds to an observation (person, event, sample), whilst columns correspond to the variable being recorded (e.g height, age, eye colour).

Go to RStudio Cloud and open your session. Load in the Iris dataset:

iris <- datasets::iris

You can see a newly created Data object in your environment called iris with 150 obs of 5 variables. That is to say, we have 150 rows and 5 columns.

Colnames & Rownames

A simple rule applies to colnames and rownames: they must be unique. This is because R uses both colnames and rownames to index each column and row respectively, duplicate entries are not allowed.

Inspect the column names and row names of a dataframe:

colnames(iris)rownames(iris)

Returns:

Note that the rownames in this dataset are not important, they are just automatically incremented (unique) integers.

Dataframe Indexes

There are situations where we will need to isolate columns or rows for an analysis. The same numerical indexing logic from vectors applies, but there are two entries to the square brackets – one for rows, and one for columns.

Like lists, we can provide human-readable names to access a specific column: iris$Sepal.Width.

Subsetting Dataframes

Now that we know how to isolate specific cells of a dataframe, the next step is to apply these changes by ‘slicing the dataframe’. Slicing and subsetting are interchangeable terms.

In our Iris dataset, make a new dataframe that contains only numerical measurements for Petals (i.e. the 3rd and 4th columns):

petal_data <- iris[,3:4]

Now make a dataframe that contains only the numerical observations (i.e drop the column species):

numerical_data <- iris[,-5]

Personally, I prefer the use of the subset() function. The above operations are performed using subset below:

petal_data <- subset(iris, select = c(Petal.Width, Petal.Length))numerical_data <- subset(iris, select = -c(Species)

Note:

It is rare that you would select/drop observations from a dataset in this manner (do not cherry-pick your data). This is why the examples are performed on columns.

Filtering Dataframes

Filtering dataframes is an extension of dataframe subsetting, performed using logical operators:

< : less than
<= : less than or equal to
> : greater than
>= : greater than or equal to
== : exactly equal to
!= : not equal to
!x : Not x
x | y : x OR y
x & y : x AND y

Using the Iris dataset as an example, subset the original dataframe to isolate data that belongs to the species Setosa:

setosa_data <- subset(iris, iris$Species == "Setosa")

Updating Dataframes

To create a new variable in our dataframe, we can use the $ operator.

In the example below, we will add a column called sepal_less_petal_len to the original dataframe.

This column will contain Sepal.Length - Petal.Length

iris$sepal_less_petal_len <- iris$Sepal.Length - iris$Petal.Length

Exercise!

Make your own Data frame

We can also make our very own data frames and lists, by combining vectors.

Open up R studio cloud, and 5 create vectors; Monday, Tuesday, Wednesday, Thursday and Friday. Make the content of each vector a shopping list containing 5 items for a restaurant on each day of the week.

Monday might look like this:

monday <- c("pasta", "bacon", "mushrooms", "milk", "cheese")

Do this for all 5 days.
Combine all 5 vectors into a data frame using the data.frame() function:
- ```
dataframe <- data.frame(monday, tuesday, wednesday, thursday, friday)
```
Use some of the subsetting techniques described such as indexing and filtering to explore how data frames are in essence, combinations of vectors that are sub settable.

If Else

We can use ifelse() to create new variables based on conditional statements.

In the example below we will use a vector, however, this applies to dataframe columns too:

Note:

The ifelse() function is an example of a ternary operator which reads as follows: “ A ? B : C“ – If A is true choose B, else choose C.

vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

test <- ifelse(vector > 5, "greater than 5", "less than 5")
print(test)

Returns:

But what about the 5th element? 5 is not less than 5.

To add a second layer of conditionals we will re-use the ifelse() function:

test <- ifelse(vector == 5, "five", test)
print(test)

Returns:

3 Coding in RStudio

Variables

Calculations

Data Types

Vectors

Named Vectors

Manipulating Vectors

Indexing

Lists

Dataframes

Colnames & Rownames

Dataframe Indexes

Subsetting Dataframes

Filtering Dataframes

Updating Dataframes

Exercise!

If Else

Licence

Share This Book