Chapter 3 Factors and Lists

Factor objects and lists are vectors with special properties. Factors and lists are vectors because they a 1-dimensional sequence of elements. Factors are primarily used for categorical data, and are technically a special form of an integer type vector. However, we will simply refer to a factor vector a as a factor object. Lists are vectors where type of each element can differ. In this chapter we introduce some of the unique properties of factors and lists.

3.1 Factors

In real-world problems, you often encounter data that can be classified in categories. For example, suppose a survey was conducted of a group of seven individuals, who were asked to identify their hair color.

hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")

Here, the hair color is an example of categorical data. For the hair color variable we will typically want to store it as a factor, as opposed to a character vector. The different values that the factor can take are called levels. In R, you can create a factor with the factor(), or the as.factor() functions.

f <- factor(hair)
f
## [1] Blonde Black  Black  Red    Blonde Brown  Black 
## Levels: Black Blonde Brown Red

3.1.1 Levels

Levels are one of the special properties of a factor object. Notice that when you print the factor, R displays the distinct levels below the factor. R keeps track of all the possible values in a vector, and each value is called a level of the associated factor. The levels() function shows all the levels from a factor.

levels(f)
## [1] "Black"  "Blonde" "Brown"  "Red"

If your vector contains only a subset of all the possible levels, then R will have an incomplete picture of the possible levels. Consider the following example of a vector consisting of directions. Notice that “South” is noticeably missing.

directions <- c("North", "West", "North", "East", "North", "West", "East")
f <- factor(directions)
f
## [1] North West  North East  North West  East 
## Levels: East North West

Notice that the levels of your new factor do not contain the value “South”. R thinks that North, West, and East are the only possible levels. However, in practice, it makes sense to have all the possible directions as levels of your factor. To add all the possible levels explicitly, you specify the levels argument of the function factor().

# Make sure all possible categories are listed using the levels argument
f <- factor(directions,
            levels = c("North", "East", "South", "West"))
f
## [1] North West  North East  North West  East 
## Levels: North East South West

R lets you assign abbreviated names for the levels. You can do this by specifying the labels argument of factor().

f <- factor(directions,
            levels = c("North", "East", "South", "West"),
            labels = c("N", "E", "S", "W"))
f
## [1] N W N E N W E
## Levels: N E S W

3.1.2 Ordered Factor

Sometimes data has some kind of natural order between elements. For example, sports analysts use a three-point scale to determine how well a sports team is competing:

loss < tie < win.

In market research, it’s very common to use a five point scale to measure perceptions:

strongly disagree < disagree < neutral < agree < strongly agree.

Such kind of data that is possible to place in order or scale is known as ordinal data. We can store ordinal data as an ordered factor. To create an ordered factor, use the factor() function with the argument ordered=TRUE.

record <- c("win", "tie", "loss", "tie", "loss", "win", "win")
f <- factor(record, 
            ordered = TRUE)
f
## [1] win  tie  loss tie  loss win  win 
## Levels: loss < tie < win

You can manually change which levels are lower and higher based on the order that the levels are listed.

record <- c("win", "tie", "loss", "tie", "loss", "win", "win")
f <- factor(record, 
           ordered = TRUE, 
           levels = c("win", "tie", "loss"))
f
## [1] win  tie  loss tie  loss win  win 
## Levels: win < tie < loss

If you have no observations in one of the levels, you can drop it using the droplevels() function.

record = c("win", "loss", "loss", "win", "loss", "win")
f = factor(record,
           levels = c("loss", "tie", "win"))

droplevels(f)
## [1] win  loss loss win  loss win 
## Levels: loss win

3.2 Factors - Basic Features

3.2.1 Length

Factor objects have a lot of the same features as atomic vectors. In general, most of the features and functions we had for atomic vectors work with factors. For example, we still use the length() function to see how many elements are in a factor.

length(f)
## [1] 6

3.2.2 Coercion

Coercion also works similarly. We can use as.factor() to create a factor object from a pre-existing vector. As we have seen in the previous examples, the factor() function also works.

record <- c("win", "loss", "loss", "win", "loss", "win")
f <- as.factor(record)
f
## [1] win  loss loss win  loss win 
## Levels: loss win

To convert a factor object to a non-factor object we still use the as.*() function. In general, it is usually easiest to convert character vectors to factor vectors, and vice versa. When we convert a factor to an integer or double vector the different levels of the factor are converted to integers in order for each level. That is, the first level listed is converted to a 1, the second level listed is converted to a 2, and so on.

# Convert to a character vector 
as.character(f)
## [1] "win"  "loss" "loss" "win"  "loss" "win"
# Convert to an integer vector 
as.integer(f)
## [1] 2 1 1 2 1 2

3.2.3 Testing/Class

We can also test if we have factor or an ordered factor using is.*() as we did before.

record <- c("win", "loss", "loss", "win", "loss", "win")
f <- factor(record, ordered = T)


# Test if the character vector is a factor
is.factor(record)
## [1] FALSE
# Test if we have a factor (includes both ordered and not ordered factors)
is.factor(f)
## [1] TRUE
# Test if we have an ordered factor (only includes ordered factors)
is.ordered(f)
## [1] TRUE

With all types of objects we can use the class() function. As mentioned in the previous section, this function returns the name of the type of object that you have, unlike typeof() which returns the storage mode. The output of class() returns name of a object with particular properties. For instance, a factor object. A factor object is stored like an integer vector but it has “levels” which can be utilized in special ways.

# Returns storage mode (not recommended)
typeof(f)
## [1] "integer"
# Returns class, which is the name of a collection of objects with similar properties 
# (Recommended)
class(f)
## [1] "ordered" "factor"

3.2.4 Names

Like standard vectors, we can name the elements in a factor using the same three techniques discussed in 2.2.5.

# Using Technique 1 for creating a named vector. 
named1 <- c(sally = "win", tom = "win", ed = "lost", jane = "tie")
named1 <- factor(named1)
named1
## sally   tom    ed  jane 
##   win   win  lost   tie 
## Levels: lost tie win

3.2.5 Accessing Elements

We can also access elements of a factor object using the same standard techniques described for accessing elements in a vector 2.2.7.

# Obtain the first element
named1[1]
## sally 
##   win 
## Levels: lost tie win
# Obtain the forth and second elements
named1[c(4, 2)]
## jane  tom 
##  tie  win 
## Levels: lost tie win

3.2.6 Frequency Tables

The summary() function will give you a quick overview of the contents of a factor.

hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
hair <- factor(hair)
summary(hair)
##  Black Blonde  Brown    Red 
##      3      2      1      1

The function table() tabulates observations.

table(hair)
## hair
##  Black Blonde  Brown    Red 
##      3      2      1      1

We can also use the table() and summary() functions on atomic vectors, and they will operate in a similar way. However, these functions are particularly utilized for factor objects.

The table() function can also tabulate two-way frequency tables.

hair <- c("Blonde", "Black", "Black", "Red", "Blonde", "Brown", "Black")
own_pets <- c(TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE)

hair <- factor(hair)
own_pets <- factor(own_pets)


# Two way table
table(hair, own_pets)
##         own_pets
## hair     FALSE TRUE
##   Black      2    1
##   Blonde     1    1
##   Brown      1    0
##   Red        0    1

3.3 Lists

A list is an array of objects. Unlike other types of vectors, the elements in a list can belong to different classes. Lists are useful for packaging together a set of related objects. We can create a list of objects in our environment by using the list() function.

# A list of mixed datatypes
lst <- list(1L, c("abc", "ABC"), 1.23, TRUE)
lst
## [[1]]
## [1] 1
## 
## [[2]]
## [1] "abc" "ABC"
## 
## [[3]]
## [1] 1.23
## 
## [[4]]
## [1] TRUE

Looking at the output above we can see that this output is structured differently than a standard vector. The location of each element of a list is denoted by [[]] instead of [].

The best way to understand the contents of a list is to use the structure function str(). It provides a compact display of the internal structure of a list.

lst <- list(1, c("abc", "ABC"), 1.23, TRUE)
str(lst)
## List of 4
##  $ : num 1
##  $ : chr [1:2] "abc" "ABC"
##  $ : num 1.23
##  $ : logi TRUE

Above we see that we have a list with 4 elements. The first element is of class “numeric” and contains a single number, 1. The second element is of “character” and contains two elements, which is indicated by [1:2]. The third and fourth elements are of class “numeric” and “logical”, and each contain a single element.

To see the class of each indivdiual element of a list we can use the following command.

sapply(lst, class)
## [1] "numeric"   "character" "numeric"   "logical"

3.3.1 Nested Lists

A list can contain sublists, which in turn can contain sublists themselves, and so on. This is known as nested list or recursive vectors.

lst <- list(1, 3, c("abc", "ABC"), list("a","b","c"), TRUE)
str(lst)
## List of 5
##  $ : num 1
##  $ : num 3
##  $ : chr [1:2] "abc" "ABC"
##  $ :List of 3
##   ..$ : chr "a"
##   ..$ : chr "b"
##   ..$ : chr "c"
##  $ : logi TRUE

3.4 List - Basic Features

3.4.1 Length

Despite looking different and being stored differently than atomic vectors and factors, lists have many of the same properties and features. For example, we still can use the length() function to determine how many elements are in a list.

lst <- list(1, 3, c("abc", "ABC"), list("a","b","c"), TRUE)
length(lst)
## [1] 5

In the example of both there are 5 items in a list. The first two elements are double vectors of length 1, the third element is a character vector of length two, the fourth element is a list of length three, and the fifth element is a logical vector of length 1. Add elements to a list using c() as we did before with atomic vectors.

lst_char <-  list("a", "b", "c")
lst_num <- list(100, 99, 0)
lst <- c(lst_num, lst_char)
str(lst)
## List of 6
##  $ : num 100
##  $ : num 99
##  $ : num 0
##  $ : chr "a"
##  $ : chr "b"
##  $ : chr "c"

3.4.2 Coercion

We can convert atomic vectors and factors into lists by simply using the as.list() function. The as.list() function lets each element in a vector correspond to each element in the list. So the first element of the vector becomes the first element of the list, the second element of the vector becomes the second element of the list, and so on. We could also use the list() function but this will convert a vector into a list in a different way.

num_vec <- c(1, 2, 3)
num_lst <- as.list(num_vec)
str(num_lst)
## List of 3
##  $ : num 1
##  $ : num 2
##  $ : num 3

We can convert lists to one of the other types of vectors using the as.*() function with the desired vector type as we did earlier. However, if we do not have a desired type in mind we can also use the unlist() function. The unlist() function takes all atomic objects in a list and creates an atomic vector. In this case R will “guess” which type of atomic vector you would like.

unlist(num_lst)
## [1] 1 2 3

3.4.3 Testing/Class

We can determine if an object is a list or not by using is.list() or the class() functions.

is.list(num_lst)
## [1] TRUE
class(num_lst)
## [1] "list"

3.4.4 Names

Lists can also be named, and often are. It is very common to created named lists because lists can have a mix of different types objects. We can create a named list using all the same techniques that we used for creating named vectors (2.2.5) . Notice, we do not have to create a name for every element in a list. Below we see the first two elements are named, and the last is not.

lst <- list(first = "Abraham", last = "Lincoln", 1860)
lst
## $first
## [1] "Abraham"
## 
## $last
## [1] "Lincoln"
## 
## [[3]]
## [1] 1860

3.4.5 Accessing Elements

Accessing elements in a list is a little different than accessing elements in a vector. As you may have already noticed, when a list is outputted into our console the elements in the list are denoted by their index number inside of double brackets, i.e. [[2]]. To access this individual element in a list we use double brackets. This isolates that individual element, and the class of this element is no longer a list but the class of the original element.

lst[[1]]
## [1] "Abraham"
class(lst[[1]])
## [1] "character"

We can still use single brackets to access elements in a list, but this method of indexing simply subsets the list. That is, it still returns us a list, just a smaller one based on the indices called.

lst[1]
## $first
## [1] "Abraham"
class(lst[1])
## [1] "list"
lst <- list(1, 3, c("abc", "ABC"), list("a","b","c"), TRUE)
lst[c(3, 1)]
## [[1]]
## [1] "abc" "ABC"
## 
## [[2]]
## [1] 1
class(lst[c(3,1)])
## [1] "list"

Summary

  • Factors are special types of vectors which are primarily used for categorical data.

  • Factors can be ordered or unordered.

  • We create factor objects using the factor() function.

  • Lists are vectors which can have a mix of classes/types of objects.

  • We create a factor or list using the functions factor() or list()

  • We can use the same basic functions with factors and lists as we do with atomic vectors: length(), as.*(), is.*(), class()

  • We have the same basic properties with factors and lists as we do with atomic vectors:

    • length() determine how long an object is

    • c() combine two objects of the same class together

    • same naming techniques

    • same indexing strategies

  • We can also access individual elements of a list using [[]], which isolates an element and takes it out of the list structure.

Additional Resources

Excercises

  1. Build a factor object with 3 levels, and 5 elements.

  2. Convert the directions vector above to a factor with the levels North, west, East, South. What happened to the values for West?

  3. Can we convert a logical vector into a factor vector? What about the other way around?

  4. Create a list with four elements:

    • a character vector of length 2 that contains your first and last name

    • a numeric vector of length 1 that contains your age

    • a factor object from somewhere in this chapter

    • an object of your choice

  5. Adjust/redefine your list in (4) so that way it is a named list.