Chapter 5 Indexing
When we wish to extract elements of an object like a vector, list, data frame, or matrix, we use a process called indexing. The process of indexing, is also sometimes called subsetting. In R the index of an object is the numeric location of that object. For example, consider the vector vec <- c(100, 20, 3)
. The index of the first element is 1, the index of the second element is 2, and the so on. We have already seen a few examples of indexing for vectors and factors, lists, and 2D objects. In this section we more formally describe a plethora of indexing techniques. Indexing can be hard to master in R because of the many options, and the different types of objects. In this section we will describe the basic indexing techniques for atomic vector and factors, lists, matrices, and data frames. The indexing between the methods are all related, so it useful to talk about them all together. At the end of this section we give examples of a few features and applications we can do with indexing.
5.1 Atomic Vectors and Factors
We have already seen a little big of indexing with vectors (atomic, factors, and lists). Now we will discuss indexing in more detail for 1D Objects. We will focus specifically on atomic vectors. The techniques here can be used with other types and classes of vectors though. For indexing with vectors we only have one indexing operator, []
. We also have four general strategies that we will focus on. Suppose we wish to preform indexing on a vector vec
.
# Generate a random vector with the following code
set.seed(10)
sample(1:100, 10) # numeric vector with 10 values
vec <- vec
## [1] 9 74 76 55 72 54 39 83 88 15
- Four basic strategies:
- positive integer: When using the positive integer strategy we use a vector
index
which only contains positive integers of the indexes. This vector can be of any positive finite length. That means it can be of length 1, length 10, or even length 10000. We use this operator by callingvec[index]
, which will return the elements ofvec
by their indices as ordered fromindex
. - negative integer: The negative integer strategy works similarly. This time we consider a vector
index
which only contains negative integers, and must have a positive length between 1 and the length ofvec
. These correspond to the elements ofvec
you would like to exclude.
- logical elements: When using the logical strategy we use a vector
index
which contains only logical (TRUE/FALSE
) values. In this strategyindex
must be the same length as the vectorvec
. If it is not, R will use recycling to complete the command. TheTRUE
values inindex
represent the elements ofvec
you wish to keep, andFALSE
values represent the elements you wish to exclude. - names: If
vec
is a named vector we can also use the names to preform indexing. In this case the vectorindex
should be a character vector where each element of the vector is the name of an element invec
that we wish to keep. We can not use a negative operator, or a negative sign with this strategy to exclude variables.
- positive integer: When using the positive integer strategy we use a vector
We can not mix and max these strategies within a command. We can only use one strategy at a time.
Example: Positive Integers
# Obtaining a single element
1] vec[
## [1] 9
# Obtaining several elements: Get 1st, 2nd, 3rd element
c(1, 2, 3)] vec[
## [1] 9 74 76
# Get mutliples of the same element
c(3, 2, 1, 1, 1, 2, 3)
index <- vec[index]
## [1] 76 74 9 9 9 74 76
Example: Negative Integers
# Remove first element
1] vec[
## [1] 9
# Remove several elements: 1st, 2nd, 3rd
-c(1, 2, 3)] vec[
## [1] 55 72 54 39 83 88 15
# Equivalent to above
c(-1, -2, -3)
index <- vec[index]
## [1] 55 72 54 39 83 88 15
5.2 Lists
Although lists are 1D objects, they have three different operators: []
, [[]]
, and $
. The first operator works the same way as we saw above for atomic vectors. We can use all four strategies we used in the prior section, and a new list will appear according to the indexing order. The new operators are [[]]
and $
, these operators are very similar. They both can only isolate one element in the list, and they return this element in its particular class. That is, if the second element in the list is data frame, then a data frame is returned with the [[]]
and $
operators.
5.2.1 Double Brackets
With the double brackets operator [[index]]
we can put the index number for the element we want returned, or if we have a named list, we can put the name of the element we desire. Remember, you can only isolate one element in the list using this operator, so index
must be of length 1.
# Create a named list
# Recall: name = value
list(first = c("Hello", "Goodbye"), second = c(1, 2, 3), third = c(T, F, T))
lst1 <-
# Create a nested list with names
list(e1 = lst1, e2 = "Stat 107 Rules")
lst2 <-
# See structure of the list
str(lst2)
## List of 2
## $ e1:List of 3
## ..$ first : chr [1:2] "Hello" "Goodbye"
## ..$ second: num [1:3] 1 2 3
## ..$ third : logi [1:3] TRUE FALSE TRUE
## $ e2: chr "Stat 107 Rules"
# Isolate second element by name (maintains class of the element)
"e2"]] lst2[[
## [1] "Stat 107 Rules"
class(lst2[["e2"]])
## [1] "character"
# Isolate second element by integer (maintains class of the element)
2]] lst2[[
## [1] "Stat 107 Rules"
class(lst2[[2]])
## [1] "character"
# Isolate nested elements
1]][[2]] lst2[[
## [1] 1 2 3
5.2.2 Dollar Sign
After the dollar sign operator $
we put the name of the desired element. You can only isolate one element in the list using this operator, and you can only access elements using their names. However, if you have
# Isolate second element by name (maintains class of the element)
$e2 lst2
## [1] "Stat 107 Rules"
# Isolate nested elements
$e1$second lst2
## [1] 1 2 3
You can also mix and match indexing methods for lists.
$second[2] lst1
## [1] 2
5.3 Matrices
For matrices we will only consider three indexing techniques, these are by far the most popular. There is only one operator we need to consider for matrices, and it is the same one we use for vectors []
. Inside this operator you can put in two vectors, or a single vector.
5.3.1 Two Vectors
Using two vectors when indexing a list is by far the most common, and the recommended way to index a matrix. It is easy to read, and standard practice. For this technique you use [row, column]
, where row
is a vector of index values of the rows you wish to isolate, and column
is a vector of the index values of the columns you wish to isolate. The vectors row
and column
support positive integers, negative integers, logical vectors, and character vectors with row and column names. That is, we can index the rows and columns of a matrix in the same way we did before with standard vectors, but now we have two dimensions to consider. Like before, the vectors row
and column
must be all positive values, all negative values, all logical, or only contain the respective names. However, the values between vectors can differ. For example, row
can be a vector of positive integers, and column
can be a vector of logical values. In general, a matrix returns another matrix, or it returns a vector.
matrix(1:9, nrow = 3, ncol = 3)
my_m <-colnames(my_m) <- c("C1", "C2", "C3")
my_m
## C1 C2 C3
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
# Obtain a full row
1,] my_m[
## C1 C2 C3
## 1 4 7
# Obtain a full column
2] my_m[,
## [1] 4 5 6
# All rows but the first, and get the last two columns
-1, c("C2", "C3")] my_m[
## C2 C3
## [1,] 5 8
## [2,] 6 9
5.3.2 Single Vector
Matrices can be thought of as a special shaped atomic vector where the first elements of the vector are the first column (from top to bottom), the next elements are the second column (top to bottom), and so on. In fact, R supports indexing matrices using this idea. If attempt to subset a matrix using [index]
, where index
is a single vector, then the values of index
will correspond register the values of the matrix in this order.
It is not particularly common to index in this way, and not recommended because it is not particularly clear.
1] my_m[
## [1] 1
c(1, 9)] my_m[
## [1] 1 9
-c(1, 9)] my_m[
## [1] 2 3 4 5 6 7 8
5.4 Data frames
Data frames can be indexed in all the ways that matrices can be indexed above. They also have a few more techniques. At its core, can think of data frames as a special type of list in which each element of the list is a vector of the same length. Data frames have three indexing operators []
, [[]]
, and $
. The []
operator works identically for data frames, as it does matrices, that is we can supply this operator two vectors [row, column]
or one [index]
. Thus, we will focus on the other two operators. Recall from indexing lists that [[]]
and $
can only access one element of a list. When using [[]]
and $
on data frames these operators can only access one column.
Example: Double brackets
# Use Built In Data Set: Iris
head(iris) # Preview Data Set
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
sapply(iris, class) # Class of Each Column
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "numeric" "numeric" "numeric" "numeric" "factor"
summary(iris) # Summary Statistics of Each Column
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
# Isolate column with positive integer
# Returns a vector, not a data frame with one column
1]] iris[[
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
# Isolate column with name (Same as above)
# Returns a vector, not a data frame with one column
"Sepal.Length"]] iris[[
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
Example: Dollar Sign
# Isolate column with name (Same as above)
# Returns a vector, not a data frame with one column
$Sepal.Length iris
## [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
## [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
## [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
## [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
## [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
## [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
## [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
## [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
## [145] 6.7 6.7 6.3 6.5 6.2 5.9
5.5 Features and Applications
In this section we will go over some features and applications of using indexing techniques. These are special functions and things that we can do with the indexing we dicussed so far.
5.5.1 Indexing and Reassignment
Recall the vector vec
we created above. With all of the indexing techniques we discussed before, we can combine indexing with reassignment. We can reassign values inside of a vector via their index number. This can be done with all the objects and techniques we have learned. For example, recall the vector vec
we created above. We can reassign the first three elements of vec
to be 62.
# `vec` from above
# Generate a random vector with the following code
set.seed(10)
sample(1:100, 10) # numeric vector with 10 values
vec <- vec
## [1] 9 74 76 55 72 54 39 83 88 15
# reassign first three values of vec using index
1:3] <- 62
vec[ vec
## [1] 62 62 62 55 72 54 39 83 88 15
The only values that are changed are the ones we isolated via indexing. Lets see another example with logical values. In this example we use the logical indexing technique to isolate only values that meet a certain condition. So the vector index_to_change
contains logical values where TRUE
indicates that the values in vec
are greater than 50, and FALSE
if otherwise. So when we use vec_chr[index_to_change]
it changes all elements which correspond to TRUE
to be equal to Big
. It does not update any other elements in the vector vec_chr
.
# Make a character vector
as.character(vec)
vec_chr <-
# Reassign elements to "Big" if they are a big number
# Do not change other elements
vec>50
index_to_change <- "Big"
vec_chr[index_to_change] <- vec_chr
## [1] "Big" "Big" "Big" "Big" "Big" "Big" "39" "Big" "Big" "15"
Here is another example where we reassign a column name of the matrix my_m
to be “my_c2”.
# Recall matrix
my_m
## C1 C2 C3
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
# Reassign just one column name
colnames(my_m)[2] <- "my_c2"
Now lets reassign the value in the second row, second column to be NA
.
2, "my_c2"] <- NA
my_m[ my_m
## C1 my_c2 C3
## [1,] 1 4 7
## [2,] 2 NA 8
## [3,] 3 6 9
5.5.2 Ordering/Integer Indexing
As we saw above, we can also using indexing with positive integers and names to rearrange values in an object. If we want to do a rearrangement based on smallest to largest value (or vice versa), or alphabetical (or reverse alphabetical), we can do this directly with the order()
function. This function returns the ranks of the variable being sorted.
# Example data frame
c("G1", "G2", "G1", "G1", "G2")
group <- c(35, 30, 31, 28, 40)
age <- c(65, 70, 60, 72, 68)
height <- c(TRUE, TRUE, FALSE, FALSE, TRUE)
pets <- data.frame(group, age, height, pets)
mydata <- mydata
## group age height pets
## 1 G1 35 65 TRUE
## 2 G2 30 70 TRUE
## 3 G1 31 60 FALSE
## 4 G1 28 72 FALSE
## 5 G2 40 68 TRUE
# Indices in smallest to largest order
order(mydata$age)
## [1] 4 2 3 1 5
# Rearrange data frame to be from shortest to tallest
order(mydata$age),] mydata[
## group age height pets
## 4 G1 28 72 FALSE
## 2 G2 30 70 TRUE
## 3 G1 31 60 FALSE
## 1 G1 35 65 TRUE
## 5 G2 40 68 TRUE
order(mydata$group, mydata$age),] mydata[
## group age height pets
## 4 G1 28 72 FALSE
## 3 G1 31 60 FALSE
## 1 G1 35 65 TRUE
## 2 G2 30 70 TRUE
## 5 G2 40 68 TRUE
We can sort by more than one variable. Including more than one variable allows a “nested sort,” where the second variable, third variable, etc., is used when there are ties in the sorting based on the previous variables. Let’s first sort by group
alone, and then by group
followed by age
and see what we get.
# Sort just by "group"
order(mydata$group), ] mydata[
## group age height pets
## 1 G1 35 65 TRUE
## 3 G1 31 60 FALSE
## 4 G1 28 72 FALSE
## 2 G2 30 70 TRUE
## 5 G2 40 68 TRUE
# Rearrange data frame FIRST by "group", SECOND by "age"
order(mydata$group, mydata$age), ] mydata[
## group age height pets
## 4 G1 28 72 FALSE
## 3 G1 31 60 FALSE
## 1 G1 35 65 TRUE
## 2 G2 30 70 TRUE
## 5 G2 40 68 TRUE
To reorder a vector from smallest to largest we can also consider the sort()
function.
sort(mydata$age)
## [1] 28 30 31 35 40
5.5.3 Adding Elements/Rows/Columns
To add an element/row/column to an object we can also use indexing and the assignment operator. To do so, we put the new index number or index name with our indexing operator, and assign a value. This only works when the new index number is only one more then current length or dimensions.
# Adding an element to vec
length(vec)+1] <- 1000
vec[ vec
## [1] 62 62 62 55 72 54 39 83 88 15 1000
# Adding a column to data frame Iris
$new_column <- "Hello"
iris1:10,] # Output first ten rows to preview iris[
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column
## 1 5.1 3.5 1.4 0.2 setosa Hello
## 2 4.9 3.0 1.4 0.2 setosa Hello
## 3 4.7 3.2 1.3 0.2 setosa Hello
## 4 4.6 3.1 1.5 0.2 setosa Hello
## 5 5.0 3.6 1.4 0.2 setosa Hello
## 6 5.4 3.9 1.7 0.4 setosa Hello
## 7 4.6 3.4 1.4 0.3 setosa Hello
## 8 5.0 3.4 1.5 0.2 setosa Hello
## 9 4.4 2.9 1.4 0.2 setosa Hello
## 10 4.9 3.1 1.5 0.1 setosa Hello
# Adding another new column to Iris
ncol(iris)+1)] <- "Goodby"
iris[, (1:10,] # Output first ten rows to preview iris[
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_column V7
## 1 5.1 3.5 1.4 0.2 setosa Hello Goodby
## 2 4.9 3.0 1.4 0.2 setosa Hello Goodby
## 3 4.7 3.2 1.3 0.2 setosa Hello Goodby
## 4 4.6 3.1 1.5 0.2 setosa Hello Goodby
## 5 5.0 3.6 1.4 0.2 setosa Hello Goodby
## 6 5.4 3.9 1.7 0.4 setosa Hello Goodby
## 7 4.6 3.4 1.4 0.3 setosa Hello Goodby
## 8 5.0 3.4 1.5 0.2 setosa Hello Goodby
## 9 4.4 2.9 1.4 0.2 setosa Hello Goodby
## 10 4.9 3.1 1.5 0.1 setosa Hello Goodby
5.5.4 Delete Elements/Rows/Columns
If we wanted to completely delete a element in a vector we can use the assignment operator.
# Recall the vector
vec
## [1] 62 62 62 55 72 54 39 83 88 15 1000
vec
vec_copy <-
# Strategy 2 - Redefine Object: Delete the third element of vec
vec_copy[-3]
vec_copy <- vec_copy
## [1] 62 62 55 72 54 39 83 88 15 1000
This method also works the same way with 2D objects and lists. In addition we can also use NULL
. Recall that NULL
is used to completely delete an object, in contrast to NA
, which removes the value but saves the space.
# Strategy 1 - NULL: Delete a column
$new_column <- NULL
iris1:10, ] # Preview first 10 rows iris[
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species V7
## 1 5.1 3.5 1.4 0.2 setosa Goodby
## 2 4.9 3.0 1.4 0.2 setosa Goodby
## 3 4.7 3.2 1.3 0.2 setosa Goodby
## 4 4.6 3.1 1.5 0.2 setosa Goodby
## 5 5.0 3.6 1.4 0.2 setosa Goodby
## 6 5.4 3.9 1.7 0.4 setosa Goodby
## 7 4.6 3.4 1.4 0.3 setosa Goodby
## 8 5.0 3.4 1.5 0.2 setosa Goodby
## 9 4.4 2.9 1.4 0.2 setosa Goodby
## 10 4.9 3.1 1.5 0.1 setosa Goodby
# Strategy 2 - Redefine Object: Delete a column
iris[, -ncol(iris)]
iris <-1:10, ] # Preview first 10 rows iris[
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
5.5.5 Select Based on Condition
So far we have not used logical vectors to index that much yet. Logical indexing is actually very helpful and common! One of the big reasons we use logical vectors for indexing is to select elements that meet a certain condition. For example, maybe we want only want to display elements of a vector that are larger than 50.
# diplays elements of vec that are larger than 50
> 50] vec[vec
## [1] 62 62 62 55 72 54 83 88 1000
We can also reassignment elements of a vector that meet a certain condition. This uses ideas from 5.5.1.
# Reassign values in vec2 to be NA if they are greater than 50.
vec
vec2 <->50] <- NA
vec2[vec2 vec2
## [1] NA NA NA NA NA NA 39 NA NA 15 NA
We can of course also use this strategy on all other objects that support the []
operator, which is everything so far!
# display rows of iris that have species == "setosa"
$Species=="setosa", ] iris[iris
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
5.5.6 Convert Indexing Techniques
With all these methods it can sometimes be difficult to remember which is which. However, we will often find ourselves naturally gravitating to one technique over another. There are different operators and functions in R that help us convert the different techniques. For example, the which()
function helps us switch from logical indexing to positive integer indexing.
# Switch from logical strategy, to positive integer strategy
which(vec >50)
index <- index
## [1] 1 2 3 4 5 6 8 9 11
vec[index]
## [1] 62 62 62 55 72 54 83 88 1000
The %in%
operator helps us make a check if elements in the object values
are in the set keep
, i.e. values %in% keep
.
# Returns logical vector of column names to keep
c("species", "Sepal.Length")
keep <-colnames(iris) %in% keep
## [1] TRUE FALSE FALSE FALSE FALSE
Summary
Indexing operators
[]
,[[]]
,$
[]
: Used with 1d and 2d objects- Positive Integers
- Negative Integers
- Name
- Logical
[[]]
: Used with lists or Data frames. Can only isolate one element or column.- Positive Integers
- Name
$
: Used with lists or Data frames. Can only isolate one element or column.- Name
Indexing can be combined with reassignment.
Some important functions and operators to remember:
order()
,sort()
,which()
,%in%
.