How to create and use strings in R
Strings are a fundamental data structure in R. They are used to display sequences of characters and individual letters. In contrast to other programming languages, R does not have a data type called ‘string’. Instead, this R data type is referred to as ‘character’.
What are R strings?
Strings are a standard in programming languages and a data structure that all seasoned programmers are familiar with. If you are just getting started with learning how to code, it’s important for you to understand what a string is.
Strings are essentially nothing more than a sequence of characters. Strings are commonly used to store and process non-numeric data in programs. Similar to other programming languages, strings are also enclosed in single or double quotation marks when writing code in R.
How to create a string in R
You can create a string in R with just one line of code. Both single quotation marks and double quotation marks can be used to create strings, so the choice is up to you:
string1 <- "Hello world!" # String with single quotation marks string2 <- 'Hello world!'
How to use R string functions and operations
R provides programmers with a set of basic functions to make working with strings efficient. These can be used to perform various operations both on strings and together with strings. We’ve compiled a list of the most important R string operations here:
- substr(): Extracts a portion of a string
- paste(): Concatenates (joins) strings
- tolower() / toupper(): Converts all of the letters in a string to lowercase letters or uppercase letters
- strsplit(): Splits a string at a specified point
- trimws(): Removes blank spaces at the beginning and end of a string
- gsub(): Replaces patterns in a string with other characters
- nchar(): Calculates the length of a string
If you have already worked with other programming languages, you’ve probably already encountered functions like the ones above. Strings in Python, for example, can also be manipulated with operations in Python that are equivalent to the ones above.
You can use the
substr()function to extract substrings from your R strings. To do this, pass your string to the function as the first parameter. For the second and third parameters, specify the start and end indices of the substring you want to extract. Remember that, unlike many other programming languages, R strings are indexed starting from 1 and not from 0.
string <- "Hello World" print(substr(string, start=7, stop=11))
The example above outputs
paste()is used in R to join two or more strings together. This is known as concatenation. Keep in mind that the
+symbol cannot be used to concatenate strings. The R operator
+is only defined for numerical data types.
string <- "Hello" string2 <- "World" print(paste(string+ string2))
paste()is called, string and string 2 are concatenated, resulting in the output:
tolower() / toupper()
toupper(), you can change all of the letters in your string to either uppercase or lowercase. For both R string functions, you’ll need to use the string that you want to change as the parameter. The function will then provide you with a new string where all letters are written either in lowercase or uppercase.
string <- "Hello World" print(tolower(string)) print(toupper(string))
The code above will display
HELLO WORLDon your screen. These two R string functions are especially useful for managing data that needs to be processed in a case-sensitive manner.
strsplit()function in R may seem somewhat familiar to experienced programmers. For example, Python also has a function named
split(). For the R string function
strsplit(), your parameters will be the string that you want to separate into substrings and a delimiter, which will determine where the string should be split. When the function is called, it returns a list of the substrings that have been created, even if there is only one.
string <- "Hello World" print(strsplit(string, " "))
The code produces the following output:
[]  "Hello" "World"
The result is a list with two strings:
"World". In this example, the blank space between the two words was used as the delimiter.
trimws()function, you can remove unwanted whitespace from the beginning and end of your R string. This can be especially helpful when processing input from users who may have unintentionally entered blank spaces when filling out a form.
string <- " Hello World " print(trimws(string))
The code above will display
Hello Worldwithout any blank spaces at the beginning or end of the string.
Another string operation in R is the
gsub()function. In this function, the first parameter is the substring that you want to replace. For the second parameter, use the string that you want to replace the substring in the first parameter with. The third parameter specifies which string the replacement should be applied to.
string <- "Hello World" print(gsub("World", "User", string))
Instead of saying hello to the entire world, the code outputs a text that only addresses a single user:
One of the most important built-in functions for strings is
nchar(), which tells you what the length of an R string is.
string <- "Hello World" print(nchar(string))
The R command
length()may be a source of confusion at first. The
length()function in R, however, is used to determine the number of elements in an object and not the number of characters in an R string. To determine R string length, make sure to use
Get your programming projects online with webspace hosting from IONOS. Webspace hosting comes with a free domain for the first year and a 30-day money-back guarantee.
What are control characters and escape sequences?
You can use control characters to control the text layout of your R strings. Control characters are predefined escape sequences that can be used to format text outputs. For example, with control characters, you can implement line breaks or tabs.
Special characters such as quotation marks, which would normally be interpreted as the beginning or end of a string in R syntax, can also be displayed in strings using an escape sequence. Escape sequences begin with a backslash in R. Here are the most important ones:
- \n: Newline/line break
- \t: Tabulator
- \: Backslash
- ": Double quotation marks
- ': Single quotation marks