Strings are a fun­da­ment­al data structure in R. They are used to display sequences of char­ac­ters and in­di­vidu­al letters. In contrast to other pro­gram­ming languages, R does not have a data type called ‘string’. Instead, this R data type is referred to as ‘character’.

What are R strings?

Strings are a standard in pro­gram­ming languages and a data structure that all seasoned pro­gram­mers are familiar with. If you are just getting started with learning how to code, it’s important for you to un­der­stand what a string is.

Strings are es­sen­tially nothing more than a sequence of char­ac­ters. Strings are commonly used to store and process non-numeric data in programs. Similar to other pro­gram­ming languages, strings are also enclosed in single or double quotation marks when writing code in R.

How to create a string in R

You can create a string in R with just one line of code. Both single quotation marks and double quotation marks can be used to create strings, so the choice is up to you:

# String with double quotation marks
string1 <- "Hello world!"
# String with single quotation marks
string2 <- 'Hello world!'
R

How to use R string functions and op­er­a­tions

R provides pro­gram­mers with a set of basic functions to make working with strings efficient. These can be used to perform various op­er­a­tions both on strings and together with strings. We’ve compiled a list of the most important R string op­er­a­tions here:

  • substr(): Extracts a portion of a string
  • paste(): Con­cat­en­ates (joins) strings
  • tolower() / toupper(): Converts all of the letters in a string to lowercase letters or uppercase letters
  • strsplit(): Splits a string at a specified point
  • trimws(): Removes blank spaces at the beginning and end of a string
  • gsub(): Replaces patterns in a string with other char­ac­ters
  • nchar(): Cal­cu­lates the length of a string

If you have already worked with other pro­gram­ming languages, you’ve probably already en­countered functions like the ones above. Strings in Python, for example, can also be ma­nip­u­lated with op­er­a­tions in Python that are equi­val­ent to the ones above.

substr()

You can use the substr() function to extract sub­strings from your R strings. To do this, pass your string to the function as the first parameter. For the second and third para­met­ers, specify the start and end indices of the substring you want to extract. Remember that, unlike many other pro­gram­ming languages, R strings are indexed starting from 1 and not from 0.

string <- "Hello World"
print(substr(string, start=7, stop=11))
R

The example above outputs World.

paste()

The function paste() is used in R to join two or more strings together. This is known as con­cat­en­a­tion. Keep in mind that the + symbol cannot be used to con­cat­en­ate strings. The R operator + is only defined for numerical data types.

string <- "Hello"
string2 <- "World"
print(paste(string+ string2))
R

When paste() is called, string and string 2 are con­cat­en­ated, resulting in the output: Hello World.

tolower() / toupper()

With tolower() und toupper(), you can change all of the letters in your string to either uppercase or lowercase. For both R string functions, you’ll need to use the string that you want to change as the parameter. The function will then provide you with a new string where all letters are written either in lowercase or uppercase.

string <- "Hello World"
print(tolower(string))
print(toupper(string))
R

The code above will display hello world and HELLO WORLD on your screen. These two R string functions are es­pe­cially useful for managing data that needs to be processed in a case-sensitive manner.

strsplit()

The strsplit() function in R may seem somewhat familiar to ex­per­i­enced pro­gram­mers. For example, Python also has a function named split(). For the R string function strsplit(), your para­met­ers will be the string that you want to separate into sub­strings and a delimiter, which will determine where the string should be split. When the function is called, it returns a list of the sub­strings that have been created, even if there is only one.

string <- "Hello World"
print(strsplit(string, " "))
R

The code produces the following output:

[[1]]
[1] "Hello" "World"

The result is a list with two strings: "Hello" and "World". In this example, the blank space between the two words was used as the delimiter.

trimws()

Using the trimws() function, you can remove unwanted whitespace from the beginning and end of your R string. This can be es­pe­cially helpful when pro­cessing input from users who may have un­in­ten­tion­ally entered blank spaces when filling out a form.

string <- "   Hello World   "
print(trimws(string))
R

The code above will display Hello World without any blank spaces at the beginning or end of the string.

gsub()

Another string operation in R is the gsub() function. In this function, the first parameter is the substring that you want to replace. For the second parameter, use the string that you want to replace the substring in the first parameter with. The third parameter specifies which string the re­place­ment should be applied to.

string <- "Hello World"
print(gsub("World", "User", string))
R

Instead of saying hello to the entire world, the code outputs a text that only addresses a single user: Hello User.

nchar()

One of the most important built-in functions for strings is nchar(), which tells you what the length of an R string is.

string <- "Hello World"
print(nchar(string))
R

The R command length() may be a source of confusion at first. The length() function in R, however, is used to determine the number of elements in an object and not the number of char­ac­ters in an R string. To determine R string length, make sure to use nchar().

Tip

Get your pro­gram­ming projects online with webspace hosting from IONOS. Webspace hosting comes with a free domain for the first year and a 30-day money-back guarantee.

What are control char­ac­ters and escape sequences?

You can use control char­ac­ters to control the text layout of your R strings. Control char­ac­ters are pre­defined escape sequences that can be used to format text outputs. For example, with control char­ac­ters, you can implement line breaks or tabs.

Special char­ac­ters such as quotation marks, which would normally be in­ter­preted as the beginning or end of a string in R syntax, can also be displayed in strings using an escape sequence. Escape sequences begin with a backslash in R. Here are the most important ones:

  • \n: Newline/line break
  • \t: Tabulator
  • \: Backslash
  • ": Double quotation marks
  • ': Single quotation marks
Go to Main Menu