R’s gsub() and sub() functions help with text ma­nip­u­la­tion and are easy to use and combine with other functions. They can be seam­lessly in­teg­rated into data analyses and stat­ist­ic­al cal­cu­la­tions.

What do gsub() and sub() do in R?

R’s gsub() and sub() functions can replace patterns in strings. sub(), short for ‘sub­sti­tute’, finds the first instance of a pattern in a string and replaces it with another ex­pres­sion. This function only makes a single re­place­ment. gsub()stands for ‘global sub­sti­tute’ and finds all the instances of a pattern in a string and replaces each of them with another ex­pres­sion.

Both functions have broad ap­plic­a­tions in data cleaning and trans­form­a­tion. Their main purpose is to delete unwanted patterns and adapt text. They are es­pe­cially important for text ma­nip­u­la­tion in stat­ist­ic­al analyses and machine learning ap­plic­a­tions in R. For example, the functions can be used to extract certain patterns or transform data into the form necessary for an analysis.

What is the syntax of R’s gsub() and sub()?

The syntax of R’s gsub() and sub() functions is pretty similar. The two methods both take the following para­met­ers:

  • pattern: The pattern you’re looking for, in the form of a string or regular ex­pres­sion
  • re­place­ment: The ex­pres­sion the pattern should be replaced with
  • x: The vector or data frame to find and replace in

The structure of R’s gsub()

gsub(pattern, replacement, x)
R

The structure of R’s sub()

sub(pattern, replacement, x)
R

Examples for gsub()in R

The dis­tin­guish­ing feature of R’s gsub() is that it finds and replaces all instances of a pattern.

Deleting spaces

You can use gsub() to remove extra spaces from strings.

sentence <- "  Data science  is  powerful.  "
clean_sentence <- gsub("\\s+", " ", sentence)
cat(clean_sentence)
R

This produces the output:

"Data science is powerful."
R

The regular ex­pres­sion \\s+ cor­res­ponds to one or more con­sec­ut­ive spaces. When used in the above example, it removes the empty spaces from the sentence.

Replacing phone numbers

R’s gsub() is also useful for an­onymising or deleting private data such as phone numbers.

text <- "Contact us at 123-456-7890 for more information."
modified_text <- gsub("\\d{3}-\\d{3}-\\d{4}", "redacted phone number", text)
cat(modified_text)
R

Output:

"Contact us at redacted phone number for more information."
R

In the above example, we extract phone numbers with the regular ex­pres­sion \\d{3}-\\d{3}-\\d{4} and replace them with the string "redacted phone number".

Examples for sub()in R

If you just want to replace the first instance of a pattern, use R’s sub() function.

Replacing the first instance of a word

Let’s say we have a string with a repeated word and want to replace the first instance of that word.

text <- "Data science is powerful. Data analysis is fun."
result_sub <- sub("Data", "Information", text)
cat(result_sub)
R

The output looks as follows:

"Information science is powerful. Data analysis is fun."
R

R’s sub() searches the text for the string "Data" and replaces the first instance it finds with "Information".

Replacing numbers

We can also replace numbers with sub().

numeric_text <- "The cost is £1000. Please pay by 01/02/2024."
result <- sub("\\d+", "2000", numeric_text)
cat(result)
R

Output:

"The cost is £2000. Please pay by 01/02/2024."
R

The regular ex­pres­sion \\d+ cor­res­ponds to one or more digits. sub() just replaces the first group of digits in the text.

Tip

Read about other R functions like R substring and R rbind in our Digital Guide.

Web hosting
The hosting your website deserves at an un­beat­able price
  • Loading 3x faster for happier customers
  • Rock-solid 99.99% uptime and advanced pro­tec­tion
  • Only at IONOS: up to 500 GB included
Go to Main Menu