With the predict() function in R, you can make pre­dic­tions for new, unseen data. This function is an important tool for machine learning.

What is predict() in R used for?

The R function predict() is a versatile tool used in pre­dict­ive modelling. It generates pre­dic­tions for new or existing data points based on a pre­vi­ously designed stat­ist­ic­al models, such as a linear re­gres­sion, logical re­gres­sion, decision trees and other modelling tech­niques.

What is the syntax for predict() in R?

R’s predict() function takes as arguments a trained model and the data points that the pre­dic­tion should apply to. You can specify different options and para­met­ers based on the type of model used. The result is a vector of pre­dic­tions that can be useful for various ana­lyt­ic­al purposes, including eval­u­at­ing the per­form­ance of a model, decision making or il­lus­trat­ing the resulting data.

predict(object, newdata, interval)
R
  • object: The trained model that the pre­dic­tions are applied to
  • newdata: The data point for the pre­dic­tion
  • interval: Optional argument for entering the type of con­fid­ence interval (confidence for mean interval, prediction for pre­dic­tions)

Example of how to use predict() in R

The following example will il­lus­trate how the predict() function in R works. We’ll use a user-defined data set with speed and distance values.

Creating and dis­play­ing data

# Creating a data frame with custom speed and distance values
custom_data <- data.frame(speed = c(15, 20, 25, 30, 35),
    distance = c(30, 40, 50, 60, 70))
# Displaying the custom data frame
print("Custom Data Frame:")
print(custom_data)
R

First, we’ll create a user-defined data set for eval­u­at­ing the re­la­tion­ship between speed and distance. We’ll use the function data.frame() to create a data frame and then define the values for the variables speed and distance as c(15, 20, 25, 30, 35) and c(30, 40, 50, 60, 70) re­spect­ively.

After we’ve created the data set, we’ll display it using the print() function. That way we can check the structure and the assigned values of our new data frame.

Output:

"Custom Data Frame:"
    speed distance
1        15        30
2        20        40
3        25        50
4        30        60
5        35        70
R

Creating a linear model

# Creating a linear model for the custom data frame
custom_model <- lm(distance ~ speed, data = custom_data)
# Printing the model results
print("Model Results:")
print(summary(custom_model))
R

Output:

"Model Results:"
Call:
lm(formula = distance ~ speed, data = custom_data)
Residuals:
     1        2        3        4        5
    -2     -1        1        0        2
Coefficients:
(Intercept)     -10.00    15.81    -0.632    0.55897
speed                         2.00        0.47            4.254    0.01205
R

In the output, we see a linear model (custom_model) that was generated for the data set and models the re­la­tion­ship between speed and distance. We get the result of the model, including coef­fi­cients and stat­ist­ic­al in­form­a­tion.

Defining new speed values and making pre­dic­tions

# Creating a data frame with new speed values
new_speed_values <- data.frame(speed = c(40, 45, 50, 55, 60))
# Predicting future distance values using the linear model
predicted_distance <- predict(custom_model, newdata = new_speed_values)
R

We’ve now created another data set (new_speed_values) with new values for speed. Then we used R predict() to make pre­dic­tions for the cor­res­pond­ing distance values using the linear model we created above.

Dis­play­ing the pre­dic­tions

# Displaying the predicted values
print("Predicted Distance Values:")
print(predicted_distance)
R

The output shows the distance values predicted based on speed:

"Predicted Distance Values:"
               1                2                  3                 4                   5
 80.0000     90.0000    100.0000  110.0000     120.0000
R
Tip

If you want to learn about pro­cessing strings for text ma­nip­u­la­tion and data cleaning in R, take a look at our tutorials on R gsub and sub and R substring.

Web hosting
The hosting your website deserves at an un­beat­able price
  • Loading 3x faster for happier customers
  • Rock-solid 99.99% uptime and advanced pro­tec­tion
  • Only at IONOS: up to 500 GB included
Go to Main Menu