How do you calculate R-squared in R?
R-squared (R2) is a statistical error metric used to measure the quality of linear regressions. In R programming, it can be calculated by calling up a simple function.
Why is R-squared in R important?
R-squared is a statistical measure that measures how well a linear regression line approximates the data. It assumes values between 0 and 1 and is a key measure for regression model quality.
An R-squared interpretation provides information about how close the data is to a calculated regression line. The higher the R-squared value, the better the model explains the data. A low R-squared value indicates poor model fitting.
R lets you program a whole range of different applications. And getting your own webspace lets you host them. Discover different IONOS webspace plans and find one that meets your individual needs.
R-squared in R with linear regression
R-squared in R is often used in the context of linear regression. Since R is a programming language often used in statistics, it’s not surprising that there are various R functions to help you calculate:
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
model <- lm(y ~ x)
RIn the code example above, two R vectors named x and y were created. These vectors contain the datasets on which the linear regression will be performed. The dependent variable in this case is the variable y. The regression model is then calculated using the R-function lm()
and stored in the variable model.
How to calculate R-squared in R
The R2 value in R can be calculated using a function. You don’t need in-depth mathematical knowledge to do this, you just need to know how to use the correct function. It’s a simple function, even if you’re just starting out with coding.
The function to calculate this is called summary()
. As the name suggests, it provides a summary of the regression analysis, including the R-squared value. The code example below, which builds on the linear regression that has already been calculated, shows the summary()
function in action:
# R-squared-value
summary(model)$r.squared
RYou can use this code to extract the R-squared value from the linear regression model lm_model
. The R-squared value indicates how well the model approximates the variation in the dependent variable y, based on the independent variable x.
In the code example above, the summary()
function is applied to the regression model that has already been calculated. At the same time, the R operator $
is used to display the R-squared value from the values returned by the function call. In our example, the value is 0.6.
Looking to dive deeper into the world of R programming? Our how-to guides will help you get started:
- While loop in R
- Commands in R
- Data types in R
- Strings in R
How to interpret R-squared
Once the R-squared value has been determined, you have to interpret the result. Here, it‘s a good idea to look at certain intervals that the value can take. As mentioned earlier, the range of R2 values is between 0 and 1.
- 0 (no adjustment): an R-squared value of 0 means that the model does not match the data at all. In this case, there is no linear relationship between the variables.
- 1 (perfect fit): an R-squared value of 1 indicates that all observations lie perfectly on the regression line. This is extremely rare and may indicate overfitting.
- 0.7 to 0.9 (good fit): an R-squared value in this interval indicates that the model describes the data sufficiently well.
- 0.5 to 0.7 (acceptable adjustment): an R-squared value in the range of 0.5 to 0.7 is acceptable but indicates that there’s still room for improvement.
- Less than 0.5 (poor fit): an R-squared value below 0.5 indicates that the calculated model doesn’t describe the data with sufficient accuracy. In this case, the model should be adapted to obtain meaningful results.
A high R-squared value alone isn’t enough to judge the quality of your model. That’s why you should also consider factors like model validation, analysis of residuals, and adaptation to specific requirements when determining the goodness of fit of a regression model. The summary()
function shown earlier provides additional key figures that you can use for the assessment.