| Title: | Calculate and Compare Multiple Definitions of Coefficient of Determination |
|---|---|
| Description: | Calculate nine types of coefficients of determination (R-squared) based on the classification by Kvalseth (1985) <doi:10.1080/00031305.1985.10479448>. This package is designed for educational purposes to demonstrate how R-squared values can fluctuate depending on the choice of formula, particularly in power regression models or linear models without an intercept. By providing a comprehensive list of definitions, it helps users understand the mathematical sensitivity of goodness-of-fit indices. |
| Authors: | Mao Kobayashi [aut, cre] |
| Maintainer: | Mao Kobayashi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.0.9000 |
| Built: | 2026-05-10 05:11:50 UTC |
| Source: | https://github.com/indenkun/kvr2 |
Calculates goodness-of-fit metrics based on Kvalseth (1985), including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Squared Error (MSE). This function provides a unified output for comparing different model specifications.
comp_fit(model, type = c("auto", "linear", "power")) RMSE(model, type = c("auto", "linear", "power")) MAE(model, type = c("auto", "linear", "power")) MSE(model, type = c("auto", "linear", "power"))comp_fit(model, type = c("auto", "linear", "power")) RMSE(model, type = c("auto", "linear", "power")) MAE(model, type = c("auto", "linear", "power")) MSE(model, type = c("auto", "linear", "power"))
model |
A linear model or power regression model of the |
type |
Character string. Selects the model type: |
The metrics are calculated according to the formulas in Kvalseth (1985):
RMSE: Root Mean Squared Residual or Error
MAE: Mean Absolute Residual or Error
MSE: Mean Squared Residual or Error (Adjusted for degrees of freedom)
where is the sample size and is the number of model parameters
(including the intercept).
Note on MSE: In many modern contexts, "MSE" refers to the mean
squared error without degree-of-freedom adjustment (denominator ).
However, this function follows Kvalseth's definition, which uses
as the denominator.
For comp_fit(): An object of class comp_kvr2, which is a list
containing the calculated RMSE, MAE, and MSE values.
For individual functions (RMSE(), MAE(), MSE()):
A named numeric value of the specific metric.
The power regression model must be based on a logarithmic transformation.
When type = "auto", the choice between linear and power regression
is determined by analyzing the model formula. It identifies a power
regression if the dependent variable is a function call to log()
(e.g., lm(log(y) ~ x)).
Note that simple variable names containing the string "log" (e.g.,
lm(log_value ~ x)) are correctly treated as linear regression.
To override this automatic detection, manually specify type = "linear"
or type = "power".
Tarald O. Kvalseth (1985) Cautionary Note about R 2 , The American Statistician, 39:4, 279-285, doi: 10.1080/00031305.1985.10479448
# example data set 1. Kvålseth (1985). df1 <- data.frame(x = c(1:6), y = c(15,37,52,59,83,92)) model_intercept <- lm(y ~ x, df1) model_without <- lm(y ~ x - 1, df1) model_power <- lm(log(y) ~ log(x), df1) comp_fit(model_intercept) comp_fit(model_without) comp_fit(model_power)# example data set 1. Kvålseth (1985). df1 <- data.frame(x = c(1:6), y = c(15,37,52,59,83,92)) model_intercept <- lm(y ~ x, df1) model_without <- lm(y ~ x - 1, df1) model_power <- lm(log(y) ~ log(x), df1) comp_fit(model_intercept) comp_fit(model_without) comp_fit(model_power)
A specialized tool for educational and diagnostic purposes. This function automatically generates a comparison between a model with an intercept and its forced no-intercept counterpart (or vice versa), revealing how mathematical definitions of R-squared diverge under different constraints.
comp_model(model, type = c("auto", "linear", "power"), adjusted = FALSE)comp_model(model, type = c("auto", "linear", "power"), adjusted = FALSE)
model |
A linear model or power regression model of the |
type |
Character string. Selects the model type: |
adjusted |
Logical. If |
This function reconstructs the alternative model using QR decomposition
rather than update() to ensure robustness against environment/scoping
issues.
It is particularly useful for observing how definitions like
can exceed 1.0 or can become negative when an intercept is
removed, illustrating the "pitfalls" discussed in Kvalseth (1985).
A data frame of class comp_model containing nine R-squared
definitions and three fit metrics (RMSE, MAE, MSE) for both intercept
and no-intercept versions.
The original model objects are stored as attributes with_int and without_int
for use by the plot method.
Kvalseth, T. O. (1985) Cautionary Note about R2. The American Statistician, 39(4), 279-285.
df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x, data = df1) # Compare R-squared sensitivity comp_model(model) # Compare adjusted R-squared comp_model(model, adjusted = TRUE)df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x, data = df1) # Compare R-squared sensitivity comp_model(model) # Compare adjusted R-squared comp_model(model, adjusted = TRUE)
Extracts the metadata and model specifications used to calculate the coefficients of determination, such as the regression type, sample size, and degrees of freedom.
model_info(x)model_info(x)
x |
An object of class |
This function provides transparency into the calculation process of the various R-squared definitions. It is particularly useful for verifying whether a model was treated as a "power" regression (log-transformed) and how the degrees of freedom were determined for adjusted R-squared values.
A list containing the following components:
type: A string indicating the regression type ("linear" or "power").
has_intercept: A logical value indicating if the model includes an intercept.
n: The number of observations used in the model (excluding missing values).
k: The number of estimated parameters (including the intercept if present).
df_res: Residual degrees of freedom ().
The sample size n refers to the actual number of observations
used by lm(), which may be fewer than the rows in the original
data frame if NA values were present.
df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x, data = df1) res <- r2(model) # Check the metadata info <- model_info(res) info$n info$typedf1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x, data = df1) res <- r2(model) # Check the metadata info <- model_info(res) info$n info$type
A diagnostic plot to visualize why R-squared might be low or negative. It compares the model predictions (identity line) against the mean (horizontal line).
plot_diagnostic(x, ...)plot_diagnostic(x, ...)
x |
A fitted |
... |
Currently ignored. |
A ggplot object representing the visual analysis.
df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model plot_diagnostic(model)df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model plot_diagnostic(model)
Visualizes the different R-squared definitions or provides a diagnostic observed-vs-predicted plot to understand the model fit.
plot_kvr2( x, type = c("auto", "linear", "power"), plot_type = c("both", "r2", "diag"), ... )plot_kvr2( x, type = c("auto", "linear", "power"), plot_type = c("both", "r2", "diag"), ... )
x |
An object of class |
type |
Character string. Selects the model type: |
plot_type |
A string specifying the plot layout: |
... |
Currently ignored. |
When plot_type = "r2", the function creates a bar plot comparing all nine
definitions. Bars are colored based on their validity:
Skyblue: Standard values between 0 and 1.
Orange: Values exceeding 1.0 or falling below 0.0 (warnings).
When plot_type = "diag", the function displays a scatter plot of observed
vs. predicted values. Two reference lines are added:
Darkgreen Solid Line: The 1:1 "perfect fit" line (RSS reference).
Red Dashed Line: The overall mean of the observed data (TSS reference).
If the data points are closer to the red dashed line than the green solid line,
will be negative.
Combined View (plot_type = "both"):
Automatically configures the plotting device to show both plots simultaneously
for a comprehensive model evaluation.
The return value depends on the plot_type argument:
For "r2" and "diag": Returns a ggplot object
that can be further customized.
For "both": Generates a combined plot using the grid
system and returns the input object x invisibly.
df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model plot_kvr2(model) # Compare all definitions plot_kvr2(model, plot_type = "r2") # Diagnostic plot to see why some R2 might be problematic plot_kvr2(model, plot_type = "diag")df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model plot_kvr2(model) # Compare all definitions plot_kvr2(model, plot_type = "r2") # Diagnostic plot to see why some R2 might be problematic plot_kvr2(model, plot_type = "diag")
Generates a comprehensive 2x2 diagnostic dashboard comparing models with and without an intercept. This visualization helps identify how the absence of an intercept affects different R-squared definitions and error metrics.
## S3 method for class 'comp_model' plot(x, ...)## S3 method for class 'comp_model' plot(x, ...)
x |
An object of class |
... |
Further graphical parameters (currently ignored). |
The plot is organized into four panels:
Top-Left: Grouped bar chart of the nine R-squared definitions.
Bottom-Left: Comparison of absolute fit metrics (RMSE, MAE, MSE).
Top-Right: Observed vs. Predicted plot for the intercept model.
Bottom-Right: Observed vs. Predicted plot for the no-intercept model.
This layout allows for a direct "cause-and-effect" analysis: for instance, observing a data point far from the identity line in the bottom-right panel explains why certain R-squared definitions might crash or become negative in the left panels.
This function is primarily called for its side effect of creating a
grid-based plot. It returns the input object x invisibly.
Since this plot uses the grid system to combine multiple ggplot
objects, it cannot be further modified with the + operator. If you
need to customize individual panels, use the internal plotting functions
or extract the models from the comp_model object attributes.
comp_model(), plot_diagnostic()
df <- data.frame(x = 1:5, y = c(2, 3, 5, 4, 6)) m1 <- lm(y ~ x, data = df) res <- comp_model(m1) plot(res)df <- data.frame(x = 1:5, y = c(2, 3, 5, 4, 6)) m1 <- lm(y ~ x, data = df) res <- comp_model(m1) plot(res)
Visualizes the nine definitions of R-squared to compare their values and identify potential issues (e.g., values exceeding 1 or falling below 0).
## S3 method for class 'r2_kvr2' plot(x, ...)## S3 method for class 'r2_kvr2' plot(x, ...)
x |
An object of class |
... |
Currently ignored. |
A ggplot object representing the visual analysis.
df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model r2(model)df1 <- data.frame(x = 1:6, y = c(15, 37, 52, 59, 83, 92)) model <- lm(y ~ x - 1, data = df1) # No-intercept model r2(model)
A specialized print method for comp_model objects. It formats the
comparison table for better readability and provides diagnostic warnings
if any R-squared values fall outside the standard 0 to 1 range.
## S3 method for class 'comp_model' print(x, ..., digits = 4)## S3 method for class 'comp_model' print(x, ..., digits = 4)
x |
An object of class |
... |
Further arguments passed to or from other methods. |
digits |
Number of decimal places to be used for formatting numerical
values. Default is |
The output is formatted using the insight package's export_table()
functionality, ensuring a clean and structured display in the console.
In addition to the table, this method performs an automated check on the
R-squared values (columns 2 to 10). If any value exceeds 1.0 or falls
below 0.0, a warning message is displayed. This is a critical educational
feature, as it flags instances where specific definitions become
mathematically inappropriate due to the lack of an intercept or model
misspecification.
Returns the input object x invisibly.
Printing objects of class "r2_kvr2" (generated by r2()) or "comp_kvr2" (generated by comp_fit()), respectively, by simple print methods.
## S3 method for class 'r2_kvr2' print(x, ..., digits = 4, model_info = TRUE) ## S3 method for class 'comp_kvr2' print(x, ..., digits = 4, model_info = TRUE)## S3 method for class 'r2_kvr2' print(x, ..., digits = 4, model_info = TRUE) ## S3 method for class 'comp_kvr2' print(x, ..., digits = 4, model_info = TRUE)
x |
An object of class " |
... |
Further arguments passed to or from other methods. |
digits |
The number of decimal places to be used for rounding the results. Default is 4. |
model_info |
Logical. If |
These methods format the calculated statistics into a human-readable summary, displaying each index or metric with its corresponding value.
The input object is returned invisibly (via invisible(x)).
This function is called for its side effect of printing the results of r2() or comp_fit() calculations to the console.
Calculates nine types of coefficients of determination () based on
the classification by Kvalseth (1985). This function is designed to demonstrate
how values can vary depending on their mathematical definition,
particularly in models without an intercept or in power regression models
r2(model, type = c("auto", "linear", "power"), adjusted = FALSE) r2_1(model, type = c("auto", "linear", "power")) r2_2(model, type = c("auto", "linear", "power")) r2_3(model, type = c("auto", "linear", "power")) r2_4(model, type = c("auto", "linear", "power")) r2_5(model, type = c("auto", "linear", "power")) r2_6(model, type = c("auto", "linear", "power")) r2_7(model, type = c("auto", "linear", "power")) r2_8(model, type = c("auto", "linear", "power")) r2_9(model, type = c("auto", "linear", "power"))r2(model, type = c("auto", "linear", "power"), adjusted = FALSE) r2_1(model, type = c("auto", "linear", "power")) r2_2(model, type = c("auto", "linear", "power")) r2_3(model, type = c("auto", "linear", "power")) r2_4(model, type = c("auto", "linear", "power")) r2_5(model, type = c("auto", "linear", "power")) r2_6(model, type = c("auto", "linear", "power")) r2_7(model, type = c("auto", "linear", "power")) r2_8(model, type = c("auto", "linear", "power")) r2_9(model, type = c("auto", "linear", "power"))
model |
A linear model or power regression model of the |
type |
Character string. Selects the model type: |
adjusted |
Logical. If |
The nine coefficient equations from to are based on Kvalseth (1985) and are as follows:
: Proportion of total variance explained.
: Based on the variation of predicted values.
: Ratio of variation using the mean of predicted values.
: Incorporates the mean residual.
: The square of the multiple correlation coefficient between the dependent variable and the independent variable (a comprehensive indicator in linearized models).
: Square of Pearson's correlation coefficient between observed and predicted .
: Recommended for models without an intercept.
: Alternative form for models without an intercept.
: Robust version using medians to resist outliers.
where represents the median of the sample.
For degree of freedom adjustment adjusted = TRUE, refer to r2_adjusted.
For r2(): An object of class r2_kvr2, which is a list containing
calculated values for all formulas.
For individual functions (r2_1() to r2_9()): A named numeric
value of the specific definition.
calculated values for each formula.
The power regression model must be based on a logarithmic transformation.
When type = "auto", the choice between linear and power regression
is determined by analyzing the model formula. It identifies a power
regression if the dependent variable is a function call to log()
(e.g., lm(log(y) ~ x)).
Note that simple variable names containing the string "log" (e.g.,
lm(log_value ~ x)) are correctly treated as linear regression.
To override this automatic detection, manually specify type = "linear"
or type = "power".
Tarald O. Kvalseth (1985) Cautionary Note about R 2 , The American Statistician, 39:4, 279-285, doi:10.1080/00031305.1985.10479448
Box, George E. P., Hunter, William G., Hunter, J. Stuart. (1978) Statistics for experimenters: an introduction to design, data analysis, and model building. New York, United States, J. Wiley, p. 462-473, ISBN:9780471093152.
# Example data set 1. Kvalseth (1985). df1 <- data.frame(x = c(1:6), y = c(15,37,52,59,83,92)) # Linear regression model with intercept model_intercept1 <- lm(y ~ x, df1) # Linear regression model without intercept model_without1 <- lm(y ~ x - 1, df1) # Power regression model model_power1 <- lm(log(y) ~ log(x), df1) r2(model_intercept1) r2(model_without1) r2(model_power1) # Example data set 2. Kvalseth (1985). df2 <- data.frame(x = 6:13, y = c(3882, 1266, 733, 450, 410, 305, 185, 112)) power_model2 <- lm(log((y/7343)) ~ log(x), data = df2) r2(power_model2) # Example of a Multiple Regression Analysis Model. # The data for two independent variables given by Box et al. (1978, p. 462) # as used in Kvalseth (1985). df3 <- data.frame(x1 = c(0.34, 0.34, 0.58, 1.26, 1.26, 1.82), x2 = c(0.73, 0.73, 0.69, 0.97, 0.97, 0.46), y = c(5.75, 4.79, 5.44, 9.09, 8.59, 5.09)) # Multiple regression analysis model with intercept model_intercept3 <- lm(y ~ x1 + x2, df3) # Multiple regression analysis model without intercept model_without3 <- lm(y ~ x1 + x2 - 1, df3) # Multiple power regression analysis model model_power3 <- lm(log(y) ~ log(x1) + log(x2), df3) r2(model_intercept3) r2(model_without3) r2(model_power3)# Example data set 1. Kvalseth (1985). df1 <- data.frame(x = c(1:6), y = c(15,37,52,59,83,92)) # Linear regression model with intercept model_intercept1 <- lm(y ~ x, df1) # Linear regression model without intercept model_without1 <- lm(y ~ x - 1, df1) # Power regression model model_power1 <- lm(log(y) ~ log(x), df1) r2(model_intercept1) r2(model_without1) r2(model_power1) # Example data set 2. Kvalseth (1985). df2 <- data.frame(x = 6:13, y = c(3882, 1266, 733, 450, 410, 305, 185, 112)) power_model2 <- lm(log((y/7343)) ~ log(x), data = df2) r2(power_model2) # Example of a Multiple Regression Analysis Model. # The data for two independent variables given by Box et al. (1978, p. 462) # as used in Kvalseth (1985). df3 <- data.frame(x1 = c(0.34, 0.34, 0.58, 1.26, 1.26, 1.82), x2 = c(0.73, 0.73, 0.69, 0.97, 0.97, 0.46), y = c(5.75, 4.79, 5.44, 9.09, 8.59, 5.09)) # Multiple regression analysis model with intercept model_intercept3 <- lm(y ~ x1 + x2, df3) # Multiple regression analysis model without intercept model_without3 <- lm(y ~ x1 + x2 - 1, df3) # Multiple power regression analysis model model_power3 <- lm(log(y) ~ log(x1) + log(x2), df3) r2(model_intercept3) r2(model_without3) r2(model_power3)
Calculate the adjusted coefficient of determination by entering the regression model and coefficient of determination. See details.
r2_adjusted(model, r2)r2_adjusted(model, r2)
model |
A linear model or power regression model of the |
r2 |
A numeric. Coefficient of determination. |
The adjustment factor is calculated using the following formula.
is the sample size, and is the number of parameters in the regression model.
() is calculated using the following formula.
This function performs freedom-of-degrees adjustment for all coefficients based on the above formula. However, Kvalseth (1985) recommends applying freedom-of-degrees adjustment only to and , based on the principle of consistency in coefficients.
Furthermore, there is no basis for applying the same type of adjustment to (the square of the correlation coefficient) or to and , which depend on specific model forms.
For details on each coefficient of determination, refer to r2().
A numeric vector or a list of class r2_kvr2 containing the adjusted values.
Each element represents the adjusted version of the corresponding definition, accounting for the degrees of freedom.
Tarald O. Kvalseth (1985) Cautionary Note about R 2 , The American Statistician, 39:4, 279-285, doi:10.1080/00031305.1985.10479448