Skip to contents

This function identifies aliased (linearly dependent) variables in a linear model by adding a constant column to the data frame, fitting a linear model, and then using the alias function to detect aliased variables.

Usage

DetectAlias(DT, Verbose = FALSE)

Arguments

DT

A data frame or tibble containing the variables to be checked for aliasing.

Verbose

A logical value indicating whether to print the aliased variables found (if any). If TRUE, aliased variables are printed to the console. Defaults to FALSE.

Value

Returns a character vector of aliased variable names if any are found; otherwise, returns NULL invisibly. If Verbose is TRUE, the function will also print a message to the console.

Examples

library("car", warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
x1 <- rnorm(100)
x2 <- 2 * x1
x3 <- rnorm(100)
y <- rnorm(100)

model <- lm(y ~ x1 + x2 + x3)
summary(model)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.02755 -0.70035 -0.03393  0.72138  2.50283 
#> 
#> Coefficients: (1 not defined because of singularities)
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.06616    0.10346  -0.639   0.5240  
#> x1           0.10764    0.09810   1.097   0.2753  
#> x2                NA         NA      NA       NA  
#> x3           0.24804    0.10164   2.440   0.0165 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.025 on 97 degrees of freedom
#> Multiple R-squared:  0.06749,	Adjusted R-squared:  0.04826 
#> F-statistic:  3.51 on 2 and 97 DF,  p-value: 0.03375
#> 

# there are aliased coefficients in the model
try(car::vif(model))
#> Error in vif.default(model) : there are aliased coefficients in the model

# The function identifies the aliased variables
DetectAlias(DT = cbind.data.frame(x1, x2, x3))
#> [1] "x2"

DetectAlias(DT = cbind.data.frame(x1, x2, x3), Verbose = TRUE)
#> Aliased variables: x2
#> [1] "x2"

# excluding x2 and refit the model
model <- lm(y ~ x1 + x3)

summary(model)
#> 
#> Call:
#> lm(formula = y ~ x1 + x3)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.02755 -0.70035 -0.03393  0.72138  2.50283 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.06616    0.10346  -0.639   0.5240  
#> x1           0.10764    0.09810   1.097   0.2753  
#> x3           0.24804    0.10164   2.440   0.0165 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.025 on 97 degrees of freedom
#> Multiple R-squared:  0.06749,	Adjusted R-squared:  0.04826 
#> F-statistic:  3.51 on 2 and 97 DF,  p-value: 0.03375
#> 

try(car::vif(model))
#>       x1       x3 
#> 1.000718 1.000718