Skip to contents

This function identifies aliased (linearly dependent) variables in a linear model by adding a constant column to the data frame, fitting a linear model, and then using the alias function to detect aliased variables.

Usage

DetectAlias(DT, Verbose = FALSE)

Arguments

DT

A data frame or tibble containing the variables to be checked for aliasing.

Verbose

Logical. Whether to print the aliased variables found (if any). If TRUE, aliased variables are printed to the console. Defaults to FALSE.

Value

Returns a character vector of aliased variable names if any are found; otherwise, returns NULL invisibly. If Verbose is TRUE, the function will also print a message to the console.

Author

Ahmed El-Gabbas

Examples

library("car", warn.conflicts = FALSE, quietly = TRUE, verbose = FALSE)
x1 <- rnorm(100)
x2 <- 2 * x1
x3 <- rnorm(100)
y <- rnorm(100)

model <- lm(y ~ x1 + x2 + x3)
summary(model)
#> 
#> Call:
#> lm(formula = y ~ x1 + x2 + x3)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.02755 -0.70035 -0.03393  0.72138  2.50283 
#> 
#> Coefficients: (1 not defined because of singularities)
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.06616    0.10346  -0.639   0.5240  
#> x1           0.10764    0.09810   1.097   0.2753  
#> x2                NA         NA      NA       NA  
#> x3           0.24804    0.10164   2.440   0.0165 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.025 on 97 degrees of freedom
#> Multiple R-squared:  0.06749,	Adjusted R-squared:  0.04826 
#> F-statistic:  3.51 on 2 and 97 DF,  p-value: 0.03375
#> 

# there are aliased coefficients in the model
try(car::vif(model))
#> Error in vif.default(model) : there are aliased coefficients in the model

# The function identifies the aliased variables
DetectAlias(DT = cbind.data.frame(x1, x2, x3))
#> [1] "x2"

DetectAlias(DT = cbind.data.frame(x1, x2, x3), Verbose = TRUE)
#> Aliased variables: x2
#> [1] "x2"

# excluding x2 and refit the model
model <- lm(y ~ x1 + x3)

summary(model)
#> 
#> Call:
#> lm(formula = y ~ x1 + x3)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -2.02755 -0.70035 -0.03393  0.72138  2.50283 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) -0.06616    0.10346  -0.639   0.5240  
#> x1           0.10764    0.09810   1.097   0.2753  
#> x3           0.24804    0.10164   2.440   0.0165 *
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 1.025 on 97 degrees of freedom
#> Multiple R-squared:  0.06749,	Adjusted R-squared:  0.04826 
#> F-statistic:  3.51 on 2 and 97 DF,  p-value: 0.03375
#> 

try(car::vif(model))
#>       x1       x3 
#> 1.000718 1.000718