Optimization Techniques for SCAD Variable Selection in Medical Research

High-dimensional data analysis requires variable selection to identify truly relevant variables. More often it is done implicitly via regularization, such as penalized regression. Of the many versions of penalties, SCAD has shown good properties and has been widely adopted in medical research and many more areas. This paper reviews the various optimization techniques in solving SCAD penalized regression


Introduction
High-dimensional data analysis has been a common and important topic in biomedical/genomic/clinical studies. For example, the identification of genetic factors for complex diseases such as lung cancer implicates a variety of genetic variants. For high-dimensional data, there is the well-known problem of curse of dimensionality arising in modeling. Therefore, variable selection is a fundamental task for high-dimensional statistical modeling. The "old school" way of doing variable selection is to follow a subset selection procedure prior to building the model of interest. The procedure commonly adopts AIC/BIC as evaluation metric and often iterates in a stepwise fashion. Yet this is independent of the subsequent modeling task hence the effectiveness might be less desirable. A more natural way is to integrate the variable selection into the modeling itself, i.e., the penalized regression, which simultaneously performs variable selection and coefficient estimation.
Theoretically, the "best" penalty for the penalized regression is the number of non-zero variables, to push as many variables to zero as possible. Yet, it is well known that the L0 (also known as the entropy penalty) optimization [1] is infeasible. As such, the L1 (LASSO) penalty Tibshirani [2] is our "next best" candidate, which is widely adopted in statistical and machine learning community for sparse solutions. However, [3] point out that L1 suffers the problem of biasedness. They propose the Smoothly Clipped Absolute Deviation (SCAD) penalty that can produce unbiased estimates while retaining good properties of L1. Subsequently, the SCAD penalty function has seen a wide range of applications including medical/clinical research, such as [1,[4][5][6][7]. Nevertheless, the estimating procedure for SCAD penalized regression is no trivial task, because the target function a) is a high-dimensional non-concave function, b) is singular at the origin, c) does not have continuous second order derivatives.

Optimization Techniques for SCAD
We now review the optimization techniques in solving SCAD penalized regression over the years.

Local Quadratic Approximation (LQA) Algorithm
Due to the non-differentiability [3], in the original paper, propose to approximate the SCAD penalty by a quadratic function, which is amenable to optimization. The drawback of this approximation is that once a coefficient is pushed to zero, it will remain zero. Another problem is that the LQA estimator does not have a sparse representation Zou and Li [8].

Local Linear Approximation (LLA) Algorithm
Zou and Li [8] develop local linear approximation (LLA) for maximizing the non-concave penalized likelihood models. Not only does LLA address the issues of LQA but also it enforces the symmetric linear function in the approximation of the SCAD penalty instead. Undoubtedly, the LLA approximation also has ascent property, which enjoys convergence guarantee of EM algorithms. However, the LLA algorithm is rather sensitive to initial estimators and can easily get stuck in local minima Mazumder Friedman and Hastie [9].

Difference Convex Algorithm (DCA)
The idea of DCA algorithm An and Tao [10] is to decompose the target function as the difference of two convex functions. Using this technique for non-convex problems, Wu and Liu [11] approximate the target function by two convex functions, and the second convex function is furthered approximated by a linear function, like LLA. However, the DCA approach does not enforce symmetry in its approximation, which is different from LLA.

Second Order Cone Programming (SOCP)
The SOCP algorithm is a method where linear function is minimized over the intersection of an affine set and the product of second-order (quadratic) cones Lobo et al., [12]. As a nonlinear convex approach, Noh et al. [13] utilize the SOCP approach to estimate the SCAD penalized regression. In addition, the authors show that the estimated relevant coefficients converge to the true functions at the univariate optimal rate.

Alternating Direction Method of Multipliers (ADMM)
As a prime-dual optimization technique, Yin and Zeng [14] apply ADMM to a family of optimization tasks, including SCAD. The method has a theoretical guarantee of convergence, and is also shown by Bertsimas, Copenhaver and Mazumder [15] to perform the best among several heuristic approaches.

Conclusion
In this paper we briefly introduced the variable selection problem in high-dimension modeling. We concluded that more naturally it is done via penalized regression. Of the many versions of penalties, SCAD stands out due to its nice properties. We reviewed the various SCAD optimization techniques proposed over the years.