Latest revision as of 16:03, 13 August 2009

Nonlinear Regression Models

Not covered: kernel smoothing, local weighting, moving averages, binning, loess (local estimation) etc.
Non-parametric regression -
- can factor in <math>y=f(x)+\mbox{other stuff}</math>
- confounding effects
- interactions
- can generalize to discrete and/or multivariate responses (logistic regression, etc.)
Example bases
- linear
- polynomial (Taylor series expansion)
  - why not?
  - it works... sort of
  - not good for smoothing: not "localized", not "parsimonious" ==> takes a lot of terms to get non-exactly polynomial
- See slide on general functions for tips on selected basis sets.
  - wavelet bases - smooth trends and spikes
    - can be "same" as wavelet transform, slowly
  - trigonometric (Fourier) - "frequency concept"
    - can be "same" as Fourier transform, slowly
  - Spline bases - general smoothing
    - We'll talk about these today. Good for general smoothing. General purpose, but do not preserve spikes.
Pick the basis for the eventual goal.

Highly controversial topic on spline fitting: fun reading [1]
Gauss and the "invention" of least squares: [2]
Big question: where to put the knots
- known inflection points
- quantiles
Only a few knot points can be fit with LS
If # parameters ~ # of data points -> fitting problems.
B-spline bases are equivalent, but the bases are much closer to orthogonal. Much more efficient and stable for solution than truncated polynomial terms. Slight change of bases, but fitted functions are the same.
In "R" --> bs (library "spline").
natural splines -- like b-splines, but with linear restriction near edges

cross-validation - leave out a data point, assess prediction value, optimizing tuning parameters to minimize prediction error
- computational concerns
- "generalized cross-validation" - way of approximating cross-validation
mixed model p-splines --
- fit generous number of knots
- place gaussian penalty on wiggly terms
- jointly estimate the coefficients and penalty
- ties in with mixed effects models.
- library spm - semipar: spm(y~(fx))
- there are good ways of estimating variance effects in mixed models : REML (less biased estimates than ML)
- package mgcv - gam
- NOTE: L1 penalty would be the same as a double-exponential distribution (penalty) "wiggle" terms
- Smoothing // estimation - L1 penalty tends not to help "smooth" as well. Select certain knots over -- why? better for model selection than for regression/smooth fitting.
- L2 penalty also known as ridge regression.
easily extends to additive models
can use standard data output. Can test whether coefficients are "0"

@@ Line 1: / Line 1: @@
 == Nonlinear Regression Models ==
 * [http://www.biostat.jhsph.edu/~bcaffo/ Brian Caffo's websiite]
 * Series Home: [[StatsWAP|Statistics Without the Agonizing Pain (WAP Series)]]
 == Resources ==
-* Slides will be available here
+* Slides will be available here in [http://iacl.ece.jhu.edu/~bennett/files/statisticsWAP.pptx  PPTX] {{file|StatisticsWAP.pdf}}
-* R-code available here: [[http://iacl.ece.jhu.edu/~bennett/files/examples.R | here]]
+* R-code available [http://iacl.ece.jhu.edu/~bennett/files/examples.R here]
 == Notes ==
 * Not covered: kernel smoothing, local weighting, moving averages, binning, loess (local estimation) etc.
 * Non-parametric regression -
@@ Line 30: / Line 30: @@
 **** We'll talk about these today. Good for general smoothing. General purpose, but do not preserve spikes.
 * Pick the basis for the eventual goal.
 = Spline Bases =
@@ Line 42: / Line 43: @@
 * In "R" --> bs (library "spline").
 * natural splines -- like b-splines, but with linear restriction near edges
 = Bias / Variance Tradeoff =
@@ Line 60: / Line 62: @@
 * easily extends to additive models
 * can use standard data output. Can test whether coefficients are "0"
 == Favorite References ==
 *Ruppert Wand Carrlo Semiparametric Regression
 * Hastie Tibshirani Generalized Additive Models
 * Venables and Ripley Modern Applied Statistics with R.