Chapter 7 move beyond linear

class: center, middle, inverse, title-slide

.title[
# Chapter 7 move beyond linear
]
.author[
### Chun Su
]
.institute[
### R-ladies Philly
]
.date[
### 2022-09-13
]

---

## The extensions of linear models

- Polynomial regression

- Step function
- Splines

- polynomial splines
  
  - natural splines
  
  - smoothing splines

- Local regression

- Generalized Additive Model (GAM)

---
## Polynomial regression

- raising each of the original predictors to a power

![](figures/polynomial_function.png)

- Generally speaking, it is unusual to use greater than 3 or 4 because for large values of X the polynomial curve can become overly flexible and can take on some very strange shapes.

```r
lm(wage ~ poly(age, 4, raw = T), data = Wage)
```

---
## Step functions

- K points break X into bins (K + 1), then fit different **constant** (**indicator function**) in each bin. Indicator function equals 0 or 1, and only one 1 cross K+1 functions.

![](figures/step_function.png)

```r
lm(wage ∼ cut(age, 4), data = Wage)
```

--
### Basic function

![](figures/basic_function.png)

---
## Regression splines

- Piecewise Polynomials: 
  - **knots** divide to different regions of X
  - **degree of polynomials** fit for each region.

- polynomial spline (`splines::bs`)
  - piecewise polynomial with contraints on knots
  - three constraints (continuity, continuity of the first derivative, and continuity of the second derivative)
  - **degrees of freedom** = number_of_knots + degree_of_polynomials + number_of_intercept
  - in practice it is common to place knots at uniform quantiles of the data.

- natural spline (`splines::ns`)
  - polynomial spline with additional boundary constraints
  - **degrees of freedom** = number_of_knots + 1 + number_of_intercept
  - usually degree of polynomials = 3
  - `lm(wage ∼ ns(age, df = 4), data = Wage)`

using least squares to estimate the spline coefficients

???
https://stats.stackexchange.com/questions/517375/splines-relationship-of-knots-degree-and-degrees-of-freedom

---
## smoothing spline (`smooth.spline`)

- a function g that makes RSS small, but that is also smooth

- "Loss+Penalty"
  ![](figures/smooth_spline.png)
  - The first derivative g′(t) measures the slope of a function at t
  - the second derivative g''(t) is a measure of its roughness
  - the function g(x) that minimizes (7.11) is a shrunken natural cubic spline with knots at x1, . . . , xn.

- **effective degrees of freedom** (df_lambda) decrease from n to 2 when lambda increase from 0 to inf

```r
fit <- smooth.spline(age, wage, df = 16)
```

---
## local regression (`stats::loess`)

- fit at a target point x0 using only the regression nearby training observations

- Hyperparameter **span s** is the proportion of points used to compute the local regression at x0
  - The smaller the value of s, the more local and wiggly will be our fit
  - `s = k/n` where k is k nearest neighbors.
  
- Using a **weighted** least squares regression

```r
fit <- loess(wage ∼ age, span = .2, data = Wage)
```

---
## Generalized additive models (GAMs)

- provide a general framework for generalized extending a standard linear model by allowing non-linear functions of each additive of the variables,while maintaining additivity.

- a big linear regression model using an appropriate choice of basis functions

```r
gam1 <- lm(wage ~ ns(year, 4) + ns(age, 5) + education, data = Wage)

gam.s <- gam(wage ~ gam::s(age, df = 16) + education, data = Wage)

gam.lo <- gam(wage ~ gam::l(age, span = 0.7) + education, data = Wage)
```

---
## budget data

```r
library(tidymodels)
library(ISLR)

Wage <- as_tibble(Wage)

# initial split
set.seed(123)
wage_split <- initial_split(Wage)
wage_train <- training(wage_split)
wage_test <- testing(wage_split)

# cv folds
set.seed(234)
wage_folds <- vfold_cv(wage_train)
```

---
## polynomial regression tune

The order of polynomial is defined by `recipe::step_poly` at feature engineer step.

```r
# recipe
rec_poly <- recipe(wage ~ age, data = wage_train) %>%
  step_poly(age, degree = tune(), options = list(raw = TRUE))

# model
lm_spec <- linear_reg() %>%
  set_mode("regression") %>%
  set_engine("lm")

# workflow
poly_wf <- workflow() %>%
  add_model(lm_spec) %>%
  add_recipe(rec_poly)

# specify hyperparameter tune
degree_grid <- tibble(degree = 1:5)

# tune
poly_rs <- poly_wf %>% 
    tune_grid(
        resamples = wage_folds,
        grid = degree_grid
    )
```

---
## select best degree of polynomial

![](chapter7_beyond_linearity2_files/figure-html/unnamed-chunk-9-1.png)

---
## last fit using best degree of polynomial

```r
best_hyper <- select_by_one_std_err(poly_rs, metric = "rmse", degree)

best_hyper
```

```
## # A tibble: 1 × 9
##   degree .metric .estimator  mean     n std_err .config              .best .bound
##    <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                <dbl>  <dbl>
## 1      2 rmse    standard    39.8    10   0.631 Preprocessor2_Model1  39.8   40.4
```

```r
poly_wf_final <- poly_wf %>% 
    finalize_workflow(best_hyper) %>% 
    last_fit(wage_split)

poly_wf_final %>% 
    extract_workflow() %>% 
    saveRDS("wage_poly_model.rds")
```

---
## final model metrics and parameters

- metrics on test data

```r
poly_wf_final %>% 
    collect_metrics()
```

```
## # A tibble: 2 × 4
##   .metric .estimator .estimate .config             
##   <chr>   <chr>          <dbl> <chr>               
## 1 rmse    standard     40.4    Preprocessor1_Model1
## 2 rsq     standard      0.0495 Preprocessor1_Model1
```

- check co-efficient

```r
poly_wf_final %>% 
    extract_fit_parsnip() %>% 
    tidy()
```

```
## # A tibble: 3 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept) -18.2      9.23        -1.98 4.82e- 2
## 2 age_poly_1    5.70     0.439       13.0  2.74e-37
## 3 age_poly_2   -0.0574   0.00500    -11.5  9.72e-30
```

---
## prediction using finalized polynomial

![](chapter7_beyond_linearity2_files/figure-html/unnamed-chunk-13-1.png)

```r
trained_poly_wf <- readRDS("wage_poly_model.rds")
trained_poly_wf %>% 
    augment(new_data =data.frame(age = 50))
```

```
## # A tibble: 1 × 2
##     age .pred
##   <dbl> <dbl>
## 1    50  123.
```

---
## step function

`step_discretize()` will convert a numeric variable into a factor variable with n bins, n here is specified with `num_breaks`

If you already know where you want the step function to break then you can use `step_cut()` and supply the `breaks` manually.

```r
rec_discretize <- recipe(wage ~ age, data = Wage) %>%
  step_discretize(age, num_breaks = 4)

discretize_wf <- workflow() %>%
  add_model(lm_spec) %>%
  add_recipe(rec_discretize)

discretize_rs <- fit_resamples(
    discretize_wf, 
    resamples = wage_folds
    )

discretize_rs %>% 
    collect_metrics()
```

```
## # A tibble: 2 × 6
##   .metric .estimator    mean     n std_err .config             
##   <chr>   <chr>        <dbl> <int>   <dbl> <chr>               
## 1 rmse    standard   40.5       10 0.681   Preprocessor1_Model1
## 2 rsq     standard    0.0694    10 0.00956 Preprocessor1_Model1
```

??? 
`fit_resamples` their only purpose is for computing the .metrics to estimate performance.

---
## step function final fit

```r
discretize_wf_final <- discretize_wf %>% 
    last_fit(wage_split)

discretize_wf_final %>% 
    extract_fit_parsnip() %>% 
    tidy()
```

```
## # A tibble: 4 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     93.5      1.69     55.2  0       
## 2 agebin2         24.2      2.39     10.1  1.34e-23
## 3 agebin3         28.2      2.36     12.0  4.46e-32
## 4 agebin4         22.4      2.47      9.08 2.23e-19
```

---
## step function prediction

![](chapter7_beyond_linearity2_files/figure-html/unnamed-chunk-17-1.png)

---
## polynomial splines / cubic splines

In the example, author directly specifies the internal breakpoints that define the spline, which usually takes quantiles for multiple knots.

Here, we tune number of `knots` by degree of freedom `deg_free`.

For more details, `?step_bs` or `?bs`.

```r
rec_cubic <- recipe(wage ~ age, data = Wage) %>%
  step_bs(age, deg_free = tune(), degree = 3)

cubic_wf <- workflow() %>%
  add_model(lm_spec) %>%
  add_recipe(rec_cubic)

grid_df <- grid_regular(
    deg_free(range = c(4,12)), 
    levels = 5
)

cubic_rs <- tune_grid(
    cubic_wf, 
    resamples = wage_folds,
    grid = grid_df 
    )
```

---
## select best hyparameter (`deg_free`)

```r
best_hyper <- select_by_one_std_err(cubic_rs, metric = "rmse", deg_free)

best_hyper
```

```
## # A tibble: 1 × 9
##   deg_free .metric .estimator  mean     n std_err .config              .best .bound
##      <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>                <dbl>  <dbl>
## 1        4 rmse    standard    39.8    10   0.625 Preprocessor1_Model1  39.8   40.4
```

```r
cubic_wf_final <- cubic_wf %>% 
    finalize_workflow(best_hyper) %>% 
    last_fit(wage_split)
```

---
## final model metrics and parameters

- metrics on test data

```r
cubic_wf_final %>% 
    collect_metrics()
```

```
## # A tibble: 2 × 4
##   .metric .estimator .estimate .config             
##   <chr>   <chr>          <dbl> <chr>               
## 1 rmse    standard     40.3    Preprocessor1_Model1
## 2 rsq     standard      0.0527 Preprocessor1_Model1
```

- check co-efficient

```r
cubic_wf_final %>% 
    extract_fit_parsnip() %>% 
    tidy()
```

```
## # A tibble: 5 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     54.5      6.01      9.06 2.78e-19
## 2 age_bs_1        46.6      9.81      4.75 2.12e- 6
## 3 age_bs_2        84.8      7.28     11.7  1.64e-30
## 4 age_bs_3        53.8     11.9       4.53 6.19e- 6
## 5 age_bs_4        37.3     13.9       2.68 7.49e- 3
```

---
## Compare predictions among multiple models.

.pull-left[
![](chapter7_beyond_linearity2_files/figure-html/unnamed-chunk-22-1.png)
]

.pull-right[
![](chapter7_beyond_linearity2_files/figure-html/unnamed-chunk-23-1.png)
]