+ - 0:00:00
Notes for current slide
Notes for next slide

(Regularized) Logistic Regression

Cengiz Zopluoglu

College of Education, University of Oregon

Nov 7 & 14, 2022
Eugene, OR

1 / 43

The goals for the next two weeks:

  • Overview of the Logistic Regression

    • Linear Probability Model

    • Model Description

    • Model Estimation

    • Model Performance Evaluation

  • Regularized Logistic Regression

    • Ridge penalty

    • Lasso penalty

    • Elastic Net

  • Review of Kaggle notebooks for building classification models

2 / 43

Demo Dataset for Two Group Classification

A random sample of 20 observations from the Recidivism dataset

recidivism_sub <- read.csv(here('data/recidivism_sub.csv'),
header=TRUE)
recidivism_sub[,c('ID',
'Dependents',
'Recidivism_Arrest_Year2')]
ID Dependents Recidivism_Arrest_Year2
1 21953 0 1
2 8255 1 1
3 9110 2 0
4 20795 1 0
5 5569 1 1
6 14124 0 1
7 24979 0 1
8 4827 1 1
9 26586 3 0
10 17777 0 0
11 22269 1 0
12 25016 0 0
13 24138 0 1
14 12261 3 0
15 15417 3 0
16 14695 0 1
17 4371 3 0
18 13529 3 0
19 25046 3 0
20 5340 3 0
table(recidivism_sub$Recidivism_Arrest_Year2)
0 1
12 8
  • The outcome variable is a binary outcome (1: Recidivated, 0: Not Recidivated)

  • In practice, the binary outcomes can be coded in various ways:

    • True vs. False

    • Yes vs. No

    • Success vs. Failure

  • In this class, we assume that the outcome variable is coded as 0s and 1s, and the category we want to predict is 1.

  • The predictor is the number of dependents a parolee has at the time of release

3 / 43

Linear Probability Model

  • A linear probability model fits a typical regression model to a binary outcome.

  • When the outcome is binary, the predictions from a linear regression model can be considered as the probability of the outcome being equal to 1,

Y^=P(Y=1)=β0+β1X+ϵ

mod <- lm(Recidivism_Arrest_Year2 ~ 1 + Dependents,
data = recidivism_sub)
summary(mod)
Call:
lm(formula = Recidivism_Arrest_Year2 ~ 1 + Dependents, data = recidivism_sub)
Residuals:
Min 1Q Median 3Q Max
-0.7500 -0.0625 0.0000 0.2500 0.5000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7500 0.1295 5.79 0.000017 ***
Dependents -0.2500 0.0682 -3.66 0.0018 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.391 on 18 degrees of freedom
Multiple R-squared: 0.427, Adjusted R-squared: 0.395
F-statistic: 13.4 on 1 and 18 DF, p-value: 0.00178
4 / 43
  • Intercept (0.75): When the number of dependents is equal to 0, the probability of being recidivated in Year 2 is 0.75.

  • Slope (-0.25): For every additional dependent (one unit increase in X) the individual has, the probability of being recidivated in Year 2 is reduced by .25.


5 / 43

A major issue when using a linear regression model to predict a binary outcome is that the model predictions can go outside of the boundary [0,1] and yield unreasonable predictions.

X <- data.frame(Dependents = 0:10)
cbind(0:10,round(predict(mod,newdata = X),3))
[,1] [,2]
1 0 0.75
2 1 0.50
3 2 0.25
4 3 0.00
5 4 -0.25
6 5 -0.50
7 6 -0.75
8 7 -1.00
9 8 -1.25
10 9 -1.50
11 10 -1.75

A linear regression model may not be the best tool to predict a binary outcome.

6 / 43










Overview of the Logistic Regression

7 / 43

Model Description

  • To overcome the limitations of the linear probability model, we bundle our prediction model in a sigmoid function.

f(a)=ea1+ea.

f(a)=11+ea.

  • The output of this function is always between 0 and 1 regardless of the value of a.

  • The sigmoid function is an appropriate choice for the logistic regression (but not the only one) because it assures that the output is always bounded between 0 and 1.

8 / 43

If we revisit the previous example, we can specify a logistic regression model to predict the probability of being recidivated in Year 2 as the following:

P(Y=1)=11+e(β0+β1X).

The model output can be directly interpreted as the probability of the binary outcome being equal to 1

Then, we assume that the actual outcome follows a binomial distribution with the predicted probability.

P(Y=1)=p

YBinomial(p)

Suppose the coefficient estimates of this model are

  • β0=1.33

  • β1=1.62

The probability of being recidivated for a parolee with 8 dependents:

P(Y=1)=11+e(1.331.62×8)=0.0000088951098.

9 / 43
b0 = 1.33
b1 = -1.62
x = 0:10
y = 1/(1+exp(-(b0+b1*x)))
data.frame(number.of.dependents=x,
probability=y)
number.of.dependents probability
1 0 0.7908406348
2 1 0.4280038671
3 2 0.1289808521
4 3 0.0284705877
5 4 0.0057659656
6 5 0.0011463790
7 6 0.0002270757
8 7 0.0000449462
9 8 0.0000088951
10 9 0.0000017603
11 10 0.0000003484

10 / 43


P(Y=1)=11+e(β0+β1X).

  • In its original form, it is difficult to interpret the logistic regression parameters because a one unit increase in the predictor is no longer linearly related to the probability of the outcome being equal to 1.

  • The most common presentation of logistic regression is obtained after a bit of algebraic manipulation to rewrite the model equation.


ln[P(Y=1)1P(Y=1)]=β0+β1X.


  • The term on the left side of the equation is known as the logit (natural logarithm of odds).
11 / 43


It is essential that you get familiar with the three concepts (probability, odds, logit) and how these three are related to each other for interpreting the logistic regression parameters.

12 / 43


ln[P(Y=1)1P(Y=1)]=1.331.62X.

  • When the number of dependents is equal to zero, the predicted logit is equal to 1.33 (intercept), and for every additional dependent, the logit decreases by 1.62 (slope).

  • It is also common to transform the logit to odds when interpreting the parameters.

    • When the number of dependents is equal to zero, the odds of being recidivated is 3.78, e1.33.

    • For every additional dependent the odds of being recidivated is multiplied by e1.62

    • Odds ratio --> e1.62=0.198

13 / 43
  • The right side of the equation can be expanded by adding more predictors, adding polynomial terms of the predictors, or adding interactions among predictors.

  • A model with only the main effects of P predictors can be written as

ln[P(Y=1)1P(Y=1)]=β0+p=1PβpXp

  • β0

    • the predicted logit when the values for all the predictor variables in the model are equal to zero.

    • eβ0, the predicted odds of the outcome being equal to 1 when the values for all the predictor variables in the model are equal to zero.

  • βp

    • the change in the predicted logit for one unit increases in Xp when the values for all other predictors in the model are held constant

    • For every one unit in increase in Xp, the odds of the outcome being equal to 1 is multiplied by eβp when the values for all other predictors in the model are held constant

14 / 43

Model Estimation

The concept of likelihood

  • It is essential to understand the likelihood concept for estimating the coefficients of a logistic regression model.

  • Consider a simple example of flipping coins. Suppose you flip the same coin 20 times and observe the following data.

Y=(H,H,H,T,H,H,H,T,H,T)

  • We don't know whether this is a fair coin in which the probability of observing a head or tail is equal to 0.5.

  • Is this a fair coin? If not, what is the probability of observing a head for this coin?

15 / 43
  • Suppose we define p as the probability of observing a head when we flip this coin.

  • By definition, the probability of observing a tail is 1p.

P(Y=H)=p

P(Y=T)=1p

  • The likelihood of our observations of heads and tails as a function of p.

L(Y|p)=p×p×p×(1p)×p×p×p×(1p)×p×(1p)

L(Y|p)=p7×(1p)3

  • If this is a fair coin, then p is equal to 0.5, and the likelihood of observing seven heads and three tails would be

L(Y|p=0.5)=0.57×(10.5)3=0.0009765625

  • If we assume that p is equal to 0.65, the likelihood of observed data would be

L(Y|p=0.65)=0.657×(10.65)3=0.00210183

  • Based on observed data, Which one is more likely? p=0.5 or p=0.65?
16 / 43

Maximum likelihood estimation (MLE)

  • What would be the best estimate of p given our observed data (seven heads and three tails)?

  • Suppose we try every possible value of p between 0 and 1 and calculate the likelihood of observed data, L(Y).

  • Then, plot p vs. L(Y)

  • Which p value does make observed data most likely (largest likelihood)?

  • This p value is called the maximum likelihood estimate of p.

  • We can show that the p value that makes the likelihood largest is 0.7.

17 / 43

The concept of the log-likelihood

  • The computation of likelihood requires the multiplication of so many p values.

  • When you multiply values between 0 and 1, the result gets smaller and smaller.

  • It creates problems when you multiply so many of these small p values due to the maximum precision any computer can handle.

.Machine$double.xmin
[1] 2.225e-308
  • When you have hundreds of thousands of observations, it is probably not a good idea to work directly with likelihood.

  • Instead, we prefer working with the log of likelihood (log-likelihood).

18 / 43
  • The log-likelihood has two main advantages:

    • We are less concerned about the precision of small numbers our computer can handle.

    • Log-likelihood has better mathematical properties for optimization problems (the log of the product of two numbers equals the sum of the log of the two numbers).

    • The point that maximizes likelihood is the same number that maximizes the log-likelihood, so our end results (MLE estimate) do not care if we use log-likelihood instead of likelihood.

ln(L(Y|p))=ln(lop7×(1p)3)

ln(L(Y|p))=ln(p7)+ln((1p)3)

ln(L(Y|p))=7×ln(p)+3×ln(1p)

19 / 43


20 / 43

MLE for Logistic Regression coefficients

  • Let's apply these concepts to estimate the logistic regression coefficients for the demo dataset.

ln[Pi(Y=1)1Pi(Y=1)]=β0+β1Xi.

  • Note that X and P have a subscript i to indicate that each individual may have a different X value, and therefore each individual will have a different probability.

  • You can consider each individual as a separate coin flip with an unknown probability.

  • Our observed outcome is a set of 0s (not recidivated) and 1s (recidivated.

recidivism_sub$Recidivism_Arrest_Year2
[1] 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0
  • How likely to observe this set of values? What { β0,β1 } values make this data most likely?
21 / 43
  • Given a specific set of coefficients, { β0,β1 }, we can calculate the logit for every observation using the model equation and then transform this logit to a probability, Pi(Y=1).

  • Then, we can calculate the log of the probability for each observation and sum them across observations to obtain the log-likelihood of observing this data (12 zeros and eight ones).

  • Suppose that we have two guesstimates for { β0,β1 }, which are 0.5 and -0.8, respectively. These coefficients imply the following predicted model.

22 / 43
b0 = 0.5
b1 = -0.8
x = recidivism_sub$Dependents
y = recidivism_sub$Recidivism_Arrest_Year2
pred_logit <- b0 + b1*x
pred_prob1 <- exp(pred_logit)/(1+exp(pred_logit))
pred_prob0 <- 1 - pred_prob1
data.frame(Dependents = x,
Recidivated = y,
Prob1 = pred_prob1,
Prob0 = pred_prob0)
Dependents Recidivated Prob1 Prob0
1 0 1 0.6225 0.3775
2 1 1 0.4256 0.5744
3 2 0 0.2497 0.7503
4 1 0 0.4256 0.5744
5 1 1 0.4256 0.5744
6 0 1 0.6225 0.3775
7 0 1 0.6225 0.3775
8 1 1 0.4256 0.5744
9 3 0 0.1301 0.8699
10 0 0 0.6225 0.3775
11 1 0 0.4256 0.5744
12 0 0 0.6225 0.3775
13 0 1 0.6225 0.3775
14 3 0 0.1301 0.8699
15 3 0 0.1301 0.8699
16 0 1 0.6225 0.3775
17 3 0 0.1301 0.8699
18 3 0 0.1301 0.8699
19 3 0 0.1301 0.8699
20 3 0 0.1301 0.8699
logL <- y*log(pred_prob1) + (1-y)*log(pred_prob0)
sum(logL)
[1] -9.253
23 / 43
  • We can summarize this by saying that if our model coefficients were β0 = 0.5 and β1 = -0.8, then the log of the likelihood of observing the outcome in our data would be -9.25.

Y=(1,0,1,0,0,0,0,1,1,0,0,1,0,0,0,1,0,0,0,0)

logL(Y|β0=0.5,β1=0.8)=9.25

  • Is there another pair of values we can assign to β0 and β1 that would provide a higher likelihood of data?

  • Is there a pair of values that makes the log-likelihood largest?

24 / 43


  • What is the maximum point of this surface?

  • Our simple search indicates that the maximum point of this surface is -8.30, and the set of β0 and β1 coefficients that make the observed data most likely is 1.33 and -1.62.

ln[Pi(Y=1)1Pi(Y=1)]=1.331.62×Xi.


25 / 43

Logistic Loss function

  • Below is a compact way of writing likelihood and log-likelihood in mathematical notation. For simplification purposes, we write Pi to represent Pi(Y=1).

L(Y|β)=i=1NPiyi×(1Pi)1yi

logL(Y|β)=i=1NYi×ln(Pi)+(1Yi)×ln(1Pi)

  • The final equation above, logL(Y|β), is known as the logistic loss function.

  • By finding the set of coefficients in a model, β=(β0,β1,...,βP), that maximizes this quantity, we obtain the maximum likelihood estimates of the coefficients for the logistic regression model.

  • There is no closed-form solution for estimating the logistic regression parameters.

  • The naive crude search we applied above would be inefficient when you have a complex model with many predictors.

  • The only way to estimate the logistic regression coefficients is to use numerical approximations and computational algorithms to maximize the logistic loss function.

26 / 43




NOTE

Why do we not use least square estimation and minimize the sum of squared residuals when estimating the coefficients of the logistic regression model? We can certainly use the sum of squared residuals as our loss function and minimize it to estimate the coefficients for the logistic regression, just like we did for the linear regression. The complication is that the sum of the squared residuals function yields a non-convex surface when the outcome is binary as opposed to a convex surface obtained from the logistic loss function. Non-convex optimization problems are more challenging than convex optimization problems, and they are more vulnerable to finding sub-optimal solutions (local minima/maxima). Therefore, the logistic loss function and maximizing it is preferred when estimating the coefficients of a logistic regression model.
27 / 43

The glm function

mod <- glm(Recidivism_Arrest_Year2 ~ 1 + Dependents,
data = recidivism_sub,
family = 'binomial')
summary(mod)
Call:
glm(formula = Recidivism_Arrest_Year2 ~ 1 + Dependents, family = "binomial",
data = recidivism_sub)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.767 -0.312 -0.241 0.686 1.303
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.326 0.820 1.62 0.106
Dependents -1.616 0.727 -2.22 0.026 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 26.920 on 19 degrees of freedom
Residual deviance: 16.612 on 18 degrees of freedom
AIC: 20.61
Number of Fisher Scoring iterations: 5

In the Coefficients table, the numbers under the Estimate column are the estimated coefficients for the logistic regression model. The quantity labeled as the Residual Deviance in the output is twice the maximized log-likelihood,

Deviance=2×logL(Y|β).

28 / 43

The glmnet function

require(glmnet)
mod <- glmnet(x = cbind(0,recidivism_sub$Dependents),
y = factor(recidivism_sub$Recidivism_Arrest_Year2),
family = 'binomial',
alpha = 0,
lambda = 0,
intercept = TRUE)
coef(mod)
3 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) 1.325
V1 .
V2 -1.616


The x argument is the input matrix for predictors, and the y argument is a vector of binary response outcome. The glmnet requires the y argument to be a factor with two levels.

Note that I defined the x argument above as cbind(0,recidivism_sub$Dependents) because glmnet requires the x to be a matrix with at least two columns. So, I added a column of zeros to trick the function and force it to run. That column of zeros has zero impact on the estimation.

29 / 43

Model Performance Evaluation

When the outcome is a binary variable, classification models, such as logistic regression, yield a probability estimate for a class membership (or a continuous-valued prediction between 0 and 1).

ln[Pi(Y=1)1Pi(Y=1)]=1.331.62×Xi.

mod <- glm(Recidivism_Arrest_Year2 ~ 1 + Dependents,
data = recidivism_sub,
family = 'binomial')
recidivism_sub$pred_prob <- predict(mod,type='response')
recidivism_sub[,c('ID','Dependents','Recidivism_Arrest_Year2','pred_prob')]
ID Dependents Recidivism_Arrest_Year2 pred_prob
1 21953 0 1 0.79010
2 8255 1 1 0.42786
3 9110 2 0 0.12935
4 20795 1 0 0.42786
5 5569 1 1 0.42786
6 14124 0 1 0.79010
7 24979 0 1 0.79010
8 4827 1 1 0.42786
9 26586 3 0 0.02867
10 17777 0 0 0.79010
11 22269 1 0 0.42786
12 25016 0 0 0.79010
13 24138 0 1 0.79010
14 12261 3 0 0.02867
15 15417 3 0 0.02867
16 14695 0 1 0.79010
17 4371 3 0 0.02867
18 13529 3 0 0.02867
19 25046 3 0 0.02867
20 5340 3 0 0.02867
30 / 43

Separation of two classes

In an ideal situation where a model does a perfect job of predicting a binary outcome, we expect

  • all those observations in Group 0 (Not Recidivated) to have a predicted probability of 0,

  • and all those observations in Group 1 (Recidivated) to have a predicted probability of 1.

So, predicted values close to 0 for observations in Group 0 and those close to 1 for Group 1 are indicators of good model performance.

One way to look at the quality of separation between two classes of a binary outcome is to examine the distribution of predictions within each class.

31 / 43

From the demo analysis:

32 / 43

Class Predictions

  • In most situations, for practical reasons, we transformed the continuous probability predicted by a model into a binary prediction.

  • Predicted class membership leads actionable items in practice.

  • This is implemented by determining an arbitrary cut-off value. Once a cut-off value is determined, then we can generate class predictions.

  • Consider that we use a cut-off value of 0.5.

ID Dependents Recidivism_Arrest_Year2 pred_prob pred_class
1 21953 0 1 0.79010 1
2 8255 1 1 0.42786 0
3 9110 2 0 0.12935 0
4 20795 1 0 0.42786 0
5 5569 1 1 0.42786 0
6 14124 0 1 0.79010 1
7 24979 0 1 0.79010 1
8 4827 1 1 0.42786 0
9 26586 3 0 0.02867 0
10 17777 0 0 0.79010 1
11 22269 1 0 0.42786 0
12 25016 0 0 0.79010 1
13 24138 0 1 0.79010 1
14 12261 3 0 0.02867 0
15 15417 3 0 0.02867 0
16 14695 0 1 0.79010 1
17 4371 3 0 0.02867 0
18 13529 3 0 0.02867 0
19 25046 3 0 0.02867 0
20 5340 3 0 0.02867 0
  • If an observation has a predicted class probability less than 0.5, we predict that this person is in Group 0 (Not Recidivated).

  • If an observation has a predicted class probability higher than 0.5, we predict that this person is in Group 1.

33 / 43

Confusion Matrix

We can summarize the relationship between the binary outcome and binary prediction in a 2 x 2 table. This table is commonly referred to as confusion matrix.

Observed
Predicted 0 1
0 10 3
1 2 5

Based on the elements of this table, we can define four key concepts:

  • True Positives(TP): True positives are the observations where both the outcome and prediction are equal to 1.

  • True Negative(TN): True negatives are the observations where both the outcome and prediction are equal to 0.

  • False Positives(FP): False positives are the observations where the outcome is 0 but the prediction is 1.

  • False Negatives(FN): False negatives are the observations where the outcome is 1 but the prediction is 0.

34 / 43
  • Accuracy: Overall accuracy simply represent the proportion of correct predictions.

ACC=TP+TNTP+TN+FP+FN

  • True Positive Rate (Sensitivity): True positive rate (a.k.a. sensitivity) is the proportion of correct predictions for those observations the outcome is 1 (event is observed).

TPR=TPTP+FN

  • True Negative Rate (Specificity): True negative rate (a.k.a. specificity) is the proportion of correct predictions for those observations the outcome is 0 (event is not observed).

TNR=TNTN+FP

  • Positive predicted value (Precision): Positive predicted value (a.k.a. precision) is the proportion of correct decisions when the model predicts that the outcome is 1.

PPV=TPTP+FP

  • F1 score: F1 score is a metric that combines both PPV and TPR.

F1=2PPVTPRPPV+TPR

35 / 43

Area Under the Receiver Operating Curve (AUC or AUROC)

  • The confusion matrix and related metrics all depend on the arbitrary cut-off value one picks when transforming continuous predicted probabilities to binary predicted classes.

  • We can change the cut-off value to optimize certain metrics, and there is always a trade-off between these metrics for different cut-off values.

cut acc tpr tnr ppv fpr f1
1 0.0 0.40 1.000 0.0000 0.4000 1.0000 0.5714
2 0.1 0.75 1.000 0.5833 0.6154 0.4167 0.7619
3 0.2 0.80 1.000 0.6667 0.6667 0.3333 0.8000
4 0.3 0.80 1.000 0.6667 0.6667 0.3333 0.8000
5 0.4 0.80 1.000 0.6667 0.6667 0.3333 0.8000
6 0.5 0.75 0.625 0.8333 0.7143 0.1667 0.6667
7 0.6 0.75 0.625 0.8333 0.7143 0.1667 0.6667
8 0.7 0.75 0.625 0.8333 0.7143 0.1667 0.6667
9 0.8 0.60 0.000 1.0000 NaN 0.0000 NaN
36 / 43

A receiver operating characteristic curve (ROC) is plot that represents this dynamic relationship between TPR and FPR (1-TNR) for varying levels of a cut-off value.

The area under the ROC curve (AUC or AUROC) is typically used to evaluate the predictive power of classification models.

  • The diagonal line in this plot represents a hypothetical model with no predictive power and AUC for the diagonal line is 0.5 (it is half of the whole square).

  • The closer AUC is to 0.5, the closer predictive power is to random guessing.

  • The more ROC curve resembles with the diagonal line, less the predictive power is.

  • The closer AUC is to 1, the more predictive power the model has.

  • The magnitude of AUC is closely related to how well the predicted probabilities separate the two classes.

37 / 43

Building a Logistic Regression Model via caret

Please review the following notebook that builds a classification model using the logistic regression for the full recidivism dataset.

Building a Logistic Regression Model

38 / 43










Regularized Logistic Regression

39 / 43
  • The regularization works similarly in logistic regression, as discussed in linear regression.

  • We add penalty terms to the loss function to avoid large coefficients, and we reduce model variance by including a penalty term in exchange for adding bias.

  • Optimizing the penalty degree via tuning, we can typically get models with better performance than a logistic regression with no regularization.

Logistic Loss with Ridge Penalty

logL(Y|β)=(i=1NYi×ln(Pi)+(1Yi)×ln(1Pi))λ2i=1Pβp2

Logistic Loss with Lasso Penalty

logL(Y|β)=(i=1NYi×ln(Pi)+(1Yi)×ln(1Pi))λi=1P|βp|

Logistic Loss with Elastic Net

logL(Y|β)=(i=1NYi×ln(Pi)+(1Yi)×ln(1Pi))((1α)λ2i=1Pβp2+αλi=1P|βp|)

40 / 43

Shrinkage in Logistic Regression Coefficients with Ridge Penalty

41 / 43

Shrinkage in Logistic Regression Coefficients with Lasso Penalty

42 / 43


Building a Regularized Logistic Regression Model via caret

Please review the following notebooks that build classification models using the regularized logistic regression for the full recidivism dataset.

43 / 43

The goals for the next two weeks:

  • Overview of the Logistic Regression

    • Linear Probability Model

    • Model Description

    • Model Estimation

    • Model Performance Evaluation

  • Regularized Logistic Regression

    • Ridge penalty

    • Lasso penalty

    • Elastic Net

  • Review of Kaggle notebooks for building classification models

2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow