• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

Card Range To Study



Play button


Play button




Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

31 Cards in this Set

  • Front
  • Back

Ensemble Modeling Fundamentals

Ensemble modeling aims at forecasting a

given response variable with higher

accuracy compared to an individual

prediction model. To that end, the forecasts

of a collection of prediction models, which

show some synergy and complement each

other, are pooled to produce a composite


General Functioning of Ensemble Models

Multi‐step modeling approach
 Develop a set of (base) models
 Aggregate their predictions
Several ensemble algorithms
have been proposed. These
algorithms differ mainly in
how they develop base
models and pool base model

Multi‐step modeling approach

 Develop a set of (base) models

 Aggregate their predictions

Several ensemble algorithms

have been proposed. These

algorithms differ mainly in

how they develop base

models and pool base model

predictions, respectively.

Why Forecast Combination Increases Accuracy

Different methods have different views on the same data

 For example linear versus nonlinear models

 Forecast combination gathers information from multiple sources

 Like asking many experts for their opinion

 Formal explanation

 Bias‐variance‐trade‐off

 Strength‐diversity‐trade‐off

 Ensemble margin

 Much empirical evidence

 Forecasting benchmarks in various domains

 Credit scoring, insolvency prediction, direct marketing, fraud detection,

project effort estimation, software defect prediction, and many others

Ensembles and the Bias‐Variance‐Trade‐Off

Bias and variance reduce predictive accuracy
 Complex classifiers
 Low bias
 High variance
 Simple classifiers
 High bias
 Low variance

Bias and variance reduce predictive accuracy

 Complex classifiers

 Low bias

 High variance

 Simple classifiers

 High bias

 Low variance

Ensembles and the Strength‐Diversity Trade‐Off

The success of an ensemble depends on two factors: the

strength of and the diversity among the base models.

 Base model strength

 How well they forecast future cases

 Predictive accuracy

 Base model diversity

 Extent to which base model predictions agree

 Can think of diversity as forecast (or error) correlation

 Research on strength‐diversity trade‐off focuses on

ensemble classifiers

There is no point in combining identical models. If all

base models in an ensemble make the same predictions,

combination cannot increase accuracy.

Imagine a perfect model

 Always predicts with 100% accuracy

 Maximal strength

 Put this model into an ensemble

 Average of perfect prediction with other predictions

 No way to increase accuracy


Since all classifiers predict the same target, they cannot be very

strong and highly diverse at the same time. There is a conflict

between strength and diversity.

Diversity in Classifier Ensembles

Different classifier outputs require different measures

 Estimates of posterior class probabilities (numeric)

 Estimates of class membership (discrete)

 Motivates research associated with

 Developing diversity measures

 Studying the behavior/features of different measures

 Exploring the extent to which diversity explains ensemble success

 Examining whether diversity maximization is a good idea

 Till now, no consensus on any of these questions

Key take‐away:

Understanding diversity is useful to understand different types of

ensemble classifiers.

Categories of diversity measures

 Pairwise (e.g., Q‐statistic, correlation, disagreement, doublefault

measure, etc.)

 Non‐pairwise (e.g., entropy, Kohavi‐Wolpert variance,

generalized diversity, etc.)


The distribution of base classifier strength and diversity

is often illustrated by means of a scatter plot.

Ensemble Margin Exemplified

Several studies have shown that the generalization

performance of an ensemble is related to the distribution of

its margin on the training sample.

 The larger the margin the better

 Generalization error upper‐bounded by ensemble margin

Homogeneous Ensemble Classifiers

Produce base models using the same algorithm

 Inject diversity through manipulating the training data

 Drawing training cases at random (e.g., Bagging)

 Drawing variables at random (e.g., Random Subspace)

Heterogeneous Ensemble Classifiers

Produce base models using different algorithms

 Inject diversity algorithmically

 Different classification algorithms

 Different meta‐parameter settings per classification algorithm

 Also called multiple‐classifier‐systems

20‐Oct‐14 Business Analytics & Predictive Modeling, Chapter 8, Stefan Lessmann 22

Heterogeneous Ensemble Classifiers


Ensemble Classifiers Without Pruning

Two‐step approach

 Develop base models (homogeneous or heterogeneous)

 Put all base models into the ensemble

 Standard practice

Ensemble Classifiers With Pruning

 Three‐step approach

 Develop candidate base models (typically heterogeneous)

 Optimize ensemble composition using some search strategy

 Put selected base models into the ensemble; discard the rest

 Active field of research

 Which search strategy?

 Which objective?

Homogeneous Ensemble Algorithms


 Random Forest

 Boosting


Given a classification algorithm, bagging derives base

models from bootstrap samples of the training set.

 Bootstrap sampling

 Given data set of size n

 Draw random sample of size n with replacement

A Note on Bootstrapping

bootstrap sample includes some cases multiple times

 About 37% of the original cases do not appear at all

 Called out‐of‐bag (OOB) examples

 Facilitate assessing a model on hold‐out data

 Facilitate assessing variable importance

Base Model Combination

Every bootstrap sample provides one base model

 Predictions are combined using majority voting

 Typical formulation used in textbooks

 Can be misleading

 Classifiers produce different types of predictions

 Binary class predictions

 Numeric confidences (magnitude of values indicate likelihood of class)

 Class probabilities (e.g., p(y=1|x))

Bagging and the Bias‐Variance‐Trade‐Off


 Depends on the base classifier

 Not altered due to bagging

 Variance

 Reduced due to combining base models from different

bootstrap samples

 As in any ensemble model

Bagging increases predictive accuracy through reducing variance.

Tuning a Bagging Classifier

Meta‐parameters of bagging

 Classification algorithm

 Preferably tree‐based or neural network

 But can use any

 Could also tune meta‐parameters of the (base) classifier

 How many bootstrap samples (i.e., ensemble size)

 Size of bootstrap sample

 Often overlooked but useful when working with large data sets

Practical advice:

The larger the ensemble the better. Given data and resources,

develop the largest bagging ensemble possible.

Bagging Assessed

Random Forest

 Combines bagging with random subspace

Improves approach to inject diversity

 “Forest” = collection of “trees”

 Works only with tree‐based classifiers

Grow each tree from a bootstrap sample of the

training data (exactly as bagging)

 In addition,

 When choosing an attribute to split the data

 Do not search among all attributes

 Instead, draw random sample of attributes (Random Subspace)

 Find best split from attributes within sample

 Increased diversity

 Random subspace limits access to attributes

 Forces tree‐growing algorithm to explore different ways to

separate the data

Random Forest and the Bias‐Variance‐Trade‐Off

Tree‐based classifiers tend to overfit the data

 Always possible to perfectly separate your data

 Just find as many rules as there are cases in your data

 Pruning is a way to avoid overfitting

 Breiman recommends to not prune decision trees

 Fully‐grown decision trees (without pruning)

 Have zero bias (by definition)

 Have high variance

 Bootstrapping many such decision trees reduces variance

Random forest increases predictive accuracy through using classifiers without

bias while avoiding the problem of high variance through bootstrapping.

Tuning a Random Forest Classifier

Meta‐parameters of random forest

 Size of the random sample of attributes (often called mtry)

 Forest size (number of bootstrap samples)

 Size of bootstrap sample

 Often overlooked but very useful when working with large data sets

Random Forest Assessed


 General idea

 It is often easy to find a simple classifier

If AGE < 25 Then RISK = High

 Finding many such simple classifiers is easier

than finding one very powerful classifier

 Combining many simple classifiers also gives a

powerful classifier


Simple classifiers are called weak learners in the boosting literature.

Base Model Development

Basic idea

 Develop one (simple) classifier

 Check which cases it gets right and which cases it gets wrong

 Develop second classifier that corrects the errors of classifier 1

 Check errors of the ensemble (classifier 1 + classifier 2)

 Develop third classifier that corrects the errors of the ensemble

 Repeat

It is common that the weights

of base classifiers vary a lot.

Classifiers with large absolute

weight have a big impact on

the ensemble prediction.

Basically, the weight of a

classifier depends on its

accuracy. However, the

weight also takes into

account, how difficult it was

to classify a specific case

correctly. This way, classifiers

that do not contribute new

information receive low


Boosting and the Bias‐Variance‐Trade‐Off

Prevailing view is that boosting reduces bias and


Given that the ensemble

is large enough, training

error and thus bias

reduces to zero.

Tuning a Boosting Classifier

Meta‐parameters of boosting

 Classification algorithm for base model construction

 Often shallow decision trees (sometimes called decision stumps)

 Embody idea to use weak classifier

 Use of other base classifiers is possible (depending on software package)

 How many iterations (i.e., ensemble size)

 Maybe more meta‐parameters

 Depending on boosting implementation and software package

Practical advice:

There is little need to try different base classifiers. Trees work fine in most

applications. Apply model selection to determine no. of iterations. Suggestion: try

sizes of 10, 25, 50, 100, 250, 500. You can use larger settings as well but note that

training time will increase. Consider gbm package when using R.

Boosting Assessed

Multiple Classifier Systems

A multiple classifier system is an ensemble in which the

base models are produced using different classification


 Active field of research

 Which factors determine the success of a MSC?

 How to weight base classifiers?

 Use all base models or optimize MCS composition?

 The simplest MCS possible

Build a collection of base models using your preferred

classification algorithms and compute the simple average over

their predictions.

Discrete class prediction

 Continuous confidences

 Class probabilities

 Averaging requires predictions of a common scale

 Recommendations

 Avoid classifiers that produce discrete class predictions only

 Calibrate base classifier predictions

Final Note on Stacking

Predictions of base classifiers are highly correlated

 All base classifiers predict the same response variable

 Use robust classifier for 2nd level

 Classic statistical classifiers suffer from multicollinearity

 Avoid such classifiers (logistic regression, discriminant analysis, etc.)

 Modeling process gets very complex when 2nd level

classifier requires model selection

 Which data to use for parameter tuning?

 Avoid classifiers with many meta‐parameters and/or classifiers that

are sensitive to meta‐parameter choice

 In view of the above

 Option 1: Regularized logistic regression (try  = 100)

 Option 2: Random forest using rule of thumb for mtry

Multiple Classifier Systems Assessed