Table 1 Open in new tab Hyperparameter...

Panel A: ENet

Panel B: RF

Package:

Scikit-learn (SGDRegressor)

Scikit-learn (RandomForestRegressor)

Feature transformation:

Standard & robust scaling

Selection by variance threshold

Model parameters:

L1-L2-penalty:

{x \in R : 10^{- 5} \leq x \leq 10^{- 1}}

Number of trees: 300

L1-ratio:

{x \in R : 0 \leq x \leq 1}

Max. depth:

{x \in N : 2 \leq x \leq 30}

Max. features:

{x \in N : 2 \leq x \leq 150}

Optimization:

Stochastic gradient descent

Tolerance: 10⁻⁴

Max. epochs: 1,000

Learning rate:

10^{- 4} / t^{0.1}

Random search:

Number of combinations: 1,000

Number of combinations: 500

Panel C: GBRT

Panel D: ANN

Package:

Scikit-learn (GradientBoostingRegressor)

Tensorflow/Keras (Sequential)

Feature transformation:

Standard & robust scaling

Selection by variance threshold

Model parameters:

Number of trees:

{x \in N : 2 \leq x \leq 100}

Activation: TanH (Glorot), ReLU (He)

Max. depth:

{x \in N : 1 \leq x \leq 3}

Hidden layers:

{1, 2, 3, 4, 5}

Max. features:

{20, 50, All}

First hidden layer nodes:

{32, 64, 128}

Learning rate:

{x \in R : 5 \times 10^{- 3} \leq x \leq 1.2 \times 10^{- 1}}

Network architecture: Pyramid

Max. weight norm: 4

Dropout rate:

{x \in R : 0 \leq x \leq 0.5}

L1-penalty:

{x \in R : 10^{- 7} \leq x \leq 10^{- 2}}

Optimization:

Adaptive moment estimation

Batch size:

{100, 200, 500, 1, 000}

Learning rate:

{x \in R : 10^{- 4} \leq x \leq 10^{- 2}}

Early stopping patience: 6

Max. epochs: 50

Batch normalization before activ.

Number of networks in ensemble: 10

Random search:

Number of combinations: 300

Number of combinations: 1,000

Table 1

Open in new tab

Hyperparameter search space

Panel A: ENet	Panel B: RF
Package:	Package:
Scikit-learn (SGDRegressor)	Scikit-learn (RandomForestRegressor)
Feature transformation:	Feature transformation:
Standard & robust scaling	Standard & robust scaling
Selection by variance threshold	Selection by variance threshold
Model parameters:	Model parameters:
L1-L2-penalty: ${x \in R : 10^{- 5} \leq x \leq 10^{- 1}}$	Number of trees: 300
L1-ratio: ${x \in R : 0 \leq x \leq 1}$	Max. depth: ${x \in N : 2 \leq x \leq 30}$
	Max. features: ${x \in N : 2 \leq x \leq 150}$
Optimization:
Stochastic gradient descent
Tolerance: 10⁻⁴
Max. epochs: 1,000
Learning rate: $10^{- 4} / t^{0.1}$
Random search:	Random search:
Number of combinations: 1,000	Number of combinations: 500

Panel A: ENet	Panel B: RF
Package:	Package:
Scikit-learn (SGDRegressor)	Scikit-learn (RandomForestRegressor)
Feature transformation:	Feature transformation:
Standard & robust scaling	Standard & robust scaling
Selection by variance threshold	Selection by variance threshold
Model parameters:	Model parameters:
L1-L2-penalty: ${x \in R : 10^{- 5} \leq x \leq 10^{- 1}}$	Number of trees: 300
L1-ratio: ${x \in R : 0 \leq x \leq 1}$	Max. depth: ${x \in N : 2 \leq x \leq 30}$
	Max. features: ${x \in N : 2 \leq x \leq 150}$
Optimization:
Stochastic gradient descent
Tolerance: 10⁻⁴
Max. epochs: 1,000
Learning rate: $10^{- 4} / t^{0.1}$
Random search:	Random search:
Number of combinations: 1,000	Number of combinations: 500

Panel C: GBRT	Panel D: ANN
Package:	Package:
Scikit-learn (GradientBoostingRegressor)	Tensorflow/Keras (Sequential)
Feature transformation:	Feature transformation:
Standard & robust scaling	Standard & robust scaling
Selection by variance threshold	Selection by variance threshold
Model parameters:	Model parameters:
Number of trees: ${x \in N : 2 \leq x \leq 100}$	Activation: TanH (Glorot), ReLU (He)
Max. depth: ${x \in N : 1 \leq x \leq 3}$	Hidden layers: ${1, 2, 3, 4, 5}$
Max. features: ${20, 50, All}$	First hidden layer nodes: ${32, 64, 128}$
Learning rate: ${x \in R : 5 \times 10^{- 3} \leq x \leq 1.2 \times 10^{- 1}}$	Network architecture: Pyramid
	Max. weight norm: 4
	Dropout rate: ${x \in R : 0 \leq x \leq 0.5}$
	L1-penalty: ${x \in R : 10^{- 7} \leq x \leq 10^{- 2}}$
	Optimization:
	Adaptive moment estimation
	Batch size: ${100, 200, 500, 1, 000}$
	Learning rate: ${x \in R : 10^{- 4} \leq x \leq 10^{- 2}}$
	Early stopping patience: 6
	Max. epochs: 50
	Batch normalization before activ.
	Number of networks in ensemble: 10
Random search:	Random search:
Number of combinations: 300	Number of combinations: 1,000

Panel C: GBRT	Panel D: ANN
Package:	Package:
Scikit-learn (GradientBoostingRegressor)	Tensorflow/Keras (Sequential)
Feature transformation:	Feature transformation:
Standard & robust scaling	Standard & robust scaling
Selection by variance threshold	Selection by variance threshold
Model parameters:	Model parameters:
Number of trees: ${x \in N : 2 \leq x \leq 100}$	Activation: TanH (Glorot), ReLU (He)
Max. depth: ${x \in N : 1 \leq x \leq 3}$	Hidden layers: ${1, 2, 3, 4, 5}$
Max. features: ${20, 50, All}$	First hidden layer nodes: ${32, 64, 128}$
Learning rate: ${x \in R : 5 \times 10^{- 3} \leq x \leq 1.2 \times 10^{- 1}}$	Network architecture: Pyramid
	Max. weight norm: 4
	Dropout rate: ${x \in R : 0 \leq x \leq 0.5}$
	L1-penalty: ${x \in R : 10^{- 7} \leq x \leq 10^{- 2}}$
	Optimization:
	Adaptive moment estimation
	Batch size: ${100, 200, 500, 1, 000}$
	Learning rate: ${x \in R : 10^{- 4} \leq x \leq 10^{- 2}}$
	Early stopping patience: 6
	Max. epochs: 50
	Batch normalization before activ.
	Number of networks in ensemble: 10
Random search:	Random search:
Number of combinations: 300	Number of combinations: 1,000

Notes: This table shows the hyperparameter search space and the Python packages used for both long and short training. Parameter configurations not listed here correspond to the respective default settings.

This Feature Is Available To Subscribers Only