Table 1

Hyperparameter search space

Panel A: ENetPanel B: RF
Package:Package:
Scikit-learn (SGDRegressor)Scikit-learn (RandomForestRegressor)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
L1-L2-penalty: {xR:105x101}Number of trees: 300
L1-ratio: {xR:0x1}Max. depth: {xN:2x30}
Max. features: {xN:2x150}
Optimization:
Stochastic gradient descent
Tolerance: 10−4
Max. epochs: 1,000
Learning rate: 104/t0.1
Random search:Random search:
Number of combinations: 1,000Number of combinations: 500
Panel A: ENetPanel B: RF
Package:Package:
Scikit-learn (SGDRegressor)Scikit-learn (RandomForestRegressor)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
L1-L2-penalty: {xR:105x101}Number of trees: 300
L1-ratio: {xR:0x1}Max. depth: {xN:2x30}
Max. features: {xN:2x150}
Optimization:
Stochastic gradient descent
Tolerance: 10−4
Max. epochs: 1,000
Learning rate: 104/t0.1
Random search:Random search:
Number of combinations: 1,000Number of combinations: 500
Panel C: GBRTPanel D: ANN
Package:Package:
Scikit-learn (GradientBoostingRegressor)Tensorflow/Keras (Sequential)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
Number of trees: {xN:2x100}Activation: TanH (Glorot), ReLU (He)
Max. depth: {xN:1x3}Hidden layers: {1,2,3,4,5}
Max. features: {20,50,All}First hidden layer nodes: {32,64,128}
Learning rate: {xR:5×103x1.2×101}Network architecture: Pyramid
Max. weight norm: 4
Dropout rate: {xR:0x0.5}
L1-penalty: {xR:107x102}
Optimization:
Adaptive moment estimation
Batch size: {100,200,500,1,000}
Learning rate: {xR:104x102}
Early stopping patience: 6
Max. epochs: 50
Batch normalization before activ.
Number of networks in ensemble: 10
Random search:Random search:
Number of combinations: 300Number of combinations: 1,000
Panel C: GBRTPanel D: ANN
Package:Package:
Scikit-learn (GradientBoostingRegressor)Tensorflow/Keras (Sequential)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
Number of trees: {xN:2x100}Activation: TanH (Glorot), ReLU (He)
Max. depth: {xN:1x3}Hidden layers: {1,2,3,4,5}
Max. features: {20,50,All}First hidden layer nodes: {32,64,128}
Learning rate: {xR:5×103x1.2×101}Network architecture: Pyramid
Max. weight norm: 4
Dropout rate: {xR:0x0.5}
L1-penalty: {xR:107x102}
Optimization:
Adaptive moment estimation
Batch size: {100,200,500,1,000}
Learning rate: {xR:104x102}
Early stopping patience: 6
Max. epochs: 50
Batch normalization before activ.
Number of networks in ensemble: 10
Random search:Random search:
Number of combinations: 300Number of combinations: 1,000

Notes: This table shows the hyperparameter search space and the Python packages used for both long and short training. Parameter configurations not listed here correspond to the respective default settings.

Table 1

Hyperparameter search space

Panel A: ENetPanel B: RF
Package:Package:
Scikit-learn (SGDRegressor)Scikit-learn (RandomForestRegressor)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
L1-L2-penalty: {xR:105x101}Number of trees: 300
L1-ratio: {xR:0x1}Max. depth: {xN:2x30}
Max. features: {xN:2x150}
Optimization:
Stochastic gradient descent
Tolerance: 10−4
Max. epochs: 1,000
Learning rate: 104/t0.1
Random search:Random search:
Number of combinations: 1,000Number of combinations: 500
Panel A: ENetPanel B: RF
Package:Package:
Scikit-learn (SGDRegressor)Scikit-learn (RandomForestRegressor)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
L1-L2-penalty: {xR:105x101}Number of trees: 300
L1-ratio: {xR:0x1}Max. depth: {xN:2x30}
Max. features: {xN:2x150}
Optimization:
Stochastic gradient descent
Tolerance: 10−4
Max. epochs: 1,000
Learning rate: 104/t0.1
Random search:Random search:
Number of combinations: 1,000Number of combinations: 500
Panel C: GBRTPanel D: ANN
Package:Package:
Scikit-learn (GradientBoostingRegressor)Tensorflow/Keras (Sequential)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
Number of trees: {xN:2x100}Activation: TanH (Glorot), ReLU (He)
Max. depth: {xN:1x3}Hidden layers: {1,2,3,4,5}
Max. features: {20,50,All}First hidden layer nodes: {32,64,128}
Learning rate: {xR:5×103x1.2×101}Network architecture: Pyramid
Max. weight norm: 4
Dropout rate: {xR:0x0.5}
L1-penalty: {xR:107x102}
Optimization:
Adaptive moment estimation
Batch size: {100,200,500,1,000}
Learning rate: {xR:104x102}
Early stopping patience: 6
Max. epochs: 50
Batch normalization before activ.
Number of networks in ensemble: 10
Random search:Random search:
Number of combinations: 300Number of combinations: 1,000
Panel C: GBRTPanel D: ANN
Package:Package:
Scikit-learn (GradientBoostingRegressor)Tensorflow/Keras (Sequential)
Feature transformation:Feature transformation:
Standard & robust scalingStandard & robust scaling
Selection by variance thresholdSelection by variance threshold
Model parameters:Model parameters:
Number of trees: {xN:2x100}Activation: TanH (Glorot), ReLU (He)
Max. depth: {xN:1x3}Hidden layers: {1,2,3,4,5}
Max. features: {20,50,All}First hidden layer nodes: {32,64,128}
Learning rate: {xR:5×103x1.2×101}Network architecture: Pyramid
Max. weight norm: 4
Dropout rate: {xR:0x0.5}
L1-penalty: {xR:107x102}
Optimization:
Adaptive moment estimation
Batch size: {100,200,500,1,000}
Learning rate: {xR:104x102}
Early stopping patience: 6
Max. epochs: 50
Batch normalization before activ.
Number of networks in ensemble: 10
Random search:Random search:
Number of combinations: 300Number of combinations: 1,000

Notes: This table shows the hyperparameter search space and the Python packages used for both long and short training. Parameter configurations not listed here correspond to the respective default settings.

Close
This Feature Is Available To Subscribers Only

Sign In or Create an Account

Close

This PDF is available to Subscribers Only

View Article Abstract & Purchase Options

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Close