brulee_tab_icl() makes the open-source foundational
model TabICL available. On first use, there is a substantial download (~
400MB) for the model weights that is cached locally.
brulee_saint() and brulee_auto_int()
now support gradient clipping via the grad_value_clip and
grad_norm_clip arguments (both default to 5),
matching brulee_mlp() and brulee_resnet().
This prevents the loss from overflowing to NaN during
training with aggressive learning rates.
There is now a type argument to
predict.brulee_chronos(): "all" returns
.pred and .pred_quantile (unchanged default),
"numeric" returns only .pred,
"quantile" returns only .pred_quantile. The id
column is still prepended for multi-series models regardless of
type.
Fixed a bug where torch’s L-BFGS optimizers internal convergence flag is NA, throwing an unhelpful error.
The brulee_saint() argument
use_target_token was renamed to
target_token.
predict() for brulee_chronos() models
was reworked. The historical context is always the data supplied to
brulee_chronos() (the model is pretrained and does no
training), so the former new_data context-override was
removed. The argument previously called future_df is now
new_data: it describes the future window to forecast for
and may have at most prediction_length rows per series
(previously exactly prediction_length). When fewer rows are
supplied, the forecast is truncated to those rows.
predict() also gained a type argument
("all", "numeric", or "quantile")
to select which prediction columns are returned.
All estimated models now include epoch zero (the randomly
initialized parameters, before any training) as the first element of
loss and estimates, matching the
neural-network models. These vectors are now length
epochs + 1, epoch = 0 is a valid argument to
predict() and coef(), and the entry for
best_epoch is at position best_epoch + 1.
Predictions and coefficients for a given (positive) epoch are unchanged.
Note: objects serialized by earlier versions of these three functions
predict off by one epoch under the new indexing, so refit any stored
models.
print() methods now report the loss from the best
epoch. Previously the displayed loss was taken one epoch too early (it
ignored the prepended epoch-zero entry in loss).New models for tabular data:
Regularization Learning Networks (brulee_rln()) use
a conventional MLP architecture but each weight learns its own adaptive
regularization coefficient.
ResNet (brulee_resnet()) can fit a multilayer neural
network with skip (i.e. residual) connections and batch
normalization.
AutoInt (brulee_auto_int()) uses residual
connections and columnwise attention mechanisms to create embeddings
that encourage in-context learning of features.
Saint (brulee_saint()) uses column and/or row
attention mechanisms.
Chronos2 (brulee_chronos()) is a foundational model
for forecasting.
All modeling functions now support GPU acceleration via the
device parameter. Users can specify
device = "cpu", device = "cuda", or
device = "mps" (Apple Silicon). When
device = NULL (default), the package automatically selects
CUDA if available, otherwise defaults to CPU. Note: MPS is not
auto-selected because it doesn’t support float64 dtype required by
brulee. See?training_efficiency for some related
notes.
Float tensors were changed from 64-bit floats to 32-bit. This is to enable GPU usage on MPS devices.
Parameters are initialized on CPU devices and then converted to the chosen device. In some cases, the RNG initialization code is independent of the seed.
For classification, the softmax was moved out of every model’s
forward pass so the loss can use torch::nnf_cross_entropy()
(which applies the log-sum-exp trick internally) instead of
nll_loss(log(softmax(x))). This avoids log(0)
underflow that produced NaN losses and “numerical overflow”
early stopping on overspecified brulee_saint() /
brulee_auto_int() fits. Affects brulee_mlp(),
brulee_logistic_reg(),
brulee_multinomial_reg(), brulee_resnet(),
brulee_auto_int(), and brulee_saint(). New
fits carry output_type = "logits" so the predict path
applies softmax; serialized fits from earlier versions of brulee
continue to predict correctly.
Transition from the magrittr pipe to the base R pipe.
To try to help avoiding numeric overflow in the loss functions:
Tensors are stored as a 64-bit float instead of 32-bit.
Starting values were transitioned to using Gaussian distribution (instead of uniform) with a smaller standard deviation.
The results always contain the initial results to use as a fallback if there is overflow during the first epoch.
brulee_mlp() has two additional parameters,
grad_value_clip and grad_value_clip, that
prevent issues.
The warning was changed to “Early stopping occurred at epoch {X} due to numerical overflow of the loss function.”
Several new SGD optimizers were added: "ADAMw",
"Adadelta", "Adagrad", and
"RMSprop".
Mixture parameter values different than zero cannot be used for several optimizers since they require L2 penalties.
Added a convenience function,
brulee_mlp_two_layer(), to more easily fit two-layer
networks with parsnip.
Various changes and improvements to error and warning messages.
Fixed a bug that occurred when linear activation was used for neural networks (#68).
Fixed bug where coef() didn’t would error if used on
a brulee_logistic_reg() that was trained with a recipe.
(#66)
Fixed a bug where SGD always being used as the optimizer (#61).
Additional activation functions were added (#74).
Several learning rate schedulers were added to the modeling functions (#12).
An optimizer was added to [brulee_mlp()], with a new
default being LBFGS instead of stochastic gradient descent.
Modeling functions gained a mixture argument for the
proportion of L1 penalty that is used. (#50)
Penalization was not occurring when quasi-Newton optimization was chosen. (#50)
First CRAN release.