train: Fit Predictive Models over Different Tuning Parameters in caret: Classification and Regression Training (2024)

train

R Documentation

Fit Predictive Models over Different Tuning Parameters

Description

This function sets up a grid of tuning parameters for a numberof classification and regression routines, fits each model andcalculates a resampling based performance measure.

Usage

train(x, ...)## Default S3 method:train( x, y, method = "rf", preProcess = NULL, ..., weights = NULL, metric = ifelse(is.factor(y), "Accuracy", "RMSE"), maximize = ifelse(metric %in% c("RMSE", "logLoss", "MAE", "logLoss"), FALSE, TRUE), trControl = trainControl(), tuneGrid = NULL, tuneLength = ifelse(trControl$method == "none", 1, 3))## S3 method for class 'formula'train(form, data, ..., weights, subset, na.action = na.fail, contrasts = NULL)## S3 method for class 'recipe'train( x, data, method = "rf", ..., metric = ifelse(is.factor(y_dat), "Accuracy", "RMSE"), maximize = ifelse(metric %in% c("RMSE", "logLoss", "MAE"), FALSE, TRUE), trControl = trainControl(), tuneGrid = NULL, tuneLength = ifelse(trControl$method == "none", 1, 3))

Arguments

`x`	For the default method, `x` is an object wheresamples are in rows and features are in columns. This could be asimple matrix, data frame or other type (e.g. sparse matrix) butmust have column names (see Details below). Preprocessing usingthe `preProcess` argument only supports matrices or dataframes. When using the recipe method, `x` should be anunprepared `recipe` object that describes the modelterms (i.e. outcome, predictors, etc.) as well as anypre-processing that should be done to the data. This is analternative approach to specifying the model. Note that, whenusing the recipe method, any arguments passed to `preProcess`will be ignored. See the links and example below for more detailsusing recipes.
`...`	Arguments passed to the classification orregression routine (such as`randomForest`). Errors will occur ifvalues for tuning parameters are passed here.
`y`	A numeric or factor vector containing the outcome foreach sample.
`method`	A string specifying which classification orregression model to use. Possible values are found using`names(getModelInfo())`. Seehttp://topepo.github.io/caret/train-models-by-tag.html. Alist of functions can also be passed for a custom modelfunction. Seehttp://topepo.github.io/caret/using-your-own-model-in-train.htmlfor details.
`preProcess`	A string vector that defines a pre-processingof the predictor data. Current possibilities are "BoxCox","YeoJohnson", "expoTrans", "center", "scale", "range","knnImpute", "bagImpute", "medianImpute", "pca", "ica" and"spatialSign". The default is no pre-processing. See`preProcess` and `trainControl` on theprocedures and how to adjust them. Pre-processing code is onlydesigned to work when `x` is a simple matrix or data frame.
`weights`	A numeric vector of case weights. This argumentwill only affect models that allow case weights.
`metric`	A string that specifies what summary metric willbe used to select the optimal model. By default, possible valuesare "RMSE" and "Rsquared" for regression and "Accuracy" and"Kappa" for classification. If custom performance metrics areused (via the `summaryFunction` argument in`trainControl`, the value of `metric` shouldmatch one of the arguments. If it does not, a warning is issuedand the first metric given by the `summaryFunction` isused. (NOTE: If given, this argument must be named.)
`maximize`	A logical: should the metric be maximized orminimized?
`trControl`	A list of values that define how this functionacts. See `trainControl` andhttp://topepo.github.io/caret/using-your-own-model-in-train.html.(NOTE: If given, this argument must be named.)
`tuneGrid`	A data frame with possible tuning values. Thecolumns are named the same as the tuning parameters. Use`getModelInfo` to get a list of tuning parametersfor each model or seehttp://topepo.github.io/caret/available-models.html.(NOTE: If given, this argument must be named.)
`tuneLength`	An integer denoting the amount of granularityin the tuning parameter grid. By default, this argument is thenumber of levels for each tuning parameters that should begenerated by `train`. If `trainControl`has the option `search = "random"`, this is the maximumnumber of tuning parameter combinations that will be generatedby the random search. (NOTE: If given, this argument must benamed.)
`form`	A formula of the form `y ~ x1 + x2 + ...`
`data`	Data frame from which variables specified in`formula` or `recipe` are preferentially to be taken.
`subset`	An index vector specifying the cases to be usedin the training sample. (NOTE: If given, this argument must benamed.)
`na.action`	A function to specify the action to be takenif NAs are found. The default action is for the procedure tofail. An alternative is `na.omit`, which leads to rejectionof cases with missing values on any required variable. (NOTE: Ifgiven, this argument must be named.)
`contrasts`	A list of contrasts to be used for some or allthe factors appearing as variables in the model formula.

Details

train can be used to tune models by picking thecomplexity parameters that are associated with the optimalresampling statistics. For particular model, a grid ofparameters (if any) is created and the model is trained onslightly different data for each candidate combination of tuningparameters. Across each data set, the performance of held-outsamples is calculated and the mean and standard deviation issummarized for each combination. The combination with theoptimal resampling statistic is chosen as the final model andthe entire training set is used to fit a final model.

The predictors in x can be most any object as long asthe underlying model fit function can deal with the objectclass. The function was designed to work with simple matricesand data frame inputs, so some functionality may not work (e.g.pre-processing). When using string kernels, the vector ofcharacter strings should be converted to a matrix with a singlecolumn.

Value

A list is returned of class train containing:

`method`	The chosen model.
`modelType`	Anidentifier of the model type.
`results`	A data frame thetraining error rate and values of the tuning parameters.
`bestTune`	A data frame with the final parameters.
`call`	The (matched) function call with dots expanded
`dots`	A list containing any ... values passed to theoriginal call
`metric`	A string that specifies whatsummary metric will be used to select the optimal model.
`control`	The list of control parameters.
`preProcess`	Either `NULL` or an object of class`preProcess`
`finalModel`	A fit object usingthe best parameters
`trainingData`	A data frame
`resample`	A data frame with columns for each performancemetric. Each row corresponds to each resample. If leave-one-outcross-validation or out-of-bag estimation methods are requested,this will be `NULL`. The `returnResamp` argument of`trainControl` controls how much of the resampledresults are saved.
`perfNames`	A character vector ofperformance metrics that are produced by the summary function
`maximize`	A logical recycled from the function arguments.
`yLimits`	The range of the training set outcomes.
`times`	A list of execution times: `everything` is forthe entire call to `train`, `final` for the finalmodel fit and, optionally, `prediction` for the time topredict new samples (see `trainControl`)

Author(s)

Max Kuhn (the guts of train.formula were basedon Ripley's nnet.formula)

References

http://topepo.github.io/caret/

Kuhn (2008), “Building Predictive Models in R Using the caret”(\Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v028.i05")})

https://topepo.github.io/recipes/

Examples

## Not run: ######################################### Classification Exampledata(iris)TrainData <- iris[,1:4]TrainClasses <- iris[,5]knnFit1 <- train(TrainData, TrainClasses, method = "knn", preProcess = c("center", "scale"), tuneLength = 10, trControl = trainControl(method = "cv"))knnFit2 <- train(TrainData, TrainClasses, method = "knn", preProcess = c("center", "scale"), tuneLength = 10, trControl = trainControl(method = "boot"))library(MASS)nnetFit <- train(TrainData, TrainClasses, method = "nnet", preProcess = "range", tuneLength = 2, trace = FALSE, maxit = 100)######################################### Regression Examplelibrary(mlbench)data(BostonHousing)lmFit <- train(medv ~ . + rm:lstat, data = BostonHousing, method = "lm")library(rpart)rpartFit <- train(medv ~ ., data = BostonHousing, method = "rpart", tuneLength = 9)######################################### Example with a custom metricmadSummary <- function (data, lev = NULL, model = NULL) { out <- mad(data$obs - data$pred, na.rm = TRUE) names(out) <- "MAD" out}robustControl <- trainControl(summaryFunction = madSummary)marsGrid <- expand.grid(degree = 1, nprune = (1:10) * 2)earthFit <- train(medv ~ ., data = BostonHousing, method = "earth", tuneGrid = marsGrid, metric = "MAD", maximize = FALSE, trControl = robustControl)######################################### Example with a recipedata(cox2)cox2 <- cox2Descrcox2$potency <- cox2IC50library(recipes)cox2_recipe <- recipe(potency ~ ., data = cox2) %>% ## Log the outcome step_log(potency, base = 10) %>% ## Remove sparse and unbalanced predictors step_nzv(all_predictors()) %>% ## Surface area predictors are highly correlated so ## conduct PCA just on these. step_pca(contains("VSA"), prefix = "surf_area_", threshold = .95) %>% ## Remove other highly correlated predictors step_corr(all_predictors(), -starts_with("surf_area_"), threshold = .90) %>% ## Center and scale all of the non-PCA predictors step_center(all_predictors(), -starts_with("surf_area_")) %>% step_scale(all_predictors(), -starts_with("surf_area_"))set.seed(888)cox2_lm <- train(cox2_recipe, data = cox2, method = "lm", trControl = trainControl(method = "cv"))######################################### Parallel Processing Example via multicore package## library(doMC)## registerDoMC(2)## NOTE: don't run models form RWeka when using### multicore. The session will crash.## The code for train() does not change:set.seed(1)usingMC <- train(medv ~ ., data = BostonHousing, method = "glmboost")## or use:## library(doMPI) or## library(doParallel) or## library(doSMP) and so on## End(Not run)

train: Fit Predictive Models over Different Tuning Parameters in caret: Classification and Regression Training (2024)

FAQs

What does train do in R? ›

train can be used to tune models by picking the complexity parameters that are associated with the optimal resampling statistics. For particular model, a grid of parameters (if any) is created and the model is trained on slightly different data for each candidate combination of tuning parameters.

Learn More Now ›

What is a function train? ›

A Train is a derived function constructed from a sequence of 2 or 3 functions, or from an array followed by two functions, which bind together to form a function.

Find Out More ›

What is tunegrid in R? ›

Source: R/tune_grid.R. tune_grid.Rd. tune_grid() computes a set of performance metrics (e.g. accuracy or RMSE) for a pre-defined set of tuning parameters that correspond to a model or recipe across one or more resamples of the data.

Get More Info Here ›

What is the function of the caret? ›

The caret package (short for Classification And REgression Training) contains functions to streamline the model training process for complex regression and classification problems.

Explore More ›

What is the purpose of using train? ›

Rail is, after all, the backbone of transportation systems worldwide, connecting people and cities and avoiding private car use and traffic congestion.

Discover More Details ›

What is classification with caret? ›

Classification And REgression Training, shortened with the caret, is a package in R programming with functions that attempt to streamline the process of creating predictive models. This Package contains tools for : data splitting. pre-processing.

Learn More Now ›

Where to use model train()? ›

model. train() is a PyTorch function that sets the model in training mode. When you call model. train() , PyTorch enables features such as dropout and batch normalization, which are typically used during training but not during inference.

What is the purpose of the model train method? ›

Model training in machine language is the process of feeding an ML algorithm with data to help identify and learn good values for all attributes involved. There are several types of machine learning models, of which the most common ones are supervised and unsupervised learning.

Learn More Now ›

What is a train in agile? ›

An Agile Release Train (ART) is a feature of the Scaled Agile Framework (SAFe). It is a long-term, dedicated cross-functional team that works toward a singular goal. The train is made up of multiple agile teams.

Know More ›

What is the GGally function in R? ›

'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

Show Me More ›

What is the Rexp function in R? ›

The 'rexp' function in R is your go-to tool for generating random numbers following an exponential distribution. Its syntax is straightforward, yet powerful, allowing for customized data generation that fits various statistical modeling needs. n : Number of observations to generate.

Show Me More ›

What is Lazyeval function in R? ›

The lazyeval package provides tools that make it easier to correctly implement non-standard evaluation (NSE) in R. You use lazy evaluation by requiring the user to "quote" specially evaluated arguments with ~ , and then using the lazyeval package to compute with those formulas.

Read On ›

What is caret with example? ›

In mathematics, the caret can signify exponentiation (e.g. 3^5 for 3⁵) where the usual superscript is not readily usable (as on some graphing calculators). It is also used to indicate a superscript in TeX typesetting.

How do I use caret? ›

The caret symbol (^) is often found near the end of input fields and commands within various software applications such as word processors and internet browsers when pressed, it jumps your cursor location up one line so you can start typing again without having to press the backspace key multiple times which takes much ...

Tell Me More ›

Why do we use carets? ›

Carets are used in proofreading to signal where additional words or punctuation marks should be added to a line of text.

Tell Me More ›

What does a train do? ›

Chugging across short distances or entire continents, trains act as a major form of transportation worldwide. Also called railroads or railways, trains carry within their cars passengers or freight — such as raw materials, supplies or finished goods — and sometimes both.

See Details ›

What is the purpose of the train engine? ›

Use. There are three main uses of locomotives in rail transport operations: for hauling passenger trains, freight trains, and for switching (UK English: shunting). Freight locomotives are normally designed to deliver high starting tractive effort and high sustained power.

View Details ›

How does the train game work? ›

In Train, players receive instructions from a typewriter to load people, represented by yellow pegs, to different railway stations. The player moves their trains by rolling dice, and they can use cards to slow down their opponents' trains, or accelerate their own.

See Details ›

What is the use of the trainControl() method? ›

The function trainControl generates parameters that further control how models are created, with possible values: method : The resampling method: "boot" , "cv" , "LOOCV" , "LGOCV" , "repeatedcv" , "timeslice" , "none" and "oob" .

Read On ›

train: Fit Predictive Models over Different Tuning Parameters in caret: Classification and Regression Training (2024)

Fit Predictive Models over Different Tuning Parameters

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

FAQs

What does train do in R? ›

What is the Rexp function in R? ›