trainControl function - RDocumentation (2024)

Description

Control the computational nuances of the train function

Usage

trainControl( method = "boot", number = ifelse(grepl("cv", method), 10, 25), repeats = ifelse(grepl("[d_]cv$", method), 1, NA), p = 0.75, search = "grid", initialWindow = NULL, horizon = 1, fixedWindow = TRUE, skip = 0, verboseIter = FALSE, returnData = TRUE, returnResamp = "final", savePredictions = FALSE, classProbs = FALSE, summaryFunction = defaultSummary, selectionFunction = "best", preProcOptions = list(thresh = 0.95, ICAcomp = 3, k = 5, freqCut = 95/5, uniqueCut = 10, cutoff = 0.9), sampling = NULL, index = NULL, indexOut = NULL, indexFinal = NULL, timingSamps = 0, predictionBounds = rep(FALSE, 2), seeds = NA, adaptive = list(min = 5, alpha = 0.05, method = "gls", complete = TRUE), trim = FALSE, allowParallel = TRUE)

Arguments

method

The resampling method: "boot", "boot632","optimism_boot", "boot_all","cv", "repeatedcv", "LOOCV", "LGOCV" (forrepeated training/test splits), "none" (only fits one model to theentire training set), "oob" (only for random forest, bagged trees,bagged earth, bagged flexible discriminant analysis, or conditional treeforest models), timeslice, "adaptive_cv", "adaptive_boot" or"adaptive_LGOCV"

number

Either the number of folds or number of resampling iterations

repeats

For repeated k-fold cross-validation only: the number ofcomplete sets of folds to compute

p

For leave-group out cross-validation: the training percentage

search

Either "grid" or "random", describing how thetuning parameter grid is determined. See details below.

initialWindow, horizon, fixedWindow, skip

possible arguments tocreateTimeSlices when method is timeslice.

verboseIter

A logical for printing a training log.

returnResamp

A character string indicating how much of the resampledsummary metrics should be saved. Values can be "final", "all"or "none"

savePredictions

an indicator of how much of the hold-out predictionsfor each resample should be saved. Values can be either "all","final", or "none". A logical value can also be used thatconvert to "all" (for true) or "none" (for false)."final" saves the predictions for the optimal tuning parameters.

classProbs

a logical; should class probabilities be computed forclassification models (along with predicted values) in each resample?

summaryFunction

a function to compute performance metrics acrossresamples. The arguments to the function should be the same as those indefaultSummary. Note that if method = "oob" is used,this option is ignored and a warning is issued.

selectionFunction

the function used to select the optimal tuningparameter. This can be a name of the function or the function itself. Seebest for details and other options.

preProcOptions

A list of options to pass to preProcess.The type of pre-processing (e.g. center, scaling etc) is passed in via thepreProc option in train.

sampling

a single character value describing the type of additionalsampling that is conducted after resampling (usually to resolve classimbalances). Values are "none", "down", "up","smote", or "rose". The latter two values require thethemis and ROSE packages, respectively. This argument can also bea list to facilitate custom sampling and these details can be found on thecaret package website for sampling (link below).

index

a list with elements for each resampling iteration. Each listelement is a vector of integers corresponding to the rows used for trainingat that iteration.

indexOut

a list (the same length as index) that dictates whichdata are held-out for each resample (as integers). If NULL, then theunique set of samples not contained in index is used.

indexFinal

an optional vector of integers indicating which samplesare used to fit the final model after resampling. If NULL, thenentire data set is used.

timingSamps

the number of training set samples that will be used tomeasure the time for predicting samples (zero indicates that the predictiontime should not be estimated.

predictionBounds

a logical or numeric vector of length 2 (regressiononly). If logical, the predictions can be constrained to be within the limitof the training set outcomes. For example, a value of c(TRUE, FALSE)would only constrain the lower end of predictions. If numeric, specificbounds can be used. For example, if c(10, NA), values below 10 wouldbe predicted as 10 (with no constraint in the upper side).

seeds

an optional set of integers that will be used to set the seedat each resampling iteration. This is useful when the models are run inparallel. A value of NA will stop the seed from being set within theworker processes while a value of NULL will set the seeds using arandom set of integers. Alternatively, a list can be used. The list shouldhave B+1 elements where B is the number of resamples, unlessmethod is "boot632" in which case B is the number ofresamples plus 1. The first B elements of the list should be vectorsof integers of length M where M is the number of models beingevaluated. The last element of the list only needs to be a single integer(for the final model). See the Examples section below and the Detailssection.

adaptive

a list used when method is "adaptive_cv","adaptive_boot" or "adaptive_LGOCV". See Details below.

trim

a logical. If TRUE the final model inobject\$finalModel may have some components of the object removed soreduce the size of the saved object. The predict method will stillwork, but some other features of the model may not work. triming willoccur only for models where this feature has been implemented.

allowParallel

if a parallel backend is loaded and available, shouldthe function use it?

Value

An echo of the parameters specified

Details

When setting the seeds manually, the number of models being evaluated isrequired. This may not be obvious as train does some optimizationsfor certain models. For example, when tuning over PLS model, the only modelthat is fit is the one with the largest number of components. So if themodel is being tuned over comp in 1:10, the only model fit isncomp = 10. However, if the vector of integers used in theseeds arguments is longer than actually needed, no error is thrown.

Using method = "none" and specifying more than one model intrain's tuneGrid or tuneLength arguments willresult in an error.

Using adaptive resampling when method is either "adaptive_cv","adaptive_boot" or "adaptive_LGOCV", the full set of resamplesis not run for each model. As resampling continues, a futility analysis isconducted and models with a low probability of being optimal are removed.These features are experimental. See Kuhn (2014) for more details. Theoptions for this procedure are:

  • min: the minimum number of resamples used beforemodels are removed

  • alpha: the confidence level of the one-sidedintervals used to measure futility

  • method: either generalizedleast squares (method = "gls") or a Bradley-Terry model (method= "BT")

  • complete: if a single parameter value is found beforethe end of resampling, should the full set of resamples be computed for thatparameter. )

The option search = "grid" uses the default grid search routine. Whensearch = "random", a random search procedure is used (Bergstra andBengio, 2012). See http://topepo.github.io/caret/random-hyperparameter-search.html fordetails and an example.

The supported bootstrap methods are:

  • "boot": the usual bootstrap.

  • "boot632": the 0.632 bootstrap estimator (Efron, 1983).

  • "optimism_boot": the optimism bootstrap estimator. (Efron and Tibshirani, 1994).

  • "boot_all": all of the above (for efficiency, but "boot" will be used for calculations).

The "boot632" method should not to be confused with the 0.632+estimator proposed later by the same author.

Note that if index or indexOut are specified, the label shown by train may not be accurate since these arguments supersede the method argument.

References

Efron (1983). ``Estimating the error rate of a prediction rule:improvement on cross-validation''. Journal of the American StatisticalAssociation, 78(382):316-331

Efron, B., & Tibshirani, R. J. (1994). ``An introduction to the bootstrap'',pages 249-252. CRC press.

Bergstra and Bengio (2012), ``Random Search for Hyper-ParameterOptimization'', Journal of Machine Learning Research, 13(Feb):281-305

Kuhn (2014), ``Futility Analysis in the Cross-Validation of Machine LearningModels'' https://arxiv.org/abs/1405.6974,

Package website for subsampling:https://topepo.github.io/caret/subsampling-for-class-imbalances.html

Examples

Run this code

# NOT RUN {# }# NOT RUN {## Do 5 repeats of 10-Fold CV for the iris data. We will fit## a KNN model that evaluates 12 values of k and set the seed## at each iteration.set.seed(123)seeds <- vector(mode = "list", length = 51)for(i in 1:50) seeds[[i]] <- sample.int(1000, 22)## For the last model:seeds[[51]] <- sample.int(1000, 1)ctrl <- trainControl(method = "repeatedcv", repeats = 5, seeds = seeds)set.seed(1)mod <- train(Species ~ ., data = iris, method = "knn", tuneLength = 12, trControl = ctrl)ctrl2 <- trainControl(method = "adaptive_cv", repeats = 5, verboseIter = TRUE, seeds = seeds)set.seed(1)mod2 <- train(Species ~ ., data = iris, method = "knn", tuneLength = 12, trControl = ctrl2)# }# NOT RUN {# }

Run the code above in your browser using DataLab

trainControl function - RDocumentation (2024)
Top Articles
Kneeling Hip Flexor Stretch Guide: Benefits, Common Mistakes, Variations, and More – Fitness Volt
A Pro’s Guide to At-Home Teeth Whitening in 2021: Toothpastes, Brightening Kits & More
Evil Dead Rise Review - IGN
Risen Kaiser Horns
Inside Watchland: The Franck Muller Watch Manufacturing Facilities | aBlogtoWatch
Craigslist Placer County
Rs3 Bring Leela To The Tomb
Melia Nassau Beach Construction Update 2023
Craigslist Greenville Pets Free
Greater Keene Men's Softball
Best Taq 56 Loadout Mw2 Ranked
Culver's Flavor Of The Day Little Chute
Minneapolis Rubratings
Love In The Air Ep 2 Eng Sub
They Cloned Tyrone Showtimes Near Showbiz Cinemas - Kingwood
Entegra Forum
Gay Pnp Zoom Meetings
Chubbs Canton Il
Lubbock Avalanche Journal Newspaper Obituaries
Things to do in Wichita Falls this weekend Sept. 12-15
Seattle Rub Rating
Chicken Coop Brookhaven Ms
Optum Primary Care - Winter Park Aloma
A Man Called Otto Showtimes Near Palm Desert
How a 1928 Pact Actually Tried to Outlaw War
Kamala Harris, Donald Trump debate prompts major endorsem*nt, Fox News invitation for a 2nd face-off
What Happened To Zion Judah Satterfield
Spinning Gold Showtimes Near Mjr Westland Grand Cinema 16
Fishweather
Exploring IranProud: A Gateway to Iranian Entertainment
Staar English 2 2022 Answer Key
Hannaford Weekly Flyer Manchester Nh
Haktuts.in Coin Master 50 Spin Link
Susan Bowers Facebook
Eddy Ketchersid Obituary
Southland Goldendoodles
Gofish Dating
Dumb Money Showtimes Near Cinemark Century Mountain View 16
Ewing Irrigation Prd
Pressconnects Obituaries Recent
Exposedrealfun Collage
A Man Called Otto Showtimes Near Carolina Mall Cinema
Dontrell Williams Miami First 48
Acadis Portal Missouri
Server Jobs Near
Rocky Aur Rani Kii Prem Kahaani - Movie Reviews
Watch Races - Woodbine Racetrack
Kohl's Hixson Tennessee
Ds Cuts Saugus
Wv Anon Vault
Sir Anthony Quayle, 76; Actor Won Distinction in Theater, Film, TV
Good Number To Shoot For
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6177

Rating: 4.7 / 5 (67 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.