Class/Object

com.amazonaws.services.sagemaker.sparksdk.algorithms

XGBoostSageMakerEstimator

Related Docs: object XGBoostSageMakerEstimator | package algorithms

Permalink

class XGBoostSageMakerEstimator extends SageMakerEstimator with XGBoostParams

A SageMakerEstimator that runs an XGBoost training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted XGBoost model. XGBoost is an open-source distributed gradient boosting library that Amazon SageMaker has adapted to run on Amazon SageMaker.

XGBoost trains and infers on LibSVM-formatted data. XGBoostSageMakerEstimator uses Spark's LibSVMFileFormat to write the training DataFrame to S3, and serializes Rows to LibSVM for inference, selecting the column named "features" by default, expected to contain a Vector of Doubles.

Inferences made against an Endpoint hosting an XGBoost model contain a "prediction" field appended to the input DataFrame as a column of Doubles, containing the prediction corresponding to the given Vector of features.

See also

https://github.com/dmlc/xgboost for more on XGBoost.

Linear Supertypes
XGBoostParams, SageMakerEstimator, Estimator[SageMakerModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. XGBoostSageMakerEstimator
  2. XGBoostParams
  3. SageMakerEstimator
  4. Estimator
  5. PipelineStage
  6. Logging
  7. Params
  8. Serializable
  9. Serializable
  10. Identifiable
  11. AnyRef
  12. Any
  1. Hide All
  2. Show all
Visibility
  1. Public
  2. All

Instance Constructors

  1. new XGBoostSageMakerEstimator(sagemakerRole: IAMRoleResource = IAMRoleFromConfig(), trainingInstanceType: String, trainingInstanceCount: Int, endpointInstanceType: String, endpointInitialInstanceCount: Int, requestRowSerializer: RequestRowSerializer = new LibSVMRequestRowSerializer(), responseRowDeserializer: ResponseRowDeserializer = new XGBoostCSVRowDeserializer(), trainingInputS3DataPath: S3Resource = S3AutoCreatePath(), trainingOutputS3DataPath: S3Resource = S3AutoCreatePath(), trainingInstanceVolumeSizeInGB: Int = 1024, trainingProjectedColumns: Option[List[String]] = None, trainingChannelName: String = "train", trainingContentType: Option[String] = Some("libsvm"), trainingS3DataDistribution: String = ..., trainingSparkDataFormat: String = "libsvm", trainingSparkDataFormatOptions: Map[String, String] = Map(), trainingInputMode: String = TrainingInputMode.File.toString, trainingCompressionCodec: Option[String] = None, trainingMaxRuntimeInSeconds: Int = 24 * 60 * 60, trainingKmsKeyId: Option[String] = None, modelEnvironmentVariables: Map[String, String] = Map(), endpointCreationPolicy: EndpointCreationPolicy = ..., sagemakerClient: AmazonSageMaker = ..., region: Option[String] = None, s3Client: AmazonS3 = ..., stsClient: AWSSecurityTokenService = ..., modelPrependInputRowsToTransformationRows: Boolean = true, deleteStagingDataAfterTraining: Boolean = true, namePolicyFactory: NamePolicyFactory = new RandomNamePolicyFactory(), uid: String = Identifiable.randomUID("sagemaker"))

    Permalink

    sagemakerRole

    The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.

    trainingInstanceType

    The SageMaker TrainingJob Instance Type to use.

    trainingInstanceCount

    The number of instances of instanceType to run a SageMaker Training Job with.

    endpointInstanceType

    The SageMaker Endpoint Confing instance type.

    endpointInitialInstanceCount

    The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage.

    requestRowSerializer

    Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.

    responseRowDeserializer

    Deserializes an Endpoint response into a series of Rows.

    trainingInputS3DataPath

    An S3 location to upload SageMaker Training Job input data to.

    trainingOutputS3DataPath

    An S3 location for SageMaker to store Training Job output data to.

    trainingInstanceVolumeSizeInGB

    The EBS volume size in gigabytes of each instance

    trainingProjectedColumns

    The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.

    trainingChannelName

    The SageMaker Channel name to input serialized Dataset fit input to

    trainingContentType

    The MIME type of the training data.

    trainingS3DataDistribution

    The SageMaker Training Job S3 data distribution scheme.

    trainingSparkDataFormat

    The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.

    trainingSparkDataFormatOptions

    The Spark Data Format Options used during serialization of the Dataset being fit.

    trainingInputMode

    The SageMaker Training Job Channel input mode.

    trainingCompressionCodec

    The type of compression to use when serializing the Dataset being fit for input to SageMaker.

    trainingMaxRuntimeInSeconds

    A SageMaker Training Job Termination Condition MaxRuntimeInHours.

    trainingKmsKeyId

    A KMS key ID for the Output Data Source

    modelEnvironmentVariables

    The environment variables that SageMaker will set on the model container during execution.

    endpointCreationPolicy

    Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.

    sagemakerClient

    Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.

    region

    The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.

    s3Client

    AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.

    stsClient

    AmazonSTS. Used to resolve the account number when creating staging input / output buckets.

    modelPrependInputRowsToTransformationRows

    Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.

    deleteStagingDataAfterTraining

    Whether to remove the training data on s3 after training is complete or failed.

    namePolicyFactory

    The NamePolicyFactory to use when naming SageMaker entities created during fit

    uid

    The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. val alpha: DoubleParam

    Permalink

    L1 regularization term on weights.

    L1 regularization term on weights. Increase this value will make model more conservative. Default = 0

    Definition Classes
    XGBoostParams
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. val baseScore: DoubleParam

    Permalink

    The initial prediction score of all instances, global bias.

    The initial prediction score of all instances, global bias. Default = 0.5

    Definition Classes
    XGBoostParams
  8. val booster: Param[String]

    Permalink

    Which booster to use.

    Which booster to use. Can be gbtree, gblinear or dart. The gbtree and dart values use a tree based model while gblinear uses a linear function. Default = gbtree

    Definition Classes
    XGBoostParams
  9. final def clear(param: Param[_]): XGBoostSageMakerEstimator.this.type

    Permalink
    Definition Classes
    Params
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val colSampleByLevel: DoubleParam

    Permalink

    Subsample ratio of columns for each split, in each level.

    Subsample ratio of columns for each split, in each level. Must be in (0, 1]. Default = 1

    Definition Classes
    XGBoostParams
  12. val colSampleByTree: DoubleParam

    Permalink

    Subsample ratio of columns when constructing each tree.

    Subsample ratio of columns when constructing each tree. Must be in (0, 1] Default = 1

    Definition Classes
    XGBoostParams
  13. def copy(extra: ParamMap): SageMakerEstimator

    Permalink
    Definition Classes
    SageMakerEstimator → Estimator → PipelineStage → Params
  14. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  15. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  16. val deleteStagingDataAfterTraining: Boolean

    Permalink

    Whether to remove the training data on s3 after training is complete or failed.

    Whether to remove the training data on s3 after training is complete or failed.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  17. val endpointCreationPolicy: EndpointCreationPolicy

    Permalink

    Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.

    Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  18. val endpointInitialInstanceCount: Int

    Permalink

    The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage.

    The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  19. val endpointInstanceType: String

    Permalink

    The SageMaker Endpoint Confing instance type.

    The SageMaker Endpoint Confing instance type.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  20. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  21. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  22. val eta: DoubleParam

    Permalink

    Step size shrinkage used in update to prevent overfitting.

    Step size shrinkage used in update to prevent overfitting. After each boosting step, we can directly get the weights of new features and eta actually shrinks the feature weights to make the boosting process more conservative. Must be in [0, 1] Default = 0.3

    Definition Classes
    XGBoostParams
  23. val evalMetric: Param[String]

    Permalink

    Evaluation metrics for validation data.

    Evaluation metrics for validation data. A default metric will be assigned according to the objective (rmse for regression, error for classification, and map for ranking ) Default according to objective

    Definition Classes
    XGBoostParams
  24. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  25. def explainParams(): String

    Permalink
    Definition Classes
    Params
  26. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  27. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  28. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  29. def fit(dataSet: Dataset[_]): SageMakerModel

    Permalink

    Fits a SageMakerModel on dataSet by running a SageMaker training job.

    Fits a SageMakerModel on dataSet by running a SageMaker training job.

    Definition Classes
    SageMakerEstimator → Estimator
  30. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SageMakerModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  31. def fit(dataset: Dataset[_], paramMap: ParamMap): SageMakerModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  32. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SageMakerModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  33. val gamma: DoubleParam

    Permalink

    Minimum loss reduction required to make a further partition on a leaf node of the tree.

    Minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm will be. Must be >= 0. Default = 0

    Definition Classes
    XGBoostParams
  34. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  35. def getAlpha: Double

    Permalink
    Definition Classes
    XGBoostParams
  36. def getBaseScore: Double

    Permalink
    Definition Classes
    XGBoostParams
  37. def getBooster: String

    Permalink
    Definition Classes
    XGBoostParams
  38. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  39. def getColSampleByLevel: Double

    Permalink
    Definition Classes
    XGBoostParams
  40. def getColSampleByTree: Double

    Permalink
    Definition Classes
    XGBoostParams
  41. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  42. def getEta: Double

    Permalink
    Definition Classes
    XGBoostParams
  43. def getEvalMetric: String

    Permalink
    Definition Classes
    XGBoostParams
  44. def getGamma: Double

    Permalink
    Definition Classes
    XGBoostParams
  45. def getGrowPolicy: String

    Permalink
    Definition Classes
    XGBoostParams
  46. def getLambda: Double

    Permalink
    Definition Classes
    XGBoostParams
  47. def getLambdaBias: Double

    Permalink
    Definition Classes
    XGBoostParams
  48. def getMaxBin: Int

    Permalink
    Definition Classes
    XGBoostParams
  49. def getMaxDeltaStep: Double

    Permalink
    Definition Classes
    XGBoostParams
  50. def getMaxDepth: Double

    Permalink
    Definition Classes
    XGBoostParams
  51. def getMaxLeaves: Int

    Permalink
    Definition Classes
    XGBoostParams
  52. def getMinChildWeight: Double

    Permalink
    Definition Classes
    XGBoostParams
  53. def getNThread: Int

    Permalink
    Definition Classes
    XGBoostParams
  54. def getNormalizeType: String

    Permalink
    Definition Classes
    XGBoostParams
  55. def getNumClasses: Int

    Permalink
    Definition Classes
    XGBoostParams
  56. def getNumRound: Int

    Permalink
    Definition Classes
    XGBoostParams
  57. def getObjective: String

    Permalink
    Definition Classes
    XGBoostParams
  58. def getOneDrop: Int

    Permalink
    Definition Classes
    XGBoostParams
  59. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  60. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  61. def getProcessType: String

    Permalink
    Definition Classes
    XGBoostParams
  62. def getRateDrop: Double

    Permalink
    Definition Classes
    XGBoostParams
  63. def getRefreshLeaf: Int

    Permalink
    Definition Classes
    XGBoostParams
  64. def getSampleType: String

    Permalink
    Definition Classes
    XGBoostParams
  65. def getScalePosWeight: Double

    Permalink
    Definition Classes
    XGBoostParams
  66. def getSeed: Double

    Permalink
    Definition Classes
    XGBoostParams
  67. def getSilent: Int

    Permalink
    Definition Classes
    XGBoostParams
  68. def getSketchEps: Double

    Permalink
    Definition Classes
    XGBoostParams
  69. def getSkipDrop: Double

    Permalink
    Definition Classes
    XGBoostParams
  70. def getSubsample: Double

    Permalink
    Definition Classes
    XGBoostParams
  71. def getTreeMethod: String

    Permalink
    Definition Classes
    XGBoostParams
  72. def getTweedieVariancePower: Double

    Permalink
    Definition Classes
    XGBoostParams
  73. def getUpdater: String

    Permalink
    Definition Classes
    XGBoostParams
  74. val growPolicy: Param[String]

    Permalink

    Controls the way that new nodes are added to the tree.

    Controls the way that new nodes are added to the tree. Can be "depthwise" or "lossguide". Currently supported only if tree_method is set to hist. Default = "depthwise"

    Definition Classes
    XGBoostParams
  75. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  76. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  77. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  78. val hyperParameters: Map[String, String]

    Permalink

    A map from hyperParameter names to their respective values for training.

    A map from hyperParameter names to their respective values for training.

    Definition Classes
    SageMakerEstimator
  79. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  80. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  81. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  82. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  83. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  84. val lambda: DoubleParam

    Permalink

    L2 regularization term on weights.

    L2 regularization term on weights. Increase this value will make model more conservative. Default = 1

    Definition Classes
    XGBoostParams
  85. val lambdaBias: DoubleParam

    Permalink

    L2 regularization term on bias.

    L2 regularization term on bias. Must be in [0, 1]. Default = 0.0

    Definition Classes
    XGBoostParams
  86. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  87. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  88. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  89. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  90. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  91. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  92. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  93. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  94. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  95. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  96. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  97. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  98. val maxBin: IntParam

    Permalink

    Maximum number of discrete bins to bucket continuous features.

    Maximum number of discrete bins to bucket continuous features. Used only if tree_method=hist. Default = 256

    Definition Classes
    XGBoostParams
  99. val maxDeltaStep: DoubleParam

    Permalink

    Maximum delta step allowed for each tree's weight estimation can be.

    Maximum delta step allowed for each tree's weight estimation can be. Valid inputs: When a positive integer is used, it helps make the update more conservative. The preferred options is to use it in logistic regression. Set it to 1-10 to help control the update. Must be >= 0. Default = 0

    Definition Classes
    XGBoostParams
  100. val maxDepth: DoubleParam

    Permalink

    Maximum depth of a tree, increase this value will make the model more complex (likely to be overfitting).

    Maximum depth of a tree, increase this value will make the model more complex (likely to be overfitting). 0 indicates no limit, limit is required when grow_policy=depth-wise. Must be >= 0. Default = 6

    Definition Classes
    XGBoostParams
  101. val maxLeaves: IntParam

    Permalink

    Maximum number of nodes to be added.

    Maximum number of nodes to be added. Relevant only if grow_policy = lossguide. Must be >= 0. Default = 0

    Definition Classes
    XGBoostParams
  102. val minChildWeight: DoubleParam

    Permalink

    Minimum sum of instance weight (hessian) needed in a child.

    Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger the algorithm is, the more conservative it will be. Must be >= 0. Default = 1

    Definition Classes
    XGBoostParams
  103. val modelEnvironmentVariables: Map[String, String]

    Permalink

    The environment variables that SageMaker will set on the model container during execution.

    The environment variables that SageMaker will set on the model container during execution.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  104. val modelImage: String

    Permalink

    A SageMaker Model hosting Docker image URI.

    A SageMaker Model hosting Docker image URI.

    Definition Classes
    SageMakerEstimator
  105. val modelPrependInputRowsToTransformationRows: Boolean

    Permalink

    Whether the transformation result on Models built by this Estimator should also include the input Rows.

    Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  106. val nThread: IntParam

    Permalink

    Number of parallel threads used to run xgboost.

    Number of parallel threads used to run xgboost. Must be >= 1. Defaults to maximum number of threads available.

    Definition Classes
    XGBoostParams
  107. val namePolicyFactory: NamePolicyFactory

    Permalink

    The NamePolicyFactory to use when naming SageMaker entities created during fit

    The NamePolicyFactory to use when naming SageMaker entities created during fit

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  108. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  109. val normalizeType: Param[String]

    Permalink

    Type of normalization algorithm.

    Type of normalization algorithm. Can be "tree" or "forest". Default = "tree"

    Definition Classes
    XGBoostParams
  110. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  111. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  112. val numClasses: IntParam

    Permalink

    No default.

    No default. Used for softmax multiclass classification.

    Definition Classes
    XGBoostParams
  113. val numRound: IntParam

    Permalink

    Number of rounds for gradient boosting.

    Number of rounds for gradient boosting. Must be >= 1. Required.

    Definition Classes
    XGBoostParams
  114. val objective: Param[String]

    Permalink

    Specifies the learning task and the corresponding learning objective.

    Specifies the learning task and the corresponding learning objective. Default: "reg:linear"

    Definition Classes
    XGBoostParams
  115. val oneDrop: IntParam

    Permalink

    Whether to drop at least one tree during the dropout.

    Whether to drop at least one tree during the dropout. Default = 0

    Definition Classes
    XGBoostParams
  116. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  117. val processType: Param[String]

    Permalink

    The type of boosting process to run.

    The type of boosting process to run. Can be default or update. Default = "default"

    Definition Classes
    XGBoostParams
  118. val rateDrop: DoubleParam

    Permalink

    Dropout rate (a fraction of previous trees to drop during the dropout).

    Dropout rate (a fraction of previous trees to drop during the dropout). Must be in [0, 1]. Default = 0.0

    Definition Classes
    XGBoostParams
  119. val refreshLeaf: IntParam

    Permalink

    A parameter of the 'refresh' updater plugin.

    A parameter of the 'refresh' updater plugin. When set to true, tree leaves and tree node stats are updated. When set to false, only tree node stats are updated. Default = 1

    Definition Classes
    XGBoostParams
  120. val region: Option[String]

    Permalink

    The region in which to run the algorithm.

    The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.

  121. val requestRowSerializer: RequestRowSerializer

    Permalink

    Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.

    Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  122. val responseRowDeserializer: ResponseRowDeserializer

    Permalink

    Deserializes an Endpoint response into a series of Rows.

    Deserializes an Endpoint response into a series of Rows.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  123. val s3Client: AmazonS3

    Permalink

    AmazonS3.

    AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  124. val sagemakerClient: AmazonSageMaker

    Permalink

    Amazon SageMaker client.

    Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  125. val sagemakerRole: IAMRoleResource

    Permalink

    The SageMaker TrainingJob and Hosting IAM Role.

    The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  126. val sampleType: Param[String]

    Permalink

    Type of sampling algorithm.

    Type of sampling algorithm. Can be "uniform" or "weighted". Default = "uniform"

    Definition Classes
    XGBoostParams
  127. val scalePosWeight: DoubleParam

    Permalink

    Controls the balance of positive and negative weights.

    Controls the balance of positive and negative weights. It's useful for unbalanced classes. A typical value to consider: sum(negative cases) / sum(positive cases). Default = 1

    Definition Classes
    XGBoostParams
  128. val seed: DoubleParam

    Permalink

    Random number seed.

    Random number seed. Default = 0

    Definition Classes
    XGBoostParams
  129. final def set(paramPair: ParamPair[_]): XGBoostSageMakerEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  130. final def set(param: String, value: Any): XGBoostSageMakerEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  131. final def set[T](param: Param[T], value: T): XGBoostSageMakerEstimator.this.type

    Permalink
    Definition Classes
    Params
  132. def setAlpha(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  133. def setBaseScore(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  134. def setBooster(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  135. def setColSampleByLevel(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  136. def setColSampleByTree(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  137. final def setDefault(paramPairs: ParamPair[_]*): XGBoostSageMakerEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  138. final def setDefault[T](param: Param[T], value: T): XGBoostSageMakerEstimator.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  139. def setEta(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  140. def setEvalMetric(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  141. def setGamma(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  142. def setGrowPolicy(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  143. def setLambda(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  144. def setLambdaBias(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  145. def setMaxBin(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  146. def setMaxDeltaStep(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  147. def setMaxDepth(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  148. def setMaxLeaves(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  149. def setMinChildWeight(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  150. def setNThread(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  151. def setNormalizeType(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  152. def setNumClasses(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  153. def setNumRound(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  154. def setObjective(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  155. def setOneDrop(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  156. def setProcessType(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  157. def setRateDrop(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  158. def setRefreshLeaf(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  159. def setSampleType(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  160. def setScalePosWeight(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  161. def setSeed(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  162. def setSilent(value: Int): XGBoostSageMakerEstimator.this.type

    Permalink
  163. def setSketchEps(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  164. def setSkipDrop(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  165. def setSubsample(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  166. def setTreeMethod(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  167. def setTweedieVariancePower(value: Double): XGBoostSageMakerEstimator.this.type

    Permalink
  168. def setUpdater(value: String): XGBoostSageMakerEstimator.this.type

    Permalink
  169. val silent: IntParam

    Permalink

    Whether in silent mode.

    Whether in silent mode. Can be 0 or 1. 0 means printing running messages, 1 means silent mode. Default = 0

    Definition Classes
    XGBoostParams
  170. val sketchEps: DoubleParam

    Permalink

    Used only for approximate greedy algorithm.

    Used only for approximate greedy algorithm. Translates into O(1 / sketch_eps) number of bins. Compared to directly select number of bins, this comes with theoretical guarantee with sketch accuracy. Must be in (0, 1). Default = 0.03

    Definition Classes
    XGBoostParams
  171. val skipDrop: DoubleParam

    Permalink

    Probability of skipping the dropout procedure during a boosting iteration.

    Probability of skipping the dropout procedure during a boosting iteration. Must be in [0, 1]. Default: 0

    Definition Classes
    XGBoostParams
  172. val stsClient: AWSSecurityTokenService

    Permalink

    AmazonSTS.

    AmazonSTS. Used to resolve the account number when creating staging input / output buckets.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  173. val subsample: DoubleParam

    Permalink

    Subsample ratio of the training instance.

    Subsample ratio of the training instance. Setting it to 0.5 means that XGBoost randomly collected half of the data instances to grow trees and this will prevent overfitting. Must be in (0, 1]. Default = 1

    Definition Classes
    XGBoostParams
  174. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  175. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  176. val trainingChannelName: String

    Permalink

    The SageMaker Channel name to input serialized Dataset fit input to

    The SageMaker Channel name to input serialized Dataset fit input to

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  177. val trainingCompressionCodec: Option[String]

    Permalink

    The type of compression to use when serializing the Dataset being fit for input to SageMaker.

    The type of compression to use when serializing the Dataset being fit for input to SageMaker.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  178. val trainingContentType: Option[String]

    Permalink

    The MIME type of the training data.

    The MIME type of the training data.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  179. val trainingImage: String

    Permalink

    A SageMaker Training Job Algorithm Specification Training Image Docker image URI.

    A SageMaker Training Job Algorithm Specification Training Image Docker image URI.

    Definition Classes
    SageMakerEstimator
  180. val trainingInputMode: String

    Permalink

    The SageMaker Training Job Channel input mode.

    The SageMaker Training Job Channel input mode.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  181. val trainingInputS3DataPath: S3Resource

    Permalink

    An S3 location to upload SageMaker Training Job input data to.

    An S3 location to upload SageMaker Training Job input data to.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  182. val trainingInstanceCount: Int

    Permalink

    The number of instances of instanceType to run a SageMaker Training Job with.

    The number of instances of instanceType to run a SageMaker Training Job with.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  183. val trainingInstanceType: String

    Permalink

    The SageMaker TrainingJob Instance Type to use.

    The SageMaker TrainingJob Instance Type to use.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  184. val trainingInstanceVolumeSizeInGB: Int

    Permalink

    The EBS volume size in gigabytes of each instance

    The EBS volume size in gigabytes of each instance

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  185. val trainingKmsKeyId: Option[String]

    Permalink

    A KMS key ID for the Output Data Source

    A KMS key ID for the Output Data Source

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  186. val trainingMaxRuntimeInSeconds: Int

    Permalink

    A SageMaker Training Job Termination Condition MaxRuntimeInHours.

    A SageMaker Training Job Termination Condition MaxRuntimeInHours.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  187. val trainingOutputS3DataPath: S3Resource

    Permalink

    An S3 location for SageMaker to store Training Job output data to.

    An S3 location for SageMaker to store Training Job output data to.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  188. val trainingProjectedColumns: Option[List[String]]

    Permalink

    The columns to project from the Dataset being fit before training.

    The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  189. val trainingS3DataDistribution: String

    Permalink

    The SageMaker Training Job S3 data distribution scheme.

    The SageMaker Training Job S3 data distribution scheme.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  190. val trainingSparkDataFormat: String

    Permalink

    The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.

    The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  191. val trainingSparkDataFormatOptions: Map[String, String]

    Permalink

    The Spark Data Format Options used during serialization of the Dataset being fit.

    The Spark Data Format Options used during serialization of the Dataset being fit.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator
  192. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator → PipelineStage
  193. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  194. val treeMethod: Param[String]

    Permalink

    The tree construction algorithm used in XGBoost.

    The tree construction algorithm used in XGBoost. Can be auto, exact, approx, hist. Default = "auto"

    Definition Classes
    XGBoostParams
  195. val tweedieVariancePower: DoubleParam

    Permalink

    Parameter that controls the variance of the Tweedie distribution.

    Parameter that controls the variance of the Tweedie distribution. Must be in (1, 2). Default = 1.5

    Definition Classes
    XGBoostParams
  196. val uid: String

    Permalink

    The unique identifier of this Estimator.

    The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.

    Definition Classes
    XGBoostSageMakerEstimatorSageMakerEstimator → Identifiable
  197. val updater: Param[String]

    Permalink

    A comma-separated string that defines the sequence of tree updaters to run.

    A comma-separated string that defines the sequence of tree updaters to run. This provides a modular way to construct and to modify the trees. Default = "grow_colmaker,prune"

    Definition Classes
    XGBoostParams
  198. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  199. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  200. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from XGBoostParams

Inherited from SageMakerEstimator

Inherited from Estimator[SageMakerModel]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped