com.amazonaws.services.sagemaker.sparksdk.algorithms
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
The SageMaker TrainingJob Instance Type to use
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker Endpoint Confing instance type
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
An S3 location to upload SageMaker Training Job input data to.
An S3 location for SageMaker to store Training Job output data to.
The EBS volume size in gigabytes of each instance.
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Channel name to input serialized Dataset fit input to
The MIME type of the training data.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The SageMaker Training Job Channel input mode.
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
A KMS key ID for the Output Data Source
The environment variables that SageMaker will set on the model container during execution.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
Whether to remove the training data on s3 after training is complete or failed.
The NamePolicyFactory to use when naming SageMaker entities created during fit
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
Parameter specific to adam optimizer.
Parameter specific to adam optimizer. Exponential decay rate for first moment estimates. Ignored when optimizer is not adam. Must be in range [0, 1). Default: 0.9.
Parameter specific to adam optimizer.
Parameter specific to adam optimizer. Exponential decay rate for second moment estimates. Ignored when optimizer is not adam. Must be in range [0, 1). Default: 0.999.
Learning rate bias multiplier.
Learning rate bias multiplier. The actual learning rate for the bias is learning rate times bias_lr_mult. Must be > 0. Default: 10.
Weight decay parameter multiplier.
Weight decay parameter multiplier. The actual L2 regularization weight for the bias is wd times bias_wd_mult. Must be >= 0. Default: 0.
Whether to remove the training data on s3 after training is complete or failed.
Whether to remove the training data on s3 after training is complete or failed.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
The SageMaker Endpoint Confing instance type
The SageMaker Endpoint Confing instance type
Max number of passes over the data.
Max number of passes over the data. Must be > 0. Default: 10.
The dimension of the input vectors.
The dimension of the input vectors. Must be > 0. Required.
Fits a SageMakerModel on dataSet by running a SageMaker training job.
Fits a SageMakerModel on dataSet by running a SageMaker training job.
A map from hyperParameter names to their respective values for training.
A map from hyperParameter names to their respective values for training.
Initial weight for bias.
Initial weight for bias. Default: 0.
Initialization function for the model weights.
Initialization function for the model weights. Supported options: "uniform" and "normal". uniform: uniformly between (-scale, +scale) normal: normal with mean 0 and sigma Default: "uniform".
Scale for init method uniform.
Scale for init method uniform. Must be > 0. Default: 0.07.
Standard deviation for init method normal.
Standard deviation for init method normal. Must be > 0. Default: 0.01.
The L1 regularization parameter.
The L1 regularization parameter. Use 0 for no L1 regularization. Must be >= 0. Default: 0.
The learning rate.
The learning rate. Must be > 0 or "auto". Default: "auto".
The loss function to apply.
The loss function to apply. Supported options: "logistic", "squared_loss" and "auto". Default: "auto".
Parameter specific to lr_scheduler.
Parameter specific to lr_scheduler. Ignored otherwise. Every lr_scheduler_step the learning rate will decrease by this quantity. Must be in (0, 1). Default: 0.99.
Parameter specific to lr_scheduler.
Parameter specific to lr_scheduler. Ignored otherwise. The learning rate will never decrease to a value lower than lr_scheduler_minimum_lr. Must be > 0. Default: 1e-5.
Parameter specific to lr_scheduler.
Parameter specific to lr_scheduler. Ignored otherwise. The number of steps between decreases of the learning rate. Must be > 0. Default: 100.
The number of examples in a mini-batch.
The number of examples in a mini-batch. Must be > 0. Required.
The environment variables that SageMaker will set on the model container during execution.
The environment variables that SageMaker will set on the model container during execution.
A SageMaker Model hosting Docker image URI.
A SageMaker Model hosting Docker image URI.
Whether the transformation result on Models built by this Estimator should also include the input Rows.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
Momentum parameter of sgd optimizer.
Momentum parameter of sgd optimizer. Must be in range [0, 1). Default: 0.
The NamePolicyFactory to use when naming SageMaker entities created during fit
The NamePolicyFactory to use when naming SageMaker entities created during fit
Whether to normalize the features before training to have std_dev of 1.
Whether to normalize the features before training to have std_dev of 1. Default: True
Whether regression label is normalized.
Whether regression label is normalized. Ignored in classification. Default: "auto"
Number of samples to use from validation dataset for doing model calibration (finding the best threshold).
Number of samples to use from validation dataset for doing model calibration (finding the best threshold). Must be > 0. Default: 10000000.
Number of models to train in parallel.
Number of models to train in parallel. Must be > 0 or "auto". If default "auto" is selected, the number of parallel models to train will be decided by the algorithm itself. Default: "auto".
Number of data points to use for calcuating the normalizing / unbiasing terms.
Number of data points to use for calcuating the normalizing / unbiasing terms. Must be > 0. Default: 10000.
Which optimizer is to be used.
Which optimizer is to be used. Supported options: "sgd" and "adam". Default: "adam".
The region in which to run the algorithm.
The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
Deserializes an Endpoint response into a series of Rows.
AmazonS3.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
Amazon SageMaker client.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
The SageMaker TrainingJob and Hosting IAM Role.
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
AmazonSTS.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
The SageMaker Channel name to input serialized Dataset fit input to
The SageMaker Channel name to input serialized Dataset fit input to
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
The MIME type of the training data.
The MIME type of the training data.
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
The SageMaker Training Job Channel input mode.
The SageMaker Training Job Channel input mode.
An S3 location to upload SageMaker Training Job input data to.
An S3 location to upload SageMaker Training Job input data to.
The number of instances of instanceType to run an SageMaker Training Job with
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker TrainingJob Instance Type to use
The SageMaker TrainingJob Instance Type to use
The EBS volume size in gigabytes of each instance.
The EBS volume size in gigabytes of each instance.
A KMS key ID for the Output Data Source
A KMS key ID for the Output Data Source
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
An S3 location for SageMaker to store Training Job output data to.
An S3 location for SageMaker to store Training Job output data to.
The columns to project from the Dataset being fit before training.
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Training Job S3 data distribution scheme.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The Spark Data Format Options used during serialization of the Dataset being fit.
The unique identifier of this Estimator.
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
Whether to unbias the features before training so that mean is 0.
Whether to unbias the features before training so that mean is 0. By default data is unbiased if use_bias is set to true. Default: "auto"
Whether to unbias the labels before training so that mean is 0.
Whether to unbias the labels before training so that mean is 0. Only done for regrssion if use_bias is true. Otherwise will be ignored. Default: "auto"
Whether model should include bias.
Whether model should include bias. Default: "True".
Whether to use a scheduler for the learning rate.
Whether to use a scheduler for the learning rate. Default: True
The L2 regularization, i.e.
The L2 regularization, i.e. the weight decay parameter. Use 0 for no L2 regularization. Must be >= 0. Default: 0.
A SageMakerEstimator that runs a Linear Learner training job in "regressor" mode in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted Linear Learner model. The Linear Learner Regressor is useful for predicting a real-valued label from training examples.
Amazon SageMaker Linear Learner trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.
For inference against a hosted Endpoint, the SageMakerModel returned by fit() by Linear Learner uses ProtobufRequestRowSerializer to serialize Rows into RecordIO-encoded Amazon Record protobuf messages, by default selecting the column named "features" expected to contain a Vector of Doubles.
Inferences made against an Endpoint hosting a Linear Learner Regressor model contain a "score" field appended to the input DataFrame as a Double