com.amazonaws.services.sagemaker.sparksdk.algorithms
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
The SageMaker TrainingJob Instance Type to use
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker Endpoint Confing instance type
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
An S3 location to upload SageMaker Training Job input data to.
An S3 location for SageMaker to store Training Job output data to.
The EBS volume size in gigabytes of each instance.
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Channel name to input serialized Dataset fit input to
The MIME type of the training data.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The SageMaker Training Job Channel input mode.
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
A KMS key ID for the Output Data Source
The environment variables that SageMaker will set on the model container during execution.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
Whether to remove the training data on s3 after training is complete or failed.
The NamePolicyFactory to use when naming SageMaker entities created during fit
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
PCA algorithm.
PCA algorithm. Supported options: "regular", "stable", and "randomized". Default: "regular".
Whether to remove the training data on s3 after training is complete or failed.
Whether to remove the training data on s3 after training is complete or failed.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
The SageMaker Endpoint Confing instance type
The SageMaker Endpoint Confing instance type
Number of extra components to compute.
Number of extra components to compute. Must be -1 or > 0. Valid for "randomized" mode. Ignored by other modes. Initializes a random matrix for covariance computation independent from the desired num_components. As it grows larger, the solution is more accurate but the runtime and memory consumption increase linearly. Default: -1
The dimension of the input vectors.
The dimension of the input vectors. Must be > 0. Required.
Fits a SageMakerModel on dataSet by running a SageMaker training job.
Fits a SageMakerModel on dataSet by running a SageMaker training job.
A map from hyperParameter names to their respective values for training.
A map from hyperParameter names to their respective values for training.
The number of examples in a mini-batch.
The number of examples in a mini-batch. Must be > 0. Required.
The environment variables that SageMaker will set on the model container during execution.
The environment variables that SageMaker will set on the model container during execution.
A SageMaker Model hosting Docker image URI.
A SageMaker Model hosting Docker image URI.
Whether the transformation result on Models built by this Estimator should also include the input Rows.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
The NamePolicyFactory to use when naming SageMaker entities created during fit
The NamePolicyFactory to use when naming SageMaker entities created during fit
Number of principal components.
Number of principal components. Required.
The region in which to run the algorithm.
The region in which to run the algorithm. If not specified, gets the region from the DefaultAwsRegionProviderChain.
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
Deserializes an Endpoint response into a series of Rows.
AmazonS3.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
Amazon SageMaker client.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
The SageMaker TrainingJob and Hosting IAM Role.
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
AmazonSTS.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
Whether to subtract the mean during training and inference Default: True
Whether to subtract the mean during training and inference Default: True
The SageMaker Channel name to input serialized Dataset fit input to
The SageMaker Channel name to input serialized Dataset fit input to
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
The MIME type of the training data.
The MIME type of the training data.
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
The SageMaker Training Job Channel input mode.
The SageMaker Training Job Channel input mode.
An S3 location to upload SageMaker Training Job input data to.
An S3 location to upload SageMaker Training Job input data to.
The number of instances of instanceType to run an SageMaker Training Job with
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker TrainingJob Instance Type to use
The SageMaker TrainingJob Instance Type to use
The EBS volume size in gigabytes of each instance.
The EBS volume size in gigabytes of each instance.
A KMS key ID for the Output Data Source
A KMS key ID for the Output Data Source
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
An S3 location for SageMaker to store Training Job output data to.
An S3 location for SageMaker to store Training Job output data to.
The columns to project from the Dataset being fit before training.
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Training Job S3 data distribution scheme.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The Spark Data Format Options used during serialization of the Dataset being fit.
The unique identifier of this Estimator.
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
A SageMakerEstimator that runs a PCA training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted PCA model. PCA, or Principal Component Analysis, is useful for reducing the dimensionality of data before training with another algorithm.
Amazon SageMaker PCA trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.
PCASageMakerEstimator uses ProtobufRequestRowSerializer to serialize Rows into RecordIO-encoded Amazon Record protobuf messages for inference, by default selecting the column named "features" expected to contain a Vector of Doubles.
Inferences made against an Endpoint hosting a PCA model contain a "projection" field appended to the input DataFrame as a Dense Vector of Doubles.