com.amazonaws.services.sagemaker.sparksdk
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
A SageMaker Model hosting Docker image URI.
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
The SageMaker TrainingJob Instance Type to use
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker Endpoint Confing instance type
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
An S3 location to upload SageMaker Training Job input data to.
An S3 location for SageMaker to store Training Job output data to.
The EBS volume size in gigabytes of each instance
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Channel name to input serialized Dataset fit input to
The MIME type of the training data.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The SageMaker Training Job Channel input mode.
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
A KMS key ID for the Output Data Source
The environment variables that SageMaker will set on the model container during execution.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
Whether to remove the training data on s3 after training is complete or failed.
The NamePolicyFactory to use when naming SageMaker entities created during fit
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
A map from hyperParameter names to their respective values for training.
Whether to remove the training data on s3 after training is complete or failed.
Defines how a SageMaker Endpoint referenced by a SageMakerModel is created.
The SageMaker Endpoint Config minimum number of instances that can be used to host modelImage
The SageMaker Endpoint Confing instance type
Fits a SageMakerModel on dataSet by running a SageMaker training job.
Fits a SageMakerModel on dataSet by running a SageMaker training job.
A map from hyperParameter names to their respective values for training.
The environment variables that SageMaker will set on the model container during execution.
A SageMaker Model hosting Docker image URI.
Whether the transformation result on Models built by this Estimator should also include the input Rows.
Whether the transformation result on Models built by this Estimator should also include the input Rows. If true, each output Row is formed by a concatenation of the input Row with the corresponding Row produced by SageMaker Endpoint invocation, produced by responseRowDeserializer. If false, each output Row is just taken from responseRowDeserializer.
The NamePolicyFactory to use when naming SageMaker entities created during fit
Serializes Spark DataFrame Rows for transformation by Models built from this Estimator.
Deserializes an Endpoint response into a series of Rows.
AmazonS3.
AmazonS3. Used to create a bucket for staging SageMaker Training Job input and/or output if either are set to S3AutoCreatePath.
Amazon SageMaker client.
Amazon SageMaker client. Used to send CreateTrainingJob, CreateModel, and CreateEndpoint requests.
The SageMaker TrainingJob and Hosting IAM Role.
The SageMaker TrainingJob and Hosting IAM Role. Used by a SageMaker to access S3 and ECR resources. SageMaker hosted Endpoints instances launched by this Estimator run with this role.
AmazonSTS.
AmazonSTS. Used to resolve the account number when creating staging input / output buckets.
The SageMaker Channel name to input serialized Dataset fit input to
The type of compression to use when serializing the Dataset being fit for input to SageMaker.
The MIME type of the training data.
A SageMaker Training Job Algorithm Specification Training Image Docker image URI.
The SageMaker Training Job Channel input mode.
An S3 location to upload SageMaker Training Job input data to.
The number of instances of instanceType to run an SageMaker Training Job with
The SageMaker TrainingJob Instance Type to use
The EBS volume size in gigabytes of each instance
A KMS key ID for the Output Data Source
A SageMaker Training Job Termination Condition MaxRuntimeInHours.
An S3 location for SageMaker to store Training Job output data to.
The columns to project from the Dataset being fit before training.
The columns to project from the Dataset being fit before training. If an Optional.empty is passed then no specific projection will occur and all columns will be serialized.
The SageMaker Training Job S3 data distribution scheme.
The Spark Data Format name used to serialize the Dataset being fit for input to SageMaker.
The Spark Data Format Options used during serialization of the Dataset being fit.
The unique identifier of this Estimator.
The unique identifier of this Estimator. Used to represent this stage in Spark ML pipelines.
Adapts a SageMaker learning Algorithm to a Spark Estimator. Fits a SageMakerModel by running a SageMaker Training Job on a Spark Dataset. Each call to fit submits a new SageMaker Training Job, creates a new SageMaker Model, and creates a new SageMaker Endpoint Config. A new Endpoint is either created by or the returned SageMakerModel is configured to generate an Endpoint on SageMakerModel transform.
On fit, the input dataset is serialized with the specified trainingSparkDataFormat using the specified trainingSparkDataFormatOptions and uploaded to an S3 location specified by trainingInputS3DataPath. The serialized Dataset is compressed with trainingCompressionCodec, if not None.
trainingProjectedColumns can be used to control which columns on the input Dataset are transmitted to SageMaker. If not None, then only those column names will be serialized as input to the SageMaker Training Job.
A Training Job is created with the uploaded Dataset being input to the specified trainingChannelName, with the specified trainingInputMode. The algorithm is specified trainingImage, a Docker image URI reference. The Training Job is created with trainingInstanceCount instances of type trainingInstanceType. The Training Job will time-out after trainingMaxRuntimeInSeconds, if not None.
SageMaker Training Job hyperparameters are built from the org.apache.spark.ml.param.Params set on this Estimator. Param objects set on this Estimator are retrieved during fit and converted to a SageMaker Training Job hyperparameter Map. Param objects are iterated over by invoking params on this Estimator. Param objects with neither a default value nor a set value are ignored. If a Param is not set but has a default value, the default value will be used. Param values are converted to SageMaker hyperparameter String values by invoking toString on the Param value.
SageMaker uses the IAM Role with ARN sagemakerRole to access the input and output S3 buckets and trainingImage if the image is hosted in ECR. SageMaker Training Job output is stored in a Training Job specific sub-prefix of trainingOutputS3DataPath. This contains the SageMaker Training Job output file as well as the SageMaker Training Job model file.
After the Training Job is created, this Estimator will poll for success. Upon success an SageMakerModel is created and returned from fit. The SageMakerModel is created with a modelImage Docker image URI, defining the SageMaker model primary container and with modelEnvironmentVariables environment variables. Each SageMakerModel has a corresponding SageMaker hosting Endpoint. This Endpoint runs on at least endpointInitialInstanceCount instances of type endpointInstanceType. The Endpoint is created either during construction of the SageMakerModel or on the first call to transform, controlled by endpointCreationPolicy. Each Endpoint instance runs with sagemakerRole IAMRole.
The transform method on SageMakerModel uses requestRowSerializer to serialize Rows from the Dataset undergoing transformation, to requests on the hosted SageMaker Endpoint. The responseRowDeserializer is used to convert the response from the Endpoint to a series of Rows, forming the transformed Dataset. If modelPrependInputRowsToTransformationRows is true, then each transformed Row is also prepended with its corresponding input Row.