Resources that may have been created during operation of the SageMaker Estimator and Model.
Specifies an IAM Role
Specifies an IAM Role
The IAM role ARN or name
Specifies an IAM Role by a Spark configuration lookup.
Specifies an IAM Role by a Spark configuration lookup.
The Spark configuration key to read the IAM role ARN or name from
References an IAM Role.
Provides names for SageMaker entities created during fit in com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator
Creates a NamePolicy upon a call to NamePolicyFactory#createNamePolicy
Provides random, unique SageMaker entity names that begin with the specified prefix.
Provides random, unique SageMaker entity names that begin with the specified prefix.
The common name prefix for all SageMaker entities named with this NamePolicy
Creates a RandomNamePolicy upon a call to RandomNamePolicyFactory#createNamePolicy
Defines an S3 location that will be auto-created at runtime.
Represents a location within an S3 Bucket.
Represents a location within an S3 Bucket.
An S3 bucket
An S3 key or key prefix
Represents an S3 location defined by a Spark configuration key.
Represents an S3 location defined by a Spark configuration key.
The configuration key must either define a bucket name or an S3 URI of the form
s3://bucket-name/prefix-path
The Spark configuration key to read the S3 location from
An S3 Resource for SageMaker to use.
Adapts a SageMaker learning Algorithm to a Spark Estimator.
Adapts a SageMaker learning Algorithm to a Spark Estimator. Fits a SageMakerModel by running a SageMaker Training Job on a Spark Dataset. Each call to fit submits a new SageMaker Training Job, creates a new SageMaker Model, and creates a new SageMaker Endpoint Config. A new Endpoint is either created by or the returned SageMakerModel is configured to generate an Endpoint on SageMakerModel transform.
On fit, the input dataset is serialized with the specified trainingSparkDataFormat using the specified trainingSparkDataFormatOptions and uploaded to an S3 location specified by trainingInputS3DataPath. The serialized Dataset is compressed with trainingCompressionCodec, if not None.
trainingProjectedColumns can be used to control which columns on the input Dataset are transmitted to SageMaker. If not None, then only those column names will be serialized as input to the SageMaker Training Job.
A Training Job is created with the uploaded Dataset being input to the specified trainingChannelName, with the specified trainingInputMode. The algorithm is specified trainingImage, a Docker image URI reference. The Training Job is created with trainingInstanceCount instances of type trainingInstanceType. The Training Job will time-out after trainingMaxRuntimeInSeconds, if not None.
SageMaker Training Job hyperparameters are built from the org.apache.spark.ml.param.Params set on this Estimator. Param objects set on this Estimator are retrieved during fit and converted to a SageMaker Training Job hyperparameter Map. Param objects are iterated over by invoking params on this Estimator. Param objects with neither a default value nor a set value are ignored. If a Param is not set but has a default value, the default value will be used. Param values are converted to SageMaker hyperparameter String values by invoking toString on the Param value.
SageMaker uses the IAM Role with ARN sagemakerRole to access the input and output S3 buckets and trainingImage if the image is hosted in ECR. SageMaker Training Job output is stored in a Training Job specific sub-prefix of trainingOutputS3DataPath. This contains the SageMaker Training Job output file as well as the SageMaker Training Job model file.
After the Training Job is created, this Estimator will poll for success. Upon success an SageMakerModel is created and returned from fit. The SageMakerModel is created with a modelImage Docker image URI, defining the SageMaker model primary container and with modelEnvironmentVariables environment variables. Each SageMakerModel has a corresponding SageMaker hosting Endpoint. This Endpoint runs on at least endpointInitialInstanceCount instances of type endpointInstanceType. The Endpoint is created either during construction of the SageMakerModel or on the first call to transform, controlled by endpointCreationPolicy. Each Endpoint instance runs with sagemakerRole IAMRole.
The transform method on SageMakerModel uses requestRowSerializer to serialize Rows from the Dataset undergoing transformation, to requests on the hosted SageMaker Endpoint. The responseRowDeserializer is used to convert the response from the Endpoint to a series of Rows, forming the transformed Dataset. If modelPrependInputRowsToTransformationRows is true, then each transformed Row is also prepended with its corresponding input Row.
A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint.
A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint. Manages life cycle of all necessary SageMaker entities, including Model, EndpointConfig, and Endpoint.
This Model transforms one DataFrame to another by repeated, distributed SageMaker Endpoint invocation. Each invocation request body is formed by concatenating input DataFrame Rows serialized to Byte Arrays by the specified serializer. The invocation request content-type property is set from contentType. The invocation request accepts property is set from the deserializer's accepts.
The transformed DataFrame is produced by deserializing each invocation response body into a series of Rows. Row deserialization is delegated to the specified deserializer, which converts an Array of Bytes to an Iterator[Row]. If prependInputRows is false, the transformed DataFrame will contain just these Rows. If prependInputRows is true, then each transformed Row is a concatenation of the input Row with its corresponding SageMaker invocation deserialized Row.
Each invocation of transform passes the Dataset.schema of the input DataFrame to requestRowSerialize by invoking setSchema.
The specified serializer also controls the validity of input Row Schemas for this Model. Schema validation is carried out on each call to transformSchema, which invokes validateSchema.
Adapting this SageMaker model to the data format and type of a specific Endpoint is achieved by sub-classing RequestRowSerializer and RequestRowDeserializer. Examples of a Serializer and Deseralizer are LibSVMRequestRowSerializer and LibSVMResponseRowDeserializer respectively.
Deletes any SageMaker entities created during operation of the SageMaker Estimator and Transformer.
Determines whether and when to create the Endpoint and other Hosting resources.
Determines whether and when to create the Endpoint and other Hosting resources.
CREATE_ON_CONSTRUCT - create the Endpoint upon creation of the SageMakerModel, at the end of fit() CREATE_ON_TRANSFORM - create the Endpoint upon invocation of SageMakerModel.transform() DO_NOT_CREATE - do not create the Endpoint
Resources that may have been created during operation of the SageMaker Estimator and Model.
The name of the SageMaker Model that was created, or empty if it wasn't created.
The name of the SageMaker EndpointConfig that was created, or empty if it wasn't created.
The name of the SageMaker Endpoint that was created, or empty if it wasn't created.