sparksdk

Type Members

case class CreatedResources(modelName: Option[String], endpointConfigName: Option[String], endpointName: Option[String]) extends Product with Serializable

Resources that may have been created during operation of the SageMaker Estimator and Model.
Resources that may have been created during operation of the SageMaker Estimator and Model.
modelName
The name of the SageMaker Model that was created, or empty if it wasn't created.
endpointConfigName
The name of the SageMaker EndpointConfig that was created, or empty if it wasn't created.
endpointName
The name of the SageMaker Endpoint that was created, or empty if it wasn't created.
case class IAMRole(role: String) extends IAMRoleResource with Product with Serializable

Specifies an IAM Role
Specifies an IAM Role
role
The IAM role ARN or name
case class IAMRoleFromConfig(configKey: String = ...) extends IAMRoleResource with Product with Serializable

Specifies an IAM Role by a Spark configuration lookup.
Specifies an IAM Role by a Spark configuration lookup.
configKey
The Spark configuration key to read the IAM role ARN or name from
abstract class IAMRoleResource extends AnyRef

References an IAM Role.
abstract class NamePolicy extends AnyRef

Provides names for SageMaker entities created during fit in com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator
abstract class NamePolicyFactory extends AnyRef

Creates a NamePolicy upon a call to NamePolicyFactory#createNamePolicy
case class RandomNamePolicy(prefix: String = "") extends NamePolicy with Product with Serializable

Provides random, unique SageMaker entity names that begin with the specified prefix.
Provides random, unique SageMaker entity names that begin with the specified prefix.
prefix
The common name prefix for all SageMaker entities named with this NamePolicy
class RandomNamePolicyFactory extends NamePolicyFactory

Creates a RandomNamePolicy upon a call to RandomNamePolicyFactory#createNamePolicy
case class S3AutoCreatePath() extends S3Resource with Product with Serializable

Defines an S3 location that will be auto-created at runtime.
case class S3DataPath(bucket: String, objectPath: String) extends S3Resource with Product with Serializable

Represents a location within an S3 Bucket.
Represents a location within an S3 Bucket.
bucket
An S3 bucket
objectPath
An S3 key or key prefix
case class S3PathFromConfig(configKey: String = ...) extends S3Resource with Product with Serializable

Represents an S3 location defined by a Spark configuration key.
Represents an S3 location defined by a Spark configuration key.
The configuration key must either define a bucket name or an S3 URI of the form
```
s3://bucket-name/prefix-path
```
configKey
The Spark configuration key to read the S3 location from
abstract class S3Resource extends AnyRef

An S3 Resource for SageMaker to use.
class SageMakerEstimator extends Estimator[SageMakerModel]

Adapts a SageMaker learning Algorithm to a Spark Estimator.
Adapts a SageMaker learning Algorithm to a Spark Estimator. Fits a SageMakerModel by running a SageMaker Training Job on a Spark Dataset. Each call to fit submits a new SageMaker Training Job, creates a new SageMaker Model, and creates a new SageMaker Endpoint Config. A new Endpoint is either created by or the returned SageMakerModel is configured to generate an Endpoint on SageMakerModel transform.
On fit, the input dataset is serialized with the specified trainingSparkDataFormat using the specified trainingSparkDataFormatOptions and uploaded to an S3 location specified by trainingInputS3DataPath. The serialized Dataset is compressed with trainingCompressionCodec, if not None.
trainingProjectedColumns can be used to control which columns on the input Dataset are transmitted to SageMaker. If not None, then only those column names will be serialized as input to the SageMaker Training Job.
A Training Job is created with the uploaded Dataset being input to the specified trainingChannelName, with the specified trainingInputMode. The algorithm is specified trainingImage, a Docker image URI reference. The Training Job is created with trainingInstanceCount instances of type trainingInstanceType. The Training Job will time-out after trainingMaxRuntimeInSeconds, if not None.
SageMaker Training Job hyperparameters are built from the org.apache.spark.ml.param.Params set on this Estimator. Param objects set on this Estimator are retrieved during fit and converted to a SageMaker Training Job hyperparameter Map. Param objects are iterated over by invoking params on this Estimator. Param objects with neither a default value nor a set value are ignored. If a Param is not set but has a default value, the default value will be used. Param values are converted to SageMaker hyperparameter String values by invoking toString on the Param value.
SageMaker uses the IAM Role with ARN sagemakerRole to access the input and output S3 buckets and trainingImage if the image is hosted in ECR. SageMaker Training Job output is stored in a Training Job specific sub-prefix of trainingOutputS3DataPath. This contains the SageMaker Training Job output file as well as the SageMaker Training Job model file.
After the Training Job is created, this Estimator will poll for success. Upon success an SageMakerModel is created and returned from fit. The SageMakerModel is created with a modelImage Docker image URI, defining the SageMaker model primary container and with modelEnvironmentVariables environment variables. Each SageMakerModel has a corresponding SageMaker hosting Endpoint. This Endpoint runs on at least endpointInitialInstanceCount instances of type endpointInstanceType. The Endpoint is created either during construction of the SageMakerModel or on the first call to transform, controlled by endpointCreationPolicy. Each Endpoint instance runs with sagemakerRole IAMRole.
The transform method on SageMakerModel uses requestRowSerializer to serialize Rows from the Dataset undergoing transformation, to requests on the hosted SageMaker Endpoint. The responseRowDeserializer is used to convert the response from the Endpoint to a series of Rows, forming the transformed Dataset. If modelPrependInputRowsToTransformationRows is true, then each transformed Row is also prepended with its corresponding input Row.
class SageMakerModel extends Model[SageMakerModel]

A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint.
A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint. Manages life cycle of all necessary SageMaker entities, including Model, EndpointConfig, and Endpoint.
This Model transforms one DataFrame to another by repeated, distributed SageMaker Endpoint invocation. Each invocation request body is formed by concatenating input DataFrame Rows serialized to Byte Arrays by the specified serializer. The invocation request content-type property is set from contentType. The invocation request accepts property is set from the deserializer's accepts.
The transformed DataFrame is produced by deserializing each invocation response body into a series of Rows. Row deserialization is delegated to the specified deserializer, which converts an Array of Bytes to an Iterator[Row]. If prependInputRows is false, the transformed DataFrame will contain just these Rows. If prependInputRows is true, then each transformed Row is a concatenation of the input Row with its corresponding SageMaker invocation deserialized Row.
Each invocation of transform passes the Dataset.schema of the input DataFrame to requestRowSerialize by invoking setSchema.
The specified serializer also controls the validity of input Row Schemas for this Model. Schema validation is carried out on each call to transformSchema, which invokes validateSchema.
Adapting this SageMaker model to the data format and type of a specific Endpoint is achieved by sub-classing RequestRowSerializer and RequestRowDeserializer. Examples of a Serializer and Deseralizer are LibSVMRequestRowSerializer and LibSVMResponseRowDeserializer respectively.
class SageMakerResourceCleanup extends AnyRef

Deletes any SageMaker entities created during operation of the SageMaker Estimator and Transformer.

Value Members

object EndpointCreationPolicy extends Enumeration

Determines whether and when to create the Endpoint and other Hosting resources.
Determines whether and when to create the Endpoint and other Hosting resources.
CREATE_ON_CONSTRUCT - create the Endpoint upon creation of the SageMakerModel, at the end of fit() CREATE_ON_TRANSFORM - create the Endpoint upon invocation of SageMakerModel.transform() DO_NOT_CREATE - do not create the Endpoint
object S3DataPath extends Serializable
object SageMakerEstimator extends Serializable
object SageMakerModel extends Serializable
package algorithms
package exceptions
package internal
package protobuf
package transformation

package sparksdk

Type Members

case class CreatedResources(modelName: Option[String], endpointConfigName: Option[String], endpointName: Option[String]) extends Product with Serializable

case class IAMRole(role: String) extends IAMRoleResource with Product with Serializable

case class IAMRoleFromConfig(configKey: String = ...) extends IAMRoleResource with Product with Serializable

abstract class IAMRoleResource extends AnyRef

abstract class NamePolicy extends AnyRef

abstract class NamePolicyFactory extends AnyRef

case class RandomNamePolicy(prefix: String = "") extends NamePolicy with Product with Serializable

class RandomNamePolicyFactory extends NamePolicyFactory

case class S3AutoCreatePath() extends S3Resource with Product with Serializable

case class S3DataPath(bucket: String, objectPath: String) extends S3Resource with Product with Serializable

case class S3PathFromConfig(configKey: String = ...) extends S3Resource with Product with Serializable

abstract class S3Resource extends AnyRef

class SageMakerEstimator extends Estimator[SageMakerModel]

class SageMakerModel extends Model[SageMakerModel]

class SageMakerResourceCleanup extends AnyRef

Value Members

object EndpointCreationPolicy extends Enumeration

object S3DataPath extends Serializable

object SageMakerEstimator extends Serializable

object SageMakerModel extends Serializable

package algorithms

package exceptions

package internal

package protobuf

package transformation

Ungrouped