Package

com.amazonaws.services.sagemaker

sparksdk

Permalink

package sparksdk

Visibility
  1. Public
  2. All

Type Members

  1. case class CreatedResources(modelName: Option[String], endpointConfigName: Option[String], endpointName: Option[String]) extends Product with Serializable

    Permalink

    Resources that may have been created during operation of the SageMaker Estimator and Model.

    Resources that may have been created during operation of the SageMaker Estimator and Model.

    modelName

    The name of the SageMaker Model that was created, or empty if it wasn't created.

    endpointConfigName

    The name of the SageMaker EndpointConfig that was created, or empty if it wasn't created.

    endpointName

    The name of the SageMaker Endpoint that was created, or empty if it wasn't created.

  2. case class IAMRole(role: String) extends IAMRoleResource with Product with Serializable

    Permalink

    Specifies an IAM Role

    Specifies an IAM Role

    role

    The IAM role ARN or name

  3. case class IAMRoleFromConfig(configKey: String = ...) extends IAMRoleResource with Product with Serializable

    Permalink

    Specifies an IAM Role by a Spark configuration lookup.

    Specifies an IAM Role by a Spark configuration lookup.

    configKey

    The Spark configuration key to read the IAM role ARN or name from

  4. abstract class IAMRoleResource extends AnyRef

    Permalink

    References an IAM Role.

  5. abstract class NamePolicy extends AnyRef

    Permalink

    Provides names for SageMaker entities created during fit in com.amazonaws.services.sagemaker.sparksdk.SageMakerEstimator

  6. abstract class NamePolicyFactory extends AnyRef

    Permalink

    Creates a NamePolicy upon a call to NamePolicyFactory#createNamePolicy

  7. case class RandomNamePolicy(prefix: String = "") extends NamePolicy with Product with Serializable

    Permalink

    Provides random, unique SageMaker entity names that begin with the specified prefix.

    Provides random, unique SageMaker entity names that begin with the specified prefix.

    prefix

    The common name prefix for all SageMaker entities named with this NamePolicy

  8. class RandomNamePolicyFactory extends NamePolicyFactory

    Permalink

    Creates a RandomNamePolicy upon a call to RandomNamePolicyFactory#createNamePolicy

  9. case class S3AutoCreatePath() extends S3Resource with Product with Serializable

    Permalink

    Defines an S3 location that will be auto-created at runtime.

  10. case class S3DataPath(bucket: String, objectPath: String) extends S3Resource with Product with Serializable

    Permalink

    Represents a location within an S3 Bucket.

    Represents a location within an S3 Bucket.

    bucket

    An S3 bucket

    objectPath

    An S3 key or key prefix

  11. case class S3PathFromConfig(configKey: String = ...) extends S3Resource with Product with Serializable

    Permalink

    Represents an S3 location defined by a Spark configuration key.

    Represents an S3 location defined by a Spark configuration key.

    The configuration key must either define a bucket name or an S3 URI of the form

    s3://bucket-name/prefix-path
    configKey

    The Spark configuration key to read the S3 location from

  12. abstract class S3Resource extends AnyRef

    Permalink

    An S3 Resource for SageMaker to use.

  13. class SageMakerEstimator extends Estimator[SageMakerModel]

    Permalink

    Adapts a SageMaker learning Algorithm to a Spark Estimator.

    Adapts a SageMaker learning Algorithm to a Spark Estimator. Fits a SageMakerModel by running a SageMaker Training Job on a Spark Dataset. Each call to fit submits a new SageMaker Training Job, creates a new SageMaker Model, and creates a new SageMaker Endpoint Config. A new Endpoint is either created by or the returned SageMakerModel is configured to generate an Endpoint on SageMakerModel transform.

    On fit, the input dataset is serialized with the specified trainingSparkDataFormat using the specified trainingSparkDataFormatOptions and uploaded to an S3 location specified by trainingInputS3DataPath. The serialized Dataset is compressed with trainingCompressionCodec, if not None.

    trainingProjectedColumns can be used to control which columns on the input Dataset are transmitted to SageMaker. If not None, then only those column names will be serialized as input to the SageMaker Training Job.

    A Training Job is created with the uploaded Dataset being input to the specified trainingChannelName, with the specified trainingInputMode. The algorithm is specified trainingImage, a Docker image URI reference. The Training Job is created with trainingInstanceCount instances of type trainingInstanceType. The Training Job will time-out after trainingMaxRuntimeInSeconds, if not None.

    SageMaker Training Job hyperparameters are built from the org.apache.spark.ml.param.Params set on this Estimator. Param objects set on this Estimator are retrieved during fit and converted to a SageMaker Training Job hyperparameter Map. Param objects are iterated over by invoking params on this Estimator. Param objects with neither a default value nor a set value are ignored. If a Param is not set but has a default value, the default value will be used. Param values are converted to SageMaker hyperparameter String values by invoking toString on the Param value.

    SageMaker uses the IAM Role with ARN sagemakerRole to access the input and output S3 buckets and trainingImage if the image is hosted in ECR. SageMaker Training Job output is stored in a Training Job specific sub-prefix of trainingOutputS3DataPath. This contains the SageMaker Training Job output file as well as the SageMaker Training Job model file.

    After the Training Job is created, this Estimator will poll for success. Upon success an SageMakerModel is created and returned from fit. The SageMakerModel is created with a modelImage Docker image URI, defining the SageMaker model primary container and with modelEnvironmentVariables environment variables. Each SageMakerModel has a corresponding SageMaker hosting Endpoint. This Endpoint runs on at least endpointInitialInstanceCount instances of type endpointInstanceType. The Endpoint is created either during construction of the SageMakerModel or on the first call to transform, controlled by endpointCreationPolicy. Each Endpoint instance runs with sagemakerRole IAMRole.

    The transform method on SageMakerModel uses requestRowSerializer to serialize Rows from the Dataset undergoing transformation, to requests on the hosted SageMaker Endpoint. The responseRowDeserializer is used to convert the response from the Endpoint to a series of Rows, forming the transformed Dataset. If modelPrependInputRowsToTransformationRows is true, then each transformed Row is also prepended with its corresponding input Row.

  14. class SageMakerModel extends Model[SageMakerModel]

    Permalink

    A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint.

    A Model implementation which transforms a DataFrame by making requests to a SageMaker Endpoint. Manages life cycle of all necessary SageMaker entities, including Model, EndpointConfig, and Endpoint.

    This Model transforms one DataFrame to another by repeated, distributed SageMaker Endpoint invocation. Each invocation request body is formed by concatenating input DataFrame Rows serialized to Byte Arrays by the specified serializer. The invocation request content-type property is set from contentType. The invocation request accepts property is set from the deserializer's accepts.

    The transformed DataFrame is produced by deserializing each invocation response body into a series of Rows. Row deserialization is delegated to the specified deserializer, which converts an Array of Bytes to an Iterator[Row]. If prependInputRows is false, the transformed DataFrame will contain just these Rows. If prependInputRows is true, then each transformed Row is a concatenation of the input Row with its corresponding SageMaker invocation deserialized Row.

    Each invocation of transform passes the Dataset.schema of the input DataFrame to requestRowSerialize by invoking setSchema.

    The specified serializer also controls the validity of input Row Schemas for this Model. Schema validation is carried out on each call to transformSchema, which invokes validateSchema.

    Adapting this SageMaker model to the data format and type of a specific Endpoint is achieved by sub-classing RequestRowSerializer and RequestRowDeserializer. Examples of a Serializer and Deseralizer are LibSVMRequestRowSerializer and LibSVMResponseRowDeserializer respectively.

  15. class SageMakerResourceCleanup extends AnyRef

    Permalink

    Deletes any SageMaker entities created during operation of the SageMaker Estimator and Transformer.

Value Members

  1. object EndpointCreationPolicy extends Enumeration

    Permalink

    Determines whether and when to create the Endpoint and other Hosting resources.

    Determines whether and when to create the Endpoint and other Hosting resources.

    CREATE_ON_CONSTRUCT - create the Endpoint upon creation of the SageMakerModel, at the end of fit() CREATE_ON_TRANSFORM - create the Endpoint upon invocation of SageMakerModel.transform() DO_NOT_CREATE - do not create the Endpoint

  2. object S3DataPath extends Serializable

    Permalink
  3. object SageMakerEstimator extends Serializable

    Permalink
  4. object SageMakerModel extends Serializable

    Permalink
  5. package algorithms

    Permalink
  6. package exceptions

    Permalink
  7. package internal

    Permalink
  8. package protobuf

    Permalink
  9. package transformation

    Permalink

Ungrouped