Package

com.amazonaws.services.sagemaker.sparksdk

algorithms

Permalink

package algorithms

Visibility
  1. Public
  2. All

Type Members

  1. class KMeansSageMakerEstimator extends SageMakerEstimator with KMeansParams

    Permalink

    A SageMakerEstimator that runs a K-Means Clustering training job on Amazon SageMaker upon a call to fit() on a DataFrame and returns a SageMakerModel that can be used to transform a DataFrame using the hosted K-Means model.

    A SageMakerEstimator that runs a K-Means Clustering training job on Amazon SageMaker upon a call to fit() on a DataFrame and returns a SageMakerModel that can be used to transform a DataFrame using the hosted K-Means model. K-Means Clustering is useful for grouping similar examples in your dataset.

    Amazon SageMaker K-Means clustering trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.

    For inference, the SageMakerModel returned by fit() by the KMeansSageMakerEstimator uses ProtobufRequestRowSerializer to serialize Rows into RecordIO-encoded Amazon Record protobuf messages for inference, by default selecting the column named "features" expected to contain a Vector of Doubles.

    Inferences made against an Endpoint hosting a K-Means model contain a "closest_cluster" field and a "distance_to_cluster" field, both appended to the input DataFrame as columns of Double.

  2. class LinearLearnerBinaryClassifier extends LinearLearnerSageMakerEstimator with BinaryClassifierParams

    Permalink

    A SageMakerEstimator that runs a Linear Learner training job in "binary classifier" mode in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted Linear Learner model.

    A SageMakerEstimator that runs a Linear Learner training job in "binary classifier" mode in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted Linear Learner model. The Linear Learner Binary Classifier is useful for classifying examples into one of two classes.

    Amazon SageMaker Linear Learner trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.

    Inferences made against an Endpoint hosting a Linear Learner Binary classifier model contain a "score" field and a "predicted_label" field, both appended to the input DataFrame as Doubles.

  3. class LinearLearnerRegressor extends LinearLearnerSageMakerEstimator with LinearLearnerParams

    Permalink

    A SageMakerEstimator that runs a Linear Learner training job in "regressor" mode in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted Linear Learner model.

    A SageMakerEstimator that runs a Linear Learner training job in "regressor" mode in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted Linear Learner model. The Linear Learner Regressor is useful for predicting a real-valued label from training examples.

    Amazon SageMaker Linear Learner trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.

    For inference against a hosted Endpoint, the SageMakerModel returned by fit() by Linear Learner uses ProtobufRequestRowSerializer to serialize Rows into RecordIO-encoded Amazon Record protobuf messages, by default selecting the column named "features" expected to contain a Vector of Doubles.

    Inferences made against an Endpoint hosting a Linear Learner Regressor model contain a "score" field appended to the input DataFrame as a Double

  4. class PCASageMakerEstimator extends SageMakerEstimator with PCAParams

    Permalink

    A SageMakerEstimator that runs a PCA training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted PCA model.

    A SageMakerEstimator that runs a PCA training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted PCA model. PCA, or Principal Component Analysis, is useful for reducing the dimensionality of data before training with another algorithm.

    Amazon SageMaker PCA trains on RecordIO-encoded Amazon Record protobuf data. SageMaker Spark writes a DataFrame to S3 by selecting a column of Vectors named "features" and, if present, a column of Doubles named "label". These names are configurable by passing a map with entries in trainingSparkDataFormatOptions with key "labelColumnName" or "featuresColumnName", with values corresponding to the desired label and features columns.

    PCASageMakerEstimator uses ProtobufRequestRowSerializer to serialize Rows into RecordIO-encoded Amazon Record protobuf messages for inference, by default selecting the column named "features" expected to contain a Vector of Doubles.

    Inferences made against an Endpoint hosting a PCA model contain a "projection" field appended to the input DataFrame as a Dense Vector of Doubles.

  5. class XGBoostSageMakerEstimator extends SageMakerEstimator with XGBoostParams

    Permalink

    A SageMakerEstimator that runs an XGBoost training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted XGBoost model.

    A SageMakerEstimator that runs an XGBoost training job in SageMaker and returns a SageMakerModel that can be used to transform a DataFrame using the hosted XGBoost model. XGBoost is an open-source distributed gradient boosting library that Amazon SageMaker has adapted to run on Amazon SageMaker.

    XGBoost trains and infers on LibSVM-formatted data. XGBoostSageMakerEstimator uses Spark's LibSVMFileFormat to write the training DataFrame to S3, and serializes Rows to LibSVM for inference, selecting the column named "features" by default, expected to contain a Vector of Doubles.

    Inferences made against an Endpoint hosting an XGBoost model contain a "prediction" field appended to the input DataFrame as a column of Doubles, containing the prediction corresponding to the given Vector of features.

    See also

    https://github.com/dmlc/xgboost for more on XGBoost.

Value Members

  1. object KMeansSageMakerEstimator extends Serializable

    Permalink
  2. object LinearLearnerSageMakerEstimator extends Serializable

    Permalink
  3. object PCASageMakerEstimator extends Serializable

    Permalink
  4. object XGBoostSageMakerEstimator extends Serializable

    Permalink

Ungrouped