Optional ReadonlyinitialInitial number of instances for the LLM endpoint.
Use cases: Initial capacity planning; Baseline availability; Deployment sizing
AWS: SageMaker endpoint initial instance count
Validation: Optional; Positive integer; Should be between min and max counts
Optional ReadonlyinstanceSageMaker instance type for LLM hosting.
Use cases: Compute resource sizing; Performance optimization; Cost management
AWS: Amazon SageMaker endpoint instance type
Validation: Optional; Must be valid SageMaker instance type (e.g., ml.g5.2xlarge)
Optional ReadonlymaximumMaximum instance count for LLM endpoint auto-scaling.
Use cases: Peak capacity control; Cost limits; Auto-scaling upper bound
AWS: SageMaker endpoint auto-scaling maximum
Validation: Optional; Positive integer; Must be >= initial and min counts
Optional ReadonlyminimumMinimum instance count for LLM endpoint auto-scaling.
Use cases: Cost optimization; Minimum availability guarantee; Auto-scaling lower bound
AWS: SageMaker endpoint auto-scaling minimum
Validation: Optional; Positive integer; Must be <= initial and max counts
ReadonlymodelSageMaker LLM model to deploy for conversational AI.
Use cases: LLM model selection; Chatbot backend model; Text generation endpoint
AWS: Amazon SageMaker hosted LLM model
Validation: Required; Must be valid SupportedSageMakerModels enum value (FalconLite, Llama2_13b_Chat, Mistral7b_Instruct2)
SageMaker-hosted LLM model configuration with auto-scaling for GAIA chatbot backends. Supports Falcon, Mistral, and Llama2 models with configurable instance types and scaling.
Use cases: SageMaker LLM deployment; Auto-scaling chatbot backends; Custom instance sizing; Production LLM hosting
AWS: Amazon SageMaker real-time inference endpoints with auto-scaling
Validation: Required model field; instance counts must satisfy min <= initial <= max when provided