Amazon Chime SDK for JavaScript contains easy-to-use APIs for adding frame-by-frame processing to an outgoing video stream.
Amazon Chime SDK for JavaScript defines a video processing stage as an implementation of the VideoFrameProcessor interface, which takes an array of VideoFrameBuffers, applies builder-defined processing, and outputs an array of VideoFrameBuffers. The outputs of each processor can be linked to the inputs of the next processor, with the last processor in the chain required to implement asCanvasImageSource to return CanvasImageSource so that the resulting frames can be rendered onto a HTMLCanvasElement and transformed into a MediaStream.
To integrate video processing into meeting session, VideoTransformDevice should be used, which internally uses a VideoFrameProcessorPipeline to complete the aforementioned linking of stages and final canvas rendering.
A typical workflow would be:
Create an array of custom VideoFrameProcessors.
Create a VideoTransformDevice from a Device and the array of VideoFrameProcessors.
Call meetingSession.audioVideo.startVideoInput with the VideoTransformDevice.
The APIs for video processing in Amazon Chime SDK for JavaScript work in Firefox, Chrome, Chromium-based browsers
(including Electron) on desktop, Android and iOS operating systems. A full compatibility table is below.
Browser
Minimum supported version
Firefox
76
Chromium-based browsers and environments, including Edge and Electron
78
Android Chrome
78
Safari on MacOS
13.0
iOS Safari
16
iOS Chrome
16
iOS Firefox (Except on iPad)
16
Note that there is a known issue with VideoFrameProcessor in Safari 15: see github issue 1059. This has been fixed with Safari 16.
VideoTransformDevice allows VideoFrameProcessors to be applied to to a Device and provide a new object which can be passed into meetingSession.audioVideo.startVideoInput.
DefaultVideoTransformDevice is the provided implementation of VideoTransformDevice. It takes the aforementioned Device and array of VideoFrameProcessors, then uses VideoFrameProcessorPipeline under the hood and hides its complexity.
The construction of the DefaultVideoTransformDevice will not start the camera or start processing. The method meetingSession.audioVideo.startVideoInput should be called just like for normal devices. The device controller will use the inner Device to acquire the source MediaStream and start the processing pipeline at the same frame rate. "Inner device" in this context refers to the original video stream coming from the selected camera.
The parameters to chooseVideoInputQuality are used as constraints on the source MediaStream. After the video input is chosen, meetingSession.audioVideo.startLocalVideoTile can be called to start streaming video.
To switch the inner Device on DefaultVideoTransformDevice, call DefaultVideoTransformDevice.chooseNewInnerDevice with a new Device.
DefaultVideoTransformDevice.chooseNewInnerDevice returns a new DefaultVideoTransformDevice but preserves the state of VideoFrameProcessors. Then call meetingSession.audioVideo.startVideoInput with the new transform device.
To stop video processing for the chosen DefaultVideoTransformDevice, call meetingSession.audioVideo.startVideoInput with a different Device (possibly another DefaultVideoTransformDevice) or call meetingSession.audioVideo.stopVideoInput to stop using previous DefaultVideoTransformDevice.
After stopping the video processing, the inner Device will be released by device controller unless the inner Device is a MediaStream provided by users where it is their responsibility of users to handle the lifecycle.
After DefaultVideoTransformDevice is no longer used by device controller, call DefaultVideoTransformDevice.stop to release the VideoProcessors and underlying pipeline. After stop is called, users must discard the DefaultVideoTransformDevice as it will not be reusable.DefaultVideoTransformDevice.stop is necessary to release the internal resources.
To receive notifications of lifecycle events, a DefaultVideoTransformDeviceObserver can be added to the DefaultVideoTransformDevice and handlers added for the following:
Observer
Description
processingDidStart
Called when video processing starts.
processingDidFailToStart
Called when video processing could not start due to runtime errors. In this case, developers are expected to call startVideoInput again with a valid VideoInputDevice to continue video sending.
processingDidStop
Called when video processing is stopped expectedly.
processingDidFailToStart
Called when the execution of processors slows the frame rate down by at least half.
VideoFrameBuffer is an abstract interface that can be implemented to represent images or video sources. It is required to implement asCanvasImageSource to return CanvasImageSource; optionally, developers can implement asCanvasElement or asTransferable to facilitate processing algorithm to work with HTMLCanvasElements or Workers respectively.
VideoFrameProcessor represents a processing stage. Internally, processors are executed in a completely serial manner. Each pass will finish before the next pass begins. The input VideoFrameBuffers are the video sources. Changing the property of buffers such as resizing will likely modify properties of the video sources and should be performed with care.
An overlay processor can be a customized processor for loading an external image. Note that this example accounts for the usage of Cross-Origin Resource Sharing (CORS):
classVideoLoadImageProcessorimplementsVideoFrameProcessor { // Create a HTMLCanvasElementprivatetargetCanvas: HTMLCanvasElement = document.createElement('canvas') asHTMLCanvasElement;// Create a HTMLImageElementprivateimage = document.createElement("img") asHTMLImageElement;// Load the image from sourceloadImage("https://someurl.any/page/bg.jpg", image);privatetargetCanvasCtx: CanvasRenderingContext2D = this.targetCanvas.getContext('2d') asCanvasRenderingContext2D;// Render the image on the canvasthis.targetCanvasCtx.drawImage(image, image.width, image.height);privatecanvasVideoFrameBuffer = newCanvasVideoFrameBuffer(this.targetCanvas);// Function to load an image from an external source (absolute URL) and configure CORS to make sure the image is successfully loadedasyncfunctionloadImage(url: string, elem: HTMLImageElement): Promise<HTMLImageElement> {returnnewPromise((resolve, reject) => {elem.onload = (): void=>resolve(elem);elem.onerror = reject;elem.src = url;// to configure CORS access for the fetch of the new image if it is not hosted on the same serverelem.crossOrigin = "anonymous"; }); }asyncprocess(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]> {constcanvas = buffers[0].asCanvasElement();// copy the frame to the intermediate canvasthis.targetCanvasCtx.drawImage(canvas, 0, 0));// replace the video frame with the external image one for subsequent processorbuffers[0] = this.canvasVideoFrameBuffer;returnbuffers; }}
The API ContentShareControllerFacade.startContentShare does not currently support passing in a VideoTransformDevice or similar. But the DefaultVideoTransformDevice makes it straight forward to apply transforms on a given MediaStream, and output a new MediaStream.
Note that for screen share usage we use MediaDevices.getDisplayMedia directly rather then the helper function ContentShareControllerFacade.startContentShareFromScreenCapture.
import {DefaultVideoTransformDevice} from'amazon-chime-sdk-js';mediaStream = navigator.mediaDevices.getDisplayMedia({audio:true,video:true});conststages = [newCircularCut()]; // constructs some custom processorconsttransformDevice = newDefaultVideoTransformDevice(logger,undefined, // Not needed when using transform directlystages);awaitmeetingSession.audioVideo.startContentShare(awaittransformDevice.transformStream(mediaStream));// On completiontransformDevice.stop();
The MediaStream can also be from a file input or other source.
Video Input Processing
Introduction
Amazon Chime SDK for JavaScript contains easy-to-use APIs for adding frame-by-frame processing to an outgoing video stream.
Amazon Chime SDK for JavaScript defines a video processing stage as an implementation of the
VideoFrameProcessor
interface, which takes an array ofVideoFrameBuffer
s, applies builder-defined processing, and outputs an array ofVideoFrameBuffer
s. The outputs of each processor can be linked to the inputs of the next processor, with the last processor in the chain required to implementasCanvasImageSource
to returnCanvasImageSource
so that the resulting frames can be rendered onto a HTMLCanvasElement and transformed into a MediaStream.To integrate video processing into meeting session,
VideoTransformDevice
should be used, which internally uses aVideoFrameProcessorPipeline
to complete the aforementioned linking of stages and final canvas rendering.A typical workflow would be:
VideoFrameProcessor
s.VideoTransformDevice
from aDevice
and the array ofVideoFrameProcessor
s.meetingSession.audioVideo.startVideoInput
with theVideoTransformDevice
.Browser compatibility
The APIs for video processing in Amazon Chime SDK for JavaScript work in Firefox, Chrome, Chromium-based browsers (including Electron) on desktop, Android and iOS operating systems. A full compatibility table is below.
Note that there is a known issue with
VideoFrameProcessor
in Safari 15: see github issue 1059. This has been fixed with Safari 16.Video Processing APIs
VideoTransformDevice
VideoTransformDevice
allowsVideoFrameProcessor
s to be applied to to aDevice
and provide a new object which can be passed intomeetingSession.audioVideo.startVideoInput
.DefaultVideoTransformDevice
is the provided implementation ofVideoTransformDevice
. It takes the aforementionedDevice
and array ofVideoFrameProcessor
s, then usesVideoFrameProcessorPipeline
under the hood and hides its complexity.Construction and Starting Video Processing
The construction of the
DefaultVideoTransformDevice
will not start the camera or start processing. The methodmeetingSession.audioVideo.startVideoInput
should be called just like for normal devices. The device controller will use the innerDevice
to acquire the sourceMediaStream
and start the processing pipeline at the same frame rate. "Inner device" in this context refers to the original video stream coming from the selected camera.The parameters to
chooseVideoInputQuality
are used as constraints on the sourceMediaStream
. After the video input is chosen,meetingSession.audioVideo.startLocalVideoTile
can be called to start streaming video.Switching the Inner Device on VideoTransformDevice
To switch the inner
Device
onDefaultVideoTransformDevice
, callDefaultVideoTransformDevice.chooseNewInnerDevice
with a newDevice
.DefaultVideoTransformDevice.chooseNewInnerDevice
returns a newDefaultVideoTransformDevice
but preserves the state ofVideoFrameProcessor
s. Then callmeetingSession.audioVideo.startVideoInput
with the new transform device.Stopping VideoTransformDevice
To stop video processing for the chosen
DefaultVideoTransformDevice
, callmeetingSession.audioVideo.startVideoInput
with a differentDevice
(possibly anotherDefaultVideoTransformDevice
) or callmeetingSession.audioVideo.stopVideoInput
to stop using previousDefaultVideoTransformDevice
.After stopping the video processing, the inner
Device
will be released by device controller unless the innerDevice
is aMediaStream
provided by users where it is their responsibility of users to handle the lifecycle.After
DefaultVideoTransformDevice
is no longer used by device controller, callDefaultVideoTransformDevice.stop
to release theVideoProcessor
s and underlying pipeline. Afterstop
is called, users must discard theDefaultVideoTransformDevice
as it will not be reusable.DefaultVideoTransformDevice.stop
is necessary to release the internal resources.Applications will need to stop and replace
DefaultVideoTransformDevice
when they want to change video processors or change the video input quality.Receiving lifecycle notifications with an observer
To receive notifications of lifecycle events, a
DefaultVideoTransformDeviceObserver
can be added to theDefaultVideoTransformDevice
and handlers added for the following:processingDidStart
processingDidFailToStart
startVideoInput
again with a validVideoInputDevice
to continue video sending.processingDidStop
processingDidFailToStart
VideoFrameBuffer
VideoFrameBuffer
is an abstract interface that can be implemented to represent images or video sources. It is required to implementasCanvasImageSource
to returnCanvasImageSource
; optionally, developers can implementasCanvasElement
orasTransferable
to facilitate processing algorithm to work with HTMLCanvasElements or Workers respectively.VideoFrameProcessor
VideoFrameProcessor
represents a processing stage. Internally, processors are executed in a completely serial manner. Each pass will finish before the next pass begins. The inputVideoFrameBuffer
s are the video sources. Changing the property of buffers such as resizing will likely modify properties of the video sources and should be performed with care.Building a simple processor
The following example shows how to build a basic processor to resize the video frames. We first define an implementation of
VideoFrameProcessor
:To keep the properties of the original video, the processor has to copy the frame onto its own staging buffer in
process
:During processing, the incoming video is painted onto the internal canvas like in the following abbreviated:
Building an overlay processor
An overlay processor can be a customized processor for loading an external image. Note that this example accounts for the usage of Cross-Origin Resource Sharing (CORS):
Additional Video Processing Use-Cases
Custom processor usage during meeting preview
Local video post processing can be previewed before transmitting to remote clients just for a normal device.
Custom video processor usage for content share
The API
ContentShareControllerFacade.startContentShare
does not currently support passing in aVideoTransformDevice
or similar. But theDefaultVideoTransformDevice
makes it straight forward to apply transforms on a givenMediaStream
, and output a newMediaStream
.Note that for screen share usage we use MediaDevices.getDisplayMedia directly rather then the helper function
ContentShareControllerFacade.startContentShareFromScreenCapture
.The
MediaStream
can also be from a file input or other source.Give feedback on this guide