Namespace VideoProcessor

Video Input Processing

Amazon Chime SDK for JavaScript contains easy-to-use APIs for adding frame-by-frame processing to an outgoing video stream.

Amazon Chime SDK for JavaScript defines a video processing stage as an implementation of the VideoFrameProcessor interface, which takes an array of VideoFrameBuffers, applies builder-defined processing, and outputs an array of VideoFrameBuffers. The outputs of each processor can be linked to the inputs of the next processor, with the last processor in the chain required to implement asCanvasImageSource to return CanvasImageSource so that the resulting frames can be rendered onto a HTMLCanvasElement and transformed into a MediaStream.

To integrate video processing into meeting session, VideoTransformDevice should be used, which internally uses a VideoFrameProcessorPipeline to complete the aforementioned linking of stages and final canvas rendering.

A typical workflow would be:

  1. Create an array of custom VideoFrameProcessors.
  2. Create a VideoTransformDevice from a Device and the array of VideoFrameProcessors.
  3. Call meetingSession.audioVideo.startVideoInput with the VideoTransformDevice.

The APIs for video processing in Amazon Chime SDK for JavaScript work in Firefox, Chrome, Chromium-based browsers (including Electron) on desktop, Android and iOS operating systems. A full compatibility table is below.

Browser Minimum supported version
Firefox 76
Chromium-based browsers and environments, including Edge and Electron 78
Android Chrome 78
Safari on MacOS 13.0
iOS Safari 16
iOS Chrome 16
iOS Firefox (Except on iPad) 16

Note that there is a known issue with VideoFrameProcessor in Safari 15: see github issue 1059. This has been fixed with Safari 16.

VideoTransformDevice allows VideoFrameProcessors to be applied to to a Device and provide a new object which can be passed into meetingSession.audioVideo.startVideoInput.

DefaultVideoTransformDevice is the provided implementation of VideoTransformDevice. It takes the aforementioned Device and array of VideoFrameProcessors, then uses VideoFrameProcessorPipeline under the hood and hides its complexity.

The construction of the DefaultVideoTransformDevice will not start the camera or start processing. The method meetingSession.audioVideo.startVideoInput should be called just like for normal devices. The device controller will use the inner Device to acquire the source MediaStream and start the processing pipeline at the same frame rate. "Inner device" in this context refers to the original video stream coming from the selected camera.

The parameters to chooseVideoInputQuality are used as constraints on the source MediaStream. After the video input is chosen, meetingSession.audioVideo.startLocalVideoTile can be called to start streaming video.

import {
DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

const stages = [new VideoResizeProcessor(4/3)]; // constructs processor

const transformDevice = new DefaultVideoTransformDevice(
logger,
'foo', // device id string
stages
);

await meetingSession.audioVideo.startVideoInput(transformDevice);
meetingSession.audioVideo.startLocalVideo();

To switch the inner Device on DefaultVideoTransformDevice, call DefaultVideoTransformDevice.chooseNewInnerDevice with a new Device. DefaultVideoTransformDevice.chooseNewInnerDevice returns a new DefaultVideoTransformDevice but preserves the state of VideoFrameProcessors. Then call meetingSession.audioVideo.startVideoInput with the new transform device.

const newInnerDevice = 'bar';
if (transformDevice.getInnerDevice() !== innerDevice) {
transformDevice = transformDevice.chooseNewInnerDevice(innerDevice);
}

To stop video processing for the chosen DefaultVideoTransformDevice, call meetingSession.audioVideo.startVideoInput with a different Device (possibly another DefaultVideoTransformDevice) or call meetingSession.audioVideo.stopVideoInput to stop using previous DefaultVideoTransformDevice.

After stopping the video processing, the inner Device will be released by device controller unless the inner Device is a MediaStream provided by users where it is their responsibility of users to handle the lifecycle.

After DefaultVideoTransformDevice is no longer used by device controller, call DefaultVideoTransformDevice.stop to release the VideoProcessors and underlying pipeline. After stop is called, users must discard the DefaultVideoTransformDevice as it will not be reusable.DefaultVideoTransformDevice.stop is necessary to release the internal resources.

await meetingSession.audioVideo.stopVideoInput();
transformDevice.stop();

Applications will need to stop and replace DefaultVideoTransformDevice when they want to change video processors or change the video input quality.

To receive notifications of lifecycle events, a DefaultVideoTransformDeviceObserver can be added to the DefaultVideoTransformDevice and handlers added for the following:

Observer Description
processingDidStart Called when video processing starts.
processingDidFailToStart Called when video processing could not start due to runtime errors. In this case, developers are expected to call startVideoInput again with a valid VideoInputDevice to continue video sending.
processingDidStop Called when video processing is stopped expectedly.
processingDidFailToStart Called when the execution of processors slows the frame rate down by at least half.

VideoFrameBuffer is an abstract interface that can be implemented to represent images or video sources. It is required to implement asCanvasImageSource to return CanvasImageSource; optionally, developers can implement asCanvasElement or asTransferable to facilitate processing algorithm to work with HTMLCanvasElements or Workers respectively.

VideoFrameProcessor represents a processing stage. Internally, processors are executed in a completely serial manner. Each pass will finish before the next pass begins. The input VideoFrameBuffers are the video sources. Changing the property of buffers such as resizing will likely modify properties of the video sources and should be performed with care.

The following example shows how to build a basic processor to resize the video frames. We first define an implementation of VideoFrameProcessor:

class VideoResizeProcessor implements VideoFrameProcessor { 
constructor(private displayAspectRatio) {}

async process(buffers: VideoFrameBuffer[]): VideoFrameBuffer[];
async destroy(): Promise<void>;
}

To keep the properties of the original video, the processor has to copy the frame onto its own staging buffer in process:

class VideoResizeProcessor implements VideoFrameProcessor { 
private targetCanvas: HTMLCanvasElement = document.createElement('canvas') as HTMLCanvasElement;
private targetCanvasCtx: CanvasRenderingContext2D = this.targetCanvas.getContext('2d') as CanvasRenderingContext2D;
private canvasVideoFrameBuffer = new CanvasVideoFrameBuffer(this.targetCanvas);

private renderWidth: number = 0;
private renderHeight: number = 0;
private sourceWidth: number = 0;
private sourceHeight: number = 0;

async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]>;
}

During processing, the incoming video is painted onto the internal canvas like in the following abbreviated:

async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]> {
const canvas = buffers[0].asCanvasElement();
const frameWidth = canvas.width;
const frameHeight = canvas.height;

// error handling to skip resizing
if (frameWidth === 0 || frameHeight === 0) {
return buffers;
}

// re-calculate the cropped width and height
.....

// copy the frame to the intermediate canvas
this.targetCanvasCtx.drawImage(canvas, this.dx, 0, this.renderWidth, this.renderHeight,
0, 0, this.renderWidth, this.renderHeight);

// replace the video frame with the resized one for subsequent processor
buffers[0] = this.canvasVideoFrameBuffer;
return buffers;
}

An overlay processor can be a customized processor for loading an external image. Note that this example accounts for the usage of Cross-Origin Resource Sharing (CORS):

class VideoLoadImageProcessor implements VideoFrameProcessor { 
// Create a HTMLCanvasElement
private targetCanvas: HTMLCanvasElement = document.createElement('canvas') as HTMLCanvasElement;
// Create a HTMLImageElement
private image = document.createElement("img") as HTMLImageElement;

// Load the image from source
loadImage("https://someurl.any/page/bg.jpg", image);

private targetCanvasCtx: CanvasRenderingContext2D = this.targetCanvas.getContext('2d') as CanvasRenderingContext2D;

// Render the image on the canvas
this.targetCanvasCtx.drawImage(image, image.width, image.height);

private canvasVideoFrameBuffer = new CanvasVideoFrameBuffer(this.targetCanvas);

// Function to load an image from an external source (absolute URL) and configure CORS to make sure the image is successfully loaded
async function loadImage(url: string, elem: HTMLImageElement): Promise<HTMLImageElement> {
return new Promise((resolve, reject) => {
elem.onload = (): void => resolve(elem);
elem.onerror = reject;
elem.src = url;
// to configure CORS access for the fetch of the new image if it is not hosted on the same server
elem.crossOrigin = "anonymous";
});
}

async process(buffers: VideoFrameBuffer[]): Promise<VideoFrameBuffer[]> {
const canvas = buffers[0].asCanvasElement();
// copy the frame to the intermediate canvas
this.targetCanvasCtx.drawImage(canvas, 0, 0));

// replace the video frame with the external image one for subsequent processor
buffers[0] = this.canvasVideoFrameBuffer;
return buffers;
}
}

Local video post processing can be previewed before transmitting to remote clients just for a normal device.

import {
DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

const stages = [new VideoResizeProcessor(4/3)]; // constructs processor
const videoElement = document.getElementById('video-preview');
const transformDevice = new DefaultVideoTransformDevice(
logger,
'foobar', // device id string
stages
);

await meetingSession.audioVideo.startVideoInput(transformDevice);
meetingSession.audioVideo.startVideoPreviewForVideoInput(videoElement);

The API ContentShareControllerFacade.startContentShare does not currently support passing in a VideoTransformDevice or similar. But the DefaultVideoTransformDevice makes it straight forward to apply transforms on a given MediaStream, and output a new MediaStream.

Note that for screen share usage we use MediaDevices.getDisplayMedia directly rather then the helper function ContentShareControllerFacade.startContentShareFromScreenCapture.

import {
DefaultVideoTransformDevice
} from 'amazon-chime-sdk-js';

mediaStream = navigator.mediaDevices.getDisplayMedia({
audio: true,
video: true
});

const stages = [new CircularCut()]; // constructs some custom processor
const transformDevice = new DefaultVideoTransformDevice(
logger,
undefined, // Not needed when using transform directly
stages
);

await meetingSession.audioVideo.startContentShare(await transformDevice.transformStream(mediaStream));

// On completion
transformDevice.stop();

The MediaStream can also be from a file input or other source.

Give feedback on this guide