Technical Specifications



The two apps are packaged Python scripts. They use a modified version of ibaiGorordo's pyazurekinect python module: a wrapper for Microsoft's own sensor and body-tracking SDKs.

Other dependencies:

  • numpy - a library for working with arrays - as the data streams are video-based, any frame is represented as an image of pixels in an array.

  • opencv & open3d - for previewing

  • pye57 - for extracting depth/colour data streams as point cloud data

The apps were built with the intention of segmenting the workflow as there is no perceived need for real-time data processing - it is assumed that device datastreams can be recorded first, and then processed later.


The recording app loops between two states:

  • Preview - Body tracking is available as a preview

  • Recording - Records data stream only, no previews etc. to minimise overhead.


The extraction app takes in a video file and processes it frame by frame.

Temporal smoothing is available to be set, but body tracking does not start until the user's specified frame.

Body Tracking Output

Using the Kinect module, it extracts body tracking data per frame and outputs it into a csv file, labelled per frame and per joint. Each joint is indexed consistently, with position and rotation, to Microsoft's Body Joints Documentation.

By default, the Kinect module's body tracking joint orientation results are as wxyz quarternions, this is converted to Euler rotations in the order xyz. This is in anticipation of Rhino/Grasshopper and Unreal Engine workflows where it seemed to make sense to pre-convert this.

The Kinect module was also altered to return this information with a [save] function in the body and joint modules.

Point Cloud Output

Using a mix of the Kinect module's capture outputs and point cloud outputs, it extracts point cloud data per frame, mapping the frame number to the intensity scalar field commonly found in e57 files. This way, the point cloud remains as one file for ease of storage and organisation. As the data is processed from video frames (i.e. images), non-existent pixels can occur and are understood as points at [0,0,0] with no colour [0,0,0], these are culled.

Last updated