Archives For May 2012

After a long journey, my team at Microsoft shipped Face Tracking SDK as part of Kinect For Windows 1.5! I worked on the 3D face tracking technology (starting from the times when it was part of Avatar Kinect) and so I’d like to describe its capabilities and limitations in this post. First of all, here is the demo:

 


You can use the Face Tracking SDK in your program if you install Kinect for Windows Developer Toolkit 1.5. After you install it, go to the provided samples and run/build yourself “Face Tracking Visualization” C++ sample or “Face Tracking Basics-WPF” C# sample. Off course, you need to have Kinect camera attached to your PC 😉 The face tracking engine tracks at the speed of 4-8 ms per frame depending on how powerful your PC is. It does its computations on CPU only (does not use GPU, since it may be needed to render graphics).

If you look at the 2 mentioned code samples, you can see that it is relatively easy to add face tracking capabilities to your application. You need to link with a provided lib, place 2 dlls in the global path or in the working directory of your your executable (so they can be found) and add something like this to your code (this is in C++, you can also do it in C#, see the code samples):

// Include main Kinect SDK .h file
#include "NuiAPI.h"

// Include the face tracking SDK .h file
#include "FaceTrackLib.h"

// Create an instance of a face tracker
IFTFaceTracker* pFT = FTCreateFaceTracker();
if(!pFT)
{
    // Handle errors
}

// Initialize cameras configuration structures.
// IMPORTANT NOTE: resolutions and focal lengths must be accurate, since it affects tracking precision!
// It is better to use enums defined in NuiAPI.h

// Video camera config with width, height, focal length in pixels
// NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS focal length is computed for 640x480 resolution
// If you use different resolutions, multiply this focal length by the scaling factor
FT_CAMERA_CONFIG videoCameraConfig = {640, 480, NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS};

// Depth camera config with width, height, focal length in pixels
// NUI_CAMERA_COLOR_NOMINAL_FOCAL_LENGTH_IN_PIXELS focal length is computed for 320x240 resolution
// If you use different resolutions, multiply this focal length by the scaling factor
FT_CAMERA_CONFIG depthCameraConfig = {320, 240, NUI_CAMERA_DEPTH_NOMINAL_FOCAL_LENGTH_IN_PIXELS};

// Initialize the face tracker
HRESULT hr = pFT->Initialize(&videoCameraConfig, &depthCameraConfig, NULL, NULL);
if( FAILED(hr) )
{
    // Handle errors
}

// Create a face tracking result interface
IFTResult* pFTResult = NULL;
hr = pFT->CreateFTResult(&pFTResult);
if(FAILED(hr))
{
    // Handle errors
}

// Prepare image interfaces that hold RGB and depth data
IFTImage* pColorFrame = FTCreateImage();
IFTImage* pDepthFrame = FTCreateImage();
if(!pColorFrame || !pDepthFrame)
{
    // Handle errors
}

// Attach created interfaces to the RGB and depth buffers that are filled with
// corresponding RGB and depth frame data from Kinect cameras
pColorFrame->Attach(640, 480, colorCameraFrameBuffer, FORMAT_UINT8_R8G8B8, 640*3);
pDepthFrame->Attach(320, 240, depthCameraFrameBuffer, FTIMAGEFORMAT_UINT16_D13P3, 320*2);
// You can also use Allocate() method in which case IFTImage interfaces own their memory.
// In this case use CopyTo() method to copy buffers

FT_SENSOR_DATA sensorData;
sensorData.pVideoFrame = &colorFrame;
sensorData.pDepthFrame = &depthFrame;
sensorData.ZoomFactor = 1.0f;       // Not used must be 1.0
sensorData.ViewOffset = POINT(0,0); // Not used must be (0,0)

bool isFaceTracked = false;

// Track a face
while ( true )
{
    // Call Kinect API to fill videoCameraFrameBuffer and depthFrameBuffer with RGB and depth data
    ProcessKinectIO();

    // Check if we are already tracking a face
    if(!isFaceTracked)
    {
        // Initiate face tracking.
        // This call is more expensive and searches the input frame for a face.
        hr = pFT->StartTracking(&sensorData, NULL, NULL, pFTResult);
        if(SUCCEEDED(hr) && SUCCEEDED(pFTResult->Status))
        {
            isFaceTracked = true;
        }
        else
        {
            // No faces found
            isFaceTracked = false;
        }
    }
    else
    {
        // Continue tracking. It uses a previously known face position.
        // This call is less expensive than StartTracking()
        hr = pFT->ContinueTracking(&sensorData, NULL, pFTResult);
        if(FAILED(hr) || FAILED (pFTResult->Status))
        {
            // Lost the face
            isFaceTracked = false;
        }
    }

    // Do something with pFTResult like visualize the mask, drive your 3D avatar,
    // recognize facial expressions
}

// Clean up
pFTResult->Release();
pColorFrame->Release();
pDepthFrame->Release();
pFT->Release();

The code calls the face tracker by using either StartTracking() or ContinueTracking() functions. StartTracking() is a more expensive function since it searches for a face on a passed RGB frame. ContinueTracking() method uses previous face location to resume tracking. StartTracking() is more stable when you have big breaks between frames since it is stateless.

There are 2 modes in which the face tracker operates – with skeleton based information and without. In the 1st mode you pass an array with 2 head points to StartTracking/ContinueTracking methods. These head points are the end of the head bone contained in NUI_SKELETON_DATA structure returned by Kinect API. This head bone is indexed by NUI_SKELETON_POSITION_HEAD member of NUI_SKELETON_POSITION_INDEX enumeration. The 1st head point is the neck position and the 2nd head point is the head position. These points allow the face tracker to find a face faster and easier, so this mode is cheaper in terms of computer resources (and sometimes more reliable at big head rotations). The 2nd mode only requires color frame + depth frame to be passed with an optional region of interest parameter that tells the face tracker where to search on RGB frame for a user face. If the region of interest is not passed (passed as NULL), then the face tracker will try to find a face on a full RGB frame which is the slowest mode of operation of StartTracking() method. ContinueTracking() will use a previously found face and so is much faster.

Camera configuration structure –  it is very important to pass correct parameters in it like frame width, height and the corresponding camera focal length in pixels. We don’t read these automatically from Kinect camera to give more advanced users more flexibility. If don’t initialize them to the correct values (that can be read from Kinect APIs), the tracking accuracy will suffer or the tracking will fail entirely.

Frame of reference for 3D results –  the face tracking SDK uses both depth and color data, so we had to pick which camera space (video or depth) to use to compute 3D tracking results in. Due to some technical advantages we decided to do it in the color camera space. So the resulting frame of reference for 3D face tracking results is the video camera space. It is a right handed system with Z axis pointing towards a tracked person and Y pointing UP. The measurement units are meters. So it is very similar to Kinect’s skeleton coordinate frame with the exception of the origin and its optical axis orientation (the skeleton frame of reference is in the depth camera space). Online documentation has a sample that describes how to convert from color camera space to depth camera space.

Also, here are several things that will affect tracking accuracy:

1) Light – a face should be well lit without too many harsh shadows on it. Bright backlight or sidelight may make tracking worse.

2) Distance to the Kinect camera – the closer you are to the camera the better it will track. The tracking quality is best when you are closer than 1.5 meters (4.9 feet) to the camera. At closer range Kinect’s depth data is more precise and so the face tracking engine can compute face 3D points more accurately.

3) Occlusions – if you have thick glasses or Lincoln like beard, you may have issues with the face tracking. This is still an open area for improvement 🙂  Face color is NOT an issue as can be seen on this video

Here are some technical details for more technologically/math minded people: We used the Active Apperance Model as the foundation for our 2D feature tracker. Then we extended our computation engine to use Kinect’s depth data, so it can track faces/heads in 3D. This made it much more robust compared to 2D feature point trackers. Active Appearance Model is not quite robust to handle all real world scenarios. Off course, we also used lots of secret sauce to make things working well together 🙂  You can read about some of these algorithms here, here and here.

Have fun with the face tracking SDK!

We published this paper that describes the face tracking algorithm in details.

Credits – Many people worked on this project or helped with their expertise:

Christian Hutema (led the project), Lin Liang, Nikolai Smolyanskiy, Sean Anderson, Evgeny Salnikov, Jayman Dalal, Jian Sun, Xin Tong, Zhengyou Zhang, Cha Zhang, Simon Baker, Qin Cai.

Finally, I decided to do some inverse kinematics and control my assembled Bioloid humanoid “type A” from PC:

Bioloid Humanoid

Bioloid Humanoid Type A

Unfortunately, it turned out to be not so easy as I thought initially…

I read some online stuff from Robotis (their site lacks good documentation by the way) and expected that I can use USB2Dynamixel connector to connect my Bioloid to PC and then use Dynamixel SDK from Robotis to control dynamixels programmatically from a C++ program. After some trial and error and looking for answers online I found to my huge surprise that you cannot use Dynamixel SDK when the kit is connected to PC via CM-510 or CM-5 controller.

This  setup (which I think should be fairly popular): PC<->USB2Dynamixel<->CM-510<->Dynamixels  is NOT supported by Dynamixel SDK 😦   You can only use this SDK if you hook Dynamixels directly to USB2Dynamixel and connect them to the battery/power. I know that many “advanced” people do it this way, but I don’t like it when the most obvious setup is not supported out of the box.

Fortunately, after some searching I found how to do it yourself (no soldering required, only programming). It is a non-documented solution and it is sort of easy to do once you know how, but it is not easy to find good explanation about it and working code. So I decided to document it here step by step and provide a working C++ code sample for Windows (see below).

BIG thanks to Aprendiendo whos site contains the solution, but it is not complete for C++ if you run on Windows. So I re-wrote some of that code to make it a complete C++ code sample that you can download and run on Windows. This could save you some time if you want to do the same 🙂

So, to control your Bioloid from PC you need to:

1) Use USB2Dynamixel or other USB to Serial connector to connect CM-510 (or CM-5) controller via a serial cable to PC’s USB port. If you have USB2Dynamixel, then you have to switch it to RS232 mode (position #3)

2) Usually, USB2Dynamixel appears as COM3 port on your PC. On Windows, you have to open this port’s properties (in devices) and go to “Advanced” settings and set “Latency Timer” to 1 (msec) as described here

3) Power up your CM-510 (or CM-5). The “manage” mode indicator should start blinking. Press the “Start” button to start the “managed mode”. The indicator next to “manage” title should light up. If you skip this step, Bioloid’s controller is not going to be in the right state for controlling from PC.

4) Now, the most important part – the Bioloid controller (CM-510 or CM-5) needs to be switched to the “Toss mode“. In this mode, CM-510 firmware works as a bridge and forwards all commands it receives from PC to Dynamixels connected to it and also forwards all Dynamixel responses back to PC. This mode is not documented by Robotis by some reason. To switch to it you need to send “t\r” (‘t’ + 0x0D sequence) from your C/C++ program via a serial port to CM-510. See SendTossModeCommand() function that does it in the C++ code below.

5) After that, you can create Dynamixel protocol commands and send them to CM-510 via the serial port. Make sure that the program initializes the COM port communication speed at 57600, since this is the max speed that CM-5 and CM-510 support. See C++ code below for details.

NOTE 1: I found that on some PCs this process was not enough. After powering your Bioloid you need to run RoboPlus Manager from Robotis and connect the app to your Bioloid (press “connect” button) before controlling it from your PC program. The RoboPlus Manager does tons of obfuscated non documented communication and by some reason the “Toss mode” can be initialized only after that!

NOTE 2: “Toss mode” cannot be turned OFF (at least I don’t know how at the moment) and to turn it off you need to turn your Bioloid power OFF.

NOTE 3: I am still not sure if I can read sensors connected to CM-510 without custom firmware. See Aprendiendo website for more information about custom firmware to do this.

This BioloidAPI.ZIP file contains the C++ project for Windows (VS 2010 project) that demonstrates how to switch to the “toss mode” and how to send Dynamixel commands to move servos. I was trying to make it easy to understand. Here is most of the C++ code:


#include
#include
#include

class SerialPort
{
public:
    SerialPort();
    virtual ~SerialPort();

    HRESULT Open(const wchar_t* szPortName, DWORD baudRate);
    void Close();
    void Clear();

    HRESULT SendData(BYTE* pBuffer, unsigned long* pSize);
    HRESULT ReceiveData(BYTE* pBuffer, unsigned long* pSize);

private:
    HANDLE serialPortHandle;
};

SerialPort::SerialPort() :
serialPortHandle(INVALID_HANDLE_VALUE)
{
}

SerialPort::~SerialPort()
{
    Close();
}

HRESULT SerialPort::Open(const wchar_t* szPortName, DWORD baudRate)
{
    HRESULT hrResult = S_OK;
    DCB dcb;

    memset( &dcb, 0, sizeof(dcb) );

    dcb.DCBlength = sizeof(dcb);
    dcb.BaudRate = baudRate;
    dcb.Parity = NOPARITY;
    dcb.fParity = 0;
    dcb.StopBits = ONESTOPBIT;
    dcb.ByteSize = 8;

    serialPortHandle = CreateFile(szPortName, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_EXISTING, NULL, NULL);
    if ( serialPortHandle!=INVALID_HANDLE_VALUE )
    {
        if( !SetCommState(serialPortHandle, &dcb) )
        {
            hrResult = E_INVALIDARG;
            Close();
        }
    }
    else
    {
        hrResult = ERROR_OPEN_FAILED;
    }

    return hrResult;
}

void SerialPort::Close()
{
    if (serialPortHandle!=INVALID_HANDLE_VALUE || serialPortHandle!=NULL)
    {
        PurgeComm(serialPortHandle, PURGE_RXCLEAR | PURGE_TXCLEAR);
        CloseHandle(serialPortHandle);
    }
    serialPortHandle = INVALID_HANDLE_VALUE;
}

void SerialPort::Clear()
{
    if (serialPortHandle!=INVALID_HANDLE_VALUE || serialPortHandle!=NULL)
    {
        PurgeComm(serialPortHandle, PURGE_RXCLEAR | PURGE_TXCLEAR);
    }
}

HRESULT SerialPort::SendData(BYTE* pBuffer, unsigned long* pSize)
{
    HRESULT hrResult = ERROR_WRITE_FAULT;

    if (serialPortHandle!=INVALID_HANDLE_VALUE && serialPortHandle!=NULL)
    {
        if( WriteFile(serialPortHandle, pBuffer, *pSize, pSize, NULL) &&
            FlushFileBuffers(serialPortHandle)
            )
        {
            hrResult = S_OK;
        }
    }

    return hrResult;
}

HRESULT SerialPort::ReceiveData(BYTE* pBuffer, unsigned long* pSize)
{
    HRESULT hrResult = ERROR_READ_FAULT;

    if (serialPortHandle!=INVALID_HANDLE_VALUE && serialPortHandle!=NULL)
    {
        if( ReadFile(serialPortHandle, pBuffer, *pSize, pSize, NULL) )
        {
            hrResult = S_OK;
        }
    }

    return hrResult;
}

bool CreateAX12SetPositionCommand(BYTE id, short goal, BYTE* pBuffer, DWORD* pSize)
{
    const unsigned int packetSize = 9;

    if(*pSize < packetSize)     {         return false;     }     // PACKET STRUCTURE: OXFF 0XFF ID LENGTH INSTRUCTION PARAMETER_1 …PARAMETER_N CHECKSUM     *pSize = packetSize;     pBuffer[0] = 0xFF;     pBuffer[1] = 0xFF;     pBuffer[2] = id;     pBuffer[3] = 2 /* number of parameters */ + 3;	// packet body length     pBuffer[4] = 3;						// instruction id = write data     // Parameters     pBuffer[5] = 30;					// start address of position goal setting     pBuffer[6] = BYTE(goal);			// goal low byte (to address 30)     pBuffer[7] = BYTE(goal>>8);			// goal high byte (to address 31)

    // Checksum
    DWORD packetSum = 0;
    for(size_t i=2; iSendData(buffer, &size);
    if(FAILED(hr))
    {
        printf("Failed to send set dynamixel position command\n");
        return false;
    }
    Sleep(10);

    memset(buffer, 0, sizeof(buffer));
    size = sizeof(buffer);
    pSerialPort->ReceiveData(buffer, &size);

    if (size>4 && buffer[4] == 0)
    {
        printf("id=%d set to position=%d\n", id, position);
    }
    else
    {
        printf("Error while setting id=%d position=%d, error:%d\n", id, position, buffer[4]);
        return false;
    }

    return true;
}

bool SendTossModeCommand(SerialPort* pSerialPort)
{
    BYTE buffer[1024];
    buffer[0]='t';
    buffer[1]='\r';
    DWORD size = 2;

    HRESULT hr = pSerialPort->SendData(buffer, &size);
    if(FAILED(hr))
    {
        printf("Failed to send TOSS model command\n");
        return false;
    }
    Sleep(100);

    size = sizeof(buffer);
    pSerialPort->ReceiveData(buffer, &size);

    return true;
}

int _tmain(int argc, _TCHAR* argv[])
{
    DWORD baudRate = 57600;
    SerialPort comPort;

    HRESULT hr = comPort.Open(L"COM3", baudRate);
    if(FAILED(hr))
    {
        printf("Cannot open COM3 port\n");
        return 0;
    }

    SendTossModeCommand(&comPort);

    while(1)
    {
        printf( "Enter dynamixel ID and goal position:\n" );

        int id = 0;
        int position = 0;
        scanf("%d %d", &id, &position);

        SetDynamixelPosition(&comPort, id, position);

        printf("Press ESC to terminate, otherwise press any other key to continue\n");
        if(getch() == 0x1b)
        {
            break;
        }
    }

    comPort.Close();

    return 0;
}

Avatar Kinect

May 19, 2012 — 4 Comments

Some time ago (summer 2011), my group at Microsoft shipped an interesting Computer Vision app/mini game for XBox 360 called “Avatar Kinect” (I worked on the face tracking technology). You can pose in front of the Kinect camera and the application tracks movements of your head and facial features (lips, eyebrows) and renders you as an XBox avatar. Pretty cool app if you want to record little videos of yourself as an animated cartoonish avatar and then post them to YouTube. Or if you want to talk to your friends as an avatar (it allows multiparty avatar chat). For example you can make:

The face tracking technology demo can be seen here:

We used a combination of Active Appearance Models  on “steroids” plus few other things like neural network, face detector and various classifiers to make it stable and robust. You can read more about Active Appearance Models here, here  and here . Off course the usage of Kinect camera improved precision and robustness a lot (due to its depth camera)

And the next week, you’ll see our Face Tracking technology in a different, better and easier to use form 🙂