Majarkeun: Math in Special Effects Motion Tracking

Tuesday, May 23, 2017

Math in Special Effects Motion Tracking

While reading an article on architecture this week, I stumbled upon a statement saying that in the future 80% of jobs will require math and physics skills. I mentioned this to my kids to motivate their occasionally fading math homework spirits and it seemed to energize them a bit. But the best trick that really lights up their minds is an illustration of how math is relevant to the occupations of their dreams. In our home this may mean a story about the collapse of a roof in the new Paris airport terminal for my son, who (currently) wants to be an architect, or discussion of NFL players statistical models, or a girl movie mentioning the math of ice skating (Ice Princess).

But while math is omnipresent in most professions, some of the best playgrounds for math skills have always been: special effects, national defense and medicine. Math majors always felt like kids in a candy shop in these fields, liberating hours of manual labor and allowing for projects to be done cheaper, faster, safer and more realistically. As a special effects veteran, I am happy to share with you a math trick that I have used in the past in the movies "Air Force One," "Multiplicity," "Desperate Measures" and "Starship Troopers." It is called Motion Tracking.

Take a look at this breathtaking commercial and dont forget to invite your kids. Note the boy who is holding a giant gorillas hand in the beginning of the movie, the man who is bouncing the soccer balls and the race car drivers.

And now, check out the clip of how this was made. No time? Play at least till you see the boy walking around with a box.

Why is he holding this box and why it is marked with a tape?

The box is obviously being replaced with the giant gorillas hand (computer generated). The tape on the box allows to reconstruct the exact motion of the box relative to the camera and then make the gorilla move its hand with exactly the same motion.

Why are there four tape marks on the box? Wouldnt one be enough?
No!
Imagine the box right in front of you, moving in your direction. One piece of tape attached right in the middle of the box. The tape may not be moving at all as the box approaches you. When you have two tapes, you will notice that the distance between them is increasing as the box is moving closer.
The math behind this requires you to use at least three tape marks to solve a system of equations and find the exact motion of the box with respect to the camera. Use of more than three marks allows to do it more precisely, minimizing any possible errors.

In order to render the gorilla as holding the boys hand, we need to know how far away the gorilla should stand from the camera and how it should be rotated. This is defined through two parameters: R that is the 3x3 rotation matrix and T that is 3x1 translation vector. Gorilla and its computer generated hand are represented as a collection of 3D points. Each such point P has (X, Y, Z) values in the computer graphics library. To place gorilla next to the boy animators tell their rendering programs to use rotation R, translation T and some camera parameters. How do they know what R and T are? From the box!

Special effects artists are tracking the tape on the box to recover its R and T. Imagine this box in your hands. Every point P on this box (including the tape point) has 3 coordinates (X, Y, Z). Next to the boy, these coordinates are P1:

P1 = R x P + T

We dont know P1 but we know its projection on the image in each frame of the commercial. Lets call this projected point p.

P1/Z = p/f where f is known cameras focal distance

from this:

p = (f/Z ) P1 = R x P + T

We know f, p and P and want to find R and T. P is a coordinate of a point on the box where tape is attached, when this point is in your hands. p is coordinate of the same point in the image of the commercial. As p has two coordinates on the image (x, y) you get two equations for every point you use.

If you use some special rotation matrix (small angle rotational matrix), you will get a system of linear equations with 6 unknowns: 3 rotational components and 3 translational components. To solve a linear system with 6 unknowns you need at least 6 equations.

Remember we have 2 equations per point? Three points minimum but better more to account for possible imprecision in point tracking: some points being blurred on the image or box is tilted so that some points become invisible.

If you got it right then the gorillas hand will be tightly attached to the boys hand. If your math is wrong, then the gorilla and its hand will be unrealistically plastered next to the boys as in some cheap old commercials. By the way, in the second video you can see similar tape marks on the wall that the soccer balls are bouncing off, and on the race car drivers helmets. In both cases, the position of these objects with respect to the video camera is being tracked in order to add more computer generated objects into the scene, map textures or render reflections.

Interestingly, exactly the same motion tracking strategies are used to reconstruct battle scenes from video for analysis and training purposes in the field of national defense, or to track patients motion during a surgery in medicine. Now you have something to tell anyone who mentions that linear equations and matrices are boring.

There are many tricks that allow special effects artists seamlessly combine real and computer generated characters in one sequence. Matching character motion is one of them. Others include realistic texture and fur rendering and light matching as described in the following stories: The Silly, Wacky, Revealing and Useful Shadows
and The trouble with facial hair.

Available link for download