Given a custom IR beacon with four LEDs at known positions (in a non-planar arrangement, e.g., a tetrahedron), it is possible to derive the camera's -- and hence the Wiimote's -- position and orientation relative to the beacon based on the (x, y) positions of the beacon's LEDs on the camera's image plane (see Figure 1). The IR tracking beacon is described in more detail on its own page.

Figure 1: Projection of a custom IR beacon with four LEDs in a non-planar arrangement onto the Wiimote camera's image plane. The (x, y) positions of the four LEDs are dependent on the (known) absolute positions of the beacon LEDs in 3D space, the (known) position of the camera's focus point relatively to its image plane, and the (unknown) position and orientation of the camera in 3D space. This figure places the camera's focus point behind its image plane for clarity; in reality, the focus point is the center of the camera's lens in front of the image plane. This does not affect the projection equations. |

**Pixel size**- The width and height of each pixel on the camera's sensor in physical coordinate units, e.g., millimeters.
**Focal length**- The orthogonal distance of the camera's focus point (center of its lens) from the image plane in physical coordinate units.
**Center of projection**- The position of the orthogonal projection of the camera's focus point onto its image plane in physical coordinate units. This 2D coordinate can be combined with the camera's focal length to express the focus point position as a 3D point.

The intrinsic camera parameters have to be measured carefully, because they have a large influence on the projection equations used for 6-DOF tracking. This is normally achieved by recording the measured (x, y) positions of an IR beacon from multiple known positions and orientations, and finding the set of parameters that best describe the measurements. This process usually has to be performed only once, since the intrinsic camera parameters do not change when the camera is moved.

Now, in practice, it is extremely difficult to measure the focal length, center of projection, and pixel size in any physical units without direct access to the camera's internals. However, when looking at the projection equations, it turns out that these values only appear as ratios of one another. That means one can arbitrarily set, say, the pixel size to (1.0, 1.0) (assuming square pixels), and express the focal length and projection center in units of pixels instead of in physical units. One set of values that seems to work quite well is pixel size = (1.0, 1.0), focal length = 1280, center of projection = (512, 384). These values might be different for different Wii controllers.

**Angle triplets**- A 3D orientation can be described as three consecutive rotations around three given axes, usually following the aircraft system of using azimuth (yaw), elevation (pitch), and roll. While angle triplets -- often referred to as "Euler angles" -- match the three degrees of freedoms of 3D orientations, they are inefficient for point transformations as they require multiple evaluations of trigonometric functions, are difficult to manipulate, and have ambiguity problems ("gimbal lock").
**Unit quaternions**- Quaternions are a four-dimensional analogon of complex numbers and represented as 4-tuples (x, y, z, w). By coincidence(?), unit quaternions, i.e., quaternions where x*x + y*y + z*z + w*w = 1, are equivalent to 3D rotations, and 3D orientations are equivalent to 3D rotations applied to a known initial orientation (the identity orientation). In other words, every unit quaternion represents a unique 3D orientation, and each 3D orientation is represented by exactly two unit quaternions, which are negatives of each other. The relationship between unit quaternions and 3D rotations is much more "linear" than for angle triplets, and there is no gimbal lock. The main advantage of quaternions is that their arithmetic closely matches 3D rotations, i.e., the quaternion corresponding to a concatenation of two 3D rotations is the product of the two rotations' quaternions. Quaternions are almost as efficient for point transformations as 3x3 matrices, and there is only one additional constraint between a unit quaternion's four parameters, namely the unit length formula shown previously.
**Orthogonal 3x3 matrices**- 3D rotations are a subclass of 3D linear transformations, which in turn are equivalent to 3x3 matrices. Hence, each 3D rotation can be expressed as a 3x3 matrix. Furthermore, 3D rotations are exactly equivalent to the subclass of orthogonal 3x3 matrices, i.e., matrices where all column vectors have unit length and are pairwise orthogonal to each other. As with quaternions, concatenation of 3D rotations is equivalent to matrix multiplication. Furthermore, point transformations are most efficiently expressed as a product between a matrix and a vector. The main drawback of using 3x3 matrices to represent 3D orientations is that a 3x3 matrix has nine parameters, and orthogonal 3x3 matrices have six additional constraints between those parameters, namely the unit length and orthogonality constraints mentioned above.

**Orientation transformation**:**a'**= (ax', ay', az'), where ax' = rz*oy - ry*oz + rw*ox + rx*ow, ay' = rx*oz - rz*ox + rw*oy + ry*ow, and az' = ry*ox - rx*oy + rw*oz + rz*ow, with rx = oy*az - oz*ay + ow*ax, ry = oz*ax - ox*az + ow*ay, rz = ox*ay - oy*ax + ow*az, and rw = ox*ax + oy*ay + oz*az.**Position transformation**:**a''**= (ax'', ay'', az'') where ax'' = ax' + px, ay'' = ay' + py, and az'' = az' + pz.

The currently implemented approach uses a tracking method to maintain point matches while all four LEDs are detected by the camera -- the new position and orientation are predicted based on the current estimate of linear and angular velocities, and predicted target point projections are matched with camera observations on a nearest-neighbor basis -- and uses the Wiimote's linear accelerometer measurements to create an initial match if not all four LEDs were visible in the previous frame, or no good match could be found. An improved tracking method could use the linear accelerometer measurements to compute better estimates of linear and angular velocity to better match LEDs and observations across frames.