Dual Quaternions for Mere Mortals


This article is written for people seeking intuition on dual quaternions (and perhaps even complex numbers and quaternions) beyond what is offered by traditional texts. The standard way a dual quaternion is defined is by introducing a dual unit which satisfies and slapping a quaternion and a dual-scaled quaternion together. A whole ton of algebra follows and mechanically at least, everything checks out, as if by magic. At some point, the writer will point out how dual quaternions are isomorphic to Clifford algebras (or somesuch mumbo-jumbo pertaining to Lie algebras). If you’ve taken a course in abstract algebra and are intimately comfortable with the notion of homogeneous coordinate systems already, maybe such a treatise was more than adequate. Chances are though, the concept of dual quaternions (and likely quaternions as well) feels somewhat hollow, and you are left with a lingering suspicion that there is a better way of grasping matters. Geometric algebra is there (and I recommend you take a look), but personally, having an appreciation of both is still useful, and I don’t actually find quaternions or dual quaternions to be unduly difficult or an inferior formulation in any way. I can’t lay claim to having a perfect understanding of the subject, but I am documenting my understanding here in the hopes that it may point you, fellow mathematician (graphics programmer, AI researcher, masochist, or what have you), in the right direction.

The goal

The ultimate goal is to have some sense for dealing with both quaternions:

and dual quaternions:

along with the ways they express rotation:

and both rotation and translation:

There’s a lot of “odd” things here that don’t seem very natural (the half angle, the “extra” dimension, the necessity of the conjugate operator, the mysterious , etc), but hopefully after reading this, you’ll feel like these oddities are actually well-motivated and the only way it could be. Quick reader beware: this post may feel a little long-winded, mainly because I’m trying to make an effort of demonstrating what doesn’t work. For readers looking for more precise and direct exposition, I’ve included a number of links at the bottom. Perfect rigor is not a goal here because I do not want to assume that you’ve had any prior background in abstract algebra (although I hope you’ll be interested in it afterwards). I’m also going to try and stray away from formulae that may be more abstruse for most readers (Rodriguez rotation, Plücker coordinates, etc.). If those are familiar and comfortable topics, great! However, we’re going to try and loosely prove the results we want just the same without them.

Back to complex numbers

Let’s review some things we know (and bear with me if it seems too slow). Complex numbers provide a means for us to elegantly describe rotation and scaling on a plane. Multiplication by the imaginary number is understood to perform a rotation. Multiplication by represents two such rotations, so . Note first, there is an asymmetry in this algebra. Compared to a typical cartesian plane, the “action” described by a multiplication by compared to a real number is different. No amount of multiplication of a real numbers amongst each other can ever produce an imaginary number, but imaginary numbers manage to propel themselves from real units to imaginary numbers without much effort. Such asymmetry is actually fairly common in algebras, the study of which often amounts to defining various types of entities and the myriad ways they interact (taking care that they interact “nicely” with properties we know and love like associativity and the distribution law and such). Also, remember that not only can we multiply by , we can add it too. We interpret this as a sort of “y” coordinate, and this should surprise you (if it hasn’t before). How can adding/subtracting units of be compatible with the rotation action of multiplication by ?

Herein lies the unreasonable efficacy of complex numbers. Algebraically, if we take a complex number like and multiply by , we get which is our original number rotated . This is “obvious” perhaps, but without fully appreciating how this is so, future attempts to generalize the concept to quaternions and dual quaternions will indubitably fail. For some clarity, we presume that addition is commutative (colloqially, it makes no difference if we go up and over, vs over and out). By the distributive law, multiplying by applies the rotation to both components of the number (in this case, and ) leaving us with two rotated components, and we’re left with two components that summed up, resulting in the rotated form of the original number. Incidentally, thinking in terms of the action as defined on the components is sometimes more helpful (the vector representation of a complex number is illustrative, but perhaps not the most fundamental). Really, we just let function as a “y-coordinate” as an algebraic convenience. We could have left the vector in cartesian form and defined and to get the same results, but this is redundant since replacing with and with produces the same effect.

This should feel pretty gentle so far. Another question to test your understanding is to consider how rotations of angles that aren’t right-angle multiples are defined. Why does the number system we defined so far permit, say, a rotation or a radian rotation? After all, multiplication by doesn’t give us half of a quarter turn (it shouldn’t because we’re scaling down by a real number). Intuitively, exponentiation of should result in fractional turns (or turns greater than a right angle) since this is a way to multiply by varying degrees. But what’s, say, and how can we get such a quantity if we’ve only imposed that ? Easy. so multiplying by the square root of must produce a rotation. Better yet, we can even represent as a normal complex number (pretend we only care about the positive root) which we recognize as a vector stabbing in the direction with length unity. Of course, we can whip out De Moivre’s Theorem here, but I’m trying to keep things tidy (avoiding transcendental functions). Try to think less about cosines and sines, and more about how from simple definitions and requirements (commutivity, distributive law, ), we can suss out everything we need.

Upgrading to quaternions

Great, we have a notion of rotation now in two dimensions, so let’s go to three. This is where tons of people get tripped up because quaternions behave somwhat differently to complex numbers (at least superficially). For one thing, a general quaternion has three imaginary parts and one real part (as opposed to the “expected” two imaginary parts). Furthermore, the rotation formula involves a strange conjugation and rotates by a half-angle. What gives?

The first thing to recognize is that “going from two dimensions to three dimensions” is a somewhat misleading characterization of our transition. Instead, consider that we’ve gone from one axis of rotation to three axes of rotation! This alone should make the “extra” imaginary component less surprising. The other “weird” thing happening is that we want to describe rotations in 3D space… but we now have a 4-dimensional algebraic entity. Let’s see what happens if we try the “natural” thing of representing 3D space with entities of the form where and represents rotations. Suppose we let units of go “into” and “out of” the graph of the typical complex plane, so that multiplication by rotates about the -axis (as before). We immediately run into a problem of deciding what axis should rotate a point about. The real axis? The -axis? Something else? Unfortunately, there are an infinite number of choices because we need three axes of rotation in 3D-space, not two. What we really need, is a third “imaginary” unit that behaves similarly to the other two, all three of which together encode information about rotation.

Ergo, let’s add that fourth dimension. First, mentally visualize the real number line. Next, imagine that each imaginary unit , , and are all orthogonal to it (we are in 4D now and think of these imaginary planes as extensions to the real number line if you will). For every real number, each one defines something akin to the complex plane from before (). To be a well-defined system, multiplication by needs to “work” not just when multiplying to a real number, but also to units of , , and as well. What should be for example? Well, multiplication of a real number by rotates about an axis (or set of axes, remember we are in 4D) orthogonal to both the reals and the -axis. It would be extremely odd for multiplication by to take into the reals or units, because subsequent multiplication would never “wrap back around” to the units. We need to be a pure rotation, so that won’t do, not to mention that the operation is no longer invertible. Similarly, shouldn’t scale by some real-valued amount since the reals already do that, and it seems unreasonable that the reals and the imaginary units of act the same in the dimension but differently elsewhere. Our only choice then, is that the action of multiplying onto produces units (we’ll choose positive by convention) so that . Multiplying both sides into gives and applying to this gives . So there we have it, all the definitions associated with the canonical quaternion basis units. This rhymes with the situation with the complex numbers, but is different in the sense that the action defined by multiplying each imaginary unit rotates not just the reals but the other imaginary units as well. This is a very important distinction that will come into play when appreciating the rotation formula later. Applying these definitions along with the usual rules of additive commutivity, associativity, and the distributive law gets us a well defined way of adding and multiplying quaternions together.

“Well, that’s all fine and dandy,” you might say, “but I’m interested in rotations in 3D, not 4D!” Very true, and herein lies the trickier bit to grasp. We need a way to express the effect of 4D quaternion multiplication in three-space. What would be great is if we could just read off the imaginary components directly. After all, they naturally correspond to three orthogonal axes with natural properties for component-wise vector summation. Should two quaternions with different real parts represent the same 3D point/vector though? Would such a system work at all? Let’s try to use our quaternion algebra to do a really simple task first, rotating the point about the axis (we should end up at ). First, let’s represent our point as a quaternion . We know we need this to end up at and as it turns out, multiplying by does exactly that which more or less mirrors our experience thus far with complex numbers (we don’t multiply by to rotate into because we are rotating from a different imaginary unit, not the reals). So far so good! Spoiler alert, though, this approach doesn’t work generally as we’ll see soon enough. Now let’s try something harder. Let’s rotate the point about the axis (we’re expecting ). Representing it as the quaternion and multiplying by , we get . Ah, now we see the problem.

We wanted multiplication by to rotate us by the axis, but in actuality, this operation does multiple things, rotating not just units of , but also ( too although we didn’t see it in the example). As a brief recap, we’re looking for an operation that can express a rotation about the axis for all points. Let’s try to brute force something for all points in the -plane. We know that application of makes the following “movements”:

Our spider senses here should be tingling. A single application of is going to generate the wrong types of units, but two applications will always get us back to where we started. From our previous example, we want to keep the action of that brought to , but don’t want the extra effect of bring to the reals. We have one trick up our sleeve though, which is the judicious use of left vs right multiplication. When we move from the reals to an imaginary unit and back, it makes no difference whether we use left or right multiplication. In contrast, rotating among the imaginary units will flip signs depending on whether we pre- or post-multiply. Ergo, we have a hope of finding a pair of quantities such that upon pre and post multiplication to the vector we wish to rotate, we can cancel out rotation we don’t care about, and preserve rotation we do care about.

This is where the conjugation operation comes into play. Let’s choose our rotator to be of the form (again, focusing on just rotation about the axis) and take to be the inverse (analogous to complex conjugates). We can enforce that to ensure we aren’t changing the length of , and intuitively, this makes sense because our operation will only have a single degree of freedom (two variables, one constraint). Applying the conjugation to gives:

In order to get the result we want, we require and which works if . Notice how the inclusion of the reals and the sign flip allow us get a quantity we can cancel as much or as little as we like by choosing and along with a secondary amount we can control (in this case, pointing along ). We were able to do this by leveraging the anti-commutative aspects of the imaginary units compared to the reals. Let’s try doing the same computation to our point offset by a unit of (goal after rotation is ).

This is where something mindblowing happens. The same choice of and as before () produces the desired rotation of ! If you follow the algebra closely, what we did was the component of the vector we were rotating moved into the reals and back again twice. That is, for our choice of , we’ve arranged it so that application of the conjugation to any vector with a component along will preserve that provided that . Now, this resembles the pythagorean identity . The other term that shows up looks a lot like the sine-half angle formula . This leaves our last term which looks a lot like the cosine-half angle formula . In reality, this should be all too surprising since we’ve taken what amounted to a linear combination of rotations (some arbitrary rotation in 4-space) and applied them twice with a sign adjustment on the second application.

Recapping, we were seeking a simple operation that, applied to any imaginary vector representing a direction in 3-space, would perform a rotation about the axis. We needed to ensure that any -component in the rotating vector needed to be preserved, and to do this, we leveraged the reals by applying two multiplications (moving to the reals and back). We still needed a rotation effect to linger, so we took advantage of combining pre and post multiplication along with a sign flip so that rotation of the imaginary components doesn’t cancel, but the rotations into and out of the reals does. What we ended up with was a conjugation for which all the half angle formulas popped out, which intuitively, is great because we’re doing two applications after all. Without going through the full derivation, it should make sense in a handwavey fashion that this reasoning applies equally well when rotating about a different axis, and due to the nice linear properties of our algebra, the general formula for rotation about an axis should be correct.

Why did we go through all that trouble

Let’s talk about some properties of quaternion rotation. First, it’s extremely easy to describe the rotation of a vector about another one (in contrast, try writing the rotation matrix by hand). Applying the rotation matrix just relies on the standard rules of algebra. Also, we can compose multiple rotations with algebra just as well. Suppose we want to rotate by and .

We have an identity so we can work out the product in advance and take its inverse later. Thus, quaternion rotation is just as composable as rotation using matrices (in fact, we can derive a matrix representation of the conjugation operation without too much difficulty).

The real kicker is that because the application of quaternion rotation is done through purely linear algebraic manipulation, we can take derivatives of the rotated result with respect to changes in the rotation . This does not work with matrices element by element, because for a general rotation matrix, the elements are actually coupled to each other nonlinearly. This is why we can efficiently and accurately interpolate between quaternions (either linearly or spherically depending on the velocity desired).

Upgrading to dual quaternions

Hopefully at this point, we’re ready to make the leap to dual quaternions. Like before, we want to develop a well-behaved algebra, but in this case, we want to extend quaternions to permit both rotations and translations. It’s not wrong to wonder why this is difficult, after all, most people are introduced to translations simply as component-wise addition of vectors. We could, if we wanted, proceed in this way with a bit of bookkeeping, remembering what quantities should be interpreted as translations, and what quantities should be rotations. Difficulty will ensue, however, if we wish to compress multiple transformations (both rotations and translations) together, since the operations don’t commute in general. If we chose to represent everything with matrices in projective space, this is possible but we would lose the nice compact representation we just developed with quaternions, as well as the ability to cleanly interpolate or differentiate. So, we need to extend our currently developed quaternion algebra to encode translations.

Let’s pretend first that we haven’t already seen the formulation of dual numbers and all that. Left to our own devices, we know that we want our “upgraded quaternion” to apply transformations via the same conjugation operator as before. That way, the transformations can compose via simple multiplication as before and we can reuse our rotation formulation as well. To separate the effect of a translation from that of a rotation, we’ll introduce a new unit called so that our upgraded quaternion has the form . Note that we aren’t so much “requiring” that this is the correct form. The introduction of a new unit imposes this form that now encompasses all possible linear combinations of units that make up our dual quaternion. Let’s consider how the conjugation operator behaves acting on identity (we’ll define the inverse based on the quaternion inverses as well as a sign flip on the dual element).

If instead of conjugating the identity, we conjugated a versor (purely imaginary quaternion) scaled by , we’d get the following:

At this point, let’s make the following observation. If we let , then the conjugation by a dual quaternion of an versor scaled by is just . This is precisely the rotation operator of the standard quaternion from the previous section. Conversely, the conjugation operator on the identity reduces down to . To proceed then, we need to consider the quantity . If this could somehow represent translation, then we’d have both our bases covered. Expanding:

At this point, to “arrange” that we get a translation, let’s arbitrarily choose , and to point in some direction and have unit length so that the expression above reduces to . This corresponds to a motion in the direction with a displacement of 2! This indicates that the conjugation of a dual quaternion of the form performs a pure translation along . Meanwhile, because drops out of the expression because , we can continue to use it to represent rotation.

To finalize things, we simply compose the two actions. Given a quaternion representing a rotation and a vector of desired displacement (both unit in length), we combine the effects in the only reasonable way (multiplication). We have associativity after all, because we have a cleanly defined algebra, so let’s use it!

Like our quaternion, this quantity has a well defined derivative (not developed here) and can thus be used for rigid transformations, unlike a matrix formulation (again with weird non-linear cross terms). Remember that when we are trying to transform a point, it now needs to be in the form (refer to the development of the equation from the beginning of this section if you forget why).

Review

We started with a new set of units , , and to encode rotations about 3 axes. Things needed to be different from our familiar complex numbers because we jumped from a single axis of rotation to three. This required an additional dimension to keep things clean (in the same way that we needed two dimensions to handle a single axis of rotation). The action of multiplication by each individual unit performed two effects: rotation into and out of the reals from that unit, and a rotation in units for the others in a manner that anti-commutes (commutes with a sign flip). Thus, encoding a rotation is done quadratically with a sign flip to cancel the effect we don’t want, and persist the effect we do want. This doubled the effect of the rotation, hence the presence of half-angles in our final formulae.

Moving to dual quaternions, we wanted to introduce a way of encoding translation by a vector that would not disrupt the mechanics of the conjugation operator. This way, we could in effect re-use the machinery of the quaternion while maintaining the nice algebraic property of associativity (which is what lets us compose successive transformations with multiplication). To separate the encoding of the translation, we introduced a new unit . Because our conjugation operation produces quadratic terms, we simply impose that the dual unit has the property (in fancy terms, is nilpotent). We then performed the conjugation on a real number and a pure imaginary quaternion to see the effect. By choosing the non-dual and dual parts carefully, we could easily produce a pure rotation and a pure translation. Composing the two operations multiplicatively (again exploiting associativity), we were able to arrive at the final expression representing a combined rotation and translation.

Both quaternions and dual quaternions can be continuously mapped to each other so we can differentiate them which comes into play when implementing animation systems or simulations that rely on rigid motion. We didn’t develop any formulism in this regard, but hopefully, texts that define the spherical interpolation and quaternion/dual-quaternion derivatives will be more accessible. As a final takeaway, it’s worth stepping back and appreciating the efficacy of abstract algebra as a tool for encoding actions. Typically, the development of a new algebra stems from identifying a desired behavior, attempting to arrange for it to “be so,” and observing the fallout that results. To continue your studies, I’ve assembled a number of helpful papers and resources I’ve used below. Feel free to message or tweet your feedback on the article using the various social media links below. Thanks for reading and if you managed to get through the whole thing, give yourself a proverbial pat on the back.

References