Relativistic quantum mechanics

Learning goals for this week:
- General Lorentz invariance
- Why Schrödinger equation is not Lorentz invariant
- Relativistic extensions: Klein-Gordon equation and Dirac equation
- Dirac equation: spinor solutions (interpretation in the next lecture)

The basic postulates of special relativity are: (i) the laws of nature are identical (invariant) in all inertial frames of reference, and (ii) the speed of light is same for all such frames. A consequence of these postulates is that there is no strict distinction between time and space, but that space and time dimensions appear different for observers moving at different speeds.

The Schrödinger equation is clearly incompatible with the special relativity: it is of the first order in time derivative, but of the second order in spatial derivatives. Our task in this part of course is to find a relativistic generalization(s) of the Schrödinger equation.

To accomplish this, let us first remind ourselves of the mathematical formalism of special relativity. For a more detailed description of special relativity, see David Tong's lecture notes.

Elements of special relativity

In Minkowski space, one considers space and time coordinates together, as one four-vector \(x^\mu\): \[ x \equiv\begin{pmatrix} c t \\ x \\ y \\ z \end{pmatrix} \equiv\begin{pmatrix} x^{0} \\ x^{1} \\ x^{2} \\ x^{3} \end{pmatrix} \equiv \begin{pmatrix} x^{0} \\ \mathbf{x}\end{pmatrix} %\equiv\mqty(x^{0}, \mathbf{x}^{\intercal})^{\intercal}, \] where \(c\) is the speed of light and \(\mu\in\{0,1,2,3\}\) is the Lorentz index. Strictly speaking, \(x^\mu\) is the \(\mu\)th component of the vector \(x\). Often, however, \(x^\mu\) refers to the whole vector. From the context, it should be clear what one means.

We use greek letters to denote the Lorentz indices (\(\mu,\nu,...\in\{0,1,2,3\}\)). Roman letters are used to denote the spatial indices, (\(i,j,...\in\{1,2,3\}\)).

The location of the indices is important: \(x^\mu\), with an upper index, is a contravariant vector. Correspondingly, \(y_\mu\), with a lower index, is a covariant vector. In relativistic algebra, the difference between the two types of vectors is in how they transform in change of basis; covariant vectors change along the change of basis, whereas contravariant vectors change in an inverse way. Mathematically, contravariant vectors are elements of a vector space \(V\) (i.e. vectors), and covariant vectors are elements of the dual space \(V^*\) (i.e. dual vectors).

Metric tensor

We can lower the indices with the covariant metric tensor ; so that with every contravariant four-vector , we associate a covariant four-vector Above, we introduced the Einstein's summation convention, in which the repeated Lorentz indices are summed over. The notation is called Einstein notation.

With the above metric tensor, the components of the contravariant/covariant vectors are related by: and .

Similarly, we can lower an index in any higher order tensor :

Sometimes we have the opposite situation: we have a covariant vector \(A_\mu\) and we would like to find the corresponding contravariant vector. This can be done with the contravariant metric tensor \(g^{\mu\nu}\): \[ A^\mu = g^{\mu\nu} A_\nu. \] We want this definition to be in harmony with the definition \(A_\nu = g_{\nu\kappa} A^\kappa\). Thus \[ A^\mu= g^{\mu\nu} g_{\nu\kappa} A^\kappa = g^\mu_{\phantom{\mu}\kappa} A^\kappa. \] Since this is true for every vector, \(g^\mu_{\phantom{\mu}\kappa}\) must be an identity in the sense that \[ g^\mu_{\phantom{\mu}\kappa} = \delta^\mu_\kappa,\quad\text{where}\quad \delta^\mu_\kappa = \begin{cases} 1,\; \mu=\kappa,\\ 0,\; \mu\neq\kappa. \end{cases} \] We conclude that the covariant and contravariant metric tensor are the inverses of each other (in the matrix sense) \[ g^{\mu\nu} = g^{-1}_{\mu\nu} = g_{\mu\nu}. \] where the second step follows because in this case \(g_{\mu\nu}\) is its own inverse. This step does not generalize from special relativity to a curved space-time.

We also note that due to the symmetry of the metric tensor (\(g_{\mu\nu} = g_{\nu\mu}\)), it is not important which index we raise: \[ g_\mu^{\phantom{\mu}\kappa} = g_{\mu\nu}g^{\nu\kappa} = g_{\nu\mu}g^{\kappa \nu} = g^\kappa_{\phantom{\kappa}\mu} = \delta^\kappa_\mu. \] For a general tensor, \(A_{\mu}^{\phantom{\nu}\nu} \neq A_{\phantom{\nu}\mu}^\nu\).

Inner product

As the name metric tensor suggests, we use \(g\) to define an inner product, which gives a notion of frame independent distance between two points in the Minkowski space: \[ A\cdot B \equiv g_{\mu\nu} A^\mu B^\nu = A_\mu B^\mu = A^\mu B_\mu. \] Separating the temporal and spatial parts, the Minkowski inner product is \[ A\cdot B = A^0 B^0 - \mathbf{A} \cdot \mathbf{B}, \] where the dot product between the spatial vectors is the usual Euclidean inner product. Notice the minus sign!

Lorentz tranformations

A linear coordinate transformation is a real-valued 4x4 matrix \(\Lambda\) such that \[ x^\mu \rightarrow x^{\prime\mu} = {\Lambda^\mu}_\nu x^\nu. \] Here \(x^\mu\) are the coordinates in the original coordinate system and \(x'^\mu\) are the coordinates after the transformation. The transformation can also be expressed as a derivative between the coordinates in transformed and original coordinate systems: \[ \Lambda^\mu{}_\nu = \frac{\partial x'^\mu}{\partial x^\nu}. \]

Lorentz transformations are defined as coordinate transformations which leave the inner product between four-vectors invariant, i.e. \[ x^\mu g_{\mu\nu} y^\nu = x'^\alpha g_{\alpha\beta} y'^\beta = x^\mu {\Lambda^\alpha}_\mu g_{\alpha\beta} {\Lambda^\beta}_\nu y^\nu, \] for all \(x\) and \(y\). These transformations form a group, known as the Lorentz group. Without reference to the vectors \(x\) and \(y\), the condition which defines the Lorentz group is \[ g_{\mu\nu} = {\Lambda^\alpha}_\mu g_{\alpha\beta} {\Lambda^\beta}_\nu. \]

By contracting both sides with \(g^{\kappa\mu}\), we obtain \[ \delta_{\nu}^\kappa = {\Lambda_\beta}^\kappa {\Lambda^\beta}_\nu, \] so \({\Lambda_\nu}^\mu = {(\Lambda^{-1})^\mu}_\nu\) is the matrix inverse of \({\Lambda^\nu}_\mu\), by which we mean that we can write the above equation in terms of matrix multiplication as \[ g=\Lambda^\intercal g \Lambda. \] The matrices are not defined systematically with regard to upper/lower indices; the components of \(\Lambda\) are \([\Lambda]_{\mu\nu}=\Lambda^\mu{}_\nu\), whereas the components of \(g\) are \([g]_{\mu\nu}=g_{\mu\nu}=g^{\mu\nu}\). From this equation we see that the matrix \(\Lambda\) has the inverse \[ \Lambda^{-1} = g^{-1}\Lambda^\intercal g=(g\Lambda g)^\intercal, \] which can be written in terms of the components as \[ (\Lambda^{-1})^\mu{}_\nu \equiv [\Lambda^{-1}]_{\mu\nu} = [(g \Lambda g)^\intercal]_{\mu\nu}= [g \Lambda g]_{\nu\mu} = [g]_{\nu\alpha} [\Lambda]_{\alpha\beta} [g]_{\beta\mu} = g_{\nu\alpha} \Lambda^\alpha{}_\beta g^{\beta\mu} = \Lambda_\nu{}^\mu. \]

Connected components of the Lorentz group

In the following, unless stated otherwise, by Lorentz transformations we mean the proper transformations, which do not invert time (time reversal transformation) or space (parity transformations).

Proper rotations

Pure rotations form a subgroup of the Lorentz group. They are represented by the matrices where is a rotation matrix. For example, a rotation of around the z-axis is given by a Lorentz transformation

Let us check that the four-vector inner product is invariant under rotations. The action on a coordinate four-vector is The inner product between four-vectors is: where we used the fact that for proper rotations . Also and , so this is indeed a proper Lorentz transformation.

Lorentz boosts

Apart from rotations, the other basic type of Lorentz transformation is a boost, a transformation to a frame moving at velocity relative to the original frame. Let and be two inertial frames such that at their origin and the coordinate axes coincide. Let us then assume that moves with a velocity relative to . On a course of special relativity we have learned that the coordinates in the two frames are related by where . The transformation between the two frames is represented by the matrix which can, like the rotation above, be verified to satisfy the conditions of a proper Lorentz transformation. Sometimes it is useful to parametrize the boost by so called rapidity . The above transformation then looks like

The rest of the Lorentz transformations can be generated as a combination of boosts and rotations. The larger symmetry group which also includes space-time translations is known as the Poincare group.

Transformations of contravariant and covariant vectors

Above, we defined that the contravariant vectors transform as \[ x'^\mu = {\Lambda^\mu}_\nu x^\nu. \]

By using the metric tensor to manipulate the location of indices, we can prove that covariant vectors transform inversely under Lorentz transformations: \[ \begin{aligned} x'_\mu &\equiv g_{\mu\nu} x'^\nu = g_{\mu\nu} {\Lambda^\nu}_\kappa x^\kappa = g_{\mu\nu} {\Lambda^\nu}_\kappa g^{\nu\kappa} x_\kappa \\ &= \Lambda_\mu{}^\nu x_\nu = (\Lambda^{-1})^\nu{}_\mu x_\nu. \end{aligned} \] The example below shows geometrically why the covariant vector transforms in an inverse way. It also shows mathematically what we mean by a transformation.

Example of contravariance and covariance in 2D

Transformations of fields

Partial derivatives

Let us define a partial derivative with respect to the components of a contravariant vector \(x^\mu\): \[ \partial_\mu = \frac{\partial}{\partial x^\mu} %= \left( \frac 1 c \frac {\partial}{\partial t},\frac {\partial}{\partial x},\frac {\partial}{\partial y},\frac {\partial}{\partial z}\right) = \left( \frac 1 c \frac {\partial}{\partial t},\nabla\right), \] which acts on \(x^\mu\) as \[ \partial_\nu x^\mu = \frac{\partial x^\mu}{\partial x^\nu} = \delta_\nu^\mu. \] How does \(\partial_\nu\) transform under Lorentz transformations? From the requirement \(\partial'_\nu x'^\mu = \delta^\mu_\nu\) for the transformed derivative, it can be verified that it transforms as a covariant vector, just as the notation suggests: \[ \partial'_\mu = \frac{\partial}{\partial x'^\mu} = (\Lambda^{-1})^\nu{}_\mu \partial_\nu. \]

Similarly, we define a partial derivative with respect to a covariant vector: \[ \partial^\mu = \frac{\partial}{\partial x_\mu} = \left( \frac 1 c \frac {\partial}{\partial t},-\nabla\right)^\intercal, \] where the extra minus sign is required to cancel the minus signs in the defition of \(x_\mu\) in order to have \[ \partial^\nu x_\mu = \frac{\partial x_\mu}{\partial x_\nu} = \delta^\nu_\mu. \] This derivative transforms as a contravariant vector: \[ \partial'^\mu = \frac{\partial}{\partial x'_\mu} = \Lambda^\mu{}_\nu \partial^\nu. \]

Note: Be careful with the signs! Here, no minus signs appear: \[ \begin{aligned} \partial_\mu A^\mu &= \partial_0 A^0 + \partial_1 A^1 + \partial_2 A^2 + \partial_3 A^3\\ &= \frac 1 c \frac{\partial A^0}{\partial t} + \nabla\cdot \mathbf A = \partial^\mu A_\mu \end{aligned} \] Unlike here, where we do get a minus sign: \[ \begin{aligned} \Box = \partial_\mu \partial^\mu &= \partial_0 \partial^0 + \partial_1 \partial^1 + \partial_2 \partial^2 + \partial_3 \partial^3\\ &= \partial_0 \partial_0 - \partial_1 \partial_1 - \partial_2 \partial_2 - \partial_3 \partial_3\\ &= \frac{1}{c^2}\frac{\partial}{\partial^2t} - \nabla^2. \end{aligned} \]

Four-momentum

Let us consider the relativistic generalization of momentum for a particle moving at velocity in the frame . From the Noether's theorem we learn that energy is the conserved quantity associated with the time translations, whereas momentum is the conserved quantity associated with the spatial translations. Thus we can assume that the relativistic generalization of momentum, four-momentum, has the form but we do not know immediately the expressions for and .

However, we do know that the particle is at rest in the frame moving at velocity . In that frame the particle has no 3-momentum and thus we know that the 4-momentum is of the form The fact that we put a mass into the component follows from dimensional analysis. Here we also require that , since a massless particle does not have a rest frame. To find the four-momentum in the frame , we make Lorentz boost from to . (Do this as an exercise!) The result is

The square of 4-momentum is a Lorentz invariant quantity In the rest frame, the energy is positive, so we find the relativistic dispersion of a free particle with mass : A proper generalization of the Schrödinger equation should reproduce this dispersion.

Klein-Gordon equation

Note: From now on, we work in natural units \(\hbar=c=1\).

Recall how for a nonrelativistic system the Schrödinger equation \[ i \frac{\partial \psi(\mathbf{x},t)}{\partial t} = \hat H \psi(\mathbf{x},t), \] could be obtained from a classical Hamiltonian \(H(\mathbf{p},\mathbf{x})=\frac{|\mathbf p|^2}{2m} +V(\mathbf{x})\) by replacing \(\mathbf{p}\) and \(\mathbf{x}\) by operators which obey the canonical commutation relations. The form of the Schrödinger equation in position representation suggests that we effectively did the replacements \(E\rightarrow i\partial_t\) and \(\mathbf{p}\rightarrow -i\nabla\).

If we try to do the same replacement for the relativistic dispersion \(E=\sqrt{m^2 + |\mathbf p|^2}\), we fail. Even if we manage to interpret the square root operator \(\sqrt{m^2 - \nabla^2}\) meaningfully (e.g. by Fourier transform or series expansion), we end up with a complicated nonlocal operator.

If, instead, we start from the square of the above expression, \(E^2-|\mathbf p|^2=m^2\), we avoid such problems. We make the replacements \(E^2\rightarrow -\partial_t^2\) and \(|\mathbf p|^2\rightarrow -\nabla^2\) and apply the resulting operator on a scalar field \(\psi(\mathbf{x},t)\) to obtain the Klein-Gordon equation \[ \left( \frac{\partial^2}{\partial t^2}- \nabla^2 + m^2\right)\psi(\mathbf{x},t) = 0, \] which can be written in Einstein notation as \[ \left( \partial_\mu\partial^\mu + m^2\right)\psi(\mathbf{x},t) = 0. \] Klein-Gordon equation is clearly Lorentz invariant. So far everything is good. As we are attempting to generalize Schrödinger equation, we hope to be able interpret \(\psi\) as a wavefunction of a particle.

Like the Schrödinger equation, the Klein-Gordon equation has plane-wave solutions: \[ \psi_{\mathbf p}(\mathbf{x},t) = \mathcal N e^{-i p\cdot x} = \mathcal N e^{-i (E t-\mathbf{p}\cdot\mathbf{x}) }. \] Let us fix the momentum \(\mathbf p\) to some value and solve for the energy of this solution. Substituting the above into the Klein-Gordon equation, we find \[ \left( \frac{\partial^2}{\partial t^2}- \nabla^2 + m^2\right)\psi_{\mathbf p}(\mathbf{x},t) = \left( -E^2 + |\mathbf p|^2 + m^2\right)\psi_{\mathbf{p}}(\mathbf{x},t) = 0, \] which has two solutions: \(E = \pm\sqrt{m^2+|\mathbf p|^2}\). The positive energy solution is what we expected, but what are we to make of the negative energy solution?

It turns out the negative energy solutions make the theory unstable. The energy of a particle is not bounded from below, which means that there is no ground state. If we were to add a perturbation (e.g. potentials), they would couple the negative and positive energy states, and the particle could transition to lower and lower energy states.

The negative energies also force us to abandon the hope of interpreting \(\psi\) as a wavefunction. As with the non-relativistic treatment, one obtains the continuity equation for the probability 4-current from the Klein-Gordon equation and its complex conjugate: \[ \partial_\mu j^\mu =0,\quad\text{with}\quad j^\mu=(\rho,\mathbf j)=\frac{i}{2m}\left( \psi^*\partial^\mu\psi -(\partial^\mu\psi^*)\psi \right). \] The "probability density" \(\rho\) for a plane-wave solution is \[ \rho = |\mathcal N|^2 \frac{E}{m}, \] which is negative if \(E<0\). The conclusion is that we cannot interpret \(\rho\) as a probability density and thus \(\psi\) is not a wavefunction.

It turns out that the classical Klein-Gordon equation can be used to derive a relativistic quantum field theory for spin-0 particles (e.g. pion or Higgs boson), but the quantization requires either the use of path integrals or the full machinery of canonical quantization (not that this would be very hard; we already did it for the electromagnetic field in the QED part of the course). Luckily, there is another equation which we can quantize with the above simple procedure, the Dirac equation, which turns out to describe electrons and other spin-½ particles.

Dirac equation

Dirac ansatz

In 1928, Paul Dirac developed a new field equation for spin-½ particles. For the (very readable) original article, see here. Dirac's motivation was to derive an equation which would not have the problems of the Klein-Gordon equation. The negative energy states in Klein-Gordon equation seem to stem from the fact that it is of the second order in time derivative, thus his idea was to search for a first order equation both in \(\frac{\partial}{\partial t}\) and \(\vec\nabla\), by first assuming a general ansatz \[ i\frac{\partial\psi(\mathbf{x},t)}{\partial t} = H_D \psi(\mathbf{x},t) %= (-i c\bm\alpha\cdot \nabla+\beta m c^2)\psi(\mathbf{x},t), = (-i \bm\alpha\cdot \nabla+\beta m )\psi(\mathbf{x},t), \] where the four initially unknown quantities \(\alpha^i\) and \(\beta\) commute with \(\partial_i\) (i.e. they do not depend on \(\mathbf x\)) but not with each other. As groups can typically be represented by matrices, we can assume that \(\vec\alpha\) and \(\beta\) are N×N matrices, where the dimension N is as of yet unknown. In order to have a compatible structure with the matrices, \(\psi=\psi(\mathbf{x},t)\) must then be a (complex) N×1 column vector.

Dirac then required that the field \(\psi\) should also satisfy the Klein-Gordon equation, to guarantee that the relativistic dispersion relation \(E^2 = |\mathbf p|^2 + m^2\) holds for the solutions of the equation:

\[ -\frac{\partial^2 \psi}{\partial t^2} = H_D^2 \psi = (-i \bm\alpha\cdot \mathbf \nabla+\beta m )(-i\bm\alpha\cdot \mathbf \nabla+\beta m)\psi\\ \] Moving towards the covariant formulation, we write the above as \[ \left[ \partial_t^2 - \sum_{i,j=1}^3\alpha^i\alpha^j \partial_i\partial_j - \sum_{i=1}^3i(\alpha^i\beta +\beta\alpha^i) m \partial_i + \beta^2 m^2 \right]\psi = 0 \] The Klein-Gordon equation, on the hand, can be written as \[ \left( \partial_t^2 - \sum_{i=1}^3\partial_i^2 + m^2 \right)\psi = 0, \] Comparing the two equations above term by term, we find that Dirac's ansatz satisfies the Klein-Gordon equation if and only if \[ \alpha^i\alpha^j + \alpha^j\alpha^i = 2\delta^{ij},\\ \alpha^i\beta+\beta\alpha^i=0,\\ \beta^2 = 1, \] Furthermore, we require that the Hamiltonian \(H_D\) is Hermitean: \(H_D = H_D^\dagger\). In the exercises it will be shown that \(\alpha^i\) and \(\beta\) are hermitean, traceless, even-dimensional, mutually anticommuting matrices. The lowest dimensional matrices that can represent them are 4×4 matrices.

The above requirements do not fix \(\alpha^i\) and \(\beta\) uniquely, so we have some freedom in choosing their representation. There are a few common choices. We take the Dirac-Pauli representation (or basis), with which the non-relativistic limit is particularly simple, \[ \alpha^i = \begin{pmatrix} 0 & \sigma^i \\ \sigma^i & 0 \end{pmatrix},\quad \beta = \begin{pmatrix} I & 0 \\ 0 & -I \end{pmatrix}, \] where \(\sigma^i\)'s are the Pauli matrices and \(I\) is the 2×2 identity matrix. The other commonly used choices for \(\alpha^i\) and \(\beta\) matrices go by the names Weyl representation and Majorana representation.

Probability current

Because the Hamiltonian \(H_D\) is hermitean, the probability amplitude \(\rho(\mathbf{x},t) = \psi(\mathbf{x},t)^\dagger\psi(\mathbf{x},t)\) is conserved: \[ \begin{aligned} \partial_t\rho = \partial_t(\psi^\dagger\psi) &= (\partial_t \psi)^\dagger\psi + \psi^\dagger (\partial_t \psi)\\ &= -(iH_D \psi)^\dagger\psi + \psi^\dagger (i H_D \psi)\\ &= \left[(\nabla \psi^\dagger\cdot\bm\alpha)\psi + \psi^\dagger(\bm\alpha\cdot\nabla \psi)\right]\\ &= \nabla\cdot (\psi^\dagger \bm\alpha\,\psi) \equiv \nabla\cdot\mathbf j, \end{aligned} \] Also, by definition, \(\rho(\mathbf{x},t)\geq0\). In this sense we have improved from the Klein-Gordon equation.

Covariant equation of motion

To formulate the Dirac equation in a covariant form, we multiply both sides of the equation by \(\beta\) and define a new set of matrices \[ \gamma^0 = \beta = \begin{pmatrix} I & 0 \\ 0 & -I\end{pmatrix},\quad \gamma^i = \beta \alpha^i = \begin{pmatrix} 0 & \sigma^i \\ -\sigma^i & 0\end{pmatrix}, \] which have the anticommutation relations \[ \{\gamma^\mu, \gamma^\nu\} = 2g^{\mu\nu} 1_4, \] defining a Clifford algebra. Because of this algebraic structure, the \(\gamma\)-matrices have a lot of useful properties. However, we do not discuss these in detail, but we just give one identity that is used in derivations below: \[ (\gamma^\mu)^\dagger=\gamma^0\gamma^\mu\gamma^0. \] This identity is independent of the choice of the representation.

In terms of \(\gamma\)-matrices, the Dirac equation becomes \[ \left( i \gamma^\mu \partial_\mu- m \right) \psi(x) = 0. \]

In particle physics, the Feynman slash notation \(\gamma^\mu a_\mu = a\!\!\!/\) is often used, so that we obtain an even more condensed form \[ \left( i {\partial}\mkern-10mu/- m \right) \psi = 0. \]

Note that even if the \(\gamma^\mu\)-matrices carry the Lorentz index \(\mu\) of the partial derivative, they do not transform as contravariant vectors. We have defined them as constant matrices so they do not change at all in Lorentz transformations. Next, we show how the wavefunction \(\psi(x)\) should transform so that the Dirac equation would be Lorentz invariant.

Note: The \(H_D\) defined above is like any other one-particle Hamiltonian: We can replace the non-relativistic Hamiltonian with it and do calculations as before. The only caveat is that its eigenenergy spectrum is not bounded from below, as discussed below. This problem can be addressed only in a many-body formulation.

Properties of \(\gamma\)-matrices

Different representations of the \(\gamma\)-matrices

Choosing the above \(4\times 4\) matrix representation for the \(\gamma\)-matrices means that the eigenvectors \(\psi(x)\) satisfying the Dirac equation are spinors of the form \[ \psi(x) = \begin{pmatrix} \psi_0(x) \\ \psi_1(x) \\ \psi_2(x) \\ \psi_3(x) \end{pmatrix}. \] Below we discuss the form of the eigenfunctions of the Dirac equation. But let us first check how such spinors transform under Lorentz transformations. These 4-component spinors can be built from two 2-component spinors (see the section about Weyl fermions below), and are sometimes called bi-spinors.

Lorentz transformation of spinors

We now derive the transformation law for \(\psi(x)\), by assuming that the form of the Dirac equation is invariant under Lorentz transformations. We can assume that \(\psi\) transforms linearly under Lorentz transformations: \[ \psi(x) \rightarrow \psi'(x') = S(\Lambda)\psi(x), \] where \(S(\Lambda)\) is a 4×4 matrix representation of the Lorentz transformation \(\Lambda\). We know how the partial derivative transforms: \[ \partial_\mu \rightarrow \partial'_\mu = (\Lambda^{-1})^\nu{}_\mu \partial_\nu \] The Dirac equation \[ (i\gamma^\mu\partial_\mu-m)\psi(x)=0 \] transforms into \[ \begin{aligned} 0 = (i\gamma^\mu\partial'_\mu-m)\psi'(x') &= (i\gamma^\mu(\Lambda^{-1})^\nu{}_\mu \partial_\nu-m)S(\Lambda)\psi(x)\\ &= S(\Lambda)\left[iS(\Lambda^{-1})(\Lambda^{-1})^\mu{}_\nu\gamma^\nu S(\Lambda) \partial_\mu -m\right]\psi(x), \end{aligned} \] which has, apart from the factor \(S(\Lambda)\) on the left, the same form as the original equation if \(S(\Lambda)\) obeys the equation \(S(\Lambda^{-1})(\Lambda^{-1})^\mu{}_\nu \gamma^\nu S(\Lambda) = \gamma^\mu\), which is equivalent to \[ S(\Lambda^{-1})\gamma^\mu S(\Lambda) = \Lambda^\mu{}_\nu \gamma^\nu. \] This equation gives the connection between a Lorentz transformation \(\Lambda\) and its spinor representation \(S(\Lambda)\). If \(\Lambda\) is known, it is possible to compute \(S(\Lambda)\).

You might recall from non-relativistic QM that the Pauli matrices obey a similar equation: \[ U^\dagger\sigma^i U = \sum_{j=1}^3 R_{ij} \sigma^j, \] where \(R\) is a rotation matrix for rotation of angle \(\theta\) around the axis \(\mathbf n\) and \(U=\exp(-i\theta\mathbf n\cdot\bm\sigma/2)\) is the corresponding unitary transformation for spinors.

Let us define the commutator of two \(\gamma\)-matrices: \[ \sigma^{\mu\nu} = [\gamma^\mu,\gamma^\nu]. \] For example, a Lorentz boost of rapidity \(\xi\) to x direction, \[ (\Lambda_1)^\mu{}_\nu = \begin{pmatrix} \cosh\xi & -\sinh\xi & 0 & 0\\-\sinh\xi & \cosh\xi & 0 & 0\\ 0 & 0 & 1 &0\\ 0& 0& 0& 1 \end{pmatrix}, \] is represented by a spinor transformation (left here as an exercise) \[ S(\Lambda_1) = \exp(-\xi \sigma^{01}/4) = \cosh\frac \xi 2-\gamma^0\gamma^1 \sinh\frac \xi 2, \] and a rotation of angle \(\theta\) around z-axis (or equivalently: on xy-plane, which is what the indices of \(\sigma^{\mu\nu}\) refer to) is represented by \[ S(\Lambda_2) = \exp(\theta\sigma^{12}/4)= \begin{pmatrix} \exp(-i\theta\sigma_3/2) & 0\\ 0 & \exp(-i\theta\sigma_3/2) \end{pmatrix}, \] which contains two copies of the same non-relativistic rotation. The above examples suggest that a spinor rotation is unitary, \(S(\Lambda^{-1}_2) = S(\Lambda_2)^\dagger\), but a spinor boost is not: \(S(\Lambda^{-1}_1)\neq S(\Lambda_1)^\dagger\).

To find the inverse transformation \(S(\Lambda)^{-1}=S(\Lambda^{-1})\) we take a hermitean conjugate of the equation relating \(\Lambda\) and \(S(\Lambda)\): \[ S(\Lambda)^\dagger(\gamma^\nu)^\dagger S(\Lambda^{-1})^\dagger = \Lambda^\mu{}_\nu (\gamma^\mu)^\dagger. \] Then we use the identity \((\gamma^\nu)^\dagger=\gamma_0\gamma^\nu\gamma_0\) and multiply from both sides with \(\gamma_0\): \[ \gamma_0S(\Lambda)^\dagger\gamma_0\gamma^\nu \gamma_0 S(\Lambda^{-1})^\dagger\gamma_0 = \Lambda^\mu{}_\nu \gamma^\mu. \] Comparison with the original equation suggests that \[ S(\Lambda^{-1}) = \gamma^0 S(\Lambda)^\dagger\gamma^0. \] That this is indeed the inverse of \(S(\Lambda)\) can be verified from the general solution \(S(\Lambda)=\exp(\frac 1 2 \omega_{\mu\nu}\sigma^{\mu\nu})\), where \(\omega_{\mu\nu}=-\omega_{\nu\mu}\) is a real antisymmetric matrix which parametrizes the Lorentz transformations.

Dirac adjoint and relativistic scalars

The 4×1 column vectors \(\psi\) act as the kets of the Dirac equation. Given a ket \(\psi\), what is the corresponding bra, i.e. the 1×4 row vector, which would allow us to calculate expectation values by sandwiching operators between bras and kets? A natural first guess would the hermitean conjugate \(\psi^\dagger\). However, this does not work since, e.g. the bilinear \(\psi^\dagger\psi\) is not a Lorentz invariant scalar: \[ \psi^\dagger\psi \rightarrow \psi'^\dagger\psi'=\psi^\dagger S(\Lambda)^\dagger S(\Lambda) \psi \neq \psi^\dagger\psi, \] because, as we saw above, \(S(\Lambda)\) is not unitary if \(\Lambda\) is a Lorentz boost.

The solution is to define a Dirac adjoint \[ \bar\psi = \psi^\dagger \gamma^0, \] for which the bilinear \(\bar\psi \psi\) is Lorentz invariant.

The necessicity of defining the adjoint \(\bar\psi\neq\psi^\dagger\) can be traced back to the Clifford algebra and the \((+,-,-,-)\) signature of the metric tensor, which forces the eigenvalues of some of the \(\gamma\)-matrices to be imaginary. Because of this, all the matrices \(S(\Lambda)\) cannot be unitary.

Sandwiched between the spinors, the \(\gamma\)-matrices effectively transform according to their Lorentz indices. For example, the quantity \(\bar\psi \gamma^\mu \psi\) transforms as a Lorentz vector, \(\bar\psi \gamma^\mu\gamma^\nu \psi\) transforms as a rank-2 contravariant tensor and so on.

Dirac Lagrangian

Plane wave solutions

Let us find the plane wave solutions for the Dirac equation with an ansatz \[ \psi(x) = u(\mathbf p) e^{-i p\cdot x} = u(\mathbf p) e^{-i(E t-\mathbf p\cdot \mathbf x)}, \] where \(u(\mathbf p)\) is a 4-component spinor. Let us see if we still have the negative energy solutions. By the above substitution the Dirac equation becomes \[ E u(\mathbf p) = (\bm\alpha\cdot\mathbf p+\beta m) u(\mathbf p). \] The above can we written in block matrix form as \[ \begin{pmatrix} m-E & \mathbf{p}\cdot\bm{\sigma}\\ \mathbf{p}\cdot\bm{\sigma} & -m-E \end{pmatrix} \begin{pmatrix} u_A(\mathbf p)\\ u_B(\mathbf p) \end{pmatrix} =0, \] where \(u_{A}\) and \(u_{B}\) are 2-component spinors. The eigenvalues satisfy the condition \(\det(H_D-E)=0\).

The determinant condition is explicitly \[ \begin{aligned} 0 &= \begin{vmatrix} m-E & 0 & p_z & p_x-ip_y\\ 0 & m-E & p_x+ip_y & -p_z\\ p_z & p_x-ip_y & -m-E & 0\\ p_x+ip_y & -p_z & 0 & -m-E \end{vmatrix}\\ &= (E^2 - p_x^2-p_y^2-p_z^2- m^2)^2 = (E^2 - |\mathbf p|^2- m^2)^2, \end{aligned} \] which has the solutions \(E = \pm\sqrt{|\mathbf p|^2 + m^2}\). Thus we did not cure the theory of the negative energy solutions. For convenience, we define a positive energy dispersion \(E_{\mathbf p} = \sqrt{|\mathbf p|^2 + m^2}\).

Fig: Spectrum of the Dirac equation with a mass (solid line). The dashed line is the spectrum when .

It might be interesting to compare this to the spectrum in the superconducting state (4th lecture), relating \(m\) and \(\Delta\). Or to the avoided crossing in the 5th lecture.

17 Apr 20 (edited 21 Apr 20)

To gain understanding of the form of the solutions, set \(\mathbf p=0\). We find the solutions \[ u_1(0) = N(0)\begin{pmatrix} 1\\0\\0\\0 \end{pmatrix},\quad u_2(0) = N(0)\begin{pmatrix} 0\\1\\0\\0 \end{pmatrix},\quad u_3(0) = N(0)\begin{pmatrix} 0\\0\\1\\0 \end{pmatrix},\quad u_4(0) = N(0)\begin{pmatrix} 0\\0\\0\\1 \end{pmatrix}, \] where \(N(0)\) is a normalization constant. The particle is at rest, so its energy should be composed purely of the rest mass. Indeed, we find for the first solutions that \(E_1=E_2=m\), which is positive, and for the last two \(E_3=E_4=-m\), which is negative.

For \(E>0\), the block-wise equation can be solved as \[ u(\mathbf p) = \begin{pmatrix} u_A(\mathbf p)\\u_B(\mathbf p) \end{pmatrix} = \begin{pmatrix} u_A(\mathbf p)\\ \frac{\mathbf p\cdot\bm\sigma}{E_{\mathbf p}+m} u_A(\mathbf p)\end{pmatrix}, \] where \(u_A\) is a 2-component spinor which can be chosen freely, reflecting the spin degree of freedom. We choose as a basis the spin-up and spin-down states in the \(z\)-direction: \[ u^{(1)}_A(\mathbf p)=N(\mathbf p)\chi_+ = N(\mathbf p)\begin{pmatrix}1\\0\end{pmatrix},\quad u^{(2)}_A(\mathbf p)=N(\mathbf p)\chi_- = N(\mathbf p)\begin{pmatrix}0\\1\end{pmatrix}. \] The above choice of basis is just a parametrization of the upper component. In general, the lower component has a different direction of spin and the bi-spinor is not an eigenstate of spin.

For \(E<0\), the block-wise equation can be solved as \[ u(\mathbf p) = \begin{pmatrix} u_A(\mathbf p)\\u_B(\mathbf p) \end{pmatrix} = \begin{pmatrix} -\frac{\mathbf p\cdot\bm\sigma}{E_{\mathbf p}+m} u_B(\mathbf p)\\ u_B(\mathbf p)\end{pmatrix}, \] where \(u_B\) is again arbitrary 2-component spinor. As above, we use for it the basis \(\chi_\pm\) (multiplied by a normalization constant).

For given momentum \(\mathbf p\), we have four linearly independent solutions \[ u^1(\mathbf p) = N(\mathbf p)\begin{pmatrix} 1\\0\\ \frac{p_z}{E_{\mathbf p}+m} \\ \frac{p_x+ip_y}{E_{\mathbf p}+m}\end{pmatrix},\quad u^2(\mathbf p) =N(\mathbf p)\begin{pmatrix} 0 \\ 1 \\ \frac{p_x-ip_y}{E_{\mathbf p}+m} \\ -\frac{p_z}{E_{\mathbf p}+m}\end{pmatrix}, \] \[ u^3(\mathbf p) = N(\mathbf p)\begin{pmatrix} -\frac{p_z}{E_{\mathbf p}+m} \\ -\frac{p_x+ip_y}{E_{\mathbf p}+m} \\ 1 \\ 0\end{pmatrix},\quad u^4(\mathbf p) = N(\mathbf p)\begin{pmatrix} -\frac{p_x-ip_y}{E_{\mathbf p}+m} \\ \frac{p_z}{E_{\mathbf p}+m}\\ 0 \\ 1\end{pmatrix}, \] which have the energies \(E_1 = E_2 = E_{\mathbf p}>0\) and \(E_3 = E_4 = -E_{\mathbf p}<0\). The normalization \(N(\mathbf p)\) can be chosen in different ways, depending on the context. Different normalizations are defined in the collapsible box below, along with identities related to \(u_s\)'s.

The general solution of the Dirac equation can be written as \[ \psi(\mathbf x,t) = \int\frac{{\rm d}^3\mathbf p}{(2\pi)^3} \left[ \sum_{s=1,2} a_s(\mathbf p)u^s(\mathbf p) e^{-i(E_{\mathbf p}t-\mathbf p\cdot\mathbf x)} + \sum_{s=3,4} a_s(\mathbf p)u^s(\mathbf p) e^{-i(-E_{\mathbf p}t-\mathbf p\cdot\mathbf x)}\right], \] where \(a_s(\mathbf p)\)'s are complex number coefficients. The solution can be written in a more symmetric way by defining the antiparticle spinors \[ \begin{aligned} v^1(\mathbf p) = u^4(-\mathbf p) = N(\mathbf p)\begin{pmatrix} \frac{\bm\sigma\cdot\mathbf p}{E_{\mathbf p}+m}\chi_- \\ \chi_- \end{pmatrix},\\ v^2(\mathbf p) = u^3(-\mathbf p) = N(\mathbf p)\begin{pmatrix} \frac{\bm\sigma\cdot\mathbf p}{E_{\mathbf p}+m}\chi_+ \\ \chi_+ \end{pmatrix}, \end{aligned} \] and making the change of variables \(\mathbf p\rightarrow -\mathbf p\) on the second sum: \[ \psi(\mathbf x,t) = \int\frac{{\rm d}^3\mathbf p}{(2\pi)^3} \sum_{s=1,2} \left[ a_s(\mathbf p)u^s(\mathbf p) e^{-i p\cdot x} + b_s^*(\mathbf p)v^s(\mathbf p) e^{i p\cdot x}\right], \] where \(p = (E_{\mathbf p},\mathbf p)\) is the four-momentum of the positive energy solution, and the coefficients are \(b^*_{1/2}(\mathbf p) = a_{4/3}(-\mathbf p)\).

Stueckelberg-Feynman interpretation

Spinor normalization and identities

These are the current permissions for this document; please modify if needed. You can always modify these permissions from the manage page.