Relativistic quantum mechanics

Learning goals for this week:
- General Lorentz invariance
- Why Schrödinger equation is not Lorentz invariant
- Relativistic extensions: Klein-Gordon equation and Dirac equation
- Dirac equation: spinor solutions (interpretation in the next lecture)

The basic postulates of special relativity are: (i) the laws of nature are identical (invariant) in all inertial frames of reference, and (ii) the speed of light is same for all such frames. A consequence of these postulates is that there is no strict distinction between time and space, but that space and time dimensions appear different for observers moving at different speeds.

The Schrödinger equation is clearly incompatible with the special relativity: it is of the first order in time derivative, but of the second order in spatial derivatives. Our task in this part of course is to find a relativistic generalization(s) of the Schrödinger equation.

To accomplish this, let us first remind ourselves of the mathematical formalism of special relativity. For a more detailed description of special relativity, see David Tong's lecture notes.

Elements of special relativity

In Minkowski space, one considers space and time coordinates together, as one four-vector $x^\mu$: \[ x \equiv\begin{pmatrix} c t \\ x \\ y \\ z \end{pmatrix} \equiv\begin{pmatrix} x^{0} \\ x^{1} \\ x^{2} \\ x^{3} \end{pmatrix} \equiv \begin{pmatrix} x^{0} \\ \mathbf{x}\end{pmatrix} %\equiv\mqty(x^{0}, \mathbf{x}^{\intercal})^{\intercal}, \] where $c$ is the speed of light and $\mu\in\{0,1,2,3\}$ is the Lorentz index. Strictly speaking, $x^\mu$ is the $\mu$th component of the vector $x$. Often, however, $x^\mu$ refers to the whole vector. From the context, it should be clear what one means.

We use greek letters to denote the Lorentz indices ($\mu,\nu,...\in\{0,1,2,3\}$). Roman letters are used to denote the spatial indices, ($i,j,...\in\{1,2,3\}$).

The location of the indices is important: $x^\mu$, with an upper index, is a contravariant vector. Correspondingly, $y_\mu$, with a lower index, is a covariant vector. In relativistic algebra, the difference between the two types of vectors is in how they transform in change of basis; covariant vectors change along the change of basis, whereas contravariant vectors change in an inverse way. Mathematically, contravariant vectors are elements of a vector space $V$ (i.e. vectors), and covariant vectors are elements of the dual space $V^*$ (i.e. dual vectors).

Metric tensor

We can lower the indices with the covariant metric tensor $g_{\mu\nu}$ ; $g = \mqty( 1 & \phantom{+}0 & \phantom{+}0 & \phantom{+}0\\ 0 & -1 & \phantom{+}0 & \phantom{+}0\\ 0 &\phantom{+} 0 & -1 & \phantom{+}0\\ 0 & \phantom{+}0 & \phantom{+}0 & -1 ) \equiv g_{\mu\nu} = g_{\nu\mu},$ so that with every contravariant four-vector $A^\mu$ , we associate a covariant four-vector $A_\mu = g_{\mu\nu} A^\nu = \sum_{\nu=0}^3 g_{\mu\nu} A^\nu.$ Above, we introduced the Einstein's summation convention, in which the repeated Lorentz indices are summed over. The notation is called Einstein notation.

With the above metric tensor, the components of the contravariant/covariant vectors are related by: A_0=A^0 and A_i = -A^i .

Similarly, we can lower an index in any higher order tensor : $B_{\phantom{\alpha\beta}\mu}^{\alpha\beta\phantom{\mu}\gamma} = g_{\mu\nu} B^{\alpha\beta\nu\gamma}.$

Sometimes we have the opposite situation: we have a covariant vector $A_\mu$ and we would like to find the corresponding contravariant vector. This can be done with the contravariant metric tensor $g^{\mu\nu}$: \[ A^\mu = g^{\mu\nu} A_\nu. \] We want this definition to be in harmony with the definition $A_\nu = g_{\nu\kappa} A^\kappa$. Thus \[ A^\mu= g^{\mu\nu} g_{\nu\kappa} A^\kappa = g^\mu_{\phantom{\mu}\kappa} A^\kappa. \] Since this is true for every vector, $g^\mu_{\phantom{\mu}\kappa}$ must be an identity in the sense that \[ g^\mu_{\phantom{\mu}\kappa} = \delta^\mu_\kappa,\quad\text{where}\quad \delta^\mu_\kappa = \begin{cases} 1,\; \mu=\kappa,\\ 0,\; \mu\neq\kappa. \end{cases} \] We conclude that the covariant and contravariant metric tensor are the inverses of each other (in the matrix sense) \[ g^{\mu\nu} = g^{-1}_{\mu\nu} = g_{\mu\nu}. \] where the second step follows because in this case $g_{\mu\nu}$ is its own inverse. This step does not generalize from special relativity to a curved space-time.

We also note that due to the symmetry of the metric tensor ($g_{\mu\nu} = g_{\nu\mu}$), it is not important which index we raise: \[ g_\mu^{\phantom{\mu}\kappa} = g_{\mu\nu}g^{\nu\kappa} = g_{\nu\mu}g^{\kappa \nu} = g^\kappa_{\phantom{\kappa}\mu} = \delta^\kappa_\mu. \] For a general tensor, $A_{\mu}^{\phantom{\nu}\nu} \neq A_{\phantom{\nu}\mu}^\nu$.

Inner product

As the name metric tensor suggests, we use $g$ to define an inner product, which gives a notion of frame independent distance between two points in the Minkowski space: \[ A\cdot B \equiv g_{\mu\nu} A^\mu B^\nu = A_\mu B^\mu = A^\mu B_\mu. \] Separating the temporal and spatial parts, the Minkowski inner product is \[ A\cdot B = A^0 B^0 - \mathbf{A} \cdot \mathbf{B}, \] where the dot product between the spatial vectors is the usual Euclidean inner product. Notice the minus sign!

Lorentz tranformations

A linear coordinate transformation is a real-valued 4x4 matrix $\Lambda$ such that \[ x^\mu \rightarrow x^{\prime\mu} = {\Lambda^\mu}_\nu x^\nu. \] Here $x^\mu$ are the coordinates in the original coordinate system and $x'^\mu$ are the coordinates after the transformation. The transformation can also be expressed as a derivative between the coordinates in transformed and original coordinate systems: \[ \Lambda^\mu{}_\nu = \frac{\partial x'^\mu}{\partial x^\nu}. \]

Lorentz transformations are defined as coordinate transformations which leave the inner product between four-vectors invariant, i.e. \[ x^\mu g_{\mu\nu} y^\nu = x'^\alpha g_{\alpha\beta} y'^\beta = x^\mu {\Lambda^\alpha}_\mu g_{\alpha\beta} {\Lambda^\beta}_\nu y^\nu, \] for all $x$ and $y$. These transformations form a group, known as the Lorentz group. Without reference to the vectors $x$ and $y$, the condition which defines the Lorentz group is \[ g_{\mu\nu} = {\Lambda^\alpha}_\mu g_{\alpha\beta} {\Lambda^\beta}_\nu. \]

By contracting both sides with $g^{\kappa\mu}$, we obtain \[ \delta_{\nu}^\kappa = {\Lambda_\beta}^\kappa {\Lambda^\beta}_\nu, \] so ${\Lambda_\nu}^\mu = {(\Lambda^{-1})^\mu}_\nu$ is the matrix inverse of ${\Lambda^\nu}_\mu$, by which we mean that we can write the above equation in terms of matrix multiplication as \[ g=\Lambda^\intercal g \Lambda. \] The matrices are not defined systematically with regard to upper/lower indices; the components of $\Lambda$ are $[\Lambda]_{\mu\nu}=\Lambda^\mu{}_\nu$, whereas the components of $g$ are $[g]_{\mu\nu}=g_{\mu\nu}=g^{\mu\nu}$. From this equation we see that the matrix $\Lambda$ has the inverse \[ \Lambda^{-1} = g^{-1}\Lambda^\intercal g=(g\Lambda g)^\intercal, \] which can be written in terms of the components as \[ (\Lambda^{-1})^\mu{}_\nu \equiv [\Lambda^{-1}]_{\mu\nu} = [(g \Lambda g)^\intercal]_{\mu\nu}= [g \Lambda g]_{\nu\mu} = [g]_{\nu\alpha} [\Lambda]_{\alpha\beta} [g]_{\beta\mu} = g_{\nu\alpha} \Lambda^\alpha{}_\beta g^{\beta\mu} = \Lambda_\nu{}^\mu. \]

Connected components of the Lorentz group

We now show that the full Lorentz group divides into four distinct parts. The determinant taken from both sides of the above equation is \[ \det g = (\det \Lambda)^2 \det g. \] So we find that $\det \Lambda =\pm 1$. As these are the only two allowed values, the determinant cannot be smoothly deformed from 1 to -1. These sets of transformations are thus in some sense disjoint from each other.

Consider then the $00$-component of the above equation: \[ 1 = g_{00} = {\Lambda^\alpha}_0 g_{\alpha\alpha} {\Lambda^\alpha}_0 = ({\Lambda^0}_0)^2 - \sum_{i=1}^3 ({\Lambda^i}_0)^2. \] As $\Lambda$ is a real matrix, the sum of squares is positive and we find that $({\Lambda^0}_0)^2 \geq 1$. This can be satisfied either with ${\Lambda^0}_0\geq 1$ or with ${\Lambda^0}_0\leq -1$. The latter transformation inverts time.

The group of all Lorentz transformations divides into four connected components:

$\det \Lambda = 1$ and ${\Lambda^0}_0 \geq 1$. These are the proper Lorentz transformations. The simplest example is the identity transformation $x^\mu \rightarrow x^\mu$.
$\det \Lambda = -1$ and ${\Lambda^0}_0 \leq -1$. Example: $(x^0,\mathbf x)\rightarrow (-x^0,\mathbf x)$, i.e. inversion of time.
$\det \Lambda = -1$ and ${\Lambda^0}_0 \geq 1$. Example: $(x^0,\mathbf x)\rightarrow (x^0,-\mathbf x)$, i.e. inversion of space.
$\det \Lambda = 1$ and ${\Lambda^0}_0 \leq -1$. Example: $(x^0,\mathbf x)\rightarrow (-x^0,-\mathbf x)$, i.e. inversion of both time and space.

In the following, unless stated otherwise, by Lorentz transformations we mean the proper transformations, which do not invert time (time reversal transformation) or space (parity transformations).

Proper rotations

Pure rotations form a subgroup of the Lorentz group. They are represented by the matrices $\Lambda = \smqty( 1 & \smqty{0 & 0 & 0}\\ \smqty{0\\0\\0} & R ),$ where $R\in SO(3)$ is a rotation matrix. For example, a rotation of $\theta$ around the z-axis is given by a Lorentz transformation $\Lambda^\mu{}_\nu = \mqty( 1 & 0 & 0 & 0\\ 0 & \cos\theta & -\sin\theta & 0\\ 0 & \sin\theta & \cos\theta & 0\\ 0 & 0 & 0 & 1 ).$

Let us check that the four-vector inner product is invariant under rotations. The action on a coordinate four-vector is $x' = \Lambda x = \mqty(x^0\\ R\mathbf{x}).$ The inner product between four-vectors is: $\begin{align*} x'\cdot y' &= x'^0 y'^0-\mathbf{x}'\cdot \mathbf{y}' = x^0 y^0-(R\mathbf{x})^\intercal (R\mathbf{y})\\ &= x^0 y^0-\mathbf{x}^\intercal R^\intercal R \mathbf y\\ &= x^0 y^0-\mathbf{x}^\intercal \mathbf y = x\cdot y, \end{align*}$ where we used the fact that for proper rotations $R^\intercal R = 1$ . Also ${\Lambda^0}_0=1$ and $\det \Lambda = \det R = 1$ , so this is indeed a proper Lorentz transformation.

Lorentz boosts

Apart from rotations, the other basic type of Lorentz transformation is a boost, a transformation to a frame moving at velocity $\mathbf v$ relative to the original frame. Let and be two inertial frames such that at t=t'=0 their origin and the coordinate axes coincide. Let us then assume that moves with a velocity $\mathbf{v}=v \hat{\mathbf{e}}_x$ relative to . On a course of special relativity we have learned that the coordinates in the two frames are related by $\begin{cases} ct' = \gamma ( ct-\frac v c x)\\ x' = \gamma(x-vt)\\ y' = y\\ z' = z, \end{cases}$ where $\gamma = 1/\sqrt{1-\frac{v^2}{ c^2}}$ . The transformation between the two frames is represented by the matrix $\Lambda^\mu{}_\nu = \mqty( \phantom{+}\gamma & - v \gamma/c & \phantom{+}0 & \phantom{+}0\phantom{+}\\ - v\gamma/c & \phantom{+}\gamma & \phantom{+}0 & \phantom{+}0\phantom{+}\\ \phantom{+}0 & \phantom{+}0 & \phantom{+}1 & \phantom{+}0\phantom{+}\\ \phantom{+}0 & \phantom{+}0 & \phantom{+}0 & \phantom{+}1\phantom{+})$ which can, like the rotation above, be verified to satisfy the conditions of a proper Lorentz transformation. Sometimes it is useful to parametrize the boost by so called rapidity $\xi=\tanh^{-1}(v/c)$ . The above transformation then looks like $\Lambda^\mu{}_\nu = \begin{pmatrix} \cosh\xi & -\sinh\xi & \phantom{+}0 & \phantom{+}0\phantom{+}\\-\sinh\xi & \cosh\xi & \phantom{+}0 & \phantom{+}0\phantom{+}\\ \phantom{+}0 & \phantom{+}0 & \phantom{+}1 &\phantom{+}0\phantom{+}\\ \phantom{+}0& \phantom{+}0&\phantom{+} 0& \phantom{+}1\phantom{+} \end{pmatrix}.$

The rest of the Lorentz transformations can be generated as a combination of boosts and rotations. The larger symmetry group which also includes space-time translations $x^\mu\rightarrow x^\mu + a^\mu$ is known as the Poincare group.

Transformations of contravariant and covariant vectors

Above, we defined that the contravariant vectors transform as \[ x'^\mu = {\Lambda^\mu}_\nu x^\nu. \]

By using the metric tensor to manipulate the location of indices, we can prove that covariant vectors transform inversely under Lorentz transformations: \[ \begin{aligned} x'_\mu &\equiv g_{\mu\nu} x'^\nu = g_{\mu\nu} {\Lambda^\nu}_\kappa x^\kappa = g_{\mu\nu} {\Lambda^\nu}_\kappa g^{\nu\kappa} x_\kappa \\ &= \Lambda_\mu{}^\nu x_\nu = (\Lambda^{-1})^\nu{}_\mu x_\nu. \end{aligned} \] The example below shows geometrically why the covariant vector transforms in an inverse way. It also shows mathematically what we mean by a transformation.

Example of contravariance and covariance in 2D

Here we try to make the difference between contravariant and covariant vectors more concrete by using 2D rotations as an example. A transformation between two coordinate systems in physics translates to a basis transformation in linear algebra. The transformation is passive, which means that physically nothing changes, but that we adopt a different way of describing the situation. The alternative to a passive transformation would be an active transformation, which would correspond to e.g. physically rotating some object in a fixed coordinate system. Active tranformations are useful in the description of rotating bodies. Mathematically, there is no difference between the kinds of coordinate transformations, the difference is only in the interpretation of the mathematics.

Also, the difference between contravariant and covariant vectors is not particular to Minkowski space, but exists in any vector space. For an illustration, it is then enough to consider vectors on a plane where the only proper transformations between the orthonormal bases are rotations. Let us assume that we have two bases, $B=(\hat e_1,\hat e_2)$ and $B'=(\hat e'_1,\hat e'_2)$, where $B'$ is obtained by rotating the basis vectors of $B$ by angle $\theta$ (see figure below): \[ \begin{aligned} \hat e_1' &= \hat e_1 \cos\theta - \hat e_2\sin\theta = (\hat e_1, \hat e_2)\begin{pmatrix} \cos\theta \\ -\sin\theta\end{pmatrix},\\ \hat e_2' &= \hat e_1 \sin\theta + \hat e_2\cos\theta = (\hat e_1, \hat e_2)\begin{pmatrix} \sin\theta \\ \cos\theta\end{pmatrix}, \end{aligned} \] where we chose to represent the basis vectors as row vectors. The above two equations can be combined into a single matrix equation \[ (\hat e_1', \hat e_2') = (\hat e_1, \hat e_2)\begin{pmatrix} \cos\theta & \sin\theta\\ -\sin\theta & \phantom{+}\cos\theta\end{pmatrix} = (\hat e_1, \hat e_2) R^{-1}, \] which we would write in the covariant notation as $\hat e'_\mu = (R^{-1})^\nu{}_\mu\hat e_\nu$. Here $R$ is a rotation matrix for a rotation of angle $\theta$ to positive direction. The transformation for the basis is a covariant transformation.

$\begin{tikzpicture}[scale=2] % Draw axes \draw [blue] (0,-1.5) -- (0,0); \draw [<->,thick, blue] (0,2.5) node (yaxis) [above] {$\hat{e}_2$} |- (2.5,0) node (xaxis) [right] {$\hat{e}_1$}; \node at (2.5*0.5,{2.5*sqrt(1-0.5^2)+0.15}) {$\hat{e}'_2$}; \node at ({2.5*sqrt(1-0.5^2)+0.15},-2.5*0.5) {$\hat{e}'_1$}; \draw [<->,thick, red] (2.5*0.5,{2.5*sqrt(1-0.5^2)}) node (yprimeaxis) {} -- (0,0) -- ({2.5*sqrt(1-0.5^2)},-2.5*0.5) node (xprimeaxis) {}; \draw [->](0.7,0) arc(0:asin(-0.48):0.7) node[midway,right] {$\theta$}; \draw[->] (0,0.7) arc (90:90+asin(-0.48):0.7) node[midway,above] {$\theta$}; \draw[dashed] (yaxis |- xprimeaxis) node[left] {$-\sin\theta$} -| (xaxis -| xprimeaxis) node[above] {$\cos\theta$}; \draw[dashed] (yaxis |- yprimeaxis) node[left] {$\cos\theta$} -| (xaxis -| yprimeaxis) node[below] {$\sin\theta$}; \end{tikzpicture}$

Let us now define a vector which has components $(x^1,x^2)^\intercal$ in the basis $B$ (see the figure below): \[ \vec x = x^1 \hat{e}_1 + x^2 \hat{e}_2 = (\hat e_1, \hat e_2)\begin{pmatrix} x^1 \\ x^2\end{pmatrix} = \hat e_\mu x^\mu \] The vector $\vec x$ itself is independent of the choice of basis, but its coordinates $\mathbf x = (x^1, x^2)^\intercal$ do depend on it. In the rotated basis $B'$, $\vec x$ is expressed as \[ \vec x = x'^1 \hat{e}'_1 + x'^2 \hat{e}'_2 = (\hat e'_1, \hat e'_2)\begin{pmatrix} x'^1 \\ x'^2\end{pmatrix} = \hat e'_\mu x'^\mu. \] Equating the expressions for $\vec x$ in bases $B$ and $B'$, and using the basis transformation law $(\hat e_1', \hat e_2')= (\hat e_1, \hat e_2) R^{-1}$, we find that the coordinates transform as: \[ \begin{pmatrix} x'^1 \\ x'^2\end{pmatrix} = R \begin{pmatrix} x^1 \\ x^2\end{pmatrix} = \begin{pmatrix} \cos\theta & -\sin\theta\\ \sin\theta & \cos\theta\end{pmatrix}\begin{pmatrix} x^1 \\ x^2\end{pmatrix}. \] We call this a contravariant transformation, since it transforms against (contra-varies) the basis transformation.

$\begin{tikzpicture}[scale=2] % Draw axes \draw [blue] (0,-1.5) -- (0,0); \draw [<->,thick, blue] (0,2.5) node (yaxis) [above] {$\hat{e}_2$} |- (2.5,0) node (xaxis) [right] {$\hat{e}_1$}; \draw [<->,thick, red] (2.5*0.5,{2.5*sqrt(1-0.5^2)}) node (yprimeaxis) [above] {$\hat{e}'_2$} -- (0,0) -- ({2.5*sqrt(1-0.5^2)},-2.5*0.5) node (xprimeaxis) [right] {$\hat{e}'_1$}; \draw [->](0.7,0) arc(0:asin(-0.48):0.7) node[midway,right] {$\theta$}; \draw[->] (0,0.7) arc (90:90+asin(-0.48):0.7) node[midway,above] {$\theta$}; \coordinate (c) at (2,1); \coordinate (yprimeaxis) at ({(1+sqrt(1-0.5^2))*0.5},{(1+sqrt(1-0.5^2))*sqrt(1-0.5^2)}); \coordinate (xprimeaxis) at ({(-0.5+2*sqrt(1-0.5^2))*sqrt(1-0.5^2)},{-(-0.5+2*sqrt(1-0.5^2))*0.5}); \draw[dashed, blue] (yaxis |- c) node[left] {$x^2$} -| (xaxis -| c) node[below] {$x^1$}; \draw[dashed,red] (yprimeaxis) node[left] {$x'^2$} -- (c) -- (xprimeaxis) node[below] {$x'^1$}; \draw [->, very thick] (0,0) -- (c) node[above right] {$\vec{x}$}; \end{tikzpicture}$

When we say that something transforms as a co/contravariant vector, we are abusing the terminology a bit. To be precise, we should say that it transforms like the coordinates of a co/contravariant vector.

Transformations of fields

Fields add another layer of complexity to transformations. Instead of a single scalar, vector or a tensor, we have a function over spacetime, so we have to transform both the field components and its argument (coordinates).

For concreteness, let us consider a vector field $\vec x \mapsto \vec A(\vec x)$, which associates with each point \[\vec x=\hat e_\mu x^\mu=\hat e'_\mu x'^\mu,\] some vector \[\vec A(\vec x) = \hat e_\mu A^\mu(\vec x) = \hat e'_\mu A'^\mu(\vec x).\] Here $\vec x$ and $\vec A$ are coordinate-independent vectors and $x^\mu$ and $A^\mu$ are their coordinates in the basis $B= (\hat e_\mu)$. $x'^\mu$ and $A'^\mu$ are the coordinates of the same vectors in some other basis $B'=(\hat e'_\mu)$. For illustration, see the collapsible box above.

In the coordinate basis $B$ we write the above field as \[A^\mu(x)\equiv A^\mu(\hat e_\nu x^\nu) = A^\mu(\vec x),\] where $x$ is the coordinate vector $x=x^\mu$. We omit the Lorentz index in the coordinate, since it is not useful to contract the argument e.g. with $\mu$ denoting the vector component of the field. In the coordinate basis $B'$ we define similarly \[A'^\mu(x') \equiv A'^\mu(\hat e'_\nu x'^\nu) = A'^\mu(\vec x).\]

Now $x' = \Lambda x$, so the transformation rule for both the field components and its argument becomes \[ \begin{aligned} A'^\mu(x') &= A'^\mu(\vec x) = \Lambda^\mu{}_\nu A^\nu(\vec x)\\ &= \Lambda^\mu{}_\nu A^\nu(x) = \Lambda^\mu{}_\nu A^\nu(\Lambda^{-1} x'). \end{aligned} \] On the first line we transform the vector field components ($A^\mu\rightarrow A'^\mu$) and on the second line the coordinates $x$ (in $B$) and $x'$ (in $B'$) are related to each other.

We can write the above transformation law in a concise form \[ A^\mu(x) \rightarrow A'^\mu(x') = \Lambda^\mu{}_\nu A^\nu(x). \] The above was for a contravariant vector field. For a scalar field we only transform the coordinate: \[ \phi(x)\rightarrow \phi'(x') = \phi(x), \] so that $\phi'(x) = \phi(\Lambda^{-1} x)$. In addition to scalars, vectors and tensors, a quantum theory admits also another kind of geometric entity: a spinor, which has its own transformation law. Spinors are discussed below.

Partial derivatives

Let us define a partial derivative with respect to the components of a contravariant vector $x^\mu$: \[ \partial_\mu = \frac{\partial}{\partial x^\mu} %= \left( \frac 1 c \frac {\partial}{\partial t},\frac {\partial}{\partial x},\frac {\partial}{\partial y},\frac {\partial}{\partial z}\right) = \left( \frac 1 c \frac {\partial}{\partial t},\nabla\right), \] which acts on $x^\mu$ as \[ \partial_\nu x^\mu = \frac{\partial x^\mu}{\partial x^\nu} = \delta_\nu^\mu. \] How does $\partial_\nu$ transform under Lorentz transformations? From the requirement $\partial'_\nu x'^\mu = \delta^\mu_\nu$ for the transformed derivative, it can be verified that it transforms as a covariant vector, just as the notation suggests: \[ \partial'_\mu = \frac{\partial}{\partial x'^\mu} = (\Lambda^{-1})^\nu{}_\mu \partial_\nu. \]

Similarly, we define a partial derivative with respect to a covariant vector: \[ \partial^\mu = \frac{\partial}{\partial x_\mu} = \left( \frac 1 c \frac {\partial}{\partial t},-\nabla\right)^\intercal, \] where the extra minus sign is required to cancel the minus signs in the defition of $x_\mu$ in order to have \[ \partial^\nu x_\mu = \frac{\partial x_\mu}{\partial x_\nu} = \delta^\nu_\mu. \] This derivative transforms as a contravariant vector: \[ \partial'^\mu = \frac{\partial}{\partial x'_\mu} = \Lambda^\mu{}_\nu \partial^\nu. \]

Note: Be careful with the signs! Here, no minus signs appear: \[ \begin{aligned} \partial_\mu A^\mu &= \partial_0 A^0 + \partial_1 A^1 + \partial_2 A^2 + \partial_3 A^3\\ &= \frac 1 c \frac{\partial A^0}{\partial t} + \nabla\cdot \mathbf A = \partial^\mu A_\mu \end{aligned} \] Unlike here, where we do get a minus sign: \[ \begin{aligned} \Box = \partial_\mu \partial^\mu &= \partial_0 \partial^0 + \partial_1 \partial^1 + \partial_2 \partial^2 + \partial_3 \partial^3\\ &= \partial_0 \partial_0 - \partial_1 \partial_1 - \partial_2 \partial_2 - \partial_3 \partial_3\\ &= \frac{1}{c^2}\frac{\partial}{\partial^2t} - \nabla^2. \end{aligned} \]

Four-momentum

Let us consider the relativistic generalization of momentum for a particle moving at velocity $\mathbf v$ in the frame . From the Noether's theorem we learn that energy is the conserved quantity associated with the time translations, whereas momentum is the conserved quantity associated with the spatial translations. Thus we can assume that the relativistic generalization of momentum, four-momentum, has the form $p^\mu = \mqty(E/c\\\mathbf{p}),$ but we do not know immediately the expressions for and $\mathbf{p}$ .

However, we do know that the particle is at rest in the frame moving at velocity $\mathbf{v}$ . In that frame the particle has no 3-momentum and thus we know that the 4-momentum is of the form $p'^\mu = \mqty(mc\\0).$ The fact that we put a mass into the $\mu=0$ component follows from dimensional analysis. Here we also require that m>0 , since a massless particle does not have a rest frame. To find the four-momentum in the frame , we make Lorentz boost from to . (Do this as an exercise!) The result is $p^\mu = \mqty(E/c\\\mathbf{p}) = m\gamma \mqty(c\\\mathbf{v}),$

The square of 4-momentum is a Lorentz invariant quantity $p_\mu p^\mu = m^2 c^2 = \left(\frac E c\right)^2-|\mathbf p|^2.$ In the rest frame, the energy is positive, so we find the relativistic dispersion of a free particle with mass : $E=c\sqrt{m^2 c^2 +|\mathbf p|^2}.$ A proper generalization of the Schrödinger equation should reproduce this dispersion.

Klein-Gordon equation

Note: From now on, we work in natural units $\hbar=c=1$.

Recall how for a nonrelativistic system the Schrödinger equation \[ i \frac{\partial \psi(\mathbf{x},t)}{\partial t} = \hat H \psi(\mathbf{x},t), \] could be obtained from a classical Hamiltonian $H(\mathbf{p},\mathbf{x})=\frac{|\mathbf p|^2}{2m} +V(\mathbf{x})$ by replacing $\mathbf{p}$ and $\mathbf{x}$ by operators which obey the canonical commutation relations. The form of the Schrödinger equation in position representation suggests that we effectively did the replacements $E\rightarrow i\partial_t$ and $\mathbf{p}\rightarrow -i\nabla$.

If we try to do the same replacement for the relativistic dispersion $E=\sqrt{m^2 + |\mathbf p|^2}$, we fail. Even if we manage to interpret the square root operator $\sqrt{m^2 - \nabla^2}$ meaningfully (e.g. by Fourier transform or series expansion), we end up with a complicated nonlocal operator.

If, instead, we start from the square of the above expression, $E^2-|\mathbf p|^2=m^2$, we avoid such problems. We make the replacements $E^2\rightarrow -\partial_t^2$ and $|\mathbf p|^2\rightarrow -\nabla^2$ and apply the resulting operator on a scalar field $\psi(\mathbf{x},t)$ to obtain the Klein-Gordon equation \[ \left( \frac{\partial^2}{\partial t^2}- \nabla^2 + m^2\right)\psi(\mathbf{x},t) = 0, \] which can be written in Einstein notation as \[ \left( \partial_\mu\partial^\mu + m^2\right)\psi(\mathbf{x},t) = 0. \] Klein-Gordon equation is clearly Lorentz invariant. So far everything is good. As we are attempting to generalize Schrödinger equation, we hope to be able interpret $\psi$ as a wavefunction of a particle.

Like the Schrödinger equation, the Klein-Gordon equation has plane-wave solutions: \[ \psi_{\mathbf p}(\mathbf{x},t) = \mathcal N e^{-i p\cdot x} = \mathcal N e^{-i (E t-\mathbf{p}\cdot\mathbf{x}) }. \] Let us fix the momentum $\mathbf p$ to some value and solve for the energy of this solution. Substituting the above into the Klein-Gordon equation, we find \[ \left( \frac{\partial^2}{\partial t^2}- \nabla^2 + m^2\right)\psi_{\mathbf p}(\mathbf{x},t) = \left( -E^2 + |\mathbf p|^2 + m^2\right)\psi_{\mathbf{p}}(\mathbf{x},t) = 0, \] which has two solutions: $E = \pm\sqrt{m^2+|\mathbf p|^2}$. The positive energy solution is what we expected, but what are we to make of the negative energy solution?

It turns out the negative energy solutions make the theory unstable. The energy of a particle is not bounded from below, which means that there is no ground state. If we were to add a perturbation (e.g. potentials), they would couple the negative and positive energy states, and the particle could transition to lower and lower energy states.

The negative energies also force us to abandon the hope of interpreting $\psi$ as a wavefunction. As with the non-relativistic treatment, one obtains the continuity equation for the probability 4-current from the Klein-Gordon equation and its complex conjugate: \[ \partial_\mu j^\mu =0,\quad\text{with}\quad j^\mu=(\rho,\mathbf j)=\frac{i}{2m}\left( \psi^*\partial^\mu\psi -(\partial^\mu\psi^*)\psi \right). \] The "probability density" $\rho$ for a plane-wave solution is \[ \rho = |\mathcal N|^2 \frac{E}{m}, \] which is negative if $E<0$. The conclusion is that we cannot interpret $\rho$ as a probability density and thus $\psi$ is not a wavefunction.

It turns out that the classical Klein-Gordon equation can be used to derive a relativistic quantum field theory for spin-0 particles (e.g. pion or Higgs boson), but the quantization requires either the use of path integrals or the full machinery of canonical quantization (not that this would be very hard; we already did it for the electromagnetic field in the QED part of the course). Luckily, there is another equation which we can quantize with the above simple procedure, the Dirac equation, which turns out to describe electrons and other spin-½ particles.

Dirac equation

Dirac ansatz

In 1928, Paul Dirac developed a new field equation for spin-½ particles. For the (very readable) original article, see here. Dirac's motivation was to derive an equation which would not have the problems of the Klein-Gordon equation. The negative energy states in Klein-Gordon equation seem to stem from the fact that it is of the second order in time derivative, thus his idea was to search for a first order equation both in $\frac{\partial}{\partial t}$ and $\vec\nabla$, by first assuming a general ansatz \[ i\frac{\partial\psi(\mathbf{x},t)}{\partial t} = H_D \psi(\mathbf{x},t) %= (-i c\bm\alpha\cdot \nabla+\beta m c^2)\psi(\mathbf{x},t), = (-i \bm\alpha\cdot \nabla+\beta m )\psi(\mathbf{x},t), \] where the four initially unknown quantities $\alpha^i$ and $\beta$ commute with $\partial_i$ (i.e. they do not depend on $\mathbf x$) but not with each other. As groups can typically be represented by matrices, we can assume that $\vec\alpha$ and $\beta$ are N×N matrices, where the dimension N is as of yet unknown. In order to have a compatible structure with the matrices, $\psi=\psi(\mathbf{x},t)$ must then be a (complex) N×1 column vector.

Dirac then required that the field $\psi$ should also satisfy the Klein-Gordon equation, to guarantee that the relativistic dispersion relation $E^2 = |\mathbf p|^2 + m^2$ holds for the solutions of the equation:

\[ -\frac{\partial^2 \psi}{\partial t^2} = H_D^2 \psi = (-i \bm\alpha\cdot \mathbf \nabla+\beta m )(-i\bm\alpha\cdot \mathbf \nabla+\beta m)\psi\\ \] Moving towards the covariant formulation, we write the above as \[ \left[ \partial_t^2 - \sum_{i,j=1}^3\alpha^i\alpha^j \partial_i\partial_j - \sum_{i=1}^3i(\alpha^i\beta +\beta\alpha^i) m \partial_i + \beta^2 m^2 \right]\psi = 0 \] The Klein-Gordon equation, on the hand, can be written as \[ \left( \partial_t^2 - \sum_{i=1}^3\partial_i^2 + m^2 \right)\psi = 0, \] Comparing the two equations above term by term, we find that Dirac's ansatz satisfies the Klein-Gordon equation if and only if \[ \alpha^i\alpha^j + \alpha^j\alpha^i = 2\delta^{ij},\\ \alpha^i\beta+\beta\alpha^i=0,\\ \beta^2 = 1, \] Furthermore, we require that the Hamiltonian $H_D$ is Hermitean: $H_D = H_D^\dagger$. In the exercises it will be shown that $\alpha^i$ and $\beta$ are hermitean, traceless, even-dimensional, mutually anticommuting matrices. The lowest dimensional matrices that can represent them are 4×4 matrices.

The above requirements do not fix $\alpha^i$ and $\beta$ uniquely, so we have some freedom in choosing their representation. There are a few common choices. We take the Dirac-Pauli representation (or basis), with which the non-relativistic limit is particularly simple, \[ \alpha^i = \begin{pmatrix} 0 & \sigma^i \\ \sigma^i & 0 \end{pmatrix},\quad \beta = \begin{pmatrix} I & 0 \\ 0 & -I \end{pmatrix}, \] where $\sigma^i$'s are the Pauli matrices and $I$ is the 2×2 identity matrix. The other commonly used choices for $\alpha^i$ and $\beta$ matrices go by the names Weyl representation and Majorana representation.

Probability current

Because the Hamiltonian $H_D$ is hermitean, the probability amplitude $\rho(\mathbf{x},t) = \psi(\mathbf{x},t)^\dagger\psi(\mathbf{x},t)$ is conserved: \[ \begin{aligned} \partial_t\rho = \partial_t(\psi^\dagger\psi) &= (\partial_t \psi)^\dagger\psi + \psi^\dagger (\partial_t \psi)\\ &= -(iH_D \psi)^\dagger\psi + \psi^\dagger (i H_D \psi)\\ &= \left[(\nabla \psi^\dagger\cdot\bm\alpha)\psi + \psi^\dagger(\bm\alpha\cdot\nabla \psi)\right]\\ &= \nabla\cdot (\psi^\dagger \bm\alpha\,\psi) \equiv \nabla\cdot\mathbf j, \end{aligned} \] Also, by definition, $\rho(\mathbf{x},t)\geq0$. In this sense we have improved from the Klein-Gordon equation.

Covariant equation of motion

To formulate the Dirac equation in a covariant form, we multiply both sides of the equation by $\beta$ and define a new set of matrices \[ \gamma^0 = \beta = \begin{pmatrix} I & 0 \\ 0 & -I\end{pmatrix},\quad \gamma^i = \beta \alpha^i = \begin{pmatrix} 0 & \sigma^i \\ -\sigma^i & 0\end{pmatrix}, \] which have the anticommutation relations \[ \{\gamma^\mu, \gamma^\nu\} = 2g^{\mu\nu} 1_4, \] defining a Clifford algebra. Because of this algebraic structure, the $\gamma$-matrices have a lot of useful properties. However, we do not discuss these in detail, but we just give one identity that is used in derivations below: \[ (\gamma^\mu)^\dagger=\gamma^0\gamma^\mu\gamma^0. \] This identity is independent of the choice of the representation.

In terms of $\gamma$-matrices, the Dirac equation becomes \[ \left( i \gamma^\mu \partial_\mu- m \right) \psi(x) = 0. \]

In particle physics, the Feynman slash notation $\gamma^\mu a_\mu = a\!\!\!/$ is often used, so that we obtain an even more condensed form \[ \left( i {\partial}\mkern-10mu/- m \right) \psi = 0. \]

Note that even if the $\gamma^\mu$-matrices carry the Lorentz index $\mu$ of the partial derivative, they do not transform as contravariant vectors. We have defined them as constant matrices so they do not change at all in Lorentz transformations. Next, we show how the wavefunction $\psi(x)$ should transform so that the Dirac equation would be Lorentz invariant.

Note: The $H_D$ defined above is like any other one-particle Hamiltonian: We can replace the non-relativistic Hamiltonian with it and do calculations as before. The only caveat is that its eigenenergy spectrum is not bounded from below, as discussed below. This problem can be addressed only in a many-body formulation.

Properties of $\gamma$-matrices

Any product of $\gamma$-matrices can be expressed as a linear combination of the following sixteen matrices: \[ \begin{aligned} I &&&\text{1 matrix}\\ \gamma^\mu &&& \text{4 matrices}\\ \sigma^{\mu\nu} &&& \text{6 matrices}\\ \gamma^5 \gamma^\mu &&& \text{4 matrices}\\ \gamma^5 &&& \text{1 matrix} \end{aligned} \] where $\sigma^{\mu\nu} \equiv [\gamma^\mu,\gamma^\nu]$ and $\gamma^5\equiv i\gamma_0\gamma_1\gamma_2\gamma_3$. Within some representation, any complex 4×4 matrix $M$ can be written as a linear combination of the above matrices: $M=\sum_a m_a \Gamma^a$, where $\Gamma^a$ are (representations of) the 16 matrices above. The usefulness of this matrix basis is that the bilinears $\bar\psi \Gamma^a \psi$ ($\bar\psi$ is defined below) have simple properties under Lorentz transformations: \[ \begin{aligned} \bar\psi'\psi' &= \bar\psi\psi &&\text{scalar}\\ \bar\psi'\gamma^\mu\psi' &= \Lambda^\mu{}_\nu\bar\psi\gamma^\nu\psi && \text{contravariant vector}\\ \bar\psi'\sigma^{\mu\nu}\psi' &= \Lambda^\mu{}_\alpha\Lambda^\nu{}_\beta\bar\psi\sigma^{\alpha\beta}\psi && \text{antisymmetric tensor}\\ \bar\psi'\gamma^5 \gamma^\mu\psi' &= \det(\Lambda) \Lambda^\mu{}_\nu \bar\psi\gamma^5 \gamma^\nu\psi && \text{contravariant pseudovector}\\ \bar\psi'\gamma^5\psi' &= \det(\Lambda)\bar\psi\gamma^5\psi && \text{pseudoscalar} \end{aligned} \] The quantities with the prefix "pseudo" change sign on parity transformation.

Different representations of the $\gamma$-matrices

There are three commonly used representations for $\gamma$ matrices. In each representation, some particular property is diagonal or otherwise simple.

In Dirac(-Pauli) basis the upper and lower spinor components correspond to particle and antiparticle solutions, respectively, in the non-relativistic limit. \[ \begin{aligned} \beta=\gamma^0&=\begin{pmatrix} I & 0 \\ 0 & -I\end{pmatrix},\quad &\alpha^i=\begin{pmatrix} 0 & \sigma^i \\ \sigma^i & 0\end{pmatrix},\\ \gamma^i&=\begin{pmatrix} 0 & \sigma^i \\ -\sigma^i & 0\end{pmatrix},\quad &\gamma^5=\begin{pmatrix} 0 & I \\ I & 0\end{pmatrix}. \end{aligned} \]
In Weyl (chiral) basis, the chiral operator $\gamma^5$ is diagonal: \[ \begin{aligned} \beta=\gamma^0&= \begin{pmatrix} 0& I\\ I&0 \end{pmatrix}, &\alpha^i=\begin{pmatrix} \sigma^i & 0 \\ 0 & -\sigma^i\end{pmatrix},\\ \gamma^i&= \begin{pmatrix} 0 & \sigma^i\\ -\sigma^i & 0 \end{pmatrix},\quad &\gamma^5 = \begin{pmatrix} -I & 0\\ 0 & I \end{pmatrix}. \end{aligned} \]
Majorana basis is chosen so that all the $\gamma$ matrices are purely imaginary: \[ \begin{aligned} \gamma^0 &= \begin{pmatrix} 0 & \sigma^2 \\ \sigma^2 & 0 \end{pmatrix},& \gamma^1 &= \begin{pmatrix} i\sigma^3 & 0 \\ 0 & i\sigma^3 \end{pmatrix},& \gamma^2 &= \begin{pmatrix} 0 & -\sigma^2 \\ \sigma^2 & 0 \end{pmatrix},\\ \gamma^3 &= \begin{pmatrix} -i\sigma^1 & 0 \\ 0 & -i\sigma^1 \end{pmatrix},& \gamma^5 &= \begin{pmatrix} \sigma^2 & 0 \\ 0 & -\sigma^2 \end{pmatrix},& C &= \begin{pmatrix} 0 & -i \sigma^2 \\ -i\sigma^2 & 0 \end{pmatrix}, \end{aligned} \] where $C$ is the charge conjugation operator, which is typically of interest for Majorana fermions. In this basis, it acts in a simple way: $C^\dagger \gamma^\mu C = -(\gamma^\mu)^\intercal$.

Choosing the above $4\times 4$ matrix representation for the $\gamma$-matrices means that the eigenvectors $\psi(x)$ satisfying the Dirac equation are spinors of the form \[ \psi(x) = \begin{pmatrix} \psi_0(x) \\ \psi_1(x) \\ \psi_2(x) \\ \psi_3(x) \end{pmatrix}. \] Below we discuss the form of the eigenfunctions of the Dirac equation. But let us first check how such spinors transform under Lorentz transformations. These 4-component spinors can be built from two 2-component spinors (see the section about Weyl fermions below), and are sometimes called bi-spinors.

Lorentz transformation of spinors

We now derive the transformation law for $\psi(x)$, by assuming that the form of the Dirac equation is invariant under Lorentz transformations. We can assume that $\psi$ transforms linearly under Lorentz transformations: \[ \psi(x) \rightarrow \psi'(x') = S(\Lambda)\psi(x), \] where $S(\Lambda)$ is a 4×4 matrix representation of the Lorentz transformation $\Lambda$. We know how the partial derivative transforms: \[ \partial_\mu \rightarrow \partial'_\mu = (\Lambda^{-1})^\nu{}_\mu \partial_\nu \] The Dirac equation \[ (i\gamma^\mu\partial_\mu-m)\psi(x)=0 \] transforms into \[ \begin{aligned} 0 = (i\gamma^\mu\partial'_\mu-m)\psi'(x') &= (i\gamma^\mu(\Lambda^{-1})^\nu{}_\mu \partial_\nu-m)S(\Lambda)\psi(x)\\ &= S(\Lambda)\left[iS(\Lambda^{-1})(\Lambda^{-1})^\mu{}_\nu\gamma^\nu S(\Lambda) \partial_\mu -m\right]\psi(x), \end{aligned} \] which has, apart from the factor $S(\Lambda)$ on the left, the same form as the original equation if $S(\Lambda)$ obeys the equation $S(\Lambda^{-1})(\Lambda^{-1})^\mu{}_\nu \gamma^\nu S(\Lambda) = \gamma^\mu$, which is equivalent to \[ S(\Lambda^{-1})\gamma^\mu S(\Lambda) = \Lambda^\mu{}_\nu \gamma^\nu. \] This equation gives the connection between a Lorentz transformation $\Lambda$ and its spinor representation $S(\Lambda)$. If $\Lambda$ is known, it is possible to compute $S(\Lambda)$.

You might recall from non-relativistic QM that the Pauli matrices obey a similar equation: \[ U^\dagger\sigma^i U = \sum_{j=1}^3 R_{ij} \sigma^j, \] where $R$ is a rotation matrix for rotation of angle $\theta$ around the axis $\mathbf n$ and $U=\exp(-i\theta\mathbf n\cdot\bm\sigma/2)$ is the corresponding unitary transformation for spinors.

Let us define the commutator of two $\gamma$-matrices: \[ \sigma^{\mu\nu} = [\gamma^\mu,\gamma^\nu]. \] For example, a Lorentz boost of rapidity $\xi$ to x direction, \[ (\Lambda_1)^\mu{}_\nu = \begin{pmatrix} \cosh\xi & -\sinh\xi & 0 & 0\\-\sinh\xi & \cosh\xi & 0 & 0\\ 0 & 0 & 1 &0\\ 0& 0& 0& 1 \end{pmatrix}, \] is represented by a spinor transformation (left here as an exercise) \[ S(\Lambda_1) = \exp(-\xi \sigma^{01}/4) = \cosh\frac \xi 2-\gamma^0\gamma^1 \sinh\frac \xi 2, \] and a rotation of angle $\theta$ around z-axis (or equivalently: on xy-plane, which is what the indices of $\sigma^{\mu\nu}$ refer to) is represented by \[ S(\Lambda_2) = \exp(\theta\sigma^{12}/4)= \begin{pmatrix} \exp(-i\theta\sigma_3/2) & 0\\ 0 & \exp(-i\theta\sigma_3/2) \end{pmatrix}, \] which contains two copies of the same non-relativistic rotation. The above examples suggest that a spinor rotation is unitary, $S(\Lambda^{-1}_2) = S(\Lambda_2)^\dagger$, but a spinor boost is not: $S(\Lambda^{-1}_1)\neq S(\Lambda_1)^\dagger$.

To find the inverse transformation $S(\Lambda)^{-1}=S(\Lambda^{-1})$ we take a hermitean conjugate of the equation relating $\Lambda$ and $S(\Lambda)$: \[ S(\Lambda)^\dagger(\gamma^\nu)^\dagger S(\Lambda^{-1})^\dagger = \Lambda^\mu{}_\nu (\gamma^\mu)^\dagger. \] Then we use the identity $(\gamma^\nu)^\dagger=\gamma_0\gamma^\nu\gamma_0$ and multiply from both sides with $\gamma_0$: \[ \gamma_0S(\Lambda)^\dagger\gamma_0\gamma^\nu \gamma_0 S(\Lambda^{-1})^\dagger\gamma_0 = \Lambda^\mu{}_\nu \gamma^\mu. \] Comparison with the original equation suggests that \[ S(\Lambda^{-1}) = \gamma^0 S(\Lambda)^\dagger\gamma^0. \] That this is indeed the inverse of $S(\Lambda)$ can be verified from the general solution $S(\Lambda)=\exp(\frac 1 2 \omega_{\mu\nu}\sigma^{\mu\nu})$, where $\omega_{\mu\nu}=-\omega_{\nu\mu}$ is a real antisymmetric matrix which parametrizes the Lorentz transformations.

Dirac adjoint and relativistic scalars

The 4×1 column vectors $\psi$ act as the kets of the Dirac equation. Given a ket $\psi$, what is the corresponding bra, i.e. the 1×4 row vector, which would allow us to calculate expectation values by sandwiching operators between bras and kets? A natural first guess would the hermitean conjugate $\psi^\dagger$. However, this does not work since, e.g. the bilinear $\psi^\dagger\psi$ is not a Lorentz invariant scalar: \[ \psi^\dagger\psi \rightarrow \psi'^\dagger\psi'=\psi^\dagger S(\Lambda)^\dagger S(\Lambda) \psi \neq \psi^\dagger\psi, \] because, as we saw above, $S(\Lambda)$ is not unitary if $\Lambda$ is a Lorentz boost.

The solution is to define a Dirac adjoint \[ \bar\psi = \psi^\dagger \gamma^0, \] for which the bilinear $\bar\psi \psi$ is Lorentz invariant.

The necessicity of defining the adjoint $\bar\psi\neq\psi^\dagger$ can be traced back to the Clifford algebra and the $(+,-,-,-)$ signature of the metric tensor, which forces the eigenvalues of some of the $\gamma$-matrices to be imaginary. Because of this, all the matrices $S(\Lambda)$ cannot be unitary.

Sandwiched between the spinors, the $\gamma$-matrices effectively transform according to their Lorentz indices. For example, the quantity $\bar\psi \gamma^\mu \psi$ transforms as a Lorentz vector, $\bar\psi \gamma^\mu\gamma^\nu \psi$ transforms as a rank-2 contravariant tensor and so on.

Dirac Lagrangian

One of the most important scalars which we can construct from the Dirac bilinears is the Lagrangian density \[ \mathcal L = \bar\psi(i\gamma^\mu\partial_\mu-m)\psi. \] The Euler-Lagrange equations of motion obtained by variation of this Lagrangian should correspond to the Dirac equation. As $\psi$ is complex, the number of degrees of freedom are twice the number of components of $\psi$; thus we can treat $\bar\psi$ as independent of $\psi$ and vary by $\psi$ and $\bar\psi$ independently. Variation by $\psi$ \[ \partial_\mu\left(\frac{\delta \mathcal L}{\delta\partial_\mu\psi}\right) - \frac{\delta \mathcal L}{\delta\psi}=0, \] gives \[ i\partial_\mu\bar\psi\gamma^\mu+m\bar\psi=0, \] which is the adjoint equation. The Dirac equation can be recovered by conjugation and multiplication by $\gamma^0$. Variation by $\bar\psi$, \[ \partial_\mu\left(\frac{\delta \mathcal L}{\delta\partial_\mu\bar\psi}\right) - \frac{\delta \mathcal L}{\delta\bar\psi}=0, \] gives the Dirac equation more directly.

From the Lagrangian density one can construct the action \[ S=\int{\rm d}^4x\; \mathcal L, \] and start building the path-integral formulation (not on this course though). The great benefit of the path-integral formulation is that we do not have choose a distinguished time-axis as in the Hamiltonian formulation, and the explicit relativistic covariance can be preserved.

Plane wave solutions

Let us find the plane wave solutions for the Dirac equation with an ansatz \[ \psi(x) = u(\mathbf p) e^{-i p\cdot x} = u(\mathbf p) e^{-i(E t-\mathbf p\cdot \mathbf x)}, \] where $u(\mathbf p)$ is a 4-component spinor. Let us see if we still have the negative energy solutions. By the above substitution the Dirac equation becomes \[ E u(\mathbf p) = (\bm\alpha\cdot\mathbf p+\beta m) u(\mathbf p). \] The above can we written in block matrix form as \[ \begin{pmatrix} m-E & \mathbf{p}\cdot\bm{\sigma}\\ \mathbf{p}\cdot\bm{\sigma} & -m-E \end{pmatrix} \begin{pmatrix} u_A(\mathbf p)\\ u_B(\mathbf p) \end{pmatrix} =0, \] where $u_{A}$ and $u_{B}$ are 2-component spinors. The eigenvalues satisfy the condition $\det(H_D-E)=0$.

The determinant condition is explicitly \[ \begin{aligned} 0 &= \begin{vmatrix} m-E & 0 & p_z & p_x-ip_y\\ 0 & m-E & p_x+ip_y & -p_z\\ p_z & p_x-ip_y & -m-E & 0\\ p_x+ip_y & -p_z & 0 & -m-E \end{vmatrix}\\ &= (E^2 - p_x^2-p_y^2-p_z^2- m^2)^2 = (E^2 - |\mathbf p|^2- m^2)^2, \end{aligned} \] which has the solutions $E = \pm\sqrt{|\mathbf p|^2 + m^2}$. Thus we did not cure the theory of the negative energy solutions. For convenience, we define a positive energy dispersion $E_{\mathbf p} = \sqrt{|\mathbf p|^2 + m^2}$.

$\begin{tikzpicture} \begin{axis}[ axis x line=middle, axis y line=middle, xlabel = $p/m$, ylabel = {$E/m$}, ymin=-6.3, ymax=6.3, ] %Here the blue parabloa is defined \addplot [ domain=-5:5, samples=100, color=blue, style=dashed, ] {x}; \addplot [ domain=-5:5, samples=100, color=blue, style=dashed ] {-x}; \addplot [ domain=-5:5, samples=100, color=blue, style=thick ] {sqrt(x^2+1)}; %Here the red parabloa is defined \addplot [ domain=-5:5, samples=100, color=blue, style=thick ] {-sqrt(x^2+1)}; \end{axis} \end{tikzpicture}$

Fig: Spectrum of the Dirac equation with a mass (solid line). The dashed line is the spectrum when m=0 .

It might be interesting to compare this to the spectrum in the superconducting state (4th lecture), relating $m$ and $\Delta$. Or to the avoided crossing in the 5th lecture.

— 17 Apr 20 (edited 21 Apr 20)

To gain understanding of the form of the solutions, set $\mathbf p=0$. We find the solutions \[ u_1(0) = N(0)\begin{pmatrix} 1\\0\\0\\0 \end{pmatrix},\quad u_2(0) = N(0)\begin{pmatrix} 0\\1\\0\\0 \end{pmatrix},\quad u_3(0) = N(0)\begin{pmatrix} 0\\0\\1\\0 \end{pmatrix},\quad u_4(0) = N(0)\begin{pmatrix} 0\\0\\0\\1 \end{pmatrix}, \] where $N(0)$ is a normalization constant. The particle is at rest, so its energy should be composed purely of the rest mass. Indeed, we find for the first solutions that $E_1=E_2=m$, which is positive, and for the last two $E_3=E_4=-m$, which is negative.

For $E>0$, the block-wise equation can be solved as \[ u(\mathbf p) = \begin{pmatrix} u_A(\mathbf p)\\u_B(\mathbf p) \end{pmatrix} = \begin{pmatrix} u_A(\mathbf p)\\ \frac{\mathbf p\cdot\bm\sigma}{E_{\mathbf p}+m} u_A(\mathbf p)\end{pmatrix}, \] where $u_A$ is a 2-component spinor which can be chosen freely, reflecting the spin degree of freedom. We choose as a basis the spin-up and spin-down states in the $z$-direction: \[ u^{(1)}_A(\mathbf p)=N(\mathbf p)\chi_+ = N(\mathbf p)\begin{pmatrix}1\\0\end{pmatrix},\quad u^{(2)}_A(\mathbf p)=N(\mathbf p)\chi_- = N(\mathbf p)\begin{pmatrix}0\\1\end{pmatrix}. \] The above choice of basis is just a parametrization of the upper component. In general, the lower component has a different direction of spin and the bi-spinor is not an eigenstate of spin.

For $E<0$, the block-wise equation can be solved as \[ u(\mathbf p) = \begin{pmatrix} u_A(\mathbf p)\\u_B(\mathbf p) \end{pmatrix} = \begin{pmatrix} -\frac{\mathbf p\cdot\bm\sigma}{E_{\mathbf p}+m} u_B(\mathbf p)\\ u_B(\mathbf p)\end{pmatrix}, \] where $u_B$ is again arbitrary 2-component spinor. As above, we use for it the basis $\chi_\pm$ (multiplied by a normalization constant).

For given momentum $\mathbf p$, we have four linearly independent solutions \[ u^1(\mathbf p) = N(\mathbf p)\begin{pmatrix} 1\\0\\ \frac{p_z}{E_{\mathbf p}+m} \\ \frac{p_x+ip_y}{E_{\mathbf p}+m}\end{pmatrix},\quad u^2(\mathbf p) =N(\mathbf p)\begin{pmatrix} 0 \\ 1 \\ \frac{p_x-ip_y}{E_{\mathbf p}+m} \\ -\frac{p_z}{E_{\mathbf p}+m}\end{pmatrix}, \] \[ u^3(\mathbf p) = N(\mathbf p)\begin{pmatrix} -\frac{p_z}{E_{\mathbf p}+m} \\ -\frac{p_x+ip_y}{E_{\mathbf p}+m} \\ 1 \\ 0\end{pmatrix},\quad u^4(\mathbf p) = N(\mathbf p)\begin{pmatrix} -\frac{p_x-ip_y}{E_{\mathbf p}+m} \\ \frac{p_z}{E_{\mathbf p}+m}\\ 0 \\ 1\end{pmatrix}, \] which have the energies $E_1 = E_2 = E_{\mathbf p}>0$ and $E_3 = E_4 = -E_{\mathbf p}<0$. The normalization $N(\mathbf p)$ can be chosen in different ways, depending on the context. Different normalizations are defined in the collapsible box below, along with identities related to $u_s$'s.

The general solution of the Dirac equation can be written as \[ \psi(\mathbf x,t) = \int\frac{{\rm d}^3\mathbf p}{(2\pi)^3} \left[ \sum_{s=1,2} a_s(\mathbf p)u^s(\mathbf p) e^{-i(E_{\mathbf p}t-\mathbf p\cdot\mathbf x)} + \sum_{s=3,4} a_s(\mathbf p)u^s(\mathbf p) e^{-i(-E_{\mathbf p}t-\mathbf p\cdot\mathbf x)}\right], \] where $a_s(\mathbf p)$'s are complex number coefficients. The solution can be written in a more symmetric way by defining the antiparticle spinors \[ \begin{aligned} v^1(\mathbf p) = u^4(-\mathbf p) = N(\mathbf p)\begin{pmatrix} \frac{\bm\sigma\cdot\mathbf p}{E_{\mathbf p}+m}\chi_- \\ \chi_- \end{pmatrix},\\ v^2(\mathbf p) = u^3(-\mathbf p) = N(\mathbf p)\begin{pmatrix} \frac{\bm\sigma\cdot\mathbf p}{E_{\mathbf p}+m}\chi_+ \\ \chi_+ \end{pmatrix}, \end{aligned} \] and making the change of variables $\mathbf p\rightarrow -\mathbf p$ on the second sum: \[ \psi(\mathbf x,t) = \int\frac{{\rm d}^3\mathbf p}{(2\pi)^3} \sum_{s=1,2} \left[ a_s(\mathbf p)u^s(\mathbf p) e^{-i p\cdot x} + b_s^*(\mathbf p)v^s(\mathbf p) e^{i p\cdot x}\right], \] where $p = (E_{\mathbf p},\mathbf p)$ is the four-momentum of the positive energy solution, and the coefficients are $b^*_{1/2}(\mathbf p) = a_{4/3}(-\mathbf p)$.

Stueckelberg-Feynman interpretation

Since $e^{-i(-E_{\mathbf p})(-t)}=e^{-i E_{\mathbf p}t}$, we can interpret negative energy solutions in two mathematically equivalent ways;

as negative-energy particles going backward in time, or
as positive-energy antiparticles going forward in time.

Both viewpoints can be used. This is the Stueckelberg-Feynman interpretation. It is used in Feynman diagrams, in which particles are depicted as arrows along the time direction, and antiparticles are depicted as arrows opposite to the time direction.

$\begin{tikzpicture}[node distance=1cm and 1.5cm] \coordinate[label=left:$e^-$] (e1); \coordinate[below right=of e1] (aux1); \coordinate[above right=of aux1,label=right:$\gamma$] (e2); \coordinate[below=1.25cm of aux1] (aux2); \coordinate[below left=of aux2,label=left:$e^+$] (e3); \coordinate[below right=of aux2,label=right:$\gamma$] (e4); \draw[postaction={decorate}, decoration={markings,mark=at position .5 with {\arrow{triangle 45}}}] (e1) -- (aux1); \draw[postaction={decorate}, decoration={markings,mark=at position .5 with {\arrow{triangle 45}}}] (aux2) -- (e3); \draw[postaction={decorate}, decoration={markings,mark=at position .6 with {\arrow{triangle 45}}}] (aux1) -- (aux2); \draw[decorate, draw=black, decoration={coil,aspect=0}] (aux1) -- (e2); \draw[decorate, draw=black, decoration={coil,aspect=0}] (aux2) -- (e4); %\draw[decorate, draw=black, % decoration={coil,aspect=0}] (aux1) -- node[label=right:$\mkern-65muV_2(\vec q)\mkern30mu\vec q$] {} (aux2); \draw[->] (0.4,0) --+ (0.7,-0.5) node[midway, above right] {$\mathbf p_1$}; \draw[->] (0.4,-3.3) --+ (0.7,0.5) node[midway, below right] {$\mathbf p_2$}; \end{tikzpicture}$

Fig: Example of a Feynman diagram, which depicts electron-positron annihilation and a creation of two photons. The positive time direction is from left to right. With the Stueckelberg-Feynman interpretation, the directed particle-antiparticle lines are always continuous. In this diagram, two photons are needed to satisfy both energy and momentum conservation laws.

Spinor normalization and identities

There are a few common conventions for the normalization of the spinors. They can be written in a unified way as \[ N(\mathbf p) = \sqrt{\frac{E_{\mathbf p}+m}{\lambda}}, \] where $\lambda$ is a normalization parameter. Let us put a particle in a box of volume $V$, and compute the total charge of a plane wave $\psi(\mathbf x,t) = \frac{1}{\sqrt{V}}u^s(\mathbf p)e^{-ip\cdot x}$: \[ \begin{aligned} Q &= q\int_V{\rm d}^3 \mathbf x\; \psi^\dagger(\mathbf x,t)\psi(\mathbf x,t) = q\, u^s(\mathbf x)^\dagger u^s(\mathbf x)\\ &= q N(\mathbf p)^2 \frac {2E_{\mathbf p}}{E_{\mathbf p}+m}= q\frac{2E_{\mathbf p}}{\lambda} \equiv q\gamma_N, \end{aligned} \] where $q$ is the charge of one particle. The field is not yet quantized, so its charge can take any value.

$\lambda=2m$ is the covariant normalization. In this case $\gamma_N = E_{\mathbf p}/m$. This normalization compensates for the shrinkage of $V$ in Lorentz boosts. With this normalization, $\gamma_N=1$ in the rest frame.
$\lambda=1$ is a very often used normalization in high-energy physics. In this case $\gamma_N =2E_{\mathbf p}$, and in the rest frame $\gamma_N = 2m$.
$\lambda=2E_{\mathbf p}$ normalizes the wave functions to unity. This is the usual normalization we use when doing quantum mechanics non-covariantly within the Hamiltonian formalism.

Below, the identities which involve adjoint spinors are more naturally given in covariant or high-energy normalization, whereas the identities involving hermitean conjugate spinors are simpler in the normalization $\lambda=2E_{\mathbf p}$.