Agenda for week 6: Relativistic quantum mechanics

Learning goals for this week:

  • General Lorentz invariance
  • Why Schrödinger equation is not Lorentz invariant
  • Relativistic extensions: Klein-Gordon equation and Dirac equation
  • Dirac equation: spinor solutions (interpretation in the next lecture)

Reading assignment:

Notes for week 6: Relativistic quantum mechanics

More details:

  • Tuominen, Secs 8.1, 8.2 (see also Secs 8.6, 8.7)
  • Bransden & Joachain, Secs 15.1-15.4
  • Sakurai & Napolitano, Secs 8.1 and 8.2

Preliminary exercises,

Do these during/after reading the assignment work. Will be discussed in class April 20th.

# w6a1q
  1. Assume a Lorentz invariant form of a dynamical equation for the wave function , of the form where and do not necessarily commute between each other. Requiring that also satisfies the Klein-Gordon equation, find the constraints (algebra) for and .
# w6a2q
  1. Taking the Dirac representation of check, for at least 2 different pairs that they satisfy the Clifford algebra relation.
# w6a3q
  1. Check that for any constant 2-component vector , the following is a solution of the positive energy Dirac equation for a plane wave in the Dirac representation:
# Preliminaryexercisesw6

Homework exercises, week 6

Will be discussed in the tutorial session on Thursday April 22nd. Return a scanned pdf with your solution by Friday April 23rd, at 9 pm using the form below. Then check and grade your solution with the help of the model solutions and resubmit your graded solutions by Tuesday April 27th at 9 am.

Exercise questions and return

# w6notes_head

Notes for week 6: Relativistic quantum mechanics

The basic postulates of special relativity are: (i) the laws of nature are identical (invariant) in all inertial frames of reference, and (ii) the speed of light is same for all such frames. A consequence of these postulates is that there is no strict distinction between time and space, but that space and time dimensions appear different for observers moving at different speeds.

The Schrödinger equation is clearly incompatible with the special relativity: it is of the first order in time derivative, but of the second order in spatial derivatives. Our task in this part of the course is to find a relativistic generalization(s) of the Schrödinger equation.

To accomplish this, let us first remind ourselves of the mathematical formalism of special relativity. For a more detailed description of special relativity, see David Tong's lecture notes.

Elements of special relativity

In Minkowski space, one considers space and time coordinates together, as one four-vector : where is the speed of light and is the Lorentz index. Strictly speaking, is the th component of the vector . Often, however, refers to the whole vector. From the context, it should be clear what one means.

We use greek letters to denote the Lorentz indices (). Roman letters are used to denote the spatial indices, ().

The location of the indices is important: , with an upper index, is a contravariant vector. Correspondingly, , with a lower index, is a covariant vector. In relativistic algebra, the difference between the two types of vectors is in how they transform in change of basis; covariant vectors change along the change of basis, whereas contravariant vectors change in an inverse way. Mathematically, contravariant vectors are elements of a vector space (i.e. vectors), and covariant vectors are elements of the dual space (i.e. dual vectors).

Metric tensor

We can lower the indices with the covariant metric tensor ; so that with every contravariant four-vector , we associate a covariant four-vector Above, we introduced the Einstein's summation convention, in which the repeated Lorentz indices are summed over. The notation is called Einstein notation.

With the above metric tensor, the components of the contravariant/covariant vectors are related by: and .

Similarly, we can lower an index in any higher order tensor :

Sometimes we have the opposite situation: we have a covariant vector and we would like to find the corresponding contravariant vector. This can be done with the contravariant metric tensor : We want this definition to be in harmony with the definition . Thus Since this is true for every vector, must be an identity in the sense that We conclude that the covariant and contravariant metric tensor are the inverses of each other (in the matrix sense) where the second step follows because in this case is its own inverse. The last step is only valid in the usual coordinate system, not for curved (such as spherical, cylindrical etc) coordinates or for a curved space-time.

We also note that due to the symmetry of the metric tensor (), it is not important which index we raise: For a general tensor, .

Inner product

As the name metric tensor suggests, we use to define an inner product, which gives a notion of frame independent distance between two points in the Minkowski space: Separating the temporal and spatial parts, the Minkowski inner product is where the dot product between the spatial vectors is the usual Euclidean inner product. Notice the minus sign!

Convention note: here we are using the "mostly minus" or "west coast" metric, which is commonly used in particle physics. In general relativity and cosmology one uses the "mostly plus" or "east coast" metric , where the scalar product is .

Lorentz tranformations

A linear coordinate transformation is a real-valued 4x4 matrix such that Here are the coordinates in the original coordinate system and are the coordinates after the transformation. The transformation can also be expressed as a derivative between the coordinates in transformed and original coordinate systems:

Lorentz transformations are defined as coordinate transformations which leave the inner product between four-vectors invariant, i.e. for all and . These transformations form a group, known as the Lorentz group. Without reference to the vectors and , the condition which defines the Lorentz group is

By contracting both sides with , we obtain so is the matrix inverse of , by which we mean that we can write the above equation in terms of matrix multiplication as The matrices are not defined systematically with regard to upper/lower indices; the components of are , whereas the components of are . From this equation we see that the matrix has the inverse which can be written in terms of the components as

Connected components of the Lorentz group

In the following, unless stated otherwise, by Lorentz transformations we mean the proper transformations, which do not invert time (time reversal transformation) or space (parity transformations).

Proper rotations

Pure rotations form a subgroup of the Lorentz group. They are represented by the matrices where is a rotation matrix. For example, a rotation of around the z-axis is given by a Lorentz transformation

Let us check that the four-vector inner product is invariant under rotations. The action on a coordinate four-vector is The inner product between four-vectors is: where we used the fact that for proper rotations . Also and , so this is indeed a proper Lorentz transformation.

Lorentz boosts

Apart from rotations, the other basic type of Lorentz transformation is a boost, a transformation to a frame moving at velocity relative to the original frame. Let and be two inertial frames such that at their origin and the coordinate axes coincide. Let us then assume that moves with a velocity relative to . On a course of special relativity we have learned that the coordinates in the two frames are related by where . The transformation between the two frames is represented by the matrix which can, like the rotation above, be verified to satisfy the conditions of a proper Lorentz transformation. Sometimes it is useful to parametrize the boost by so called rapidity . The above transformation then looks like

The rest of the Lorentz transformations can be generated as a combination of boosts and rotations. The larger symmetry group which also includes space-time translations is known as the Poincare group.

Transformations of contravariant and covariant vectors

Above, we defined that the contravariant vectors transform as

By using the metric tensor to manipulate the location of indices, we can prove that covariant vectors transform inversely under Lorentz transformations: The example below shows geometrically why the covariant vector transforms in an inverse way. It also shows mathematically what we mean by a transformation.

Example of contravariance and covariance in 2D

Transformations of fields

Partial derivatives

Let us define a partial derivative with respect to the components of a contravariant vector : which acts on as How does transform under Lorentz transformations? From the requirement for the transformed derivative, it can be verified that it transforms as a covariant vector, just as the notation suggests:

Similarly, we define a partial derivative with respect to a covariant vector: where the extra minus sign is required to cancel the minus signs in the defition of in order to have This derivative transforms as a contravariant vector:

Note: Be careful with the signs! Here, no minus signs appear: Unlike here, where we do get a minus sign:

Four-momentum

Let us consider the relativistic generalization of momentum for a particle moving at velocity in the frame . From the Noether's theorem we learn that energy is the conserved quantity associated with the time translations, whereas momentum is the conserved quantity associated with the spatial translations. Thus we can assume that the relativistic generalization of momentum, four-momentum, has the form but we do not know immediately the expressions for and .

However, we do know that the particle is at rest in the frame moving at velocity . In that frame the particle has no 3-momentum and thus we know that the 4-momentum is of the form The fact that we put a mass into the component follows from dimensional analysis. Here we also require that , since a massless particle does not have a rest frame. To find the four-momentum in the frame , we make Lorentz boost from to . (Do this as an exercise!) The result is

The square of 4-momentum is a Lorentz invariant quantity In the rest frame, the energy is positive, so we find the relativistic dispersion of a free particle with mass : A proper generalization of the Schrödinger equation should reproduce this dispersion.

Klein-Gordon equation

Note: From now on, we work in natural units .

Recall how for a nonrelativistic system the Schrödinger equation could be obtained from a classical Hamiltonian by replacing and by operators which obey the canonical commutation relations. The form of the Schrödinger equation in position representation suggests that we effectively did the replacements and .

If we try to do the same replacement for the relativistic dispersion , we fail. Even if we manage to interpret the square root operator meaningfully (e.g. by Fourier transform or series expansion), we end up with a complicated nonlocal operator.

If, instead, we start from the square of the above expression, , we avoid such problems. We make the replacements and and apply the resulting operator on a scalar field to obtain the Klein-Gordon equation which can be written in Einstein notation as Klein-Gordon equation is clearly Lorentz invariant. So far everything is good. As we are attempting to generalize Schrödinger equation, we hope to be able interpret as a wavefunction of a particle.

Like the Schrödinger equation, the Klein-Gordon equation has plane-wave solutions: Let us fix the momentum to some value and solve for the energy of this solution. Substituting the above into the Klein-Gordon equation, we find which has two solutions: . The positive energy solution is what we expected, but what are we to make of the negative energy solution?

It turns out the negative energy solutions make the theory unstable. The energy of a particle is not bounded from below, which means that there is no ground state. If we were to add a perturbation (e.g. potentials), they would couple the negative and positive energy states, and the particle could transition to lower and lower energy states.

The negative energies also force us to abandon the hope of interpreting as a wavefunction. As with the non-relativistic treatment, one obtains the continuity equation for the probability 4-current from the Klein-Gordon equation and its complex conjugate: The "probability density" for a plane-wave solution is which is negative if . The conclusion is that we cannot interpret as a probability density and thus is not a wavefunction.

It turns out that the classical Klein-Gordon equation can be used to derive a relativistic quantum field theory for spin-0 particles (e.g. pion or Higgs boson), but the quantization requires either the use of path integrals or the full machinery of canonical quantization (not that this would be very hard; we already did it for the electromagnetic field in the QED part of the course). Luckily, there is another equation which we can quantize with the above simple procedure, the Dirac equation, which turns out to describe electrons and other spin-½ particles.

Dirac equation

Dirac ansatz

In 1928, Paul Dirac developed a new field equation for spin-½ particles. For the (very readable) original article, see here. Dirac's motivation was to derive an equation which would not have the problems of the Klein-Gordon equation. The negative energy states in Klein-Gordon equation seem to stem from the fact that it is of the second order in time derivative, thus his idea was to search for a first order equation both in and , by first assuming a general ansatz where the four initially unknown quantities and commute with (i.e. they do not depend on ) but not with each other. As groups can typically be represented by matrices, we can assume that and are N×N matrices, where the dimension N is as of yet unknown. In order to have a compatible structure with the matrices, must then be a (complex) N×1 column vector.

Dirac then required that the field should also satisfy the Klein-Gordon equation, to guarantee that the relativistic dispersion relation holds for the solutions of the equation:

Moving towards the covariant formulation, we write the above as The Klein-Gordon equation, on the hand, can be written as Comparing the two equations above term by term, we find that Dirac's ansatz satisfies the Klein-Gordon equation if and only if Furthermore, we require that the Hamiltonian is Hermitian: . In the exercises it will be shown that and are hermitean, traceless, even-dimensional, mutually anticommuting matrices. The lowest dimensional matrices that can represent them are 4×4 matrices.

The above requirements do not fix and uniquely, so we have some freedom in choosing their representation. There are a few common choices. We take the Dirac-Pauli representation (or basis), with which the non-relativistic limit is particularly simple, where 's are the Pauli matrices and is the 2×2 identity matrix. The other commonly used choices for and matrices go by the names Weyl representation and Majorana representation.

Probability current

Because the Hamiltonian is Hermitian, the probability amplitude is conserved: Also, by definition, . In this sense we have improved from the Klein-Gordon equation.

Now you should be able to go back and do Question 1 in the preliminary exercises

Covariant equation of motion

To formulate the Dirac equation in a covariant form, we multiply both sides of the equation by and define a new set of matrices which have the anticommutation relations defining a Clifford algebra. Because of this algebraic structure, the -matrices have a lot of useful properties. However, we do not discuss these in detail, but we just give one identity that is used in derivations below: This identity is independent of the choice of the representation.

In terms of -matrices, the Dirac equation becomes

In particle physics, the Feynman slash notation is often used, so that we obtain an even more condensed form

Note that even if the -matrices carry the Lorentz index of the partial derivative, they do not transform as contravariant vectors. We have defined them as constant matrices so they do not change at all in Lorentz transformations. Next, we show how the wavefunction should transform so that the Dirac equation would be Lorentz invariant.

Note: The defined above is like any other one-particle Hamiltonian: We can replace the non-relativistic Hamiltonian with it and do calculations as before. The only caveat is that its eigenenergy spectrum is not bounded from below, as discussed below. This problem can be addressed only in a many-body formulation.

Properties of -matrices

# gammarepresentations

Different representations of the -matrices

Choosing the above matrix representation for the -matrices means that the eigenvectors satisfying the Dirac equation are spinors of the form Below we discuss the form of the eigenfunctions of the Dirac equation. But let us first check how such spinors transform under Lorentz transformations. These 4-component spinors can be built from two 2-component spinors (see the section about Weyl fermions below), and are sometimes called bi-spinors.

Lorentz transformation of spinors

We now derive the transformation law for , by assuming that the form of the Dirac equation is invariant under Lorentz transformations. We can assume that transforms linearly under Lorentz transformations: where is a 4×4 matrix representation of the Lorentz transformation . We know how the partial derivative transforms: The Dirac equation transforms into which has, apart from the factor on the left, the same form as the original equation if obeys the equation , which is equivalent to This equation gives the connection between a Lorentz transformation and its spinor representation . If is known, it is possible to compute .

You might recall from non-relativistic QM that the Pauli matrices obey a similar equation: where is a rotation matrix for rotation of angle around the axis and is the corresponding unitary transformation for spinors.

Let us define the commutator of two -matrices: For example, a Lorentz boost of rapidity to x direction, is represented by a spinor transformation (left here as an exercise) and a rotation of angle around z-axis (or equivalently: on xy-plane, which is what the indices of refer to) is represented by which contains two copies of the same non-relativistic rotation. The above examples suggest that a spinor rotation is unitary, , but a spinor boost is not: .

To find the inverse transformation we take a Hermitian conjugate of the equation relating and : Then we use the identity and multiply from both sides with : Comparison with the original equation suggests that That this is indeed the inverse of can be verified from the general solution , where is a real antisymmetric matrix which parametrizes the Lorentz transformations.

Lorentz transformation of spinors where is a real antisymmetric matrix which parametrizes the Lorentz transformations and + different conventions for and factors , in different sources

Dirac adjoint and relativistic scalars

The 4×1 column vectors act as the kets of the Dirac equation. Given a ket , what is the corresponding bra, i.e. the 1×4 row vector, which would allow us to calculate expectation values by sandwiching operators between bras and kets? A natural first guess would the hermitean conjugate . However, this does not work since, e.g. the bilinear is not a Lorentz invariant scalar: because, as we saw above, is not unitary if is a Lorentz boost.

The solution is to define a Dirac adjoint for which the bilinear is Lorentz invariant.

The necessicity of defining the adjoint can be traced back to the Clifford algebra and the signature of the metric tensor, which forces the eigenvalues of some of the -matrices to be imaginary. Because of this, all the matrices cannot be unitary.

Sandwiched between the spinors, the -matrices effectively transform according to their Lorentz indices. For example, the quantity transforms as a Lorentz vector, transforms as a rank-2 contravariant tensor and so on.

Dirac Lagrangian

Now you should be able to go back and do Question 2 in the preliminary exercises

Plane wave solutions

Let us find the plane wave solutions for the Dirac equation with an ansatz where is a 4-component spinor. Let us see if we still have the negative energy solutions. By the above substitution the Dirac equation becomes The above can we written in block matrix form as where and are 2-component spinors. The eigenvalues satisfy the condition .

The determinant condition is explicitly which has the solutions . Thus we did not cure the theory of the negative energy solutions. For convenience, we define a positive energy dispersion .

Fig: Spectrum of the Dirac equation with a mass (solid line). The dashed line is the spectrum when .

To gain understanding of the form of the solutions, set . We find the solutions where is a normalization constant. The particle is at rest, so its energy should be composed purely of the rest mass. Indeed, we find for the first solutions that , which is positive, and for the last two , which is negative.

For , the block-wise equation can be solved as where is a 2-component spinor which can be chosen freely, reflecting the spin degree of freedom. We choose as a basis the spin-up and spin-down states in the -direction: The above choice of basis is just a parametrization of the upper component. In general, the lower component has a different direction of spin and the bi-spinor is not an eigenstate of spin.

For , the block-wise equation can be solved as where is again arbitrary 2-component spinor. As above, we use for it the basis (multiplied by a normalization constant).

For given momentum , we have four linearly independent solutions which have the energies and . The normalization can be chosen in different ways, depending on the context. Different normalizations are defined in the collapsible box below, along with identities related to 's.

The general solution of the Dirac equation can be written as where 's are complex number coefficients. The solution can be written in a more symmetric way by defining the antiparticle spinors and making the change of variables on the second sum: where is the four-momentum of the positive energy solution, and the coefficients are .

Stueckelberg-Feynman interpretation

Spinor normalization and identities

These are the current permissions for this document; please modify if needed. You can always modify these permissions from the manage page.