WikiJournal Preprints/Cut the coordinates! (or Vector Analysis Done Fast)

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Abstract

The gradient, the curl, the divergence, and the Laplacian are initially defined, without coordinates, as closed-surface integrals per unit volume—the definition of the Laplacian being indifferent to whether the operand is a scalar field or a vector field. Four integral theorems—including the divergence theorem—follow almost immediately, provided that the initial definitions are unambiguous. Their unambiguity, together with their usefulness, is established as follows, at a level suitable for beginners (although this abstract is for prospective instructors):
  • The gradient is related to an acceleration through an equation of motion;
  • The divergence is related to two time-derivatives of density (the partial derivative and the material derivative) through two forms of an equation of continuity;
  • The component of the curl in a general direction is expressed as a divergence (now known to be unambiguous);
  • The same is done for the general component of the gradient, yielding not only a second proof of unambiguity of the gradient, but also the relation between the gradient and the directional derivative; this together with the original definition of the Laplacian shows that the Laplacian of a scalar field is the divergence of the gradient and therefore unambiguous. The unambiguity of the Laplacian of a vector field then follows from a component argument (as for the curl) or from a linearity argument.

The derivation of the relation between the gradient and the directional derivative yields a coordinate-free definition of the dot-del operator for a scalar right-hand operand. But, as the directional derivative is also defined for a non-scalar operand, the same relation offers a method of generalizing the dot-del operator, so that the definition of the Laplacian of a general field can be rewritten with that operator. The advection operator—derived without coordinates, for both scalar and vector properties—is likewise rewritten.

Meanwhile comparison between the definitions of the various operators leads to coordinate-free definitions of the del-cross, del-dot, and del-squared operators. These together with the dot-del operator allow the four integral theorems to be condensed into a single generalized volume-integral theorem.

If the volume of integration is reduced to a thin curved slab of uniform thickness, with an edge-face perpendicular to the broad faces, the four integral theorems are reduced to their two-dimensional forms, each of which relates an integral over a surface segment to an integral around its enclosing curve, provided that the original closed-surface integral has no contribution from the broad faces of the slab. This proviso can be satisfied by construction in two of the four cases, yielding two general theorems, one of which is the Kelvin–Stokes theorem. By applying these two theorems to a segment of a closed surface, and expanding the segment to cover the entire surface, it is shown that the gradient is irrotational and the curl is solenoidal.

The next part of the exposition is more conventional, but still coordinate-free. The gradient theorem is derived from the relation between the gradient and the directional derivative. An irrotational field is shown to have a scalar potential. The 1/r  scalar field is shown to be the field whose negative gradient is the inverse-square vector field, whose divergence is a delta function, which is therefore also the negative Laplacian of the 1/r  scalar field. These results enable the construction of a field with a given divergence or a given Laplacian. The wave equation is derived from small-amplitude sound waves in a non-viscous fluid, and shown to be satisfied by a spherical-wave field with a 1/r  amplitude, whose D'Alembertian is a delta function, enabling the construction of a wave function with a given D'Alembertian. But further progress, including the construction of a field with a given curl, seems to require the introduction of coordinates.

With the aid of identities already found, expressions are easily obtained for the gradient, curl, divergence, Laplacian, and advection operators in Cartesian coordinates—with indicial notation and implicit summation, for brevity. While the resulting expressions for the curl and divergence may look unfamiliar, they match the initial definitions given by J. Willard Gibbs. The Cartesian expressions are found convenient for deriving further identities: a comprehensive collection is derived, leading to the construction of a field with a given curl in a star-shaped region and, as a by-product, a demonstration that the curl of the velocity field of a rigid body is twice the angular velocity. The curl-of-the-curl identity leads to a second definition of the Laplacian of a vector, the Helmholtz decomposition, and the prediction of electromagnetic waves.

The time-honored method of deriving vector-analytic identities—treating the divergence and curl as "formal products" with the del operator, varying one field at a time, and adding the results—is found to be less than rigorous, sometimes less than clear, and hard to justify in view of the ease with which the same thing can be done with Cartesian coordinates, indicial notation, and implicit summation.

The introduction of general coordinates proceeds through (non-normalized) natural and dual basis vectors, reciprocity, the Kronecker delta, covariance of the natural basis, contravariance of the dual basis, contravariant and covariant components, local bases, contravariance of coordinates, covariance of derivatives w.r.t. coordinates, the Jacobian, and handedness. Reciprocity leads to the dot-product of two vector fields and, via the permutation symbol, to the cross-products of the basis vectors, the definition of one basis in terms of the other, the cross-product of two vector fields, and reciprocity of the covariant and contravariant Jacobians. Thus the stage is set for expressing operators in general coordinates.

The multivariate chain rule leads to expressions for the directional derivative (in terms of the contravariant basis), hence the gradient (del) and advection operators. The identity for the curl of the product of a scalar and a vector leads to the curl of a vector field.

[To be continued.]


Introduction

edit

Sheldon Axler, in his essay "Down with determinants!" (1995) and his ensuing book Linear Algebra Done Right (4th Ed., 2023–), does not entirely eliminate determinants, but introduces them as late as possible and then exploits them for what he calls their "main reasonable use in undergraduate mathematics", namely the change-of-variables formula for multiple integrals.[1] Here I treat coordinates in vector analysis somewhat as Axler treats determinants in linear algebra: I introduce coordinates as late as possible, and then exploit them in unconventionally rigorous derivations of vector-analytic identities from (e.g.) vector-algebraic identities. But I contrast with Axler in at least two ways. First, as my subtitle suggests, I have no intention of expanding my paper into a book. Brevity is of the essence. Second, while one may well avoid determinants in numerical  linear algebra,[2] one can hardly avoid coordinates in numerical vector analysis! So I cannot extend the coordinate-minimizing path into computation. But I can extend it up to the threshold by expressing the operators of vector analysis in a suitably general coordinate system, leaving others to specialize it and compute with it. On the way, I can satisfy readers who need the concepts of vector analysis for theoretical purposes, and who would rather read a paper than a book.

The cost of coordinates

edit

Mathematicians define a "vector" as a member of a vector space, which is a set whose members satisfy certain basic rules of algebra (called the vector-space axioms) with respect to another set (called a field), which has its own basic rules of algebra (the field axioms), and whose members are called "scalars". Physicists are more fussy. They typically want a "vector" to be not only a member of a vector space, but also a first-order tensor : a "tensor", meaning that it exists independently of any coordinate system with which it might be specified; and "first-order" (or "first-degree", or "first-rank"), meaning that it is specified by a one-dimensional array of numbers. Similarly, a 2nd-order tensor is specified by a 2-dimensional array (a matrix), and a 3rd-order by a 3-dimensional array, and so on; and a "scalar", being specified by a single number (a zero-dimensional array), is a zero-order tensor. In "vector analysis", we are greatly interested in applications to physical situations, and accordingly take the physicists' view on what constitutes a vector or a scalar.

So, for our purposes, defining a quantity by three components in (say) a Cartesian coordinate system is not enough to make it a vector, and defining a quantity as a real function of a list of coordinates is not enough to make it a scalar, because we still need to show that the quantity has an independent existence. One way to do this is to show that its coordinate representation behaves appropriately when the coordinate system is changed. Independent existence of a quantity means that its coordinate representation changes so as to compensate for the change in the coordinate system.[3] But independent existence of an operator means that its representation in a coordinate system, with the operand(s) and the result in that system, has the same form in one coordinate system as in another (except for features internal  to each system).[4]

Here we circumvent these complications by the most obvious route: by initially defining things without coordinates. If, having defined something without coordinates, we then need to represent it with coordinates, we can choose the coordinate system for convenience.

The limitations of limits

edit

In the branch of pure mathematics known as analysis, there is a thing called a limit, whereby for every positive ϵ  there exists a positive δ such that if some increment is less than δ, some error is less than ϵ. In the branch of applied mathematics known as continuum mechanics, there is a thing called reality, whereby if the increment is less than some positive δ, the assumption of a continuum becomes ridiculous, so that the error cannot be made less than an arbitrary ϵ. Yet vector "analysis" (together with higher-order tensors) is typically studied with the intention of applying it to some form of "continuum" mechanics, such as the modeling of elasticity, plasticity, fluid flow, or (widening the net) electrodynamics of ordinary matter; in short, it is studied with the intention of conveniently forgetting that, on a sufficiently small scale, matter is lumpy.[a] One might therefore submit that to express the principles of vector analysis in the language of limits is to strain at a gnat and swallow a camel. Here I avoid that camel by referring to elements of length or area or volume, each of which is small enough to allow some quantity or quantities to be considered uniform within it, but, for the same reason, large enough to allow such local averaging of the said quantity or quantities as is necessary to tune out the lumpiness.

We shall see bigger camels, where well-known authors define or misdefine a vector operator and then want to treat it like an ordinary vector (a quantity). These I also avoid.

Prerequisites

edit

I assume that the reader is familiar with the algebra and geometry of vectors in 3D space, including the dot-product, the cross-product, and the scalar triple product, their geometric meanings, their expressions in Cartesian coordinates, and the identity

a × (b × c)  =  a⸱ c ba⸱b c ,

which we call the "expansion" of the vector triple product.[5] I further assume that the reader can generalize the concept of a derivative, so as to differentiate a vector with respect to a scalar, e.g.

 

or so as to differentiate a function of several independent variables "partially" w.r.t. one of them while the others are held constant, e.g.

 

But in view of the above remarks on limits, I also expect the reader to be tolerant of an argument like this: In a short time dt, let the vectors r and p change by dr and dp respectively. Then

 

where, as always, the orders of the cross-products matter.[b] Differentiation of a dot-product behaves similarly, except that the orders don't matter; and if  p = mv, where m is a scalar and v is a vector, then

 

Or an argument like this:  If , then

 

that is, we can switch the order of differentiation in a "mixed" partial derivative. Ifx is an abbreviation for /∂x, etc., this rule can be written in operational terms as

x y = ∂y x .

More generally, if i is an abbreviation for /∂xi  where  i ∊ {1, 2,…},  the rule becomes

i j = ∂j i .

These generalizations of differentiation, however, do not go beyond differentiation w.r.t. real variables, some of which are scalars, and some of which are coordinates. Vector analysis involves quantities that may be loosely described as derivatives w.r.t. a vector—usually the position vector.

Closed-surface integrals per unit volume

edit

The term field, mentioned above in the context of algebraic axioms, has an alternative meaning: if r is the position vector, a scalar field is a scalar-valued function of r, and a vector field is a vector-valued function of r; both may also depend on time. These are the functions of which we want "derivatives" w.r.t. the vector r.

In this section I introduce four such derivatives—the gradient, the curl, the divergence, and the Laplacian —in a way that will seem unremarkable to those readers who aren't already familiar with them, but idiosyncratic to those who are. The gradient is commonly introduced in connection with a curve and its endpoints, the curl in connection with a surface segment and its enclosing curve, the divergence in connection with a volume and its enclosing surface, and the Laplacian as a composite of two of the above, initially applicable only to a scalar field. Here I introduce all four in connection with a volume and its enclosing surface; and I introduce the Laplacian as a concept in its own right, equally applicable to a scalar or vector field, and only later relate it to the others. My initial definitions of the gradient, the curl, and the Laplacian, although not novel, are usually thought to be more advanced than the common ones—in spite of being conceptually simpler, and in spite of being obvious variations on the same theme.

Instant integral theorems (with a caveat)

edit

Let V be a volume (3D region) enclosed by a surface S (a mathematical surface, not generally a physical barrier). Let n̂ be the unit normal vector at a general point on S, pointing out of V. Let n be the distance from S in the direction of n̂ (positive outside V, negative inside), and let n be an abbreviation for /∂n, where the derivative—commonly called the normal derivative—is tacitly assumed to exist.

In V, and on S, let p be a scalar field (e.g., pressure in a fluid, or temperature), and let q be a vector field (e.g., flow velocity, or heat-flow density), and let ψ be a generic field which may be a scalar or a vector. Let a general element of the surface S have area dS, and let it be small enough to allow n̂, p, q, and n ψ to be considered uniform over the element. Then, for every element, the following four products are well defined:

 

 

 

 

 

(1)

If p is pressure in a non-viscous fluid, the first of these products is the force exerted by the fluid in V  through the area dS. The second product does not have such an obvious physical interpretation; but if q is circulating clockwise about an axis directed through V, the cross-product will be exactly tangential to S and will tend to have a component in the direction of that axis. The third product is the flux of q through the surface element; if q is flow velocity, the third product is the volumetric flow rate (volume per unit time) out of V  through dS ; or if q is heat-flow density, the third product is the heat transfer rate (energy per unit time) out of V  through dS. The fourth product, by analogy with the third, might be called the flux of the normal derivative of ψ through the surface element, but is equally well defined whether ψ is a scalar or a vector—or, for that matter, a matrix, or a tensor of any order, or anything else that we can differentiate w.r.t. n.

If we add up each of the four products over all the elements of the surface S, we obtain, respectively, the four surface integrals

 

 

 

 

 

(2)

in which the double integral sign indicates that the range of integration is two-dimensional. The first surface integral takes a scalar field and yields a vector; the second takes a vector field and yields a vector; the third takes a vector field and yields a scalar; and the fourth takes (e.g.) a scalar field yielding a scalar, or a vector field yielding a vector. If p is pressure in a non-viscous fluid, the first integral is the force exerted by the fluid in V  on the fluid outside V. The second integral may be called the skew surface integral of q over S ,[6] or, for the reason hinted above, the circulation of q over S.  The third integral, commonly called the flux integral (or simply the surface integral) of q over S, is the total flux of q out of V. And the fourth integral is the surface integral of the outward normal derivative of ψ.

Let the volume V  be divided into elements. Let a general volume element have the volume dV and be enclosed by the surface δS —not to be confused with the area dS of a surface element, which may be an element of S or of δS. Now consider what happens if, instead of evaluating each of the above surface integrals over S, we evaluate it over each δS and add up the results for all the volume elements. In the interior of V, each surface element of area dS is on the boundary between two volume elements, for which the unit normals n̂ at dS, and the respective values of n ψ, are equal and opposite. Hence when we add up the integrals over the surfaces δS, the contributions from the elements dS cancel in pairs, except on the original surface S, so that we are left with the original integral over S. So, for the four surface integrals in (2), we have respectively

 

 

 

 

 

(3)

Now comes a big "if":  if  we define the gradient of p (pronounced "grad p") inside dV  as

 

 

 

 

 

(4g)

and the curl of q inside dV  as

 

 

 

 

 

(4c)

and the divergence of q inside dV  as

 

 

 

 

 

(4d)

and the Laplacian of ψ inside dV  as [c]

 

 

 

 

 

(4L)

(where the letters after the equation number stand for gradient, curl, divergence, and Laplacian, respectively), then equations (3) can be rewritten

 

But because each term in each sum has a factor dV, we call the sum an integral; and because the range of integration is three-dimensional, we use a triple integral sign. Thus we obtain the following four theorems relating integrals over an enclosing surface S  to integrals over the enclosed volume V :

 

 

 

 

 

(5g)

 

 

 

 

 

(5c)

 

 

 

 

 

(5d)

 

 

 

 

 

(5L)

Of the above four results, only the third (5d) seems to have a standard name; it is called the divergence theorem (or Gauss's theorem or, more properly, Ostrogradsky's theorem[7]), and is indeed the best known of the four—although the other three, having been derived in parallel with it, may be said to be equally fundamental.

As each of the operators ∇, curl, and div calls for an integration w.r.t. area and then a division by volume, the dimension (or unit of measurement) of the result is the dimension of the operand divided by the dimension of length, as if the operation were some sort of differentiation w.r.t. position. Moreover, in each of equations (5g) to (5d), there is a triple integral on the right but only a double integral on the left, so that each of the operators ∇, curl, and div appears to compensate for a single integration. For these reasons, and for convenience, we shall describe them as differential operators. By comparison, the operator in (4L) or (5L) calls for a further differentiation w.r.t. n ; we shall therefore describe as a 2nd-order differential operator. (An additional reason for these descriptions will emerge later.) As promised, the four definitions (4g) to (4L) are "obvious variations on the same theme" (although the fourth is somewhat less obvious than the others).

But remember the "if": Theorems (5g) to (5L) depend on definitions (4g) to (4L) and are therefore only as definite as those definitions! Equations (3), without assuming anything about the shapes and sizes of the closed surfaces δS (except, tacitly, that n̂ is piecewise well-defined), indicate that the surface integrals are additive with respect to volume. But this additivity, by itself, does not guarantee that the surface integrals are shared among neighboring volume elements in proportion to their volumes, as envisaged by "definitions" (4g) to (4L). Each of these "definitions" is unambiguous if, and only if, the ratio of the surface integral to dV  is insensitive to the shape and size of δS for a sufficiently small δS. Notice that the issue here is not whether the ratios specified in equations (4g) to (4L) are true vectors or scalars, independent of the coordinates; all of the operations needed in those equations have coordinate-free definitions. Rather, the issue is whether the resulting ratios are unambiguous notwithstanding the ambiguity of δS, provided only that δS is sufficiently small. That is the advertised "caveat", which must now be addressed.

In accordance with our "applied" mathematical purpose, our proofs of the unambiguity of the differential operators will rest on a few thought experiments, each of which applies an operator to a physical field, say f, and obtains another physical field whose unambiguity is beyond dispute. The conclusion of the thought experiment is then applicable to any operand field whose mathematical properties are consistent with its interpretation as the physical field f ; the loss of generality, if any, is only what is incurred by that interpretation.

Unambiguity of the gradient

edit

Suppose that a fluid with density ρ (a scalar field) flows with velocity v (a vector field) under the influence of the internal pressure p (a scalar field). Then the integral in (4g) is the force exerted by the pressure of the fluid inside δS on the fluid outside, so that minus the integral is the force exerted on the fluid inside δS by the pressure of the fluid outside. Dividing by dV, we find that −∇p, as defined by (4g), is the force per unit volume, due to the pressure outside the volume.[8] If this is the only force per unit volume acting on the volume (e.g., because the fluid is non-viscous and in a weightless environment, and the volume element is not in contact with the container), then it is equal to the acceleration times the mass per unit volume; that is,

 

 

 

 

 

(6g)

Now provided that the left side of this equation is locally continuous, it can be considered uniform inside the small δS, so that the left side is unambiguous, whence  p is also unambiguous. If there are additional forces on the fluid element, e.g. due to gravity and/or viscosity, then −∇p is not the sole contribution to density-times-acceleration, but is still the contribution due to pressure, which is still unambiguous.

By showing the unambiguity of definition (4g), we have confirmed theorem (5g). In the process we have seen that the volume-based definition of the gradient is useful for the modeling of fluids, and intuitive in that it formalizes the common notion that a pressure "gradient" gives rise to a force.

Unambiguity of the divergence

edit

In the aforesaid fluid, in a short time dt, the volume that flows out of fixed closed surface δS  through a fixed surface element of area dS  is vdt⸱ n̂ dS.  Multiplying by density and integrating over δS, we find that the mass flowing out of δS  in time dt is   .  Dividing this by dV, and then by dt, we get the rate of reduction of density inside δS ; that is,

 

where the derivative w.r.t. time is evaluated at a fixed location (because δS is fixed), and is therefore written as a partial derivative (because other variables on which ρ might depend—namely the coordinates—are held constant). Provided that the right-hand side is locally continuous, it can be considered uniform inside δS and is therefore unambiguous, so that the left side is likewise unambiguous. But the left side is simply div ρv  as defined by (4d),[d] which is therefore also unambiguous,[9] confirming theorem (5d). In short, the divergence operator is that which maps ρv to the rate of reduction of density at a fixed point:

 

 

 

 

 

(7d)

This result, which expresses conservation of mass, is a form of the so-called equation of continuity.

The partial derivative ∂ρ/∂t in (7d) must be distinguished from the material derivative /dt, which is evaluated at a point that moves with the fluid.[e] [Similarly, dv/dt in (6g) is the material acceleration, because it is the acceleration of the mobile mass—not of a fixed point! ]  To re-derive the equation of continuity in terms of the material derivative, the volume vdt⸱ n̂ dS , which flows out through dS in time dt (as above), is integrated over δS to obtain the increase in volume of the mass initially contained in dV. Dividing this by the mass, ρ dV, gives the increase in specific volume (1⧸ρ) of that mass, and then dividing by dt gives the rate of change of specific volume; that is,

 

Multiplying by ρ² and comparing the left side with (4d), we obtain

 

 

 

 

 

(7d')

Whereas (7d) shows that div ρv is unambiguous, (7d') shows that div v is unambiguous (provided that the right-hand sides are locally continuous). In accordance with the everyday meaning of "divergence", (7d') also shows that div v is positive if the fluid is expanding (ρ decreasing), negative if it is contracting (ρ increasing), and zero if it is incompressible. In the last case, the equation of continuity reduces to

 [ for an incompressible fluid ].

 

 

 

 

(7i)

For incompressible flow, any tubular surface tangential to the flow velocity, and consequently with no flow in or out of the "tube", has the same volumetric flow rate across all cross-sections of the "tube", as if the surface were the wall of a pipe full of liquid (except that the surface is not necessarily stationary). Accordingly, a vector field with zero divergence is described as solenoidal (from the Greek word for "pipe"). More generally, a solenoidal vector field has the property that for any tubular surface tangential to the field, the flux integrals across any two cross-sections of the "tube" are the same—because otherwise there would be a net flux integral out of the closed surface comprising the two cross-sections and any segment of tube between them, in which case, by the divergence theorem (5d), the divergence would have to be non-zero somewhere inside, contrary to (7i).

Unambiguity of the curl (and gradient)

edit

The unambiguity of the curl (4c) follows from the unambiguity of the divergence. Taking dot-products of (4c) with an arbitrary constant vector b, we get

 

that is, by (4d),

 [ for uniform b].

 

 

 

 

(8c)

(The parentheses on the right, although helpful because of the spacing, are not strictly necessary, because the alternative binding would be (div q), which is a scalar, whose cross-product with the vector b is not defined. And the left-hand expression does not need parentheses, because it can only mean the dot-product of a curl with the vector b; it cannot mean the curl of a dot-product, because the curl of a scalar field is not defined.) This result (8c) is an identity if the vector b is independent of location, so that it can be taken inside or outside the surface integral; thus b may be a uniform vector field, and may be time-dependent. If we make b a unit vector, the left side of the identity is the (scalar) component of curl q in the direction of b, and the right side is unambiguous. Thus the curl is unambiguous because its component in any direction is unambiguous. This confirms theorem (5c).

Similarly, the unambiguity of the divergence implies the unambiguity of the gradient. Starting with (4g), taking dot-products with an arbitrary uniform vector b, and proceeding as above, we obtain

 [ for uniform b].

 

 

 

 

(8g)

(The left-hand side does not need parentheses, because it can only mean the dot-product of a gradient with the vector b; it cannot mean the gradient of the dot-product of a scalar field with a vector field, because that dot-product would not be defined.) If we make b a unit vector, this result (8g) says that the (scalar) component of p in the direction of b is given by the right-hand side, which again is unambiguous. So here we have a second explanation of the unambiguity of the gradient: like the curl, it is unambiguous because its component in any direction is unambiguous.

We might well ask what happens if we take cross-products with b on the left, instead of dot-products. If we start with (4g), the process is straightforward: in the end we can switch the order of the cross-product on the left, and change the sign on the right, obtaining

 [ for uniform b].

 

 

 

 

(8p)

(Again no parentheses are needed.) If we start with (4c) instead, and take b inside the integral, we get a vector triple product to expand, which leads to

 

in which the first term on the right is simply  ∇ b⸱q  (the gradient of the dot-product). The second term is more problematic. If we had a scalar p instead of the vector q, we could take b outside the second integral, so that the second term would be (minus) b ⸱ ∇p. This suggests that the actual second term should be (minus) b ⸱ ∇q.  Shall we therefore adopt the second term (without the sign) as the definition of b⸱∇ q for a vector q (treating b⸱ as an operator), and write

 [ for uniform b] ?

 

 

 

 

(8q)

The proposal would be open to the objection that  b⸱∇ q  had been defined only for uniform b , whereas  b ⸱ ∇p (for scalar p) is defined whether b is uniform or not.  So, for the moment, let us put (8q) aside and run with (8c), (8g), and (8p).

Another meaning of the gradient

edit

Let be a unit vector in a given direction, and let s be a parameter measuring distance (arc length) along a path in that direction. By equation (8g) and definition (4d), we have

 

where, by the unambiguity of the divergence, the shape of the closed surface δS enclosing dV  can be chosen for convenience. So let δS be a right cylinder with cross-sectional area α  and perpendicular height ds , with the path passing perpendicularly through the end-faces at parameter-values s and s+ds , where the outward unit normal n̂ consequently takes the values and  , respectively. And let the cross-sectional dimensions be small compared with ds  so that the values of p at the end-faces, say p and p+dp, can be taken to be the same as where the end-faces cut the path. Then  dV = α ds , and the surface integral over δS includes only the contributions from the end-faces (because n̂ is perpendicular to elsewhere); those contributions are respectively    and     i.e.    and  .  With these substitutions the above equation becomes

 

that is,

 

 

 

 

 

(9g)

where the right-hand side, commonly called the directional derivative of p in the direction,[10] is the derivative of p w.r.t. distance in that direction. Although (9g) has been obtained by taking that direction as fixed, the equality is evidently maintained if s measures arc length along any path tangential  to at the point of interest.

Equation (9g) is an alternative definition of the gradient: it says that the gradient of  is the vector whose scalar component in any direction is the directional derivative of  in that direction. For real , this component has its maximum, namely |p| , in the direction of p; thus the gradient of  is the vector whose direction is that in which the derivative of  w.r.t. distance is a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the gradient.[11] Sometimes it is convenient to work directly from this definition. For example, in Cartesian coordinates (x, y, z), if a scalar field is given by x , its gradient is obviously the unit vector in the direction of the x axis, usually called i; that is, x = i. Similarly, if  r = rr ̂  is the position vector, then r = r ̂.

If  is tangential  to a level surface of p (a surface of constant p), then s p  in that direction is zero, in which case (9g) says that p (if not zero) is orthogonal to .  So  is orthogonal to the surfaces of constant  (as we would expect, having just shown that the direction of p is that in which p varies most steeply). This result leads to a method of finding a vector normal to a curved surface at a given point: if the equation of the surface is  f (r) = C ,  where r is the position vector and C  is a constant (possibly zero), a suitable vector is f  evaluated at the given point.

If p is uniform —that is, if it has no spatial variation—then its derivative w.r.t. distance in every direction is zero; that is, the component of p in every direction is zero, so that p must be the zero vector. In short, the gradient of a uniform scalar field is zero. Conversely, if p is not uniform, there must be some location and some direction in which its derivative w.r.t. distance, if defined at all, is non-zero, so that its gradient, if defined at all, is also non-zero. Thus a scalar field with zero gradient in some region is uniform in that region.

Unambiguity of the Laplacian

edit

Armed with our new definition of the gradient (9g), we can revisit our definition of the Laplacian (4L). If ψ is a scalar field, then, by (9g),    can be replaced by   in (4L), which then becomes

 

 

 

 

 

(9L)

that is, by definition (4d),

 [ for scalar ψ].

 

 

 

 

(9L')

So the Laplacian of a scalar field is the divergence of the gradient. This is the usual introductory definition of the Laplacian—and on its face is applicable only in the case of a scalar field. The unambiguity of the Laplacian, in this case, follows from the unambiguity of the divergence and the gradient.

If, on the contrary, ψ in definition (4L) is a vector field, then we can again take dot-products with a uniform vector b, obtaining

 

If we make b a unit vector, this says that the scalar component of the Laplacian of a vector field, in any direction, is the Laplacian of the scalar component of that vector field in that direction. As we have just established that the latter is unambiguous, so is the former.

But the unambiguity of the Laplacian can be generalized further. If

 

where each   is a scalar field, and each αi is a constant, and the counter i ranges from (say) 1 to k , then it is clear from (4L) that

 

 

 

 

 

(10)

In words, this says that the Laplacian of a linear combination of fields is the same linear combination of the Laplacians of the same fields—or, more concisely, that the Laplacian is linear. I say "it is clear" because the Laplacian as defined by (4L) is itself a linear combination, so that (10) merely asserts that we can regroup the terms of a nested linear combination; the gradient, curl, and divergence as defined by (4g) to (4d) are likewise linear. It follows from (10) that the Laplacian of a linear combination of fields is unambiguous if the Laplacians of the separate fields are unambiguous. Now we have supposed that the fields   are scalar and that the coefficients αi are constants. But the same logic applies if the "constants" are uniform basis vectors (e.g., i, j,k), so that the "linear combination" can represent any vector field, whence the Laplacian of any vector field is unambiguous. And the same logic applies if the "constants" are chosen as a "basis" for a space of tensors of any order, so that the Laplacian of any tensor field of that order is unambiguous, and so on. In short, the Laplacian of any field that we can express with a uniform basis is unambiguous.

The dot-del, del-cross, and del-dot operators

edit

The gradient operator is also called del.[f] If it simply denotes the gradient, we tend to pronounce it "grad" in order to emphasize the result. But it can also appear in combination with other operators to give other results, and in those contexts we tend to pronounce it "del".

One such combination is "dot del"— as in "b⸱∇ ", which we proposed for (8q), but did not quite manage to define satisfactorily for a vector operand. With our new definition of the gradient (9g), we can now make a second attempt. A general vector field q can be written |q| q̂ , so that

 

If ψ is a scalar field, we can apply (9g) to the right-hand side, obtaining

 

where sq is distance in the direction of q. For scalar ψ, this result is an identity between previously defined quantities. For non-scalar ψ, we have not yet defined the left-hand side, but the right-hand side is still well-defined and self-explanatory (provided that we can differentiate ψ w.r.t. sq). So we are free to adopt

 

 

 

 

 

(11)

where sq is distance in the direction of q , as the general definition of the operator q⸱∇ , and to interpret it as defining both a unary operator  q⸱ which operates on a generic field, and a binary operator  which takes a (possibly uniform) vector field on the left and a generic field on the right.

For any vector field q , it follows from (11) that if  is a uniform field, then .

For the special case in which q is a unit vector  , with s measuring distance in the direction of   , definition (11) reduces to

 

 

 

 

 

(12)

which agrees with (9g) but now holds for a generic field ψ [whereas (9g) was for a scalar field, and was derived as a theorem based on earlier definitions]. So ŝ⸱∇ , with a unit vector s , is the directional-derivative operator on a generic field; and by (11),  q⸱ is a scaled directional derivative operator on a generic field.

In particular, if   = n̂  we have

 

which we may substitute into the original definition of the Laplacian (4L) to obtain

 

 

 

 

 

(13L)

which is just (9L) again, except that it now holds for for a generic field.

If our general definition of the gradient (4g) is also taken as the general definition of the operator,[12] then, comparing (4g) with (4c), (4d), and (13L), we see that

 

where the parentheses may seem to be required on account of the closing dS  in (4g).[13] But if we write the factor dS before the integrand, the del operator in (4g) becomes

 

if  we insist that it is to be read as a operator looking for an operand, and not as a self-contained expression. Then, if we similarly bring forward the dS in (4c), (4d), and (13L), the respective operators become[14]

 

 

 

 

 

(14)

(pronounced "del cross", "del dot", and "del dot del"), of which the last is usually abbreviated as2  ("del squared").[15]

There is a misconception that the operational equivalences in (14) apply only in Cartesian coordinates.[16] But, because these equivalences have been derived from coordinate-free definitions of the operators, they must remain valid in any coordinate system provided that they are expressed correctly—without (e.g.) inadvertently taking dependent variables inside or outside differentiations.[17] That does not mean that they are always convenient, or easily verified, or conducive to the avoidance of error. But they sometimes make useful mnemonics; e.g., they let us rewrite identities (8c), (8g), and (8p) as

 for uniform b.

 

 

 

 

(15)

These would be basic algebraic vector identities if  were an ordinary vector, and one could try to derive them from the "algebraic" behavior of ; but they're not, because it isn't, so we didn't !  Moreover, these simple "algebraic" rules are for a uniform b, and do not of themselves tell us what to do if  b is spatially variable; for example, (8g) is not applicable to (7d).

The advection operator

edit

Variation or transportation of a property of a medium due to motion with the medium is called advection (which, according to its Latin roots, means "carrying to"). Suppose that a medium (possibly a fluid) moves with a velocity field v in some inertial reference frame. Let ψ be a field (possibly a scalar field or a vector field) expressing some property of the medium (e.g., density, or acceleration, or stress,[g]… or even v itself). We have seen that the time-derivative of ψ may be specified in two different ways: as the partial derivative ∂ψ/∂t , evaluated at a fixed point (in the chosen reference frame), or as the material derivative /dt, evaluated at a point moving at velocity v (i.e., with the medium). The difference  /dt − ∂ψ/∂t is due to motion with the medium. To find another expression for this difference, let s be a parameter measuring distance along the path traveled by a particle of the medium. Then, for a short time interval dt, the surface-plot of the small change in ψ (or each component thereof) as a function of the small changes in t and s  (plotted on perpendicular axes) can be taken as a plane through the origin, so that

 

that is, the change in ψ is the sum of the changes due to the change in t and the change in s . Dividing by dt gives

 

i.e.,

 

(and the first term on the right could have been written t ψ). So the second term on the right is the contribution to the material derivative due to motion with the medium; it is called the advective term, and is non-zero wherever a particle of the medium moves along a path on which ψ varies with location—even if ψ at each location is constant over time.  So the operator  |v| s , where s measures distance along the path, is the advection operator : it maps a property of a medium to the advective term in the time-derivative of that property. If ψ is v itself, the above result becomes

 

where the left-hand side (the material acceleration) is as given by Newton's second law, and the first term on the right (which we might call the "partial" acceleration) is the time-derivative of velocity in the chosen reference frame, and the second term on the right (the advective term) is the correction that must be added to the "partial" acceleration in order to obtain the material acceleration. This term is non-zero wherever velocity is non-zero and varies along a path, even if the velocity at each point on the path is constant over time (as when water speeds up while flowing at a constant volumetric rate into a nozzle). Paradoxically, while the material acceleration and the "partial" acceleration are apparently linear (first-degree) in v, their difference (the advective term) is not. Thus the distinction between ∂ψ/∂t and /dt  has the far-reaching implication that fluid dynamics is non-linear.

Applying (11) to the last two equations, we obtain respectively

 

 

 

 

 

(16)

and

 

 

 

 

 

(16v)

where, in each case, the second term on the right is the advective term. So the advection operator can also be written v⸱∇ .

When the generic ψ  in (16) is replaced by the density ρ , we get a relation between ∂ρ/∂t and /dt, both of which we have seen before—in equations (7d) and (7d') above. Substituting from those equations then gives

 

 

 

 

 

(17)

where ρ can be taken as a gradient since ρ is scalar. This result is in fact an identity—a product rule for the divergence—as we shall eventually confirm by another method.

Generalized volume-integral theorem

edit

We can rewrite the fourth integral theorem (5L) in the "dot del" notation as

 

 

 

 

 

(18L)

Then, using notations (14), we can condense all four integral theorems (5g), (5c), (5d), and (18L) into the single equation

 

 

 

 

 

(19)

where the wildcard (conveniently pronounced "star") is a generic binary operator which may be replaced by a null (direct juxtaposition of the operands) for theorem (5g), or a cross for (5c), or a dot for (5d), or  for (18L). This single equation is a generalized volume-integral theorem, relating an integral over a volume to an integral over its enclosing surface.[18]

Theorem (19) is based on the following definitions, which have been found unambiguous:

  • the gradient of a scalar field p is the closed-surface integral of  n̂ p per unit volume, where n̂ is the outward unit normal;
  • the divergence of a vector field is the outward flux integral per unit volume;
  • the curl of a vector field is the skew surface integral per unit volume, also called the surface circulation per unit volume; and
  • the Laplacian is the closed-surface integral of the outward normal derivative, per unit volume.

The gradient maps a scalar field to a vector field; the divergence maps a vector field to a scalar field; the curl maps a vector field to a vector field; and the Laplacian maps a scalar field to a scalar field, or a vector field to a vector field, etc.

The gradient of p, as defined above, has been shown to be also

  • the vector whose (scalar) component in any direction is the directional derivative of p in that direction (i.e. the derivative of p w.r.t. distance in that direction), and
  • the vector whose direction is that in which the directional derivative of p is a maximum, and whose magnitude is that maximum.

Consistent with these alternative definitions of the gradient, we have defined the  operator so that  ŝ⸱ (for a unit vector ) is the operator yielding the directional derivative in the direction of   , and we have used that notation to bring theorem (5L) under theorem (19).

So far, we have said comparatively little about the curl. That imbalance will now be rectified.

Closed-circuit integrals per unit area

edit

Instant integral theorems (on a condition)

edit

Theorems (5g) to (5L) are three-dimensional: each of them relates an integral over a volume V  to an integral over its enclosing surface S. We now seek analogous two-dimensional theorems, each of which relates an integral over a surface segment to an integral around its enclosing curve. For maximum generality, the surface segment should be allowed to be curved into a third dimension.[h] Theorems of this kind can be obtained as special cases of theorems (5g) to (5L) by suitably choosing V and S ; this is another advantage of our "volume first" approach.

Let Σ be a surface segment enclosed by a curve C (a circuit or closed contour), and let l be a parameter measuring arc length around C , so that a general element of C has length dl ; and let a general element of the surface Σ have area . Let  be the unit normal vector at a general point on Σ , and let t ̂ be the unit tangent vector to C at a general point on C in the direction of increasing l. In the original case of a surface enclosing a volume, we had to decide whether the unit normal pointed into or out of the volume (we chose the latter). In the present case of a circuit enclosing a surface segment, we have to decide whether l is measured clockwise or counterclockwise as seen when looking in the direction of the unit normal, and we choose clockwise. So l is measured clockwise about  and C is traversed clockwise about .

From Σ  we can construct obvious candidates for V and S. From every point on Σ , erect a perpendicular with a uniform small  height h in the direction of . Then simply let V be the volume occupied by all the perpendiculars, and let S be its enclosing surface. Thus V is a (generally curved) thin slab of uniform thickness h, whose enclosing surface S consists of two close parallel (generally curved) broad faces connected by a perpendicular edge-face of uniform height h ; and we can treat  as a vector field  by extrapolating it perpendicularly from Σ. If we can arrange for h to cancel out, the volume V will serve as a 3D representation of the surface segment Σ while the edge-face will serve as a 2D representation of the curve C , so that our four theorems will relate an integral around C to an integral over Σprovided that there is no contribution from the broad faces to the integral over S. For brevity, let us call this proviso the 2D condition.

If  the 2D condition is satisfied, an integral over the new S reduces to an integral over the edge-face, on which

 

so that the cancellation of h will leave an integral over C  w.r.t. length. Meanwhile, in an integral over the new V, regardless of the 2D condition, we have

 

so that the cancellation of h will leave an integral over Σ w.r.t. area. So, substituting for dS and dV  in (5g) to (5L), and canceling h as planned, we obtain respectively

 

 

 

 

 

(20g)

 

 

 

 

 

(20c)

 

 

 

 

 

(20d)

 

 

 

 

 

(20L)

all subject to the 2D condition. In each equation, the circle on the left integral sign acknowledges that the integral is around a closed loop. The unit vector n̂ , which was normal to the edge-face, is now normal to both t ̂ and ; that is, n̂ is tangential to the surface segment Σ and projects perpendicularly outward from its bounding curve.

On the left side of (20g), the 2D condition is satisfied if (but not only if) n̂p takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if p takes the same value at such points, i.e. if p has a zero directional derivative normal to Σ , i.e. if p has no component normal to Σ. Thus a sufficient "2D condition" for (20g) is the obvious one.

Skipping forward to (20L), we see that the 2D condition is satisfied if  takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if  (where   measures distance in the direction of ) takes the same value at such points, i.e. if .

For (20c) and (20d), the 2D constraint can be satisfied by construction, with more useful results—as explained under the next two headings. To facilitate this process, we first make a minor adjustment to Σ and C. Noting that any curved surface segment can be approximated to any desired accuracy by a polyhedral surface enclosed by a polygon, we shall indeed consider Σ to be a polyhedral surface made up of small planar elements, being the area of a general element, and we shall indeed consider C to be a polygon with short sides, dl being the length of a general side.[i] The benefit of this trick, as we shall see, is to make the unit normal   uniform over each surface element, without forcing us to treat q (or any other field) as uniform over the same element. But, as the elements of C can independently be made as short as we like (dividing straight sides into shorter elements if necessary!), we can still consider   q , and t ̂ to be uniform over each element of C.

Special case for the gradient

edit

In (20c), the 2D condition is satisfied by  (where p is a scalar field), because then the integrand on the left is zero on the broad faces of S , where n is parallel to . Equation (20c) then becomes

 

 

 

 

 

(21n)

Now on the left,    and on the right, over each surface element, the unit normal   is uniform so that, by (8p),   .  With these substitutions, the minus signs cancel and we get

 

 

 

 

 

(21g)

or, if we write  dr = t ̂ dl  and   

 

 

 

 

 

(21r)

This result, although well attested in the literature,[19] does not seem to have a name—unlike the next result.

Special case for the curl

edit

In (20d), the 2D condition is satisfied if q is replaced by  because then (again) the integrand on the left is zero on the broad faces of S , where n is parallel to . Equation (20d) then becomes

 

 

 

 

 

(22n)

Now on the left, the integrand can be written    and on the right,    by identity (8c), since   is uniform over each surface element.  With these substitutions, the minus signs cancel and we get

 

 

 

 

 

(22c)

or, if we again write  dr = t ̂ dl  and   

 

 

 

 

 

(22r)

This result—the best-known theorem relating an integral over a surface segment to an integral around its enclosing curve, and the best-known theorem involving the curl—is called Stokes' theorem or, more properly, the Kelvin–Stokes theorem,[20] or simply the curl theorem.[21]

The integral on the left of (22c) or (22r) is called the circulation of the vector field q around the closed curve C. So, in words, the Kelvin–Stokes theorem says that the circulation of a vector field around a closed curve is equal to the flux of the curl of that vector field through any surface spanning that closed curve.

Now let a general element of Σ (with area dΣ ) be enclosed by the curve δC, traversed in the same direction as the outer curve C. Then, applying (22c) to the single element, we have

 

that is,

 

 

 

 

 

(23c)

where the right-hand side is simply the circulation per unit area.

Equation (23c) is an alternative definition of the curl: it says that the curl of q is the vector whose scalar component in any direction is the circulation of q per unit area of a surface whose normal points in that direction. For real q, this component has its maximum, namely |curl q| , in the direction of curl q; thus the curl of q is the vector whose direction is that which a surface must face if the circulation of q per unit area of that surface is to be a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the curl.[22]

[Notice, however, that our original volume-based definition (4c) is more succinct: the curl is the closed-surface circulation per unit volume, i.e. the skew surface integral per unit volume.]

It should now be clear where the curl gets its name (coined by Maxwell), and why it is also called the rotation (indeed the curl operator is sometimes written "rot", especially in Continental languages, in which "rot" does not have the same unfortunate everyday meaning as in English). It should be similarly unsurprising that a vector field with zero curl is described as irrotational (which one must carefully pronounce differently from "irri tational"!), and that the curl of the velocity of a medium is called the vorticity.

However, a field does not need to be vortex-like in order to have a non-zero curl; for example, by identity (8p), in Cartesian coordinates, the velocity field xj has a curl equal to  x × j = i × j = k ,  although it describes a shearing motion rather than a rotating motion. This is understandable because if you hold a pencil between the palms of your hands and slide one palm over the other (a shearing motion), the pencil rotates. Conversely, we can have a vortex-like field whose curl is zero everywhere except on or near the axis of the vortex. For example, the Maxwell–Ampère law in magnetostatics says that  curl H = J , where H is the magnetizing field and J is the current density.[j] So if the current is confined to a wire, curl H is zero outside the wire—although, as is well known, the field lines circle the wire. The resolution of the paradox is that H gets stronger as we approach the wire, making a shearing pattern, whose effect on the curl counteracts that of the rotation.

The curl-grad and div-curl operators

edit

We have seen from (9L) that the Laplacian of a scalar field is the divergence of the gradient. Four more such second-order combinations make sense, namely the curl of the gradient (of a scalar field), and the divergence of the curl, the gradient of the divergence, and the curl of the curl (of a vector field). The first two —"curl grad" and "div curl"— can now be disposed of.

Let the surface segment Σ enclosed by the curve C be a segment of the closed surface S surrounding the volume V, and let Σ expand across S until it engulfs S , so that C shrinks to a point on the far side of S. Then, in the nameless theorem (21g) and the Kelvin–Stokes theorem (22c), the integral on the left becomes zero while Σ and   on the right become S and n̂ , so that the theorems respectively reduce to

 

and

 

Applying theorem (5c) to the first of these two equations, and the divergence theorem (5d) to the second, we obtain respectively

 

and

 

As the integrals vanish for any volume V in which the integrands are defined, the integrands must be zero wherever they are defined; that is,

 

 

 

 

 

(24c)

and

 

 

 

 

 

(24d)

In words, the curl of the gradient is zero, and the divergence of the curl is zero; or, more concisely, any gradient is irrotational, and any curl is solenoidal.

We might well ask whether the converses are true. Is every irrotational vector field the gradient of something? And is every solenoidal vector field the curl of something? The answers are affirmative, but the proofs require more preparation.

Meanwhile we may note, as a mnemonic aid, that when the left-hand sides of the last two equations are rewritten in the del-cross and del-dot notations, they become  ∇ × ∇p  and  ∇  ∇ × q , respectively. The former looks like (but isn't) a cross-product of two parallel vectors, and the latter looks like (but isn't) a scalar triple product with a repeated factor, so that each expression looks like it ought to be zero (and it is). But such appearances can lead one astray, because is an operator, not a self-contained vector quantity; for example,  p × ∇φ  is not identically zero, because two gradients are not necessarily parallel.[23]

We should also note, to tie a loose end, that identity (24d) was to be expected from our verbal statement of the Kelvin–Stokes theorem (22c). That statement implies that the flux of the curl through any two surfaces spanning the same closed curve is the same. So if we make a closed surface from two spanning surfaces, the flux into one spanning surface is equal to the flux out of the other, i.e. the net flux out of the closed surface is zero, i.e. the integral of the divergence over the enclosed volume is zero; and since any simple volume in which the divergence is defined can be enclosed this way, the divergence itself (of the curl) must be zero wherever it is defined.

Change per unit length

edit

Continuing (and concluding) the trend of reducing the number of dimensions, we now seek one-dimensional theorems, each of which relates an integral over a path to values at the endpoints of the path. For maximum generality, the path should be allowed to be curved into a second and a third dimension.

We could do this by further specializing theorems (5g) to (5L). We could take a curve Γ with a unit tangent vector . At every point on Γ we could mount a circular disk with a uniform small area α , centered on Γ and orthogonal to it. We could let V be the volume occupied by all the disks and let S be its enclosing surface; thus V would be a thin right circular cylinder, except that its axis could be curved. If we could arrange for α to cancel out, our four theorems would indeed be reduced to the desired form, provided that there were no contribution from the curved face of the "cylinder" to the integral over S (the "1D proviso"). But, as it turns out, this exercise yields only one case in which the "1D proviso" can be satisfied by a construction involving and a general field, and we have already almost discovered that case by a simpler and more conventional argument—which we shall now continue.

Fundamental theorem

edit

Equation (9g) is applicable where p(r) is a scalar field,  s is a parameter measuring arc length along a curve Γ, and is the unit tangent vector to Γ in the direction of increasing s. Let s take the values s1 and s2 at the endpoints of Γ, where the position vector r takes the values r1 and r2 respectively. Then, integrating (9g) w.r.t. s from s1 to s2 and applying the fundamental theorem of calculus, we get

 

 

 

 

 

(25g)

This is our third integral theorem involving the gradient, and the best-known of the three: it is commonly called simply the gradient theorem,[24] or the fundamental theorem of the gradient, or the fundamental theorem of line integrals; it generalizes the fundamental theorem of calculus to a curved path.[25] If we write dr  for  ds (the change in the position vector), we get the theorem in the alternative form

 

 

 

 

 

(25r)

As the right-hand side of (25g) or (25r) obviously depends on the endpoints but not on the path in between, so does the integral on the left. This integral is commonly called the work integral of p over the path—because if p is a force, the integral is the work done by the force over the path. So, in words, the gradient theorem says that the change in value of a scalar field from one point to another is the work integral of the gradient of that field field over any path from the one to the other.

Applying (25r) to a single element of the curve, we get

 

 

 

 

 

(26g)

which is reminiscent of    in elementary calculus.[26] Alternatively, we could have obtained (26g) by multiplying both sides of (9g) by ds, and then obtained (25r) by adding (26g) over all the elemental displacements dr on any path from r1 to r2.

If we close the path by setting  r2 = r1 , the gradient theorem reduces to

 

 

 

 

 

(27g)

where the integral is around any closed loop. Applying the Kelvin–Stokes theorem then gives

 

 

 

 

 

(28g)

where Σ is any surface spanning the loop. As this applies to any loop spanned by any surface on which the integrand is defined,  curl ∇p  must be zero wherever it is defined. This is a second proof (more conventional than the first) of theorem (24c).

Scalar potential: field with given gradient

edit

Lemma:  If  curl q = 0  in a simply connected region V,  then    over any path in V  depends only on the endpoints of the path.

Proof:  Suppose, on the contrary, that there are two paths Γ and Λ in V,  with a common starting point and a common finishing point, such that

 

Let  −Λ denote Λ traversed backwards. Then for every dr on Λ  there is an equal and opposite dr on  −Λ , so that we have

 

i.e.

 

where the left-hand side is now a work integral of q around a closed loop in V.  By the simple connectedness of V,  this loop is spanned by some surface Σ in V.  So we can apply the Kelvin–Stokes theorem and conclude that the flux integral of  curl q  through Σ  is non-zero, in which case  curl q  must be non-zero somewhere on Σ , hence somewhere in V — contradicting the hypothesis of the lemma. ◼

Corollary:  If  curl q = 0  in a simply connected region V,  there exists a scalar field p such that  q = ∇p  in V.

Proof:  We shall show that a suitable candidate is

 

where r0 is the position vector of any fixed point in V,  and ρ is the position vector of a general point on the path of integration, which may be any path in V. First note that p(r) is unambiguous because, by the preceding lemma, it is independent of the path for given r0 and r, provided that the path is in V.  Now to find  p(r),  let σ be the arc length along the path from r0 to ρ, so that σ ranges from 0 to (say) s  as ρ ranges from r0 to r; and let be the unit vector tangential to the path at ρ, in the direction of increasing σ.  Then  dρ =  , so that the above equation becomes

 

Differentiating w.r.t. s gives

 

where is evaluated at  σ = s  and is therefore in the direction in which the path reaches r.  By the generality of the path, this can be any direction. So the last equation says that q is the vector whose (scalar) component in any direction is the derivative of p w.r.t. arc length in that direction; that is, q = ∇p , as required. ◼

This is the promised converse of theorem (24c). But, given an irrotational vector field q , we usually prefer to find a scalar field whose negative gradient is q;  that is, we usually prefer a scalar field   such that   .  Such a field   is called a scalar potential for q.  From the above expression for p(r), a suitable candidate is

 

 

 

 

 

(29)

A scalar field has zero gradient if and only if it is uniform, so that adding a uniform field, but only a uniform field, to a given scalar field leaves its gradient unchanged. Thus the scalar potential is determined up to an arbitrary additive uniform field. This would be the case with or without the minus sign in front of the gradient. The reason for preferring the minus sign appears next.

Conservative fields

edit

An irrotational vector field—or, equivalently, a field that is (plus or minus) the gradient of something—is described as conservative, because if the field is a force, it does zero work around a closed loop, and consequently conserves energy around the loop (at least if the field does not change during traversal of the loop).

If the only force acting on a particle is  F = −∇U,  then, by the gradient theorem, the work done on the particle over a path is the increase in −U,  i.e. the decrease in U ; and this work is the increase in the particle's kinetic energy T.  Hence, if we identify U with the potential energy, the total energy  U + T  is conserved. This interpretation of the scalar potential is possible only if the force is minus the gradient of the potential.

The minus sign is also used if the conservative vector field is an electric field (force per unit charge) or a gravitational acceleration (force per unit mass); the scalar potential is potential energy per unit charge, or potential energy per unit mass, respectively.

Some special fields

edit

The 1/r scalar potential

edit

For the potential energy field

 

 

 

 

 

(30)

where r is the distance from the origin (and r ≠ 0), let us find the corresponding force  F = −∇U.  The direction of  U  is that of the steepest increase of U, which, by the spherical symmetry, can only be parallel or anti-parallel to r ̂ (the unit vector pointing away from the origin). So

 

whence

 

 

 

 

 

(31)

So the negative gradient of the 1/r  scalar potential (30) is the unit inverse-square radial vector field. Multiplying the numerator and denominator by r gives the alternative form

 

which is convenient if the center of the force is shifted from the origin to position r′: in that case we simply replace r by r − r′, and r by |r − r′|, so that the force becomes

 

and the corresponding scalar potential becomes

 

Inverse-square radial vector field

edit

We derived the vector field (31) as the negative gradient of the scalar potential (30). Conversely, given the inverse-square radial vector field (31), we could derive its scalar potential from (29). At a general point on the path, let the position vector be    so that, by (31),   .  Then (29) becomes

 

so that, if we choose  r0 → ∞ , we recover (30).

Because F, given by (31), has a scalar potential,  curl F  must be zero. This is independently obvious in that the spherical symmetry ofF seems to rule out any resemblance of rotation or shear—even at the origin, where F becomes infinite. On the last point, let us check whether  curl F  has a meaningful integral over a volume containing the origin. If the volume V  is enclosed by the surface S  whose outward unit normal is n̂ , then, by theorem (5c),

 

If V contains the origin, then, because  curl F  is zero everywhere except at the origin, the volume V  can be replaced by any element of V  containing the origin, whatever the shape of that element may be. If we choose that element to be a spherical ball centered on the origin, then n̂ is parallel to r ̂ , so that the cross-product in the integrand on the right is zero. Thus the volume integral on the left is not only meaningful, but is zero, even if the volume contains the point where the integrand is undefined. In this sense, the field F is so irrotational that its curl may be taken as zero even where the field itself is undefined!

The situation concerning the divergence ofF is more complicated. Again, let the volume V  be enclosed by the surface S whose outward unit normal is n̂.  By the divergence theorem (5d),

 

where dΩ is the solid angle subtended at the origin by the surface element of area dS , and is positive if the outward unit normal n̂ has a positive component away from the origin (r ̂⸱ n̂ > 0), and negative if n̂ has a positive component toward the origin (r ̂⸱ n̂ < 0). If the volume enclosed by S does not include the origin, then for every positive contribution dΩ there is a compensating negative contribution, so that the integral of  div F  over the volume is zero. As this applies to every such volume,  div F  must be zero everywhere except at the origin. If, on the contrary, the volume does include the origin, then the contributions dΩ add up to the total solid angle subtended by the enclosing surface, which is 4π. In summary,

 

 

 

 

 

(32d)

where δ(r), the 3D unit delta function, is zero everywhere except at the origin, but has an integral of  1 over any volume that includes the origin. For example, a unit point-mass at the origin has the density δ(r), and a point-mass m at position r′ has the density  (r − r′). As the argument of  div  in (32d) is  −∇(1/r), we also have

 

 

 

 

 

(32L)

If we shift the centers from the origin to r′, the last two results become

 

 

 

 

 

(33d)

and

 

 

 

 

 

(33L)

Field with given divergence (and zero curl)

edit

It follows from Coulomb's law that the electric field due to a point-charge Q at the origin, in a vacuum, is

 

where ϵ0 is a physical constant (called the vacuum permittivity or simply the electric constant). In a vacuum, the electric displacement field, denoted by D , is ϵ0E.  So it is convenient to multiply the above equation by ϵ0 , obtaining

 

This is a inverse-square radial vector field and therefore has zero curl.

Now suppose that, instead of a charge Q at the origin, we have a static charge density ρ(r′) in a general elemental volume dV′  at position r′ (the standard symbol for charge density being unfortunately the same as for mass density). Then the contribution from that element to the field D at position r  is

 

provided that, for each r, the dimensions of each volume element are small compared with |r − r′|. This contribution likewise has zero curl. The total field due to static charges is then the sum of the contributions:

 

 

 

 

 

(34)

where the integral is over all space. And D(r) has zero curl because all the contributions have zero curl.

Independently of the physical significance of  D(r), we can take its divergence "term by term" (or "under the integral sign"), obtaining

 

where the last step is permitted because the volume integral of the delta function of r′ is not changed by a "point reflection" (inversion) across r.  As the volume of integration (all space) includes the shifted origin of the delta function, the integral is simply 1 , so that

 

 

 

 

 

(35)

where both sides are evaluated at r.

Mathematically, this result is an identity which applies if  D is given by (34); substituting for D , we can write the identity in full as

 

 

 

 

 

(36)

Subject to the convergence of the integral, this shows that we can construct an irrotational vector field whose divergence is a given scalar field ρ(r). And of course, by theorem (24d), any curl can be added to that vector field without changing its divergence.

In electrostatics, (34) is a generalization of Coulomb's law; and (35), which follows from (34), is Gauss's law expressed in differential form. If we integrate (35) over a volume enclosed by a surface S (with outward unit normal n̂) and apply the divergence theorem on the left, we get the integral form of Gauss's law:

 

 

 

 

 

(37)

where Qe is the total charge enclosed  by S.

Field with given Laplacian

edit

In (36), we can recognize the r-dependent factor  r − r′/|r − r′|3  as  −∇1/ |r − r′|   and take the gradient operator outside the integral, obtaining

 

i.e.

 

 

 

 

 

(38)

Subject to the convergence of the integral, this shows that we can construct a field whose Laplacian is a given field. More precisely, it shows that we can construct a scalar field whose Laplacian is a given scalar  field ρ(r). But, due to the linearity of the Laplacian, the same applies to any given linear combination of scalar fields, including any combination whose coefficients are uniform vectors, uniform matrices, or uniform tensors of any order; that is, the same applies to any field that we can express with a uniform basis.

Mathematically, (38) is simply an identity. To find its significance in electrostatics, we can multiply it by  −1⧸ϵ0 , obtaining

 

 

 

 

 

(39)

which is also an identity. But the negative gradient of the expression after the integral sign is

 

which is the contribution to the electric field at position r due to a charge  ρ(r′) dV′  at position r′ in a vacuum. So the expression after the integral sign is the corresponding contribution to the electrostatic potential, and the whole integral is the whole electrostatic potential. Denoting this by   we can rewrite (39) as

 

 

 

 

 

(40)

This is Poisson's equation in electrostatics, treating the medium as a vacuum (so that ρ must be taken as the total charge density, including any contributions caused by the effect of the field on the medium). In a region in which  ρ = 0 ,  Poisson's equation (40) reduces to

 

 

 

 

 

(41)

which is Laplace's equation in electrostatics.

The wave equation

edit

It is an empirical fact that a compressible fluid, such as air, carries waves of a mechanical nature: sound waves. In establishing the unambiguity of the gradient and the divergence, we have already derived equations dealing with the inertia and continuity (mass-conservation) of non-viscous fluids. So, by introducing a relation describing the compressibility, and eliminating variables, we should be able to get one equation (the "wave equation") in one scalar or vector field (the "wave function"), with recognizably "wavelike" solutions. And we should expect this equation to be analogous to equations describing other kinds of waves.

If we suppose, for simplicity, that the only force acting on an element of fluid is the pressure force, the applicable equation of motion is (6g). But, for reasons which will soon be apparent, let us call the pressure P, so that (6g) becomes

 

Then at equilibrium we have

 

where P0 is the equilibrium pressure. Subtracting this equation from the previous one and defining

 

we get

 

which looks like (6g), except that p is now the sound pressure (also called "acoustic pressure", or sometimes "excess pressure"), i.e. the pressure rise above equilibrium.

For the equation of continuity we can use (7d'), which we repeat for convenience:

 

Eliminating v between the last two equations is fraught because v is evaluated at a moving point in the former and at a fixed point in the latter; and introducing any relation between p and ρ is similarly fraught because p is evaluated at a fixed point and ρ at a moving point. The obvious remedy is to apply the advection rule (16) to the last two equations, obtaining respectively

 

That gets all the variables evaluated at fixed points, at the cost of making the equations more complicated and more obviously non-linear. But the equations and be simplified and linearized by small-amplitude approximations. In the parentheses in the first equation, the first term is proportional to the amplitude of the vibrations while the second term is a product of two factors proportional to the amplitude, so that, for sufficiently small amplitudes, the second term is negligible. Similarly, in the second equation, for sufficiently small amplitudes and a homogeneous medium, we can neglect the second term on the right. Then, on the left side of each equation, we are left with a factor proportional to the amplitude, multiplied by ρ. But ρ is not proportional to the amplitude; only its deviation from the equilibrium density is so proportional. Hence, for small amplitudes,  ρ can be replaced by the equilibrium density, which we shall call ρ0 , which is independent of time and (in a homogeneous medium) independent of position. With these approximations, our equations of motion and continuity become

 

where, for brevity, we use an overdot to denote partial differentiation w.r.t. time (i.e., at a fixed point, not a point moving with the fluid).

Now we can eliminate v. Taking divergences in the first equation, and differentiating the second partially w.r.t. time (which can be done inside the div operator, which represents a linear combination), we get

 

so that we can equate the right-hand sides, obtaining

 

 

 

 

 

(42)

Maintaining the small-amplitude assumption, we can now consider compressibility. For small compressions in a homogeneous medium, we may suppose that the pressure change dp is some constant times the density change. It is readily verified that such a constant must have the dimension of velocity squared. So we can say  dp = c²  , where c is a constant with the units of velocity.[k] Dividing by dt gives    whence

 

 

 

 

 

(43)

Substituting from (42) then gives the desired wave equation:

 

 

 

 

 

(44)

This is the 3D classical wave equation with the sound pressure p as the wave function. For a generic wave function ψ , in a homogeneous isotropic medium, we would expect the equation to be

 

 

 

 

 

(45)

which may be written more compactly as

 

 

 

 

 

(46)

where ☐, pronounced "wave" or "box",[l] is called the D'Alembertian operator and is defined by

 

 

 

 

 

(47)

in this paper, although other conventions exist.[m]

In a static situation, the second term on the right of (47) is zero. So one advantage of definition (47), over any alternative definition that changes the sign or the scale factor, is that in the static case, the D'Alembertian is reduced to the Laplacian, making it especially obvious that in the static case, the wave equation is reduced to Laplace's equation [compare (46) and (41)]. Also notice that the D'Alembertian, being a linear combination of two linear operators, is itself linear.

Spherical waves

edit

Having established that there are wavelike time-dependent fields described by equation (45), in which the constant c has the units of velocity, we can now make an informed guess at an elementary solution of the equation. Consider the candidate

 

 

 

 

 

(48)

where  r = rr ̂  is the position vector (so that r is distance from the origin),  f  is an arbitrary function (arbitrary except that we will need to be able to differentiate it twice),  t is time, and c is a constant (and obviously ψ is not defined at the origin even if f  is.)

If, at the origin, the function f  has a certain argument at time    then at any distance r  from the origin, it has the same argument at time    which is   later  than at the origin. Hence, if f  has a certain feature (e.g., a zero-crossing) at the origin, the time taken for that feature to reach any distance  is  implying that the feature travels outward from the origin at speed c.  Another way to perceive this is to set the argument of f  equal to a constant (corresponding to some feature of the function) and differentiate w.r.t. t , obtaining    (the speed at which the feature recedes from the origin). Thus equation (48) describes waves radiating outward from the origin with speed c. [n]

Equation (48) further implies that there are surfaces over which the wave function ψ  is uniform—namely surfaces of constant r,  i.e. spheres centered on the origin. These are the wavefronts. So (48) describes spherical waves.

Because the surface area of a sphere is proportional to the square of its radius, we should expect the radiated intensity (power per unit area) to satisfy an inverse-square law (if the medium is lossless—neither absorbing nor scattering the radiated power). That does not mean that the wave function itself should satisfy an inverse-square law. In a traveling wave in 3D space, there will be an "effort" variable (e.g., sound pressure) and a "flow" variable (e.g., fluid velocity), and the instantaneous intensity will be proportional to the product of the two. If the two are proportional to each other, the instantaneous intensity will be proportional to the square of one or the other. Hence if the instantaneous intensity falls off like 1/r 2, the effort and flow variables—and the wave function, if it is proportional to one or the other—will fall off like 1/r. That suggests the attenuation factor 1/r  in (48).

But there are big if s in that argument. For all we know so far, the relation between effort and flow could involve a lag, so that the instantaneous product of the two could swing negative although it averages to something positive. And for all we know so far, the lag could vary with r, allowing at least one of the two (effort or flow) to depart from the 1/r  law, even if their average product still falls off like 1/r 2. The 1/r  factor in (48) is therefore only an "informed guess". Notwithstanding these complications, we have also guessed that the form of the function f  (the "waveform") does not change as r increases; we have not considered whether this behavior might depend on the medium, or the functional form, or the geometry.

So let us carefully verify that (48) satisfies (45) or, equivalently, (46).

As a first step, and as a useful inquiry in its own right, we find ψ from definition (4L), given that ψ is a function of (r, t) only. For the surface δS  let us start with

  • a cone (not a double cone) with its apex at the origin, subtending a small solid angle ω at the origin,
  • a sphere centered on the origin, with radius r, and
  • a sphere centered on the origin, with radius r + dr ;

and let the volume element be the region inside the cone and between the spheres, so that its enclosing surface δS  has three faces: a segment of the cone, a segment of the inner sphere with area r 2ω , and a segment of the outer sphere with area (r + dr)2ω. By the symmetry of ψ , the outward normal derivative n ψ  is equal to zero on the conical face,  +r ψ(r + dr, t) on the outer spherical face, and  r ψ(r, t) on the inner spherical face. The volume of the element is  dV = r 2ω dr. So, assembling the pieces of definition (4L), we get

 

i.e.

 

 

 

 

 

(49)

Now we can verify our "informed guess". Differentiating (48) twice w.r.t. t  by the chain rule gives

 

 

 

 

 

(50)

where each prime (′) denotes differentiation of the function w.r.t. its own argument. Differentiating (48) once w.r.t. r  by the product rule and chain rule, we get

 

 

 

 

 

(51)

Proceeding as specified in (49), we multiply this by r 2, differentiate again w.r.t. r (obtaining three terms, of which two cancel), and divide by r 2, and get

 

 

 

 

 

(52)

Then if we substitute (52) and (50) into (47), we obviously get  ψ = 0 , satisfying (46). So we have guessed correctly.

Having shown that the D'Alembertian of ψ , as given by (48), is zero everywhere except at the origin (where it is not defined), let us now find its integral over a volume V (enclosed by a surface S) that includes the origin. From (47),

 

where the second equality follows from theorem (5L). Now because the integrand on the left is zero except at the origin, any V containing the origin will give the same integral. So for convenience, let V be a spherical ball of radius R centered on the origin. Then, by the spherical symmetry of ψ , integration over S reduces to multiplication by 4πR2, and n is equivalent tor , and dV can be taken as 4πr 2dr. With these substitutions we have

 

or, substituting from (51) and (50),

 

Again noting that any V containing the origin will give the same integral, we can let R approach zero, with the result that the integral approaches  −4πf (t). This is the integral ofψ over any volume containing the origin, for ψ given by (48). Meanwhile ψ is zero everywhere except that the origin. In summary,

 

 

 

 

 

(53)

Shifting the center of the spherical waves from the origin to position r′, we get

 

 

 

 

 

(54)

We shall refer to the field given by (48) as the wave function due to a monopole source with strength f (t) at the origin. The D'Alembertian of this wave function is given by (53).[27] Hence the field whose D'Alembertian is given by (54) is the wave function due to a monopole source with strength f (t) at position r′. In each case, the D'Alembertian is zero everywhere except at the source.

Field with given D'Alembertian

edit

Now suppose that, instead of a wave source with strength f (t) at the general position r′, we have at that position a wave-source density  in an elemental volume dV′, whose contribution to the wave function ψ at position r  is

 

where for each r, the dimensions of each volume element are small compared with |r − r′|. Then the total wave function is the sum of the contributions:

 

 

 

 

 

(55)

where the integral is over all space.

Independently of the physical significance of ψ(r, t), we can take its D'Alembertian "under the integral sign" by rule (54), obtaining

 

that is,

 

 

 

 

 

(56)

Mathematically, equation (56) is an identity which applies if ψ(r, t) is given by (55). Substituting from (55) and solving for  we can write the identity in full as

 

 

 

 

 

(57)

Subject to the convergence of the integral, this shows that we can construct a wave function with a given D'Alembertian.

Physically, equation (56) gives the D'Alembertian of the wave function for a source density . It is the inhomogeneous wave equation, which applies in the presence of an arbitrary source density—in contrast to the homogeneous wave equation (46), which applies in a region where the source density is zero. In this context the word homogeneous or inhomogeneous describes the equation, not the medium (which has been assumed homogeneous and isotropic).

In a static situation, in which the D'Alembertian is reduced to the Laplacian, the inhomogeneous wave equation (56) is reduced to the form of Poisson's equation (40). As written, equation (40) is Poisson's equation in electrostatics; it applies to the charge density ρ(r), for which the scalar potential [in (39)] is

 

In electrodynamics, which takes time-dependence into account, the scalar potential due to the charge density ρ(r, t) is

 

where the wave speed c is the speed of light; this is the same as in the static case except for the delay  |r − r′| /c , indicating that the influence of the change density at r′ travels outward from that point at the speed of light. In the dynamic case, by rule (57), the D'Alembertian of the scalar potential is

 

This result is the inhomogeneous wave equation in the scalar potential—the equation which, in the electrostatic case, reduces to Poisson's equation (40).

In electrodynamics, however, the electric field  E is not simply  but  where A is the magnetic vector potential, whose defining property is that its curl is the magnetic flux density:

 

By identity (24d), this property implies

 

which is Gauss's law for magnetism. We have noted in passing—but not yet proven—that (24d) has a converse, whereby the solenoidality ofB implies the existence of the vector potential A.  Precedents suggest we might be able to prove this by finding a vector field whose curl is a delta function—perhaps through new identities relating it to a field whose divergence is a delta function—and using it to construct a vector field with a given curl. In fact we shall prove our "converse" differently, but we shall still need some new identities for the purpose. And to obtain those identities (among a comprehensive set of identities), we must take the detour that we have made a virtue of not taking until now…

Cartesian coordinates

edit

Indicial notation; implicit summation

edit

Considering that a scalar field is a function of three coordinates, while a vector field has three components each of which is a function of three coordinates, we can readily imagine that coordinate-based derivations of vector-analytic identities are likely to be excruciatingly repetitive—unless perhaps we choose a notation that concisely specifies the repetition. So, instead of writing the Cartesian coordinates as x, y, z ,  we shall usually write them as xi  where  i = 1, 2, 3 ,  respectively;  and instead of writing the unit vectors in the directions of the respective axes as i, j,k ,  we shall usually write them as ei.  And for partial differentiation w.r.t. xi , instead of writing /∂xi or even xi , we shall write i.

Now comes a stroke of genius for which we are indebted to Einstein—although he used it in a more sophisticated context!  Instead of writing the position vector as

 

or even as

 

we shall write it simply as

 

where it is understood  that we sum over the repeated index. More generally, we shall write the vector field q as

 

with implicit summation, and the vector field v as

 

with implicit summation, and so on. (By that nomenclature, the position vector in Cartesian coordinates should be, and often is, called x; but we called it r because we wanted to call its magnitude r, for radius.)

Implicit summation not only avoids writing the Σ symbol and specifying the index of summation, but also allows a summation over two repeated indices, say i and j , to be considered as summed first over i and then over j or vice versa, removing the need for an explicit regrouping of terms. Of course, if we hide messy details behind a notation, we need to make sure that it handles those details correctly. In particular, when we perform a notation on an implicit sum, we implicitly perform it term-by-term, and must therefore make sure that the operation is valid when interpreted that way.

Formulation of operators

edit

Gradient:  Putting  s = xi  in (9g), we find that the scalar component of  ∇p in the direction of each ei  is  ∂i p.  To obtain the vector component in that direction, we multiply by ei.  Assembling the components, we have (with implicit summation)

 

 

 

 

 

(58g)

or, in operational terms,

 

 

 

 

 

(58o)

or, in traditional longhand notation,

 

 

 

 

 

(58t)

It is also worth noting, from (58g), that the squared magnitude of  ∇pis

 

 

 

 

 

(58s)

where we write  i p ∂i p  rather than (i p)2  to ensure that implicit summation applies!

As reported by Chen-To Tai (1994), there are unfortunately some textbooks in which the del operator is defined as

 [sic! ]

—which, on its face, is not an operator at all, but a self-contained expression whose value is the zero vector (because it is a sum of derivatives of constant vectors). Among the offenders is Erwin Kreyszig, who, in the 6th edition of his bestselling Advanced Engineering Mathematics (1988, p. 486), misdefines the del operator thus and then rewrites the gradient of  f  as ∇ f, apparently imagining that the differentiation operators look through the constant vectors rather than at  them. Six pages later, he defines the divergence in Cartesian coordinates (which we shall do shortly) and then immediately informs us that "Another common notation for the divergence of v is ⸱ v," where is defined as before, but the resulting ⸱ v is apparently not identically zero![28] These errors persist in the 10th edition (2011, pp. 396, 402–3). Tai finds similar howlers in mathematics texts by Wilfred Kaplan, Ladis D. Kovach, and Merle C. Potter, and in electromagnetics texts by William H. Hayt and Martin A. Plonus.[29]  Knudsen & Katz, in Fluid Dynamics and Heat Transfer (1958), avoid the misdefinition of ∇, but implicitly define the divergence of V as V⸱  (which, as we have seen, is actually an operator), and then somehow reduce it to the correct expression for  div V. [30]  But I digress.

Curl and divergence:  Expressing the operand of the curl in components, and noting that the unit vectors are uniform, we can apply (8p):

 

If we sum over j first, this is

 

 

 

 

 

(59c)

or, in traditional longhand,

 

For the divergence we proceed as for the curl except that, instead of (8p), we use (8g):

 

that is,

 

 

 

 

 

(59d)

or, in traditional longhand,

 

It follows from (59c) and (59d), if it was not already obvious, that a uniform vector field has zero curl and zero divergence.

Although the above expressions for the divergence and curl will surprise many modern readers, they match the initial definitions of the divergence and curl given by the founder of vector analysis as we know it, J. Willard Gibbs (1881, § 54). Gibbs even uses the ∇ ×  and   notations on the left sides of the defining equations, and only after  the equations (albeit immediately after) does he announce that  " ∇ω is called the divergence of ω  and ∇ ×ω  its curl." (He uses Greek letters for vectors.) Our notation and Cartesian expression for the gradient (58g) also match Gibbs (1881, § 52). Hence, using the Gibbs notations, we can merge definitions (58g), (59c), and (59d) into the general Cartesian formula

 

 

 

 

 

(60)

(with implicit summation), where the  operator may be a null (for the gradient), a cross (for the curl), or a dot (for the divergence).

Gibbs does not offer any justification for the ∇ ×  and   notations, but nor is it difficult to find such a justification based on his definitions. As ei is a uniform vector, we can rewrite (59c) rigorously as

 

 

 

 

 

(61)

and thence operationally as

 

 

 

 

 

(61o)

or, recalling (58o),

 

which can be evaluated in the usual manner as

 

where qx is the x component of q, etc. This indeed is how one evaluates the curl of a given field in Cartesian coordinates, although we shall find (59c) more convenient for deriving identities. Similarly, we can rewrite (59d) rigorously as

 

 

 

 

 

(62)

and thence operationally as

 

 

 

 

 

(62o)

or, recalling (58o),

 

For evaluating the divergence of a given field, however, we simplify (62) to

 

or, in traditional longhand,

 

although we shall find (59d) more convenient for deriving identities. But the longhand form makes it especially obvious that ifr is the position vector,

 

 

 

 

 

(62r)

Notice that we can get from (62o) back to (59d) by permuting the i with the dot, and from (61o) back to (59c) by permuting the i with the cross, as if the differentiation operator could, as it were, look through the dot or the cross—or, as Gibbs's student Edwin B. Wilson puts it, "pass by" the dot and the cross, yielding Gibbs's original definitions.[31] Hence Wilson considers it helpful to regard Gibbs's   and ∇ ×  notations as "the (formal) scalar product and the (formal) vector product" or "the symbolic scalar and vector products" of  ∇… and the operand, and to regard as a "symbolic vector".[32]

Tai (1994, 1995) rejects Wilson's argument together with the entire tradition of treating ∇ ×  and   as compound operators. Of formal products, Tai says that the concept "has had a tremendously detrimental effect upon the learning of vector analysis"; he calls such a product a "meaningless assembly".[33] Of the "pass by" step, he complains that "standard books on mathematical analysis do not have such a theorem."[34]

I submit, however, that the intermediate steps (61) and (62), after which we take the constant multiplier outside the operator (eqs. 61o & 62o), support Wilson's "pass by" argument; and if that is not enough support, the reader may write out the sums on the right-hand sides of (59c) and (59d) and verify that they agree with ∇ × q  and ⸱ q  respectively. I further submit that the great generality of our derivation of equations (14), above, compels us to treat the ∇ ×  and   notations as more than mere notations. That being said, I shall find some points of agreement with Tai, and some reasons to criticize Wilson.

Laplacian:  If ψ is a scalar field, then

 

that is,

 

 

 

 

 

(63L)

where we write  i ∂i  rather than i2  in order to maintain implicit summation. In traditional longhand, (63L) becomes

 

or, in operational terms,

 

or, by comparison with (58t),

 

—as expected.

By the linearity of the Laplacian, the same applies if ψ is any field expressible in terms of a uniform basis. For example, if ψ is a vector field given by  ψjej  (with implicit summation), then

 

where the third line follows from (63L) as applied to a scalar field. Thus (63L) is quite general.

After listing theorems (5g) to (5L) above, we gave reasons for describing ∇, curl, and div as differential operators, and as a 2nd-order differential operator—the implication being that the others are only 1st-order. We now have the promised "additional reason" for these descriptions: when expressed in Cartesian coordinates, the operator involves second derivatives, while the others involve (only) first derivatives. In the meantime we have acquired the q⸱ operator, which is also 1st-order, as we shall now confirm.

Advection, directional derivative, etc.:  If ψ is a scalar field, then

 

In this double summation, the only non-zero terms are those for which  j = i ,  in which case  ei⸱ ej = 1.  So we have

 

 

 

 

 

(64)

or, in operational terms,

 

 

 

 

 

(64o)

or, in traditional longhand,

 

which indeed is the "formal" or "symbolic" dot-product of  q and.  By the linearity of the directional derivative in (11), the same result applies if ψ is a vector field or any field expressible in terms of a uniform basis. In particular, if r is the position vector, we have

 

i.e.,

 

 

 

 

 

(64r)

—which is also deducible from (11).

For convenience in the following discussion, we shall refer to the scaled-directional-derivative operator q⸱ as an "advection" operator although, physically, it represents advection only if q is the material velocity.

Identities without pain

edit

In deriving the Cartesian expressions for the gradient, curl, divergence, Laplacian, and advection operators, we used the preceding identities (9g), (8p), (8g), (9L'), and (11) respectively, the last being a definition generalizing (9g). Thus we could have derived the Cartesian expressions quite early in the exposition, although we did not find that option convenient. The other vector-analytic identities that we have previously mentioned are:

  • (8c), which showed the unambiguity of the curl;
  • (8q), which has a question mark after it;
  • (17), a product rule for the divergence, which is yet to be proven as a general identity;
  • (24c) and (24c), concerning "curl grad" and "div curl"; and
  • the identities showing that we can construct a field with a given divergence (36), Laplacian (38), or D'Alembertian (57).

The above list exposes the following shortcomings:

  • we have not yet investigated "grad div" and "curl curl";
  • we have only one product rule —the unverified identity (17)—in which both factors are spatially variable fields; this needs to be verified and identities (8c) and (8p) need to be generalized;
  • our collection of product rules does not yet include the curl of a cross-product, or the gradient of a dot-product or of a product of scalars, or the advection of a product; and
  • we do not yet have any chain rules involving ∇, curl, or div.

With the aid of the Cartesian forms of the various operators, we may now fill these gaps.


The "grad div" and "curl curl" operators turn out to be related:

 

whence expanding the vector triple product gives

 

In the first term on the right, we can switch the order of partial differentiation; and in the second term—which, like the first, is a double summation—the only non-zero contributions are those for which  j = i  and  ei⸱ ej = 1.  So we have

 

that is,

 

 

 

 

 

(65)

This result may be memorized as "curl curl is grad div minus del squared " and written as

∇ × (∇ × q) ≡ ∇ ∇⸱ q − ∇2q ,

 

 

 

 

(66)

which looks like the expansion of a vector triple product; and the key step in the above derivation, based on the Gibbs definitions of the operators, really is  the expansion of a vector triple product.


We now turn to product rules in which neither factor is assumed uniform.

The curl of a cross-product is

 

i.e.,

 

 

 

 

 

(67c)

The divergence of a cross-product, as we might expect, is simpler:

 

i.e.,

 

 

 

 

 

(67d)

In particular, in electromagnetics,  div(E × H) ≡ H ⸱ curl EE ⸱ curl H ;  this is the identity on which Poynting's theorem is based. But if  b in (67d) is uniform, then (67d) reduces to (8c).

The gradient of a dot-product, by comparison, is surprisingly messy:

 

Now the first term on the right can be recognized as  a × (ei × ib) + a⸱ eiib;  that is,  a × (ei × ib) + ai ∂ib;  that is,   .  Similarly, the second term is   .  Thus we have

 

 

 

 

 

(68)

For uniformb , the first and third terms on the right vanish, and we can solve for the first term on the right, obtaining

 [ for uniform b] ,

so that we can now drop the question mark after (8q). If we write the curl operator as  ∇ × ,  the last equation [or (8q)]  looks like the expansion of a vector triple product; but the identity is valid only for uniform b.

The gradient of a product of scalars, unlike that of a dot-product, is as simple as the product rule for ordinary differentiation:

 

that is,

 

 

 

 

 

(69)

The advection of a product is equally simple, regardless of the type of product, except that the order of a cross-product matters. Let ψ and χ be scalar or vector fields, and let ψ ∗χ denote any meaningful product of the two. Then, by (64),

 

that is,

 

 

 

 

 

(70)

The q⸱ operator is a scalar operator in the sense that it maps the operand field to a field of the same order—a scalar field to a scalar field, a vector field to a vector field, a matrix field to a matrix field, etc.— as if  it were multiplication by a scalar or differentiation w.r.t. a scalar; and indeed a differentiation w.r.t. path length appears in the coordinate-free definition (11) of the operator. Moreover, we did not need coordinates to obtain rule (70); as the reader may verify, the same rule can be obtained directly from the definition (11) in a similar manner. From these points of view, the simplicity of the rule is unsurprising.

The curl of the product of a scalar and a vector is

 

that is,

 

 

 

 

 

(71c)

For uniform b , this reduces to (8p), which was used to derive the Cartesian form of the curl (59c).

For the divergence of the product of a scalar and a vector, we proceed likewise except that we use a dot instead of a cross. The result is

 

 

 

 

 

(71d)

which has the same form as (17), delivering the promised confirmation that (17) is an identity. For uniform b ,  (71d) reduces to (8g), which was used to derive the Cartesian form of the divergence (59d).

That exhausts the first-order product rules. For future reference, however, we shall also derive one second-order rule.

The Laplacian of the product of a scalar and a vector, by (63L), is

 

In the middle term, by (58g), i p  is the i th component of  ∇p  so that, by (64o),  i p ∂i  is the q⸱ operator for  q = ∇p.  So we have

 

 

 

 

 

(72)


Finally we turn to chain rules — especially the simple cases of the gradient, curl, divergence, advection, and Laplacian of a function of a scalar field u. As usual, let p denote a scalar field, q a vector field, and ψ a generic field.

Gradient ⧸ curl ⧸ divergence of a function of a scalar:  By the general Cartesian formula (60) and the chain rule for i ,

 

i.e., by (58g),

 

 

 

 

 

(73)

In particular, if is a null,

 

 

 

 

 

(73g)

and if is a cross,

 

 

 

 

 

(73c)

and if is a dot,

 

 

 

 

 

(73d)

Advection of a function of a scalar:

 

i.e.,

 

 

 

 

 

(73q)

This fits into the pattern set by (73) in that the gradient operator in (73g) is replaced by an advection operator.

Of the last four results, only (73c) is dependent on the order of the  product; the others could equally well be written

 

 

 

 

 

(73z)

The Laplacian of a function of a scalar departs from the above pattern.

 

where the last line follows from the product rule for i  and, in the second term, the chain rule fori.  In that second term, the implicit sum  i u ∂i u  can be recognized as  |u|2  by (58s). So we have

 

 

 

 

 

(74)

Multivariate chain rule:  The foregoing chain rules involve one intermediate function of one scalar variable. It will be useful to have an elementary chain rule that can handle more than one of each. Let p(r) be a smooth scalar field, and let r in turn be a smooth function of several variables, one of which, say t , is allowed to vary while the others are held constant, so that r changes by dr when t changes by dt. Then dividing (26g) by dt  gives

 

or, in indicial Cartesian coordinates with implicit summation,

 

or, in traditional longhand,

 

This is the desired multivariate chain rule for a scalar function of three intermediate real variables. The assumption that these variables are Cartesian coordinates is not a loss of generality, because any three real quantities can be suitably scaled and represented by perpendicular axes, so that any scalar function of them becomes a function of position, to which (26g) applies; and then the scaling can be reversed without changing the products in the last equation. Moreover, by the linearity of t, the scalar field p may be replaced by any field expressible in terms of a uniform basis. For example, for a vector field q,

 

where the third line is obtained by applying the multivariate chain rule for a scalar field. Thus, for a generic field ψ ,

 [ for generic ψ and xi].

 

 

 

 

(75)

Gradient ⧸ curl ⧸ divergence of a function of a scaled position vector:  We end this subsection by deriving a lemma for use in the next subsection. If k is a uniform scalar and r is the position vector,

 

where the third expression is obtained by from the second by multiplying each denominator (change in xi) by k  and compensating. But now we have

 

 

 

 

 

(76)

where the vertical bar and subscript indicate that the gradient, curl, or divergence is evaluated at kr. We shall be interested in the curl (for which is a cross).

Field with given curl

edit

Consider the vector field

 

 

 

 

 

(77)

where q is a solenoidal  vector field and r is the position vector. By identity (67c),

 

where, by hypothesis, div q  is zero. Applying identities (62r) and (64r) then yields

 

In the special case in which q is the angular velocity ω of a rigid body about an axis through the origin,  v is the velocity field (ω × r) and ω is uniform, so that the last result reduces to  curl v = 2ω; that is, the vorticity is twice the angular velocity. As the vorticity in this case is uniform and therefore independent of position relative to the axis, it does not change if the axis is shifted, provided that the angular velocity has the same magnitude and direction. And because a uniform velocity field has zero curl, the vorticity is also unchanged if a translational motion is superposed on the rotation. This is the most direct connection that we have seen between curl and rotation. But again I digress.

Returning to the more general case in which q is not necessarily uniform, but merely solenoidal,[35] we have

 

to which we can apply our lemma (76) with a uniform real factor t , obtaining

 

On the left we can recall (77); and on the right we can apply (11), noting that the magnitude of |r| is r , which measures distance in the direction of r. Thus we obtain

 

Now if the direction of r is held constant,  q(tr) is a function of tr ; and in general  r ∂r f (tr) = t ∂t f (tr).  So we have

 

Integrating w.r.t. t  from 0 to 1 gives

 

that is,

 [ for solenoidal q].

 

 

 

 

(78)

Thus for any solenoidal vector field q  we can construct a vector potential—that is, a field whose curl is q; such a field is given by the integral on the right. This is the long-promised proof of the "converse" of identity (24d). Of course the vector potential is not unique, because any conservative field—but only a conservative field—can be added to it without changing its curl. Hence the existence of one vector potential implies the existence of infinitely many. The above integral gives us one.

The proof of (78) assumes that q is solenoidal not only at position r , but also at tr  where  0 ≤ t ≤ 1, i.e. at every point on the line-segment from the origin to r.  A star-shaped region is one that contains an point O  such that for every point P in the region, the line-segment OP is entirely contained in the region. We may choose any such O  as the origin in the proof of (78). So the proof tells us that if a vector field is solenoidal within a star-shaped region, it has a vector potential in that region. As a special case, a vector field that is solenoidal everywhere has a vector potential everywhere.

Notes on the curl of the curl

edit

Identity (65), namely

 

("curl curl is grad div minus del squared"), has at least three implications worth noting here.

First, it can be rearranged as

 

 

 

 

 

(79)

("del squared is grad div minus curl curl"). This would serve as a coordinate-free definition of the Laplacian of a vector, if we did not already have one.[36] But we do: we started with a coordinate-free definition (4L) for a generic field, established its unambiguity via (9L), and found its Cartesian form (63L), which we used in the derivation of (79). Wherever we start, we may properly assert by way of contrast that the Laplacian of a vector  is given by (79), whereas the Laplacian of a scalar  is given by the divergence of the gradient. But we should not conclude, as Moon & Spencer do, that representing the scalar and vector Laplacians by the same symbol is "poor practice… since the two are basically quite different",[37] because in fact the two have a common definition which is succinct, unambiguous, and coordinate-free: the Laplacian (of anything) is the closed-surface integral of the outward normal derivative, per unit volume.[o]

Second, by reason of identity (38) and the remarks thereunder, a given vector field v can be written

 

where the integral is over all space. So, subject to the convergence of the integral, there exists a vector field q such that

 

that is, by (79), there exists q such that

 

that is, there exist a scalar field, say  and a vector field, say Ψ, such that

 

(namely    and  Ψ = − curl q). In short, subject to the convergence of the said integral,

  • a given vector field can be resolved into [minus] a gradient plus a curl.

Such a resolution is called a Helmholtz decomposition, and the proposition that it exists is the Helmholtz decomposition theorem. Of course the gradient is irrotational and the curl is solenoidal so that, subject to the same convergence,

  • a given vector field can be resolved into an irrotational field plus a solenoidal field.

This is a second statement of the theorem, and follows from the first. And the first follows from the second because an irrotational field has a scalar potential by (29) and a solenoidal field has a vector potential by (78).

Third, if q is solenoidal, the term  ∇ div q  in (65) or (79) vanishes. Hence for a solenoidal field, the curl of the curl is minus the Laplacian. For example, in the dynamic case, in a vacuum, the Maxwell–Ampère law says that   .  Multiplying this by the physical constant μ0 (called the vacuum permeability or simply the magnetic constant) gives    whence

 

But, by Gauss's law for magnetism, B is solenoidal so that, by (65), the left-hand side of the above is   .  And by Faraday's law,    so that   .  Making these substitutions, we get    i.e.

 

By comparison with (45), this is the wave equation with

 

Thus the Maxwell–Ampère law, Gauss's law for magnetism, and Faraday's law, with the aid of (65), predict the existence of electromagnetic waves together with their speed.

For these reasons, especially the last, one could hardly overstate the importance of identity (65).

Digression: Proofs from formal products

edit

We have seen that Wilson (1901, pp. 150, 152) interprets the divergence and curl as "formal" or "symbolic" scalar and vector products with the operator.  C.-T. Tai, in his 1995 report (pp. 26–9), alleges that this interpretation began with Wilson and not with Gibbs. Here I shall submit, on the contrary, that while the terminology may not be attributable to Gibbs, the concept certainly is.

Later in the same report, Tai confuses the picture by citing the first volume of Heaviside's Electromagnetic Theory (1893), where Heaviside, although his notations for the scalar and vector products differ from those of Gibbs, nevertheless considers the operator as a factor in such products. Tai continues:

At the time of his writing he [Heaviside] was already aware of Gibbs' pamphlets on vector analysis but Wilson's book was not yet published. It seems, therefore, that Heaviside and Wilson independently introduced the misleading concept for the scalar and vector products between and a vector function. Both were, perhaps, induced by Gibbs' notations for the divergence and the curl. Heaviside did not even include the word 'formal' in his description of the products.[38]

Whereas it was quite in character for Heaviside to treat an operator that way, the word "independently" would have surprised Wilson and is contradicted by Tai himself, who observes that Wilson's preface acknowledges Heaviside.[39] In Wilson's own words:

By far the greater part of the material used in the following pages has been taken from the course of lectures on Vector Analysis delivered annually at the University [Yale] by Professor Gibbs. Some use, however, has been made of the chapters on Vector Analysis in Mr. Oliver Heaviside's Electromagnetic Theory (Electrician Series, 1893) and in Professor Föppl's lectures on Die Maxwell'sche Theorie der Electricität (Teubner, 1894). ....

Notwithstanding the efforts which have been made during more than half a century to introduce Quaternions into physics the fact remains that they have not found wide favor.[p] On the other hand there has been a growing tendency especially in the last decade toward the adoption of some form of Vector Analysis. The works of Heaviside and Föppl referred to before may be cited in evidence. As yet however no system of Vector Analysis which makes any claim to completeness has been published. In fact Heaviside says: "I am in hopes that the chapter which I now finish may serve as a stopgap till regular vectorial treatises come to be written suitable for physicists, based upon the vectorial treatment of vectors" (Electromagnetic Theory, Vol. I., p. 305). Elsewhere in the same chapter Heaviside has set forth the claims of vector analysis as against Quaternions, and others have expressed similar views.[40]

Most damaging to Tai's thesis, however, is Gibbs's original pamphlet, a copy of which Heaviside received from Gibbs himself in June 1888.[41] Sections 62 to 65 of the pamphlet appear under the heading

∇,  , and∇ ×applied to Functions of Functions of Position.

In § 62, Gibbs says that a constant scalar factor after such an operator may be placed before it (that is, taken outside the operator). In § 63 he states our rule (73g) for the gradient of a function of a scalar field. His next section (in which I have bolded the vector field ω) is worth quoting in full:

64.  If u or ω is a function of several scalar or vector variables, which are themselves functions of the position of a single point, the value of  u or ⸱ ω  or ∇ × ω  will be equal to the sum of the values obtained by making successively all but each one of these variables constant.

This proposition is a generalized product rule in the sense that the "function of several scalar or vector variables" may be, but is not restricted to, any sort of product of those variables. Gibbs continues:

65.  By the use of this principle, we easily derive the following identical equations:

Six "equations" follow. The first says that the gradient operation is distributive over addition, and the second says the same of the divergence and curl (on one line). The last four are our identities (69), (71d), (71c), and (67d), in that order (albeit with different symbols). Gibbs then remarks (with my italics):

The student will observe an analogy between these equations and the formulæ of multiplication. (In the last four equations the analogy appears most distinctly when we regard all the factors but one as constant.) Some of the more curious features of this analogy are due to the fact that the contains implicitly the vectors i , j , and k , which are to be multiplied  into the following quantities.

Indeed, if the first  factor is constant, identities (69), (71d), (71c), and (67d) become

 

whereas if the second  factor is constant, they become respectively

 

All eight equations look like rearrangements of products involving a vector.  [Concerning the last three equations, we have made that observation before; see (15) above.]  But only seven of the eight are explained by taking the constant outside the operator (as in § 62); the exception is the fourth, in which the minus sign is not explained by that step alone, but is explained by the change in the cyclic order of the formal triple product. And if we add the two right-hand sides corresponding to each of the four left-hand sides, we get the identities in which both factors are variable—as claimed in § 64.

If § 65 leaves any doubt that Gibbs approved of formal products with the symbolic vector (albeit without using those terms), this is dispelled by § 166, where he writes:

166.  To the equations in No. 65 may be added many others…

followed by a list of seven identities terminated by "etc." Six of the seven are beyond the scope of the present paper,[q] while the third of the seven is our (67c). After the list comes the smoking gun (§ 166, continued):

The principle in all these cases is that if we have one of the operators  ∇,  , ∇ ×  prefixed to a product of any kind, and we make any transformation of the expression which would he allowable if the were a vector, (viz: by changes in the order of the factors, in the signs of multiplication, in the parentheses written or implied, etc.,) by which changes the is brought into connection with one particular factor, the expression thus transformed will represent the part of the value of the original expression which results from the variation of that factor.

The italics are mine, but I have refrained from italicizing those instances of the word "factor" which are not applicable to. In particular, at the stage when "the is brought into connection with one particular factor," the "part of the value… which results from the variation of that factor" evidently means the term of the sum in § 64 —which, as we have noted, amounts to a generalized product rule. But, according to the stated "principle', we reach that stage by treating as a factor. I rest my case.


Wilson (1901, p. 157) gives a comprehensive list of sum and product rules for the gradient, divergence, and curl, and properly states (p. 158) that the rules may be proven "most naturally" from Gibbs's definitions of the operators—our equations (58g), (59d), and (59c). Understandably, Wilson uses a Σ sign rather than implicit summation. Less understandably, and less fortunately, he does not sum over a numerical index; e.g., he defines the curl operator as

 [sic]

and explains that "The summation extends over x, y, z."  With these definitions he proves our identities (71c) and (68) essentially as we have done, but inevitably with greater difficulty, which may explain why he then says "The other formulæ are demonstrated in a similar manner" before reverting to Gibbs's strategy of varying one factor at a time. He announces (p. 159) that the variable held constant will be written as a subscript after the product, and he combines this notation with his Σ notation in a rigorous proof that varying one factor at a time is valid for our (68), i.e. the gradient of a dot-product. Noting that this result is analogous to

 

he then jumps to the conclusion that varying one factor at a time is valid for all  of his product rules—notwithstanding that a small change in a vector is not related to its divergence or curl as a small change in a scalar is related to its gradient.

That per saltum  conclusion is his cue to go formal and symbolic. To obtain the curl of a cross-product [as in our (67c)], he "formally" expands a vector triple product to obtain the curl when the first factor is constant, states the curl when the second factor is held constant, and adds the two partial curls (Wilson, 1901, p. 161). Next he gives various arrangements of our (8q), except that he presents the first vector not as strictly uniform, but as merely held constant for the gradient operation. He states in passing that a proof may be effected by "expanding in terms of  i , j, k"; but instead of such a proof, he offers a "method of remembering the result" by expanding the "product"  u × (∇ × v)  "formally as if  ∇, u , v  were all real vectors" (pp. 161–2). Concerning the curl of the gradient, and the divergence of the curl (pp. 167, 168), he recommends expanding in terms of  i , j, k ,  but does not elaborate. Concerning the curl of the curl, however, he shows what would happen if it were "expanded formally according to the law of the triple vector product" (p. 169).

In defense of the "formal product" method, we should note that the operators x , y , and z are linear, so that they are distributive over addition and may be permuted with multiplication by a constant, as if the operators themselves were multipliers (like components of vectors). They may be similarly permuted with other like operators—explaining why the formal-product method correctly deals with the curl of the gradient, the divergence of the curl, and the curl of the curl. But such an operator cannot be permuted with multiplication by a variable, because then the product rule of differentiation applies, yielding an extra term. The formal-product system responds to this difficulty by generalizing the product rule as in §§ 64 & 166 of Gibbs (1881–84). As Borisenko & Tarapov put it (1968, p. 169),

the operator acts on each factor separately with the other held fixed. Thus should be written after any factor regarded as a constant in a given term and before any factor regarded as variable.

In this they differ inconsequentially from Gibbs, who requires that the operator be "brought into connection" with the factor considered variable.

To illustrate, let us find the gradient of a dot-product, essentially in the manner of Borisenko & Tarapov (1968, p. 180), quoted by Tai (1995, p. 46; the next five equation numbers are Tai's). In this case the generalized product rule gives

 

 

 

 

 

(7.26)

where the subscript c marks the factor held constant during the differentiation.[r] By the algebraic identity

 

 

 

 

 

(7.27)

i.e.

 

we can say

 

 

 

 

 

(7.28)

Similarly,[42]

 

 

 

 

 

(7.29)

Substituting (7.28) and (7.29) into (7.26), in which the order of the dot-products is immaterial, and dropping the c subscripts (because they are now outside the differentiations), we get the correct result

 

 

 

 

 

(7.30)

corresponding to our (68).

Tai (1995, p. 47) is unimpressed, asking why we cannot apply (7.27) directly to the left side of (7.26). The answer to that is obvious: on the left side, the operator is applied to a product of two variables, and the variations of both must be taken into account. But there is a harder question which Tai does not ask: in (7.28), why can't we have ⸱Ac instead of Ac ?  Because that would make the term vanish? Yes, it would; but, as there is only one variable factor on the left side, why do we need two terms on the right? Because the rule says should be written after the constant but before the variable? Yes, but that rule serves the purpose of varying each variable, whereas there is only one variable to vary on the left of (7.28). The same issue arises in (7.29). We cannot settle the question even by appealing to symmetry. Obviously the right side of (7.30), like the left, must be unchanged if we switch A and B; and indeed it is. But if the first term on the right of (7.28) and of (7.29) were to vanish, the necessary symmetry of (7.30) would be maintained.

For another example of the same issue, consider the following two-liner offered by Panofsky & Phillips (1962, pp. 470–71) and rightly pilloried by Tai (1995, pp. 47–8):

 

If the first line were right, the authors would hardly bother to continue; but evidently it isn't, because it doesn't begin by "varying one factor at a time". The second line does not follow from the first and includes divergences of constants, which ought to vanish but somehow apparently do not. Let's try again, this time sticking to the rules:

 

in agreement with our (67c). Here the first line comes from the generalized product rule, and the third is obtained from the second by rearranging terms and dropping the (now redundant) subscripts. The interesting line is the second, which is obtained from the first by expanding the formal vector triple products. But again, why must we have Ac and Bc∇, instead of ⸱Ac and ⸱Bc , which would make the middle two terms vanish? Again symmetry does not give an answer. The right-hand side, like the left, must change sign if we switch A and B; but the disappearance of the Ac and Bc terms would maintain the required (anti)symmetry. Funnily enough, the result would then agree with the incorrect first line given by Panofsky & Phillips (above). But then how would we know that it is incorrect?

The foregoing examples show that "formal product" arguments can be tenuous, even on their own terms. Before these examples, we might have been troubled by the omission of a general proof of the "generalized" product rule. After them, we might wonder whether the rule is even well defined.

I submit, however, that none of this matters. I submit that the popularity of using "formal products" with the del operator, in derivations of vector-analytic identities, is a reaction to the failure of early writers to use indicial notation in the Cartesian definitions of differential operators.[s] The ensuing proliferation of terms in coordinate-based derivations led authors to seek shortcuts through "formal products" when more rigorous but no-less convenient shortcuts could have been taken through indicial notation, especially in combination with implicit summation. Our derivation of the gradient of a dot-product (68) is shorter than that of Borisenko & Tarapov, and even uses the right-hand sides of their identities (7.28) and (7.29), but obtains them rigorously with no ambiguity and no c  subscripts. Our derivation of the curl of a cross-product (67c) takes six lines with a single column of "=" signs. Our subsequent formal-product derivation (not to be confused with the attempt of Panofsky & Phillips) seems to take only three lines; but it is only through our earlier indicial derivation that we have any confidence in our result (not to be confused with the result of Panofsky & Phillips). Our other indicial derivations of identities are shorter than the two just mentioned. Having amassed so comprehensive a collection of identities so rigorously with so little effort, I submit that the use of formal products, Wilson subscripts, c subscripts, and Feynman subscripts for this purpose is a historical aberration, to be deciphered in other people's writings but avoided in one's own.

That being said, it is one thing to conclude, as Tai duly does, that the del-cross and del-dot notations should not be interpreted as products in derivations and proofs, and another thing to allege, as Tai also does (1995, p. 22), that    and ∇ ×  are "not compound operators" but only "assemblies", or in other words that " ∇ is not a constituent of the divergence operator nor of the curl operator." Against the latter proposition, our equations (14), (61o), and (62o) have been derived, not merely defined, and our derivation of (14) is as general as we could wish. Moreover, whereas (61o) and (62o) are for Cartesian coordinates, we shall see that they have counterparts in more general coordinates.

General coordinates

edit

From our initial definitions of the differential operators, we derived certain identities, from which we derived expressions for the operators in Cartesian coordinates, from which we derived more identities, from which we now hope to derive expressions for the operators in other coordinate systems. Cartesian coordinates are traditionally called x, y, z,  which we renamed xi  where  i = 1, 2, 3 ,  respectively. The best-known 3D non -Cartesian coordinate systems are the cylindrical coordinates (ρ, φ, z) and the spherical coordinates (r, θ, φ); we have already seen r  in the guise of the magnitude of the position vector r.  But now we want our coordinate system to be as general as possible—with the Cartesian, cylindrical, and spherical systems and many others, and even classes of systems, as special cases.

Natural and dual basis vectors

edit

We shall call our general coordinates ui  where  i = 1, 2, 3 ;  yes, for reasons which will emerge, we shall write the coordinate index as a super script. But we shall write i  for /∂ui ,  relying on context to distinguish it from the special case /∂xi.  By describing the ui  as coordinates  we mean that for some domain of interest, the position vector is a smooth function

 

which possesses partial derivatives w.r.t. its arguments. We also mean that for every position vector in the resulting range, there is only one ordered triplet  (ui ) = (u¹, u², u³),  so that we can think of each coordinate as

 

—that is, we can think of each ui  as a scalar field, which possesses a gradient.[t]

These two properties of coordinates respectively suggest two simple ways of choosing basis vectors related to the coordinates: we shall define the natural basis vectors as

 

 

 

 

 

(80a)

and the dual basis vectors as

 

 

 

 

 

(80b)

(We could normalize the natural basis vectors by dividing them by their magnitudes to obtain unit vectors; but, for the moment, we won't bother.) Just as we may think of each ui as a scalar field and inquire after its directional derivative or its gradient or its Laplacian, so we may think of each hi or hi as a vector field and inquire after its directional derivative or its curl or its divergence or its Laplacian—because these properties, in combination with our stock of identities, might help us to express operators in general coordinates.

In Cartesian coordinates,  hi and hi are both equal to the unit vector ei; thus, in Cartesian coordinates, the natural basis vectors are their own duals.  In general coordinates,  hi and hi may differ in both direction and magnitude and are not generally unit vectors. Nevertheless, even in general coordinates, there is a simple relation between the natural and dual basis vectors. Consider the dot-product

 

Ifi ≠ j , then ir , being in a direction in which ui varies while each other u j does not, is tangential to a surface of constant u j and therefore normal to u j, so that the dot-product is zero. But by (26g),

 

and if we vary r by varying ui while holding each other u j constant, we can divide by dui and obtain

 [with no summation].

Putting the two cases together, we have

 

 

 

 

 

(81)

where the right-hand function, known as the Kronecker delta function, is defined by

  

 

 

 

 

(82)

Obviously the function is symmetric: the indices i and j  can be interchanged. If two lists of vectors are related so that the dot-product of the i th vector in one list and the j th in the other is δij, the two lists are described as reciprocal. Thus the triplets (hi) and (hi) are reciprocal bases: the dual basis is the reciprocal of the natural basis and vice versa. Hence, taking the natural basis as a reference, the dual basis is sometimes called "the" reciprocal basis.

In Cartesian coordinates, (81) becomes

 

So we have a relation for general coordinates (81) which is just as simple as its special case for Cartesian coordinates, provided that we use the natural basis for one factor and the dual basis for the other. This will be a recurring pattern.

We have deduced the reciprocity relation (81) from prior definitions of the natural basis (hi) and the dual basis (hi).  This result has a partial converse, in that a reciprocity relation between bases is enough to define either basis in terms of the other—as we shall see later. But first we proceed to components of vector fields.

Contravariant and covariant components

edit

A coordinate grid is a set of intersecting curves such that on each curve, one coordinate varies while the others are constant. If we could somehow inscribe such a grid in an elastic medium, and then stretch and rotate the medium, the natural basis vectors hi given by (80a) would stretch and rotate with the medium. Accordingly, the natural basis is also called the covariant basis. But according to (81), the dot-product of a natural basis vector and a dual basis vector is invariant (independent of the coordinate system), so that the variation of one factor compensates  for the variation of the other. So, as the natural basis is "covariant" with the coordinate system, we say that the dual basis is contravariant with the coordinate system. Notice that the co variant factor has a sub script index (easily remembered because "co  rhymes with low ") whereas the contra variant factor has a super script index, and that one kind of variation must combine with the other in order to produce an in variant result; these will be recurring patterns.

A vector field q may be expressed in components w.r.t. the natural (covariant) basis as

 

 

 

 

 

(83a)

with summation, or in components w.r.t. the dual (covariant) basis components) as

 

 

 

 

 

(83b)

with summation. If q is to be invariant (a true vector, existing independently of the coordinate system), the components must be contravariant in the former case and covariant in the latter, and accordingly are written with superscripts and subscripts respectively. In Cartesian coordinates, the two bases are the same, so that the components w.r.t. the two bases are also the same; that's why, in the above section on Cartesian coordinates, we got away with writing component indices as subscripts. We shall see later that we can do this with a somewhat wider class of coordinates than Cartesian. In general coordinates, however, the basis vectors have subscripts and the components have superscripts or vice versa, so that the index of implicit summation appears once as a superscript and once as a subscript.

If a particular ui has a particular name, such as θ or φ, then, if we're not using indexed summation, we may find it convenient to write that name in place of the index i  in the superscript or subscript.

At the present level of generality, the basis vectors hi , unlike their Cartesian counterparts ei , are not  assumed to be uniform (i.e., homogeneous). One consequence of this general non-uniformity (inhomogeneity) is that, although we can say  r = xiei  in Cartesian coordinates and  q = qihi  in general coordinates, we cannot say

 [sic! ]

in general coordinates. Indeed the last equation is not necessarily even meaningful, because the terms in the implicit sum may not have the same units. For example, we have seen that in spherical coordinates the position vector r is simply rr ̂ = rhr;  it is notrhr + θhθ + φhφ , because θ and φ are encoded in the direction ofhr (and you can't add a distance and two angles, or a linear displacement and two angular displacements). Similarly, in cylindrical coordinates the position vector r is ρhρ + zhz;  it is notρhρ + φhφ + zhz , because φ is encoded in the direction ofhρ (and you can't add two distances and an angle, or two linear displacements and an angular displacement). In both examples, encoding one coordinate in the direction of another coordinate's unit vector is circular in that the said direction depends on the position vector, which is the very thing that we want to represent.

A non-uniform basis is not a global  basis. It cannot give a uniform representation of a uniform vector field, because the standard of representation changes; it is like having a compass whose orientation varies from place to place and/or a measuring stick whose length varies from place to place. But it can serve as a local basis —as in (83a) and (83b), each of which expresses a vector field at a given location in terms of a basis at that location, notwithstanding that the basis may be different at other locations. And although a local basis (as we have just seen) cannot generally represent the position vector in a non-circular manner, it can  represent a change  in the position vector. By the generality of the multivariate chain rule (75),

 

Dividing by dt  we get

 

 

 

 

 

(84)

or, substituting from (80a),

 

 

 

 

 

(85)

Thus the small changes in the coordinates ui  are the components of the true vector dr w.r.t. the covariant  basis. That means the changes in the coordinates must be contravariant. Here at last is the explanation why we write general coordinates with superscript indices. And again the point is moot for Cartesian coordinates, for which the covariant basis is also contravariant.

Since dui  is contravariant,  ir  in (84) must be covariant in order to yield the true vector dr. This vindicates our decision to write i with a subscript. Recall, however, that i  means /∂ui. Thus the derivative w.r.t. the contravariant quantity is covariant —wherefore it is said that a superscript in the denominator of a derivative counts as a subscript in the derivative as a whole.

In (85), the general term  hidui (not the sum) is the displacement ofr due to the small change dui in the coordinate ui. The three such displacements ofr make concurrent edges of a parallelepiped whose signed volume is

 

that is,

 

 

 

 

 

(86)

where

 

or, to use a standard abbreviation for the scalar triple product,

 

 

 

 

 

(87)

J  is called the Jacobian of the natural (covariant) basis. We describe the basis and the associated coordinate system as right-handed if this Jacobian is positive, and left-handed if this Jacobian is negative. Thus the handedness depends on the standard order in which we write the vectors; e.g., the standard Cartesian basis is right-handed because we write it as (i, j,k) but would be left-handed if we wrote it as (i,k, j).

If the covariant basis is indeed a basis, its member vectors must not be coplanar; that is, J must not be zero. Hence, if the covariant basis is to be a local basis in some region of interest, J must not vanish anywhere in that region, and therefore must have the same sign throughout the region; that is, the handedness of the coordinate system must be the same throughout the region.

Properties of reciprocal bases

edit

Now, retaining the designations "covariant" and "contravariant" for convenience, let us see what we can deduce from the reciprocity relation (81) alone.

Most obviously, the reciprocity relation leads to a simple component-based expression for the dot-product of two vector fields, say q and v , provided that we use the contravariant components and covariant basis (83a) for one vector, and the covariant components and contravariant basis (83b) for the other:

 

whence selecting the non-zero terms gives

 

 

 

 

 

(88a)

And the two vectors, being general, can swap roles in (83a) and (83b):

 

 

 

 

 

(88b)

The cross-product needs a bit more preparation. First we define the permutation symbol ϵijk or ϵijk (also called the Levi-Civita symbol) as having the value  +1 if (i, j, k) is a permutation of (1, 2, 3) in the same cyclic order,  −1 if (i, j, k) is a permutation of (1, 2, 3) in the reverse cyclic order, and 0 if (i, j, k) is not a permutation, i.e. if there is at least one repeated index. To put it more formally,

   

 

 

 

 

(89)

Now by (81),  h1 is perpendicular to both h2 and h3. So we can say

 

where α1 is a scalar to be determined. Taking dot-products with h1 and applying (81) and (87), we find that  α1 = J , so that

 

By the generality of the vectors we can rotate the three indices, but the sign of the left-hand side changes if we swap the two indices on the left. All six cases are covered by

 

 

 

 

 

(90a)

Here we want only one term; but we need not specify "no sum", because for given i  and j  the permutation symbol leaves only one non-zero term in the sum over k. In words, this result says that the cross-product of two covariant basis vectors, with their indices in the standard cyclic order, is the Jacobian times the contravariant basis vector with the omitted index. Similarly, or rather reciprocally,

 

 

 

 

 

(90b)

where J′  is the Jacobian of the contravariant basis.

Equations (90a) and (90b), which we have obtained from the reciprocity relation (81), can be solved for hk and hk respectively. Thus a reciprocity relation between bases is enough to define each basis in terms of the other—as claimed above.

Now we're ready to consider the cross-product of two vector fields. In terms of the covariant basis,

 

i.e.,

 

 

 

 

 

(91a)

On the right, the two components and the basis vector are contravariant, but invariance is achieved by multiplying by the covariant Jacobian (which has three covariant factors). Similarly,

 

 

 

 

 

(91b)

On the right of (91a) or (91b), the implicit triple summation has 27 terms, of which only six (corresponding to the six possible permutations of the three possible indices) can be non-zero. Thus the factor following the Jacobian can be recognized as the familiar determinant whose columns (or rows), in cyclic order, are the components of q , the components of v , and the three basis vectors. In Cartesian coordinates, in which the Jacobians are equal to 1 and we don't need the co⧸contra distinction, both equations reduce to

 

—a familiar result written in a possibly unfamiliar way.

The Jacobian of the contravariant basis is

 

or, if we solve (90a) for each hk and substitute,

 

In the numerator, the cross-product of cross-products can be read as a vector triple product in which the first factor is a cross-product. Expanding that triple product and noting that one term is a scalar triple product with a repeated factor, we get

 

so that we may write

 

 

 

 

 

(92)

in (90b) and (91b). In words, the Jacobian of the reciprocal basis is the reciprocal of the Jacobian of the original basis. Therefore the two Jacobians have the same sign. Therefore a basis is right-handed if and only if its reciprocal is right-handed. Thus the natural and dual bases of a coordinate system have the same handedness, and the handedness of either may be identified with the handedness of the coordinate system.

Gradient, del, advection

edit

Let p  be a scalar field, and let s  be arc length in the direction of the unit vector . By the multivariate chain rule (75),

 

So  hii p  is the vector whose (invariant) scalar component in the direction of any is the directional derivative of p in that direction; that is,

 

 

 

 

 

(93g)

or, in operational terms,

 

 

 

 

 

(93o)

Apart from the need to match a superscript and a subscript, these two results look as simple as their Cartesian special cases (58g) and (58o).

If ψ  is a generic field and q is a general vector in the direction of the same s then by definition (11),

 

that is,

 

 

 

 

 

(94)

or, in operational terms,

 

 

 

 

 

(94o)

These results likewise look as simple as their Cartesian special cases (64) and (64o).  And by (88a), the q⸱ operator again turns out to be the formal dot-product of  q and.

Curl

edit

To express the curl of a vector field q , we choose the contravariant basis (83b) and apply identity (71c):

 

On the right, the first term vanishes because h j  is u j (and the curl of a gradient is zero). Substituting from (93o) in the second term, we obtain

 

or, using (90b),

 

 

 

 

 

(95)

[To be continued.]

Additional information

edit

Acknowledgment

edit

Professor Chen-To Tai, FIEEE, died in 2004.  He first came to my attention in 2018 through his paper "On the presentation of Maxwell's theory" (Proc. IEEE, 60(8): 936–45, 1972). In nearly every place where I mention him here, even if I do not accept his conclusion, I am entirely indebted to his works for drawing my attention to the issue raised. In particular, it was he who alerted me to Gibbs's original definitions of the divergence and curl and their suitability for expression in indicial notation (Tai, 1995, pp. 17, 21).

Competing interests

edit

None.

Ethics statement

edit

This article does not concern research on human or animal subjects.

TO DO:

edit
  • Keywords
  • Figure(s) & caption(s)
  • Etc.!

Notes

edit
  1. Even if we claim that "particles" of matter are wave functions and therefore continuous, this still implies that matter is lumpy in a manner not normally contemplated by continuum mechanics.
  2. If r is the position of a particle and p is its momentum, the last term vanishes. If the force is toward the origin, the previous term also vanishes, and we are left with conservation of angular momentum about the origin.
  3. Here we use the broad triangle symbol (△) rather than the narrower Greek Delta (Δ); the latter would more likely be misinterpreted as "change in…"
  4. There is no need for parentheses around ρv , because div ρv cannot mean (div ρ)v , because the divergence of a scalar field is not defined.
  5. The material derivative d/dt is also called the substantive derivative, and is sometimes written D/Dt if the result is meant to be understood as a field rather than simply a function of time (Kemmer, 1977, pp. 184–5).
  6. Or nabla, because it allegedly looks like the ancient Phoenician harp that the Greeks called by that name.
  7. Stress is a second-order tensor, and the origin of the term "tensor"; but, for present purposes, it's just another possible example of a field called ψ.
  8. In mathematical jargon, it should be a two-dimensional manifold embedded in 3D Euclidean space.
  9. If any part of our argument requires Σ or C to be smooth, this is not an impediment, because having approximated Σ or C to any desired accuracy by a polyhedron or polygon, we can then approximate the polyhedron or polygon to any desired higher accuracy by a smooth surface or curve!
  10. In the general case, there is an extra term D/∂t on the right; but this term is zero in the magnetostatic case.
  11. When a gas is compressed, work is done on it, causing its temperature to rise, so that the ratio of dp to is higher than if the compression were isothermal. In sound waves, there is typically not enough time for a significant part of the heat of compression to be conducted away; that is, the compression is near enough to adiabatic. The words "not enough time" may suggest that the adiabatic approximation is a high-frequency approximation. But in fact, in free air, it is a low-frequency approximation, because as the frequency is reduced, the equalization of temperature is hindered more by the longer wavelength than it is helped by the longer period. Only in a confined space, which limits the required distance of conduction, does the adiabatic assumption require the frequency to be above some lower limit. In a musical wind instrument, that lower limit tends to be far below the audible range. Meanwhile the upper limit, due to easier heat conduction within a shorter wavelength, tends to be very far above the audible range. Thus, under typical conditions, for the purpose of calculating c , the adiabatic assumption is reasonable. (See Fletcher, 1974.)
  12. Or sometimes "quabla", by analogy with "nabla".
  13. In particular, some authorities change the sign, defining as  1/ c²∂²/∂t² − △ ,  and some write the operator (however defined) as2.
  14. The symbol c comes from a general-purpose Latin word for speed, but has become the usual symbol for wave speed.
  15. Tai (1995, pp. 43–4) also disagrees with Moon & Spencer, but for a different reason: he regards the Laplacian as the divergence of the gradient even if the operand is a vector field. For better or worse, we do not consider the gradient of a vector in the present paper—although the reader can probably work out how to modify (26g) if  dr is written as a column vector and  dp  is replaced  by a column vector (compare the later footnote on dyadics).
  16. A quaternion  is a mathematical object invented by William Rowan Hamilton in 1843, consisting of two parts which Hamilton later called the scalar part and the vector part. For most purposes the two parts were found to be more useful separately than together. By putting them together, however, Hamilton constructed a set which satisfied all the algebraic field axioms except commutativity of multiplication. This was considered a triumph.
  17. They involve dyadics, i.e. 2nd-order tensors written in a vector-friendly notation. The fourth of the seven is
    ∇(τ⸱ ω) = ∇τ ⸱ ω + ∇ω ⸱ τ ,
    which is our (68) expressed in terms of the dyadics τ andω; the right-hand side is not to be confused with
    (ω ⸱∇)τ + (τ⸱∇)ω ,
    which would contradict our (68).
  18. In Wilson's notation, (7.26) would be written
    ∇(A ⸱ B) = ∇(A ⸱ B)A + ∇(A ⸱ B)B.
    Another alternative is the Feynman subscript notation, in which the subscript is attached to the operator and indicates which factor is allowed to vary, so that (7.26) would be written
    ∇(A ⸱ B) = ∇B(A ⸱ B) + ∇A(A ⸱ B) .
  19. Indicial notation is standard in higher-order tensor analysis, which however tends not to use unit vectors of coordinate systems, and therefore tends not to encourage the indexing of unit vectors in elementary vector analysis—whereas in the present paper, I have unapologetically indexed the unit vectors.
  20. Hence we want each ui(r) to be, as far as possible, a smooth function. This may require some tweaking of definitions. E.g., in cylindrical coordinates, the angular coordinate φ must be confined to some 360° range in order to make it unique, and we don't want it jumping from the end of the range to the beginning within the region of interest.

References

edit
  1. Axler, 1995, §9. The relegation of determinants was anticipated by C.G. Broyden (1975). But Broyden's approach is less radical: he does not deal with abstract vector spaces or abstract linear transformations, and his eventual definition of the determinant, unlike Axler's, is traditional—not a product of the preceding narrative.
  2. Axler, 1995, §1. But it is Broyden (1975), not Axler, who discusses numerical methods at length.
  3. E.g., Feynman (1963, vol. 1, § 11-5), having defined velocity from displacement in Cartesian coordinates, shows that velocity is a vector by showing that its coordinate representation contra-rotates (like that of displacement) if the coordinate system rotates.
  4. E.g., Feynman (1963, vol. 1, § 11-7), having defined the magnitude and dot-product operators in Cartesian coordinates, shows that they are scalar operators by showing that their representations in rotated coordinates are the same as in the original coordinates (except for names of coordinates and components). And Tai (1995, pp. 40–42), having determined the form of the "gradient" operator in a general curvilinear orthogonal coordinate system (which we shall eventually meet in the present paper), shows that it is a vector operator by showing that it has the same form in any other curvilinear orthogonal coordinate system.
  5. There are many proofs and interpretations of this identity. My own effort, for what it's worth, is "Trigonometric proof of vector triple product expansion", Mathematics Stack Exchange, t.co/NM2v4DJJGo, 2024. The classic is Gibbs, 1881, §§ 26–7.
  6. Gibbs, 1881, § 56.
  7. Katz, 1979, pp. 146–9.
  8. In Feynman, 1963,  −∇p as the "pressure force per unit volume" eventually appears in the 3rd-last lecture of Volume 2 (§40-1).
  9. A demonstration like the foregoing is outlined by Gibbs (1881, § 55).
  10. Wilson, 1901, pp. 147–8; Borisenko & Tarapov, 1968, pp. 147–8 (again); Hsu, 1984, p. 92; Kreyszig, 1988, pp. 485–6; Wrede & Spiegel, 2010, p. 198.
  11. Gibbs (1881, § 50) introduces the gradient with this definition, except that he calls u simply the derivative of u, and u the primitive of u. Use of the term gradient as an alternative to derivative is reported by Wilson (1901, p. 138).
  12. CfBorisenko & Tarapov, 1968, p. 157, eq. (4.43), quoted in Tai, 1995, p. 33, eq. (4.19).
  13. The first two cases may be compared with Javid & Brown, 1963, cited in Tai, 1994, p. 15.
  14. The first two cases may be compared with Neff, 1991, cited in Tai, 1994, p. 16.
  15. But Gibbs (1881) and Wilson (1901) were content to leave it as .  And they did not call it the Laplacian; they used that term with a different meaning, which has apparently fallen out of fashion.
  16. Durney & Johnson, in Introduction to Modern Electromagnetics (1969, p. 45, cited in Tai, 1994, p. 12), make the absurd statement that "a operator cannot be defined in the other coordinate systems…" In the context, they apparently meant to say that div A isn't ⸱A in other coordinate systems. Robert S. Elliott, in Electromagnetics (1966, p. 606, cited in Tai, 1994, p. 13), says that "only in Cartesian coordinates… do the gradient and divergence operators turn out to be identical." Apparently he meant to say that only in Cartesian coordinates do the two operators differ by a dot. But what these authors apparently meant to say is still wrong, as shown with counterexamples by Kemmer (next reference).
  17. The perception that they apply only in Cartesian coordinates arises partly from failure to allow for the variability of the basis vectors in curvilinear coordinate systems; cfKemmer, 1977, pp. 163–5, 172–3 (Exs. 2, 3, 5), 230–33 (sol'ns). From the del operator and the derivatives of the basis vectors w.r.t. the coordinates, Kemmer finds the curl and divergence in cylindrical coordinates, notes that we can do the same "with a little greater effort" in spherical coordinates (p. 230), and finds the Laplacian of a scalar in both coordinate systems (p. 231). He further reports that the method works for the Laplacian of a vector in cylindrical and spherical coordinates and is relatively convenient for the former (p. 232), for which "differentiation of the unit vectors is very simple" (p. 165).
  18. Kemmer (1977, p. 98, eq. 4) gives an equivalent result for our first three integral theorems (5g to 5d) only, and calls it the generalized divergence theorem because the divergence theorem is its most familiar special case.
  19. E.g., Gibbs, 1884, § 165, eq. (1); Wilson, 1901, p. 255, Ex. 1; Kemmer, 1977, p. 99, eq. (6); Hsu, 1984, p. 146, eq. (7.31).
  20. CfKatz, 1979, pp. 149–50.
  21. Although Hsu (1984, p. 141) applies that name to our theorem (5c).
  22. E.g., Gibbs, 1881, § 61; Hsu, 1984, pp. 117–18.
  23. CfFeynman, 1963, vol. 2, §2-8.
  24. Although Hsu (1984, p. 141) applies that name to our theorem (5g).
  25. CfGibbs, 1881, §§ 50, 59; presumably this is one reason why Gibbs called the gradient simply the derivative.
  26. CfGibbs, 1881, §§ 50, 51; presumably this is another reason why Gibbs called the gradient the derivative.
  27. Our definition of strength follows the old convention used by Baker & Copson (1939, p. 42), Born & Wolf (2002, p. 421), and Larmor (1904, p. 5). The newer convention followed by Miller (1991, p. 1371) would use the denominator 4πr instead of our r in (48); this would have the advantage of eliminating the factor 4π from the D'Alembertian of the wave function, and the disadvantage of introducing that factor into (the denominator of) the wave function itself.
  28. The latter passage, as it appears in the 5th edition (p. 397), is the one cited by Tai (1994, p. 6).
  29. Quoted by Tai (1994), in alphabetical order within each category. For Kovach he could have added p. 308.  Potter he misnames as Porter.
  30. Quoted by Tai (1994, p. 23).
  31. Wilson, 1901, p. 150.
  32. Wilson, 1901, pp. 150, 152. Wilson does not announce this idea in his preface (p. xii), although Tai (1995, p. 26) gets the contrary impression by omitting a comma from the relevant quote.
  33. Tai, 1995, pp. 26, 38.
  34. Tai, 1995, p. 28.
  35. The following explanation takes some hints from Christopher Ford's note on "Vector Potentials" at maths.tcd.ie/~houghton/231/Notes/ChrisFord/vp.pdf, circa 2004.
  36. CfGibbs, 1881, § 71, and Moon & Spencer, 1965, p. 235; quoted in Tai, 1995, pp. 18, 43.
  37. Moon & Spencer, 1965, p. 236.
  38. Tai, 1995, p. 35.
  39. Tai, 1995, pp. 25, 29.
  40. Wilson, 1901, pp. ix, xi–xii.
  41. Gibbs, 1881–84, privately printed version—of which the scan linked in our bibliography is of the very copy that Gibbs sent to Heaviside, with annotations in Heaviside's hand. On the annotations see Rocci, 2020.
  42. In the next equation as printed in Borisenko & Tarapov (1968, p. 180), the first cross should be "="; Tai (1995, p. 46) corrects it.

Bibliography

edit
  • S.J. Axler, 1995, "Down with Determinants!"  American Mathematical Monthly, vol. 102, no. 2 (Feb. 1995), pp. 139–54; jstor.org/stable/2975348.  (Author's preprint, with different pagination: researchgate.net/publication/265273063_Down_with_Determinants.)
  • S.J. Axler, 2023–, Linear Algebra Done Right, 4th Ed., Springer; linear.axler.net (open access).
  • B.B. Baker and E.T. Copson, 1939, The Mathematical Theory of  Huygens' Principle, Oxford; 3rd Ed. (same pagination, with addenda), New York: Chelsea, 1987, archive.org/details/mathematicaltheo0000bake.
  • A.I. Borisenko and I.E. Tarapov (tr. & ed. R.A. Silverman), 1968, Vector and Tensor Analysis with Applications, Prentice-Hall; reprinted New York: Dover, 1979, archive.org/details/vectortensoranal0000bori.
  • M. Born and E. Wolf, 2002, Principles of Optics, 7th Ed., Cambridge, 1999 (reprinted with corrections, 2002).
  • C.G. Broyden, 1975, Basic Matrices, London: Macmillan.
  • R.P. Feynman, R.B. Leighton, & M. Sands, 1963 etc., The Feynman Lectures on Physics, California Institute of Technology; feynmanlectures.caltech.edu.
  • N.H. Fletcher, 1974, "Adiabatic assumption for wave propagation", American Journal of Physics, vol. 42, no. 6 (June 1974), pp. 487–9; doi.org/10.1119/1.1987757.
  • J.W. Gibbs, 1881–84, "Elements of Vector Analysis", privately printed New Haven: Tuttle, Morehouse & Taylor, 1881 (§§ 1–101), 1884 (§§ 102–189, etc.), archive.org/details/elementsvectora00gibb; published in The Scientific Papers of J. Willard Gibbs (ed. H.A. Bumstead & R.G. Van Name), New York: Longmans, Green, & Co., 1906, vol. 2, archive.org/details/scientificpapers02gibbuoft, pp. 17–90.
  • H.P. Hsu, 1984, Applied Vector Analysis, Harcourt Brace Jovanovich; archive.org/details/appliedvectorana00hsuh.
  • V.J. Katz, 1979, "The history of Stokes' theorem", Mathematics Magazine, vol. 52, no. 3 (May 1979), pp. 146–56; jstor.org/stable/2690275.
  • N. Kemmer, 1977, Vector Analysis: A physicist's guide to the mathematics of fields in three dimensions, Cambridge; archive.org/details/isbn_0521211581.
  • E. Kreyszig, 1962 etc., Advanced Engineering Mathematics, New York: Wiley;  5th Ed., 1983;  6th Ed., 1988;  9th Ed., 2006;  10th Ed., 2011.
  • J. Larmor, 1904, "On the mathematical expression of the principle of  Huygens" (read 8 Jan. 1903), Proceedings of the London Mathematical Society, Ser. 2, vol. 1 (1904), pp. 1–13.
  • D.A.B. Miller, 1991, "Huygens's wave propagation principle corrected", Optics Letters, vol. 16, no. 18 (15 Sep. 1991), pp. 1370–72; stanford.edu/~dabm/146.pdf.
  • P.H. Moon and D.E. Spencer, 1965, Vectors, Princeton, NJ: Van Nostrand.
  • W.K.H. Panofsky and M. Phillips, 1962, Classical Electricity and Magnetism, 2nd Ed., Addison-Wesley; reprinted Mineola, NY: Dover, 2005.
  • A. Rocci, 2020, "Back to the roots of vector and tensor calculus: Heaviside versus Gibbs" (online 10 Nov. 2020), Archive for History of Exact Sciences, vol. 75, no. 4 (July 2021), pp. 369–413. (Author's preprint, with different pagination: arxiv.org/abs/2010.09679.)
  • C.-T. Tai, 1994, "A survey of the improper use of ∇ in vector analysis" (Technical Report RL 909), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7869.
  • C.-T. Tai, 1995, "A historical study of vector analysis" (Technical Report RL 915), Dept. of Electrical Engineering & Computer Science, University of Michigan; hdl.handle.net/2027.42/7868.
  • E.B. Wilson, 1901, Vector Analysis: A text-book for the use of students of mathematics and physics ("Founded upon the lectures of J. Willard Gibbs…"), New York: Charles Scribner's Sons; 12th printing, Yale University Press, 1958, archive.org/details/vectoranalysiste0000gibb.
  • R.C. Wrede and M.R. Spiegel, 2010, Advanced Calculus, 3rd Ed., New York: McGraw-Hill (Schaum's Outlines); archive.org/details/schaumsoutlinesa0000wred.

Further reading

edit

M.J. Crowe, "A History of Vector Analysis" (address at the University of Louisville, Autumn term, 2002), researchgate.net/publication/244957729_A_History_of_Vector_Analysis (including much discussion of quaternions).

P. Lynch, "Matthew O'Brien: An inventor of vector analysis", Bulletin of the Irish Mathematical Society, No. 74 (Winter 2014), pp. 81–8; doi.org/10.33232/BIMS.0074.81.88.