WikiJournal Preprints/Cut the coordinates! (or Vector Analysis Done Fast)

WikiJournal Preprints
Open access • Publication charge free • Public peer review

WikiJournal User Group is a publishing group of open-access, free-to-publish, Wikipedia-integrated academic journals. <seo title=" Wikiversity Journal User Group, WikiJournal Free to publish, Open access, Open-access, Non-profit, online journal, Public peer review "/>

<meta name='citation_doi' value=>

Article information

Abstract

The gradient, the curl, the divergence, and the Laplacian are initially defined, without coordinates, as closed-surface integrals per unit volume—the definition of the Laplacian being indifferent to whether the operand is a scalar field or a vector field. Four integral theorems—including the divergence theorem—follow almost immediately, provided that the initial definitions are unambiguous. Their unambiguity, together with some examples of their usefulness, is established as follows, at a level suitable for beginners (although this abstract is for prospective instructors):
  • The gradient is related to an acceleration through an equation of motion;
  • The divergence is related to two time-derivatives of density (the partial derivative and the material derivative) through two forms of an equation of continuity;
  • The component of the curl in a general direction is expressed as a divergence (now known to be unambiguous);
  • The same is done for the general component of the gradient, yielding not only a second proof of unambiguity of the gradient, but also the relation between the gradient and the directional derivative; this together with the original definition of the Laplacian shows that the Laplacian of a scalar field is the divergence of the gradient and therefore unambiguous. The unambiguity of the Laplacian of a vector field then follows from a component argument (as for the curl) or from a linearity argument.

The derivation of the relation between the gradient and the directional derivative yields a coordinate-free definition of the dot-del operator for a scalar right-hand operand. But, as the directional derivative is also defined for a non-scalar operand, the same relation offers a method of generalizing the dot-del operator, so that the definition of the Laplacian of a general field can be rewritten with that operator. The advection operator—derived without coordinates, for both scalar and vector properties—is likewise rewritten.

Meanwhile comparison between the definitions of the various operators leads to coordinate-free definitions of the del-cross, del-dot, and del-squared operators. These together with the dot-del operator allow the four integral theorems to be condensed into a single generalized volume-integral theorem.

If the volume of integration is reduced to a thin curved slab of uniform thickness, with an edge-face perpendicular to the broad faces, the four integral theorems are reduced to their two-dimensional forms, each of which relates an integral over a surface segment to an integral around its enclosing curve, provided that the original closed-surface integral has no contribution from the broad faces of the slab. This proviso can be satisfied by construction in two of the four cases, yielding two general theorems, one of which is the Kelvin–Stokes theorem. By applying these two theorems to a segment of a closed surface, and expanding the segment to cover the entire surface, it is shown that the gradient is irrotational and the curl is solenoidal.

The next part of the exposition is more conventional, but still coordinate-free. The gradient theorem is derived from the relation between the gradient and the directional derivative. An irrotational field is shown to have a scalar potential. The 1/r  scalar field is shown to be the field whose negative gradient is the inverse-square vector field, whose divergence is a delta function, which is therefore also the negative Laplacian of the 1/r  scalar field. These results enable the construction of a field with a given divergence or a given Laplacian. The wave equation is derived from small-amplitude sound waves in a non-viscous fluid, and shown to be satisfied by a spherical-wave field with a 1/r  amplitude, whose D'Alembertian is a delta function, enabling the construction of a wave function with a given D'Alembertian. But further progress, including the construction of a field with a given curl, seems to require the introduction of coordinates.

With the aid of identities already found, expressions are easily obtained for the gradient, curl, divergence, Laplacian, and advection operators in Cartesian coordinates—with indicial notation and implicit summation, for brevity. While the resulting expressions for the curl and divergence may look unfamiliar, they match the initial definitions given by J. Willard Gibbs. The Cartesian expressions are found convenient for deriving further identities: a comprehensive collection is derived, leading to the construction of a field with a given curl in a star-shaped region and, as a by-product, a demonstration that the curl of the velocity field of a rigid body is twice the angular velocity. The curl-of-the-curl identity leads to a second definition of the Laplacian of a vector, the Helmholtz decomposition, and the prediction of electromagnetic waves.

The time-honored method of deriving vector-analytic identities—treating the divergence and curl as "formal products" with the del operator, varying one field at a time, and adding the results—is found to be less than rigorous, sometimes less than clear, and hard to justify in view of the ease with which the same thing can be done with Cartesian coordinates, indicial notation, and implicit summation.

The introduction of general coordinates proceeds through (non-normalized) natural and dual basis vectors, reciprocity, the Kronecker delta, covariance of the natural basis, contravariance of the dual basis, contravariant and covariant components, local bases, contravariance of coordinates, covariance of derivatives w.r.t. coordinates, the Jacobian, and handedness. Reciprocity leads to the dot-product of two vector fields and, via the permutation symbol, to the cross-products of the basis vectors, the definition of one basis in terms of the other, the cross-product of two vector fields, and reciprocity of the covariant and contravariant Jacobians. Thus the stage is set for expressing operators in general coordinates.

The multivariate chain rule leads to expressions for the directional derivative (in terms of the contravariant basis), hence the gradient (del) and advection operators. The identity for the curl of the product of a scalar and a vector leads to an expression for the curl in terms of covariant components. Expressions for the curl and divergence operators are obtained from the original volume-based definitions, and are found to agree with del-cross and del-dot respectively, with del expressed in the same general coordinates. The volume-based definition of the divergence leads, by a simpler path, to an expression in terms of contravariant components, which in turn yields an expression for the Laplacian.

Affine coordinates are briefly described before proceeding to orthogonal coordinates. In the latter, the Jacobian is simplified and we can choose an orthonormal basis, which is its own reciprocal, so that vectors can be specified in components w.r.t. a single basis. By expressing the old basis vectors and components in terms of the new ones, we can re-express dot-products, cross-products, and differential operators in terms of orthogonal coordinates with an orthonormal basis.

In an appendix, Huygens' principle is mathematized by deriving Green's identities and thence Kirchhoff's integral theorem (without  assuming sinusoidal time-dependence), and then interpreting Kirchhoff's integrand as a distribution of secondary sources. That distribution can be described in two ways. The standard description, in terms of monopoles and normal dipoles, is applicable to the general case. A new description, in terms of "generalized spatiotemporal dipoles" (GSTDs), is useful for the case of a single monopole primary source—but, unlike the original spatiotemporal-dipole formulation of D.A.B. Miller (1991), does not  require the surface of integration to be a wavefront. The GSTD description clarifies the manner in which the secondary sources suppress backward secondary waves: the directivity of the GSTD sources is such as to suppress specular reflections off the surface of integration.


Introduction

edit

Sheldon Axler, in his essay "Down with determinants!" (1995) and his ensuing book Linear Algebra Done Right (4th Ed., 2023–), does not entirely eliminate determinants, but introduces them as late as possible and then exploits them for what he calls their "main reasonable use in undergraduate mathematics", namely the change-of-variables formula for multiple integrals.[1] Here I treat coordinates in vector analysis somewhat as Axler treats determinants in linear algebra: I introduce coordinates as late as possible, and then exploit them in unconventionally rigorous derivations of vector-analytic identities from (e.g.) vector-algebraic identities. But I contrast with Axler in at least two ways. First, as my subtitle suggests, I have no intention of expanding my paper into a book. Brevity is of the essence. Second, while one may well avoid determinants in numerical  linear algebra,[2] one can hardly avoid coordinates in numerical vector analysis! So I cannot extend the coordinate-minimizing path into computation. But I can extend it up to the threshold by expressing the operators of vector analysis in general coordinates and orthogonal coordinates, leaving it for others to specialize the coordinates and compute with them. On the way, I can satisfy readers who need the concepts of vector analysis for theoretical purposes, and who would rather read a paper than a book. Readers who stay to the end of the paper will get a more general treatment of coordinates than is offered by a typical book-length introduction to vector analysis. In the meantime, coordinates won't needlessly get in the way.

The cost of coordinates

edit

Mathematicians define a "vector" as a member of a vector space, which is a set whose members satisfy certain basic rules of algebra (called the vector-space axioms) with respect to another set called a field (e.g., the real numbers), which has its own basic rules of algebra (the field axioms), and whose members are called "scalars". Physicists are more fussy. They typically want a "vector" to be not only a member of a vector space, but also a first-order tensor : a "tensor", meaning that it exists independently of any coordinate system with which it might be specified; and "first-order" (or "first-degree", or "first-rank"), meaning that it is specified by a one-dimensional array of numbers. Similarly, a 2nd-order tensor is specified by a 2-dimensional array (a matrix), and a 3rd-order by a 3-dimensional array, and so on. Hence they want a "scalar", which is specified by a single number (a zero-dimensional array), to be a zero-order tensor. In "vector analysis", we are greatly interested in applications to physical situations, and accordingly take the physicists' view on what constitutes a vector or a scalar.

So, for our purposes, defining a quantity by three components in (say) a Cartesian coordinate system is not enough to make it a vector, and defining a quantity as a real function of a list of coordinates is not enough to make it a scalar, because we still need to show that the quantity has an independent existence. One way to do this is to show that its coordinate representation behaves appropriately when the coordinate system is changed. Independent existence of a quantity means that its coordinate representation changes so as to compensate for the change in the coordinate system.[3] But independent existence of an operator means that its expression in one coordinate system (with the operand[s] and the result in that system) gives the same result as the corresponding expression in another coordinate system.[4]

Here we circumvent these complications by the most obvious route: by initially defining things without coordinates. If, having defined something without coordinates, we then need to represent it with coordinates, we can choose the coordinate system for convenience rather than generality.

The limitations of limits

edit

In the branch of pure mathematics known as analysis, there is a thing called a limit, whereby for every positive ϵ  there exists a positive δ such that if some increment is less than δ, some error is less than ϵ. In the branch of applied mathematics known as continuum mechanics, there is a thing called reality, whereby if the increment is less than some positive δ, the assumption of a continuum becomes ridiculous, so that the error cannot be made less than an arbitrary ϵ. Yet vector "analysis" (together with higher-order tensors) is typically studied with the intention of applying it to some form of "continuum" mechanics, such as the modeling of elasticity, plasticity, fluid flow, or (widening the net) electrodynamics of ordinary matter; in short, it is studied with the intention of conveniently forgetting that, on a sufficiently small scale, matter is lumpy.[a] One might therefore submit that to express the principles of vector analysis in the language of limits is to strain at a gnat and swallow a camel. Here I avoid that camel by referring to elements of length or area or volume, each of which is small enough to allow some quantity or quantities to be considered uniform within it, but, for the same reason, large enough to allow such local averaging of the said quantity or quantities as is necessary to tune out the lumpiness.

We shall see bigger camels, where well-known authors define or misdefine a vector operator and then derive identities by treating it like an ordinary vector quantity. These I also avoid.

Prerequisites

edit

I assume that the reader is familiar with the algebra and geometry of vectors in 3D space, including the dot-product, the cross-product, and the scalar triple product, their geometric meanings, their expressions in Cartesian coordinates, and the identity

a × (b × c)  =  a⸱ c ba⸱ b c ,

which we call the "expansion" of the vector triple product.[5] I further assume that the reader can generalize the concept of a derivative, so as to differentiate a vector with respect to a scalar, e.g.

 

or so as to differentiate a function of several independent variables "partially" w.r.t. one of them while the others are held constant, e.g.

 

But in view of the above remarks on limits, I also expect the reader to be tolerant of an argument like this: In a short time dt, let the vectors r and p change by dr and dp respectively. Then

 

where, as always, the orders of the cross-products matter.[b] Differentiation of a dot-product behaves similarly, except that the orders don't matter; and if  p = mv, where m is a scalar and v is a vector, then

 

Or an argument like this:  If , then

 

that is, we can switch the order of differentiation in a "mixed" partial derivative. Ifx is an abbreviation for /∂x, etc., this rule can be written in operational terms as

x y = ∂y x .

More generally, if i is an abbreviation for /∂xi  where  i ∊ {1, 2,…},  the rule becomes

i j = ∂j i .

These generalizations of differentiation, however, do not go beyond differentiation w.r.t. real variables, some of which are scalars, and some of which are coordinates. Vector analysis involves quantities that may be loosely described as derivatives w.r.t. a vector—usually the position vector.

Closed-surface integrals per unit volume

edit

The term field, mentioned above in the context of algebraic axioms, has an alternative meaning: if r is the position vector, a scalar field is a scalar-valued function of r, and a vector field is a vector-valued function of r; both may also depend on time. These are the functions of which we want "derivatives" w.r.t. the vector r.

In this section I introduce four such derivatives—the gradient, the curl, the divergence, and the Laplacian —in a way that will seem unremarkable to those readers who aren't already familiar with them, but idiosyncratic to those who are. The gradient is commonly introduced in connection with a curve and its endpoints, the curl in connection with a surface segment and its enclosing curve, the divergence in connection with a volume and its enclosing surface, and the Laplacian as a composite of two of the above, initially applicable only to a scalar field. Here I introduce all four in connection with a volume and its enclosing surface; and I introduce the Laplacian as a concept in its own right, equally applicable to a scalar or vector  field, and only later relate it to the others. My initial definitions of the gradient, the curl, and the Laplacian, although not novel, are usually thought to be more advanced than the common ones—in spite of being conceptually simpler, and in spite of being obvious variations on the same theme.

Instant integral theorems (with a caveat)

edit

Let V be a volume (3D region) enclosed by a surface S (a mathematical surface, not generally a physical barrier). Let n̂ be the unit normal vector at a general point on S, pointing out of V. Let n be the distance from S in the direction of n̂ (positive outside V, negative inside), and let n be an abbreviation for /∂n, where the derivative—commonly called the normal derivative—is tacitly assumed to exist.

In V, and on S, let p be a scalar field (e.g., pressure in a fluid, or temperature), and let q be a vector field (e.g., flow velocity, or heat-flow density), and let ψ be a generic field which may be a scalar or a vector. Let a general element of the surface S have area dS, and let it be small enough to allow n̂, p, q, and n ψ to be considered uniform over the element. Then, for every element, the following four products are well defined:

 

 

 

 

 

(1)

If p is pressure in a non-viscous fluid, the first of these products is the force exerted by the fluid in V  through the area dS. The second product does not have such an obvious physical interpretation; but if q is circulating clockwise about an axis directed through V, the cross-product will be exactly tangential to S and will tend to have a component in the direction of that axis. The third product is the flux of q through the surface element; if q is flow velocity, the third product is the volumetric flow rate (volume per unit time) out of V  through dS ; or if q is heat-flow density, the third product is the heat transfer rate (energy per unit time) out of V  through dS. The fourth product, by analogy with the third, might be called the flux of the normal derivative of ψ through the surface element, but is equally well defined whether ψ is a scalar or a vector—or, for that matter, a matrix, or a tensor of any order, or anything else that we can differentiate w.r.t. n.

If we add up each of the four products over all the elements of the surface S, we obtain, respectively, the four surface integrals

 

 

 

 

 

(2)

in which the double integral sign indicates that the range of integration is two-dimensional. The first surface integral takes a scalar field and yields a vector; the second takes a vector field and yields a vector; the third takes a vector field and yields a scalar; and the fourth takes (e.g.) a scalar field yielding a scalar, or a vector field yielding a vector. If p is pressure in a non-viscous fluid, the first integral is the force exerted by the fluid in V  on the fluid outside V. The second integral may be called the skew surface integral of q over S ,[6] or, for the reason hinted above, the circulation of q over S.  The third integral, commonly called the flux integral (or simply the surface integral) of q over S, is the total flux of q out of V. And the fourth integral is the surface integral of the outward normal derivative of ψ.

Let the volume V  be divided into elements. Let a general volume element have the volume dV and be enclosed by the surface δS —not to be confused with the area dS of a surface element, which may be an element of S or of δS. Then consider what happens if, instead of evaluating each of the above surface integrals over S, we evaluate it over each δS and add up the results for all the volume elements. In the interior of V, each surface element of area dS is on the boundary between two volume elements, for which the unit normals n̂ at dS, and the respective values ofn ψ, are equal and opposite. Hence when we add up the integrals over the surfaces δS, the contributions from the elements dS cancel in pairs, except on the original surface S, so that we are left with the original integral over S. So, for the four surface integrals in (2), we have respectively

 

 

 

 

 

(3)

Now comes a big "if":  if  we define the gradient of p (pronounced "grad p") inside dV  as

 

 

 

 

 

(4g)

and the curl of q inside dV  as

 

 

 

 

 

(4c)

and the divergence of q inside dV  as

 

 

 

 

 

(4d)

and the Laplacian of ψ inside dV  as [c]

 

 

 

 

 

(4L)

(where the letters after the equation number stand for gradient, curl, divergence, and Laplacian, respectively), then equations (3) can be rewritten

 

But because each term in each sum has a factor dV, we call the sum an integral; and because the range of integration is three-dimensional, we use a triple integral sign. Thus we obtain the following four theorems relating integrals over an enclosing surface S  to integrals over the enclosed volume V :

 

 

 

 

 

(5g)

 

 

 

 

 

(5c)

 

 

 

 

 

(5d)

 

 

 

 

 

(5L)

Of the above four results, only the third (5d) seems to have a standard name; it is called the divergence theorem (or Gauss's theorem or, more properly, Ostrogradsky's theorem[7]), and is indeed the best known of the four—although the other three, having been derived in parallel with it, may be said to be equally fundamental.

As each of the operators ∇, curl, and div calls for an integration w.r.t. area and then a division by volume, the dimension (or unit of measurement) of the result is the dimension of the operand divided by the dimension of length, as if the operation were some sort of differentiation w.r.t. position. Moreover, in each of equations (5g) to (5d), there is a triple integral on the right but only a double integral on the left, so that each of the operators ∇, curl, and div appears to compensate for a single integration. For these reasons, and for convenience, we shall describe them as differential operators. By comparison, the operator in (4L) or (5L) calls for a further differentiation w.r.t. n ; we shall therefore describe as a 2nd-order differential operator. (An additional reason for these descriptions will emerge later.) As promised, the four definitions (4g) to (4L) are "obvious variations on the same theme" (although the fourth is somewhat less obvious than the others).

But remember the "if": Theorems (5g) to (5L) depend on definitions (4g) to (4L) and are therefore only as definite as those definitions! Equations (3), without assuming anything about the shapes and relative sizes of the closed surfaces δS (except, tacitly, that n̂ is piecewise well-defined), indicate that the surface integrals are additive with respect to volume. But this additivity, by itself, does not guarantee that the surface integrals are shared among neighboring volume elements in proportion to their volumes, as envisaged by "definitions" (4g) to (4L). Each of these "definitions" is unambiguous if, and only if, the ratio of the surface integral to dV  is insensitive to the shape and size of δS  for a sufficiently small δS. Notice that the issue here is not whether the ratios specified in equations (4g) to (4L) are true vectors or scalars, independent of the coordinates; all of the operations needed in those equations have coordinate-free definitions. Rather, the issue is whether the resulting ratios are unambiguous notwithstanding the ambiguity of δS, provided only that δS is sufficiently small. That is the advertised "caveat", which must now be addressed.

In accordance with our "applied" mathematical purpose, our proofs of the unambiguity of the differential operators will rest on a few thought experiments, each of which applies an operator to a physical field, say f, and obtains another physical field whose unambiguity is beyond dispute. The conclusion of the thought experiment is then applicable to any operand field whose mathematical properties are consistent with its interpretation as the physical field f ; the loss of generality, if any, is only what is incurred by that interpretation.

Unambiguity of the gradient

edit

Suppose that a fluid with density ρ (a scalar field) flows with velocity v (a vector field) under the influence of the internal pressure p (a scalar field). Then the integral in (4g) is the force exerted by the pressure of the fluid inside δS on the fluid outside, so that minus the integral is the force exerted on the fluid inside δS  by the pressure of the fluid outside. Dividing by dV, we find that −∇p, as defined by (4g), is the force per unit volume, due to the pressure outside the volume.[8] If this is the only force per unit volume acting on the volume (e.g., because the fluid is non-viscous and in a weightless environment, and the volume element is not in contact with the container), then it is equal to the acceleration times the mass per unit volume; that is,

 

 

 

 

 

(6g)

Now provided that the left side of this equation is locally continuous, it can be considered uniform inside the small δS, so that the left side is unambiguous, whence  p is also unambiguous. If there are additional forces on the fluid element, e.g. due to gravity and⧸or viscosity, then −∇p is not the sole contribution to density-times-acceleration, but is still the contribution due to pressure, which is still unambiguous.

By showing the unambiguity of definition (4g), we have confirmed theorem (5g). In the process we have seen that the volume-based definition of the gradient is useful for the modeling of fluids, and intuitive in that it formalizes the common notion that a pressure "gradient" gives rise to a force.

Unambiguity of the divergence

edit

In the aforesaid fluid, in a short time dt, the volume that flows out of fixed closed surface δS  through a fixed surface element of area dS  is vdt⸱ n̂ dS  (i.e., the displacement normal to the surface element, times the area).  Multiplying this by density and integrating over δS, we find that the mass flowing out of δS  in time dt is   .  Dividing this by dV, and then by dt, we get the rate of reduction of density inside δS ; that is,

 

where the derivative w.r.t. time is evaluated at a fixed location (because δS is fixed), and is therefore written as a partial derivative (because other variables on which ρ might depend—namely the coordinates—are held constant). Provided that the right-hand side is locally continuous, it can be considered uniform inside δS and is therefore unambiguous, so that the left side is likewise unambiguous. But the left side is simply div ρv  as defined by (4d),[d] which is therefore also unambiguous,[9] confirming theorem (5d). In short, the divergence operator is that which maps ρv to the rate of reduction of density at a fixed point:

 

 

 

 

 

(7d)

This result, which expresses conservation of mass, is a form of the so-called equation of continuity.

The partial derivative ∂ρ/∂t in (7d) must be distinguished from the material derivative /dt, which is evaluated at a point that moves with the fluid.[e] [Similarly, dv/dt in (6g) is the material acceleration, because it is the acceleration of the mobile mass—not of a fixed point! ]  To re-derive the equation of continuity in terms of the material derivative, the volume vdt⸱ n̂ dS , which flows out through dS in time dt (as above), is integrated over δS to obtain the increase in volume of the mass initially contained in dV. Dividing this by the mass, ρ dV, gives the increase in specific volume (1⧸ρ) of that mass, and then dividing by dt gives the rate of change of specific volume; that is,

 

Multiplying by ρ² and comparing the left side with (4d), we obtain

 

 

 

 

 

(7d')

Whereas (7d) shows that div ρv is unambiguous, (7d') shows that div v is unambiguous (provided that the right-hand sides are locally continuous). In accordance with the everyday meaning of "divergence", (7d') also shows that div v is positive if the fluid is expanding (ρ decreasing), negative if it is contracting (ρ increasing), and zero if it is incompressible. In the last case, the equation of continuity reduces to

 [ for an incompressible fluid ].

 

 

 

 

(7i)

For incompressible flow, any tubular surface tangential to the flow velocity, and consequently with no flow in or out of the "tube", has the same volumetric flow rate across all cross-sections of the "tube", as if the surface were the wall of a pipe full of liquid (except that the surface is not necessarily stationary). Accordingly, a vector field with zero divergence is described as solenoidal (from the Greek word for "pipe"). More generally, a solenoidal vector field has the property that for any tubular surface tangential to the field, the flux integrals across any two cross-sections of the "tube" are the same—because otherwise there would be a net flux integral out of the closed surface comprising the two cross-sections and any segment of tube between them, in which case, by the divergence theorem (5d), the divergence would have to be non-zero somewhere inside, contrary to (7i).

Unambiguity of the curl (and gradient)

edit

The unambiguity of the curl (4c) follows from the unambiguity of the divergence. Taking dot-products of (4c) with an arbitrary constant vector b, we get

 

that is, by (4d),

 [ for uniform b].

 

 

 

 

(8c)

(The parentheses around  q × b  on the right, although helpful because of the spacing, are not strictly necessary, because the alternative binding would be (div q), which is a scalar, whose cross-product with the vector b is not defined. And the left-hand expression does not need parentheses, because it can only mean the dot-product of a curl with the vector b; it cannot mean the curl of a dot-product, because the curl of a scalar field is not defined.) This result (8c) is an identity if the vector b is independent of location, so that it can be taken inside or outside the surface integral; thus b may be a uniform vector field, and may be time-dependent. If we make b a unit vector, the left side of the identity is the (scalar) component of curl q in the direction ofb, and the right side is unambiguous. Thus the curl is unambiguous because its component in any direction is unambiguous. This confirms theorem (5c).

Similarly, the unambiguity of the divergence implies the unambiguity of the gradient. Starting with (4g), taking dot-products with an arbitrary uniform vector b, and proceeding as above, we obtain

 [ for uniform b].

 

 

 

 

(8g)

(The left-hand side does not need parentheses, because it can only mean the dot-product of a gradient with the vector b; it cannot mean the gradient of the dot-product of a scalar field with a vector field, because that dot-product would not be defined.) If we make b a unit vector, this result (8g) says that the (scalar) component ofp in the direction ofb is given by the right-hand side, which again is unambiguous. So here we have a second explanation of the unambiguity of the gradient: like the curl, it is unambiguous because its component in any direction is unambiguous.

We might well ask what happens if we take cross-products with b on the left, instead of dot-products. If we start with (4g), the process is straightforward: in the end we can switch the order of the cross-product on the left, and change the sign on the right, obtaining

 [ for uniform b].

 

 

 

 

(8p)

(Again no parentheses are needed.) If we start with (4c) instead, and take b inside the integral, we get a vector triple product to expand, which leads to

 

in which the first term on the right is simply  ∇ b⸱q  (the gradient of the dot-product). The second term is more problematic. If we had a scalar p instead of the vector q, we could take b outside the second integral, so that the second term would be (minus) b ⸱ ∇p. This suggests that the actual second term should be (minus) b ⸱ ∇q.  Shall we therefore adopt the second term (without the sign) as the definition ofb⸱∇ q for a vector q (treating b⸱ as an operator), and write

 [ for uniform b] ?

 

 

 

 

(8q)

The proposal would be open to the objection that  b⸱∇ q  had been defined only for uniform b , whereas  b ⸱ ∇p (for scalar p) is defined whether b is uniform or not.  So, for the moment, let us put (8q) aside and run with (8c), (8g), and (8p).

Another meaning of the gradient

edit

Let be a unit vector in a given direction, and let s be a parameter measuring distance (arc length) along a path in that direction. By equation (8g) and definition (4d), we have

 

where, by the unambiguity of the divergence, the shape of the closed surface δS enclosing dV  can be chosen for convenience. So let δS be a right cylinder with cross-sectional area α  and perpendicular height ds , with the path passing perpendicularly through the end-faces at parameter-values s and s+ds , where the outward unit normal n̂ consequently takes the values and  , respectively. And let the cross-sectional dimensions be small compared with ds  so that the values of p at the end-faces, say p and p+dp, can be taken to be the same as where the end-faces cut the path. Then  dV = α ds , and the surface integral over δS includes only the contributions from the end-faces (because n̂ is perpendicular to elsewhere); those contributions are respectively    and     i.e.    and  .  With these substitutions the above equation becomes

 

that is,

 

 

 

 

 

(9g)

where the right-hand side, commonly called the directional derivative of p in the direction,[10] is the derivative of p w.r.t. distance in that direction. Although (9g) has been obtained by taking that direction as fixed, the equality is evidently maintained if s measures arc length along any path tangential  to at the point of interest.

Equation (9g) is an alternative definition of the gradient: it says that the gradient of  is the vector whose scalar component in any direction is the directional derivative of  in that direction. For real , this component has its maximum, namely |p| , in the direction ofp; thus the gradient of  is the vector whose direction is that in which the derivative of  w.r.t. distance is a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the gradient.[11] Sometimes it is convenient to work directly from this definition. For example, in Cartesian coordinates (x, y, z), if a scalar field is given by x , its gradient is obviously the unit vector in the direction of the x axis, usually called i; that is, x = i. Similarly, if  r = rr ̂  is the position vector, then r = r ̂.

If  is tangential  to a level surface of p (a surface of constant p), then s p  in that direction is zero, in which case (9g) says that p (if not zero) is orthogonal to .  So  is orthogonal to the surfaces of constant  (as we would expect, having just shown that the direction ofp is that in which p varies most steeply). This result leads to a method of finding a vector normal to a curved surface at a given point: if the equation of the surface is  f (r) = C ,  where r is the position vector and C  is a constant (possibly zero), a suitable vector is f  evaluated at the given point.

If p is uniform —that is, if it has no spatial variation—then its derivative w.r.t. distance in every direction is zero; that is, the component ofp in every direction is zero, so that p must be the zero vector. In short, the gradient of a uniform scalar field is zero. Conversely, if p is not uniform, there must be some location and some direction in which its derivative w.r.t. distance, if defined at all, is non-zero, so that its gradient, if defined at all, is also non-zero. Thus a scalar field with zero gradient in some region is uniform in that region.

Unambiguity of the Laplacian

edit

Armed with our new definition of the gradient (9g), we can revisit our definition of the Laplacian (4L). If ψ is a scalar field, then, by (9g),    can be replaced by   in (4L), which then becomes

 

 

 

 

 

(9L)

that is, by definition (4d),

 [ for scalar ψ].

 

 

 

 

(9L')

So the Laplacian of a scalar field is the divergence of the gradient. This is the usual introductory definition of the Laplacian—and on its face is applicable only in the case of a scalar field. The unambiguity of the Laplacian, in this case, follows from the unambiguity of the divergence and the gradient.

If, on the contrary, ψ in definition (4L) is a vector field, then we can again take dot-products with a uniform vector b, obtaining

 

If we make b a unit vector, this says that the scalar component of the Laplacian of a vector field, in any direction, is the Laplacian of the scalar component of that vector field in that direction. As we have just established that the latter is unambiguous, so is the former.

But the unambiguity of the Laplacian can be generalized further. If

 

where each   is a scalar field, and each αi is a constant, and the counter i ranges from (say) 1 to k , then it is clear from (4L) that

 

 

 

 

 

(10)

In words, this says that the Laplacian of a linear combination of fields is the same linear combination of the Laplacians of the same fields—or, more concisely, that the Laplacian is linear. I say "it is clear" because the Laplacian as defined by (4L) is itself a linear combination, so that (10) merely asserts that we can regroup the terms of a nested linear combination; the gradient, curl, and divergence as defined by (4g) to (4d) are likewise linear. It follows from (10) that the Laplacian of a linear combination of fields is unambiguous if the Laplacians of the separate fields are unambiguous. Now we have supposed that the fields   are scalar and that the coefficients αi are constants. But the same logic applies if the "constants" are uniform basis vectors (e.g., i, j,k), so that the "linear combination" can represent any vector field, whence the Laplacian of any vector field is unambiguous. And the same logic applies if the "constants" are chosen as a "basis" for a space of tensors of any order, so that the Laplacian of any tensor field of that order is unambiguous, and so on. In short, the Laplacian of any field that we can express with a uniform basis is unambiguous.

The dot-del, del-cross, and del-dot operators

edit

The gradient operator is also called del.[f] If it simply denotes the gradient, we tend to pronounce it "grad" in order to emphasize the result. But it can also appear in combination with other operators to give other results, and in those contexts we tend to pronounce it "del".

One such combination is "dot del"— as in "b⸱∇ ", which we proposed for (8q), but did not quite manage to define satisfactorily for a vector operand. With our new definition of the gradient (9g), we can now make a second attempt. A general vector field q can be written |q| q̂ , so that

 

If ψ is a scalar field, we can apply (9g) to the right-hand side, obtaining

 

where sq is distance in the direction of q. For scalar ψ, this result is an identity between previously defined quantities. For non-scalar ψ, we have not yet defined the left-hand side, but the right-hand side is still well-defined and self-explanatory (provided that we can differentiate ψ w.r.t. sq). So we are free to adopt

 

 

 

 

 

(11)

where sq is distance in the direction of q , as the general definition of the operator q⸱∇ , and to interpret it as defining both a unary operator  q⸱ which operates on a generic field, and a binary operator  which takes a (possibly uniform) vector field on the left and a generic field on the right.

For any vector field q , it follows from (11) that if  is a uniform field, then .

For the special case in which q is a unit vector  , with s measuring distance in the direction of   , definition (11) reduces to

 

 

 

 

 

(12)

which agrees with (9g) but now holds for a generic field ψ [whereas (9g) was for a scalar field, and was derived as a theorem based on earlier definitions]. So ŝ⸱∇ , with a unit vector s , is the directional-derivative operator on a generic field; and by (11),  q⸱ is a scaled directional derivative operator on a generic field.

In particular, if   = n̂  we have

 

which we may substitute into the original definition of the Laplacian (4L) to obtain

 

 

 

 

 

(13L)

which is just (9L) again, except that it now holds for for a generic field.

If our general definition of the gradient (4g) is also taken as the general definition of the operator,[12] then, comparing (4g) with (4c), (4d), and (13L), we see that

 

where the parentheses may seem to be required on account of the closing dS  in (4g).[13] But if we write the factor dS before the integrand, the del operator in (4g) becomes

 

if  we insist that it is to be read as a operator looking for an operand, and not as a self-contained expression. Then, if we similarly bring forward the dS in (4c), (4d), and (13L), the respective operators become[14]

 

 

 

 

 

(14)

(pronounced "del cross", "del dot", and "del dot del"), of which the last is usually abbreviated as2  ("del squared").[15] These notations are ubiquitous.

Another way to obtain the ∇ ×  and   operators (but not  ∇2), again inspired by (4g), is to define

 

 

 

 

 

(14s)

where T  is any well-defined function that takes a vector argument. SettingT (∇) to p , ∇ × q , and⸱ q  in (14s), we obtain respectively p ,curl q , and  div q  as given by (4g) to (4d). But this approach has undesirable side-effects—for example, that p  becomes synonymous with p.  Accordingly, Chen-To Tai,[16] on the left of (14s), replaces with his original symbol  which he calls the "symbolic operator" or the "S -operator" or, later, the "symbolic vector" or the "dummy vector". Tai in his later works (e.g., 1994, 1995) does not tolerate cross- or dot-products involving the del operator, but does tolerate such products involving his symbolic vector (1995, pp. 50–52).

There is a misconception that the operational equivalences in (14) apply only in Cartesian coordinates.[17] Tai does not accept them even in that case. But, because these equivalences have been derived from coordinate-free definitions of the operators, they must remain valid in any coordinate system provided that they are expressed correctly—without (e.g.) inadvertently taking dependent variables inside or outside differentiations.[18] That does not mean that they are always convenient, or easily verified, or conducive to the avoidance of error. But they sometimes make useful mnemonics; e.g., they let us rewrite identities (8c), (8g), and (8p) as

 for uniform b.

 

 

 

 

(15)

These would be basic algebraic vector identities if  were an ordinary vector, and one could try to derive them from the "algebraic" behavior of; but they're not, because it isn't, so we didn't !  Moreover, these simple "algebraic" rules are for a uniform b, and do not of themselves tell us what to do if  b is spatially variable; for example, (8g) is not applicable to (7d).

The advection operator

edit

Variation or transportation of a property of a medium due to motion with the medium is called advection (which, according to its Latin roots, means "carrying to"). Suppose that a medium (possibly a fluid) moves with a velocity field v in some inertial reference frame. Let ψ be a field (possibly a scalar field or a vector field) expressing some property of the medium (e.g., density, or acceleration, or stress,[g]… or even v itself). We have seen that the time-derivative of ψ may be specified in two different ways: as the partial derivative ∂ψ/∂t , evaluated at a fixed point (in the chosen reference frame), or as the material derivative /dt, evaluated at a point moving at velocity v (i.e., with the medium). The difference  /dt − ∂ψ/∂t is due to motion with the medium. To find another expression for this difference, let s be a parameter measuring distance along the path traveled by a particle of the medium. Then, for points along the path, the surface-plot of the small change in ψ (or any component thereof) as a function of small changes in t and s  (plotted on perpendicular axes) can be taken as a plane through the origin, so that

 

that is, the change in ψ is the sum of the changes due to the change in t and the change in s . Dividing by dt gives

 

i.e.,

 

(and the first term on the right could have been written t ψ). So the second term on the right is the contribution to the material derivative due to motion with the medium; it is called the advective term, and is non-zero wherever a particle of the medium moves along a path on which ψ varies with location—even if ψ at each location is constant over time.  So the operator  |v| s , where s measures distance along the path, is the advection operator : it maps a property of a medium to the advective term in the time-derivative of that property. If ψ is v itself, the above result becomes

 

where the left-hand side (the material acceleration) is as given by Newton's second law, and the first term on the right (which we might call the "partial" acceleration) is the time-derivative of velocity in the chosen reference frame, and the second term on the right (the advective term) is the correction that must be added to the "partial" acceleration in order to obtain the material acceleration. This term is non-zero wherever velocity is non-zero and varies along a path, even if the velocity at each point on the path is constant over time (as when water speeds up while flowing at a constant volumetric rate into a nozzle). Paradoxically, while the material acceleration and the "partial" acceleration are apparently linear (first-degree) in v, their difference (the advective term) is not. Thus the distinction between ∂ψ/∂t and /dt  has the far-reaching implication that fluid dynamics is non-linear.

Applying (11) to the last two equations, we obtain respectively

 

 

 

 

 

(16)

and

 

 

 

 

 

(16v)

where, in each case, the second term on the right is the advective term. So the advection operator can also be written v⸱∇ .

When the generic ψ  in (16) is replaced by the density ρ , we get a relation between ∂ρ/∂t and /dt, both of which we have seen before—in equations (7d) and (7d') above. Substituting from those equations then gives

 

 

 

 

 

(17)

where ρ can be taken as a gradient since ρ is scalar. This result is in fact an identity—a product rule for the divergence—as we shall eventually confirm by another method.

Generalized volume-integral theorem

edit

We can rewrite the fourth integral theorem (5L) in the "dot del" notation as

 

 

 

 

 

(18L)

Then, using notations (14), we can condense all four integral theorems (5g), (5c), (5d), and (18L) into the single equation

 

 

 

 

 

(19)

where the wildcard (conveniently pronounced "star") is a generic binary operator which may be replaced by a null (direct juxtaposition of the operands) for theorem (5g), or a cross for (5c), or a dot for (5d), or  for (18L). This single equation is a generalized volume-integral theorem, relating an integral over a volume to an integral over its enclosing surface.[19]

Theorem (19) is based on the following definitions, which have been found unambiguous:

  • the gradient of a scalar field p is the closed-surface integral of  n̂ p per unit volume, where n̂ is the outward unit normal;
  • the curl of a vector field is the skew surface integral per unit volume, also called the surface circulation per unit volume;
  • the divergence of a vector field is the outward flux integral per unit volume; and
  • the Laplacian is the closed-surface integral of the outward normal derivative, per unit volume.

The gradient maps a scalar field to a vector field; the curl maps a vector field to a vector field; the divergence maps a vector field to a scalar field; and the Laplacian maps a scalar field to a scalar field, or a vector field to a vector field, etc.

The gradient of p, as defined above, has been shown to be also

  • the vector whose (scalar) component in any direction is the directional derivative of p in that direction (i.e. the derivative of p w.r.t. distance in that direction), and
  • the vector whose direction is that in which the directional derivative of p is a maximum, and whose magnitude is that maximum.

Consistent with these alternative definitions of the gradient, we have defined the  operator so that  ŝ⸱ (for a unit vector ) is the operator yielding the directional derivative in the direction of   , and we have used that notation to bring theorem (5L) under theorem (19).

So far, we have said comparatively little about the curl. That imbalance will now be rectified.

Closed-circuit integrals per unit area

edit

Instant integral theorems (on a condition)

edit

Theorems (5g) to (5L) are three-dimensional: each of them relates an integral over a volume V  to an integral over its enclosing surface S. We now seek analogous two-dimensional theorems, each of which relates an integral over a surface segment to an integral around its enclosing curve. For maximum generality, the surface segment should be allowed to be curved into a third dimension.[h] Theorems of this kind can be obtained as special cases of theorems (5g) to (5L) by suitably choosing V and S ; this is another advantage of our "volume first" approach.

Let Σ be a surface segment enclosed by a curve C (a circuit or closed contour), and let l be a parameter measuring arc length around C , so that a general element of C  has length dl ; and let a general element of the surface Σ  have area . Let  be the unit normal vector at a general point on Σ , and let t ̂ be the unit tangent vector to C at a general point on C  in the direction of increasing l. In the original case of a surface enclosing a volume, we had to decide whether the unit normal pointed into or out of the volume (we chose the latter). In the present case of a circuit enclosing a surface segment, we have to decide whether l is measured clockwise or counterclockwise as seen when looking in the direction of the unit normal, and we choose clockwise. So l is measured clockwise about  and C is traversed clockwise about .

From Σ  we can construct obvious candidates for V and S. From every point on Σ , erect a perpendicular with a uniform small  height h in the direction of . Then simply let V be the volume occupied by all the perpendiculars, and let S be its enclosing surface. Thus V is a (generally curved) thin slab of uniform thickness h, whose enclosing surface S consists of two close parallel (generally curved) broad faces connected by a perpendicular edge-face of uniform height h ; and we can treat  as a vector field  by extrapolating it perpendicularly from Σ. If we can arrange for h to cancel out, the volume V  will serve as a 3D representation of the surface segment Σ  while the edge-face will serve as a 2D representation of the curve C , so that our four theorems will relate an integral around C  to an integral over Σprovided that there is no contribution from the broad faces to the integral over S. For brevity, let us call this proviso the 2D condition.

If  the 2D condition is satisfied, an integral over the new S  reduces to an integral over the edge-face, on which

 

so that the cancellation of h will leave an integral over C  w.r.t. length. Meanwhile, in an integral over the new V, regardless of the 2D condition, we have

 

so that the cancellation of h will leave an integral over Σ  w.r.t. area. So, substituting for dS and dV  in (5g) to (5L), and canceling h as planned, we obtain respectively

 

 

 

 

 

(20g)

 

 

 

 

 

(20c)

 

 

 

 

 

(20d)

 

 

 

 

 

(20L)

all subject to the 2D condition. In each equation, the circle on the left integral sign acknowledges that the integral is around a closed loop. The unit vector n̂ , which was normal to the edge-face, is now normal to both t ̂ and ; that is, n̂ is tangential to the surface segment Σ  and projects perpendicularly outward from its bounding curve.

On the left side of (20g), the 2D condition is satisfied if (but not only if) n̂p takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if p takes the same value at such points, i.e. if p has a zero directional derivative normal to Σ , i.e. if p has no component normal to Σ. Thus a sufficient "2D condition" for (20g) is the obvious one.

Skipping forward to (20L), we see that the 2D condition is satisfied if  takes equal-and-opposite values at any two opposing points on opposing broad faces of S , i.e. if  (where   measures distance in the direction of ) takes the same value at such points, i.e. if .

For (20c) and (20d), the 2D constraint can be satisfied by construction, with more useful results—as explained under the next two headings. To facilitate this process, we first make a minor adjustment to Σ and C. Noting that any curved surface segment can be approximated to any desired accuracy by a polyhedral surface enclosed by a polygon, we shall indeed consider Σ  to be a polyhedral surface made up of small planar elements,   being the area of a general element, and we shall indeed consider C to be a polygon with short sides, dl being the length of a general side.[i] The benefit of this trick, as we shall see, is to make the unit normal   uniform over each surface element, without forcing us to treat q (or any other field) as uniform over the same element. But, as the elements of C  can independently be made as short as we like (dividing straight sides into shorter elements if necessary!), we can still consider   q , and t ̂ to be uniform over each element of C.

Special case for the gradient

edit

In (20c), the 2D condition is satisfied by  (where p is a scalar field), because then the integrand on the left is zero on the broad faces of S , where n is parallel to . Equation (20c) then becomes

 

 

 

 

 

(21n)

Now on the left,    and on the right, over each surface element, the unit normal   is uniform so that, by (8p),   .  With these substitutions, the minus signs cancel and we get

 

 

 

 

 

(21g)

or, if we write  dr = t ̂ dl  and   

 

 

 

 

 

(21r)

This result, although well attested in the literature,[20] does not seem to have a name—unlike the next result.

Special case for the curl

edit

In (20d), the 2D condition is satisfied if q is replaced by  because then (again) the integrand on the left is zero on the broad faces of S , where n is parallel to . Equation (20d) then becomes

 

 

 

 

 

(22n)

Now on the left, the integrand can be written    and on the right,    by identity (8c), since   is uniform over each surface element.  With these substitutions, the minus signs cancel and we get

 

 

 

 

 

(22c)

or, if we again write  dr = t ̂ dl  and   

 

 

 

 

 

(22r)

This result—the best-known theorem relating an integral over a surface segment to an integral around its enclosing curve, and the best-known theorem involving the curl—is called Stokes' theorem or, more properly, the Kelvin–Stokes theorem,[21] or simply the curl theorem.[22]

The integral on the left of (22c) or (22r) is called the circulation of the vector field q around the closed curve C. So, in words, the Kelvin–Stokes theorem says that the circulation of a vector field around a closed curve is equal to the flux of the curl of that vector field through any surface spanning that closed curve.

Now let a general element of Σ (with area dΣ ) be enclosed by the curve δC, traversed in the same direction as the outer curve C. Then, applying (22c) to the single element, we have

 

that is,

 

 

 

 

 

(23c)

where the right-hand side is simply the circulation per unit area.

Equation (23c) is an alternative definition of the curl: it says that the curl of q is the vector whose scalar component in any direction is the circulation of q per unit area of a surface whose normal points in that direction. For real q, this component has its maximum, namely |curl q| , in the direction of curl q; thus the curl of q is the vector whose direction is that which a surface must face if the circulation of q per unit area of that surface is to be a maximum, and whose magnitude is that maximum. This is the usual conceptual definition of the curl.[23]

[Notice, however, that our original volume-based definition (4c) is more succinct: the curl is the closed-surface circulation per unit volume, i.e. the skew surface integral per unit volume.]

It should now be clear where the curl gets its name (coined by Maxwell), and why it is also called the rotation (indeed the curl operator is sometimes written "rot", especially in Continental languages, in which "rot" does not have the same unfortunate everyday meaning as in English). It should be similarly unsurprising that a vector field with zero curl is described as irrotational (which one must carefully pronounce differently from "irri tational"!), and that the curl of the velocity of a medium is called the vorticity.

 
Animation of a non-vortex-like velocity field whose curl (like its circulation around the red loop) is non-zero due to shear.
 
Animation of a vortex-like velocity field whose curl is zero because the shear compensates for the rotation.

However, a field does not need to be vortex-like in order to have a non-zero curl; for example, by identity (8p), in Cartesian coordinates, the velocity field xj has a curl equal to  x × j = i × j = k ,  although it describes a shearing motion rather than a rotating motion. This is understandable because if you hold a pencil between the palms of your hands and slide one palm over the other (a shearing motion), the pencil rotates. Conversely, we can have a vortex-like field whose curl is zero everywhere except on or near the axis of the vortex. For example, the Maxwell–Ampère law in magnetostatics says that  curl H = J , where H is the magnetizing field and J is the current density.[j] So if the current is confined to a wire, curl H is zero outside the wire—although, as is well known, the field lines circle the wire. The resolution of the paradox is that H gets stronger as we approach the wire, making a shearing pattern, whose effect on the curl counteracts that of the rotation.

The curl-grad and div-curl operators

edit

We have seen from (9L) that the Laplacian of a scalar field is the divergence of the gradient. Four more such second-order combinations make sense, namely the curl of the gradient (of a scalar field), and the divergence of the curl, the gradient of the divergence, and the curl of the curl (of a vector field). The first two —"curl grad" and "div curl"— can now be disposed of.

Let the surface segment Σ enclosed by the curve C  be a segment of the closed surface S surrounding the volume V, and let Σ expand across S until it engulfs{{mvar| V},} so that C shrinks to a point on the far side of S. Then, in the nameless theorem (21g) and the Kelvin–Stokes theorem (22c), the integral on the left becomes zero while Σ and   on the right become S and n̂ , so that the theorems respectively reduce to

 

and

 

Applying theorem (5c) to the first of these two equations, and the divergence theorem (5d) to the second, we obtain respectively

 

and

 

As the integrals vanish for any volume V  in which the integrands are defined, the integrands must be zero wherever they are defined; that is,

 

 

 

 

 

(24c)

and

 

 

 

 

 

(24d)

In words, the curl of the gradient is zero, and the divergence of the curl is zero; or, more concisely, any gradient is irrotational, and any curl is solenoidal.

We might well ask whether the converses are true. Is every irrotational vector field the gradient of something? And is every solenoidal vector field the curl of something? The answers are affirmative, but the proofs require more preparation.

Meanwhile we may note, as a mnemonic aid, that when the left-hand sides of the last two equations are rewritten in the del-cross and del-dot notations, they become  ∇ × ∇p  and  ∇  ∇ × q , respectively. The former looks like (but isn't) a cross-product of two parallel vectors, and the latter looks like (but isn't) a scalar triple product with a repeated factor, so that each expression looks like it ought to be zero (and it is). But such appearances can lead one astray, because is an operator, not a self-contained vector quantity; for example,  p × ∇φ  is not identically zero, because two gradients are not necessarily parallel.[24]

We should also note, to tie a loose end, that identity (24d) was to be expected from our verbal statement of the Kelvin–Stokes theorem (22c). That statement implies that the flux of the curl through any two surfaces spanning the same closed curve is the same. So if we make a closed surface from two spanning surfaces, the flux into one spanning surface is equal to the flux out of the other, i.e. the net flux out of the closed surface is zero, i.e. the integral of the divergence over the enclosed volume is zero; and since any simple volume in which the divergence is defined can be enclosed this way, the divergence itself (of the curl) must be zero wherever it is defined.

Change per unit length

edit

Continuing (and concluding) the trend of reducing the number of dimensions, we now seek one-dimensional theorems, each of which relates an integral over a path to values at the endpoints of the path. For maximum generality, the path should be allowed to be curved into a second and a third dimension.

We could do this by further specializing theorems (5g) to (5L). We could take a curve Γ with a unit tangent vector . At every point on Γ we could mount a circular disk with a uniform small area α , centered on Γ and orthogonal to it. We could let V be the volume occupied by all the disks and let S be its enclosing surface; thus V would be a thin right circular cylinder, except that its axis could be curved. If we could arrange for α to cancel out, our four theorems would indeed be reduced to the desired form, provided that there were no contribution from the curved face of the "cylinder" to the integral over S (the "1D proviso"). But, as it turns out, this exercise yields only one case in which the "1D proviso" can be satisfied by a construction involving and a general field, and we have already almost discovered that case by a simpler and more conventional argument—which we shall now continue.

Fundamental theorem

edit

Equation (9g) is applicable where p(r) is a scalar field,  s is a parameter measuring arc length along a curve Γ, and is the unit tangent vector to Γ in the direction of increasing s. Let s take the values s1 and s2 at the endpoints of Γ, where the position vector r takes the values r1 and r2 respectively. Then, integrating (9g) w.r.t. s from s1 to s2 and applying the fundamental theorem of calculus, we get

 

 

 

 

 

(25g)

This is our third integral theorem involving the gradient, and the best-known of the three: it is commonly called simply the gradient theorem,[25] or the fundamental theorem of the gradient, or the fundamental theorem of line integrals; it generalizes the fundamental theorem of calculus to a curved path.[26] If we write dr  for  ds (the change in the position vector), we get the theorem in the alternative form

 

 

 

 

 

(25r)

As the right-hand side of (25g) or (25r) obviously depends on the endpoints but not on the path in between, so does the integral on the left. This integral is commonly called the work integral ofp over the path—because if p is a force, the integral is the work done by the force over the path. So, in words, the gradient theorem says that the change in value of a scalar field from one point to another is the work integral of the gradient of that field field over any path from the one to the other.

Applying (25r) to a single element of the curve, we get

 

 

 

 

 

(26g)

which is reminiscent of    in elementary calculus.[27] Alternatively, we could have obtained (26g) by multiplying both sides of (9g) by ds, and then obtained (25r) by adding (26g) over all the elemental displacements dr on any path from r1 to r2.

If we close the path by setting  r2 = r1 , the gradient theorem reduces to

 

 

 

 

 

(27g)

where the integral is around any closed loop. Applying the Kelvin–Stokes theorem then gives

 

 

 

 

 

(28g)

where Σ is any surface spanning the loop and  is the unit normal to Σ.  As this applies to any loop spanned by any surface on which the integrand is defined,  curl ∇p  must be zero wherever it is defined. This is a second proof (more conventional than the first) of theorem (24c).

Scalar potential: field with given gradient

edit

Lemma:  If  curl q = 0  in a simply connected region V,  then    over any path in V  depends only on the endpoints of the path.

Proof:  Suppose, on the contrary, that there are two paths Γ and Λ in V,  with a common starting point and a common finishing point, such that

 

Let  −Λ denote Λ traversed backwards. Then for every dr on Λ  there is an equal and opposite dr on  −Λ , so that we have

 

i.e.

 

where the left-hand side is now a work integral of q around a closed loop in V.  By the simple connectedness of V,  this loop is spanned by some surface Σ in V.  So we can apply the Kelvin–Stokes theorem and conclude that the flux integral of  curl q  through Σ  is non-zero, in which case  curl q  must be non-zero somewhere on Σ , hence somewhere in V — contradicting the hypothesis of the lemma. ◼

Corollary:  If  curl q = 0  in a simply connected region V,  there exists a scalar field p such that  q = ∇p  in V.

Proof:  We shall show that a suitable candidate is

 

where r0 is the position vector of any fixed point in V,  and ρ is the position vector of a general point on the path of integration, which may be any path in V. First note that p(r) is unambiguous because, by the preceding lemma, it is independent of the path for given r0 and r, provided that the path is in V.  Now to find  p(r),  let σ be the arc length along the path from r0 to ρ, so that σ ranges from 0 to (say) s  as ρ ranges from r0 to r; and let be the unit vector tangential to the path at ρ, in the direction of increasing σ.  Then  dρ =  , so that the above equation becomes

 

Differentiating w.r.t. s gives

 

where is evaluated at  σ = s  and is therefore in the direction in which the path reaches r.  By the generality of the path, this can be any direction. So the last equation says that q is the vector whose (scalar) component in any direction is the derivative of p w.r.t. arc length in that direction; that is, q = ∇p , as required. ◼

This is the promised converse of theorem (24c). But, given an irrotational vector field q , we usually prefer to find a scalar field whose negative gradient is q;  that is, we usually prefer a scalar field   such that   .  Such a field   is called a scalar potential for q.  From the above expression for p(r), a suitable candidate is

 

 

 

 

 

(29)

A scalar field has zero gradient if and only if it is uniform, so that adding a uniform field, but only a uniform field, to a given scalar field leaves its gradient unchanged. Thus the scalar potential is determined up to an arbitrary additive uniform field. This would be the case with or without the minus sign in front of the gradient. The reason for preferring the minus sign appears next.

Conservative fields

edit

An irrotational vector field—or, equivalently, a field that is (plus or minus) the gradient of something—is described as conservative, because if the field is a force, it does zero work around a closed loop, and consequently conserves energy around the loop (at least if the field does not change during traversal of the loop).

If the only force acting on a particle is  F = −∇U,  then, by the gradient theorem, the work done on the particle over a path is the increase in −U,  i.e. the decrease in U ; and this work is the increase in the particle's kinetic energy T.  Hence, if we identify U with the potential energy, the total energy  U + T  is conserved. This interpretation of the scalar potential is possible only if the force is minus the gradient of the potential.

The minus sign is also used if the conservative vector field is an electric field (force per unit charge) or a gravitational acceleration (force per unit mass); the scalar potential is potential energy per unit charge, or potential energy per unit mass, respectively.

Some special fields

edit

The 1/r scalar potential

edit

For the potential energy field

 

 

 

 

 

(30)

where r is the distance from the origin (and r ≠ 0), let us find the corresponding force  F = −∇U.  The direction of  U  is that of the steepest increase of U, which, by the spherical symmetry, can only be parallel or antiparallel to r ̂ (the unit vector pointing away from the origin). So

 

whence

 

 

 

 

 

(31)

So the negative gradient of the 1/r  scalar potential (30) is the unit inverse-square radial vector field. Multiplying the numerator and denominator by r gives the alternative form

 

which is convenient if the center of the force is shifted from the origin to position r′: in that case we simply replace r by r − r′, and r by |r − r′|, so that the force becomes

 

and the corresponding scalar potential becomes

 

Inverse-square radial vector field

edit

We derived the vector field (31) as the negative gradient of the scalar potential (30). Conversely, given the inverse-square radial vector field (31), we could derive its scalar potential from (29). At a general point on the path, let the position vector be    so that, by (31),   .  Then (29) becomes

 

so that, if we choose  r0 → ∞ , we recover (30).

Because F, given by (31), has a scalar potential,  curl F  must be zero. This is independently obvious in that the spherical symmetry ofF seems to rule out any resemblance of rotation or shear—even at the origin, where F becomes infinite. On the last point, let us check whether  curl F  has a meaningful integral over a volume containing the origin. If the volume V  is enclosed by the surface S  whose outward unit normal is n̂ , then, by theorem (5c),

 

If V contains the origin, then, because  curl F  is zero everywhere except at the origin, the volume V  can be replaced by any element of V  containing the origin, whatever the shape of that element may be. If we choose that element to be a spherical ball centered on the origin, then n̂ is parallel to r ̂ , so that the cross-product in the integrand on the right is zero. Thus the volume integral on the left is not only meaningful, but is zero, even if the volume contains the point where the integrand is infinite. In this sense, the field F is so irrotational that its curl may be taken as zero even where the field itself is undefined!

The situation concerning the divergence ofF is more complicated. Again, let the volume V  be enclosed by the surface S whose outward unit normal is n̂.  By the divergence theorem (5d),

 

where dΩ is the solid angle subtended at the origin by the surface element of area dS , and is positive if the outward unit normal n̂ has a positive component away from the origin (r ̂⸱ n̂ > 0), and negative if n̂ has a positive component toward the origin (r ̂⸱ n̂ < 0). If the volume enclosed by S does not include the origin, then for every positive contribution dΩ there is a compensating negative contribution, so that the integral of  div F  over the volume is zero. As this applies to every such volume,  div F  must be zero everywhere except at the origin. If, on the contrary, the volume does include the origin, then the contributions dΩ add up to the total solid angle subtended by the enclosing surface, which is 4π. In summary,

 

 

 

 

 

(32d)

where δ(r), the 3D unit delta function, is zero everywhere except at the origin, but has an integral of  1 over any volume that includes the origin. For example, a unit point-mass at the origin has the density δ(r), and a point-mass m at position r′ has the density  (r − r′). As the argument of  div  in (32d) is  −∇(1/r), we also have

 

 

 

 

 

(32L)

If we shift the centers from the origin to r′, the last two results become

 

 

 

 

 

(33d)

and

 

 

 

 

 

(33L)

Field with given divergence (and zero curl)

edit

It follows from Coulomb's law that the electric field due to a point-charge Q at the origin, in a vacuum, is

 

where ϵ0 is a physical constant (called the vacuum permittivity or simply the electric constant). In a vacuum, the electric displacement field, denoted by D , is ϵ0E.  So it is convenient to multiply the above equation by ϵ0 , obtaining

 

This is a inverse-square radial vector field and therefore has zero curl.

Now suppose that, instead of a charge Q at the origin, we have a static charge density ρ(r′) in a general elemental volume dV′  at position r′ (the standard symbol for charge density being unfortunately the same as for mass density). Then the contribution from that element to the field D at position r  is

 

provided that, for each r, the dimensions of each volume element are small compared with |r − r′|. This contribution likewise has zero curl. The total field due to static charges is then the sum of the contributions:

 

 

 

 

 

(34)

where the integral is over all space. And D(r) has zero curl because all the contributions have zero curl.

Independently of the physical significance of  D(r), we can take its divergence "term by term" (or "under the integral sign"), obtaining

 

where the last step is permitted because the volume integral of the delta function of r′ is not changed by a "point reflection" (inversion) across r.  As the volume of integration (all space) includes the shifted origin of the delta function, the integral is simply 1 , so that

 

 

 

 

 

(35)

where both sides are evaluated at r.

Mathematically, this result is an identity which applies if  D is given by (34); substituting for D , we can write the identity in full as

 

 

 

 

 

(36)

where the integral is over all space, or at least all of the space in which ρ may be non-zero. Subject to the convergence of the integral, this shows that we can construct an irrotational vector field whose divergence is a given scalar field ρ(r). And of course, by theorem (24d), any curl can be added to that vector field without changing its divergence.

In electrostatics, (34) is a generalization of Coulomb's law; and (35), which follows from (34), is Gauss's law expressed in differential form. If we integrate (35) over a volume enclosed by a surface S (with outward unit normal n̂) and apply the divergence theorem on the left, we get the integral form of Gauss's law:

 

 

 

 

 

(37)

where Qe is the total charge enclosed  by S.

Field with given Laplacian

edit

In (36), we can recognize the r-dependent factor  r − r′/|r − r′|3  as  −∇1/ |r − r′|   and take the gradient operator outside the integral, obtaining

 

i.e.

 

 

 

 

 

(38)

where again the integral is over all space, or at least all of the space in which ρ may be non-zero. Subject to the convergence of the integral, this shows that we can construct a field whose Laplacian is a given field. More precisely, it shows that we can construct a scalar field whose Laplacian is a given scalar  field ρ(r). But, due to the linearity of the Laplacian, the same applies to any given linear combination of scalar fields, including any combination whose coefficients are uniform vectors, uniform matrices, or uniform tensors of any order; that is, the same applies to any field that we can express with a uniform basis.

Mathematically, (38) is simply an identity. To find its significance in electrostatics, we can multiply it by  −1⧸ϵ0 , obtaining

 

 

 

 

 

(39)

which is also an identity. But the negative gradient of the expression after the integral sign is

 

which is the contribution to the electric field at position r due to a charge  ρ(r′) dV′  at position r′ in a vacuum. So the expression after the integral sign is the corresponding contribution to the electrostatic potential, and the whole integral is the whole electrostatic potential. Denoting this by   we can rewrite (39) as

 

 

 

 

 

(40)

This is Poisson's equation in electrostatics, treating the medium as a vacuum (so that ρ must be taken as the total charge density, including any contributions caused by the effect of the field on the medium). In a region in which  ρ = 0 ,  Poisson's equation (40) reduces to

 

 

 

 

 

(41)

which is Laplace's equation in electrostatics.

The wave equation

edit

It is an empirical fact that a compressible fluid, such as air, carries waves of a mechanical nature: sound waves. In establishing the unambiguity of the gradient and the divergence, we have already derived equations dealing with the inertia and continuity (mass-conservation) of non-viscous fluids. So, by introducing a relation describing the compressibility, and eliminating variables, we should be able to get one equation (the "wave equation") in one scalar or vector field (the "wave function"), with recognizably "wavelike" solutions. And we should expect this equation to be analogous to equations describing other kinds of waves.

If we suppose, for simplicity, that the only force acting on an element of fluid is the pressure force, the applicable equation of motion is (6g). But, for reasons which will soon be apparent, let us call the pressure P, so that (6g) becomes

 

Then at equilibrium we have

 

where P0 is the equilibrium pressure. Subtracting this equation from the previous one and defining

 

we get

 

which looks like (6g), except that p is now the sound pressure (also called "acoustic pressure", or sometimes "excess pressure"), i.e. the pressure rise above equilibrium.

For the equation of continuity we can use (7d'), which we repeat for convenience:

 

Eliminating v between the last two equations is fraught because v is evaluated at a moving point in the former and at a fixed point in the latter; and introducing any relation between p and ρ is similarly fraught because p is evaluated at a fixed point and ρ at a moving point. The obvious remedy is to apply the advection rule (16) to the last two equations, obtaining respectively

 

That gets all the variables evaluated at fixed points, at the cost of making the equations more complicated and more obviously non-linear. But the equations and be simplified and linearized by small-amplitude approximations. In the parentheses in the first equation, the first term is proportional to the amplitude of the vibrations while the second term is a product of two factors proportional to the amplitude, so that, for sufficiently small amplitudes, the second term is negligible. Similarly, in the second equation, for sufficiently small amplitudes and a homogeneous medium, we can neglect the second term on the right. Then, on the left side of each equation, we are left with a factor proportional to the amplitude, multiplied by ρ. But ρ is not proportional to the amplitude; only its deviation from the equilibrium density is so proportional. Hence, for small amplitudes,  ρ can be replaced by the equilibrium density, which we shall call ρ0 , which is independent of time and (in a homogeneous medium) independent of position. With these approximations, our equations of motion and continuity become

 

where, for brevity, we use an overdot to denote partial differentiation w.r.t. time (i.e., at a fixed point, not a point moving with the fluid).

Now we can eliminate v. Taking divergences in the first equation, and differentiating the second partially w.r.t. time (which can be done inside the div operator, which represents a linear combination), we get

 

so that we can equate the right-hand sides, obtaining

 

 

 

 

 

(42)

Maintaining the small-amplitude assumption, we can now consider compressibility. For small compressions in a homogeneous medium, we may suppose that the pressure change dp is some constant times the density change. It is readily verified that such a constant must have the dimension of velocity squared. So we can say  dp = c²  , where c is a constant with the units of velocity.[k] Dividing by dt gives    whence

 

 

 

 

 

(43)

Substituting from (42) then gives the desired wave equation:

 

 

 

 

 

(44)

This is the 3D classical wave equation with the sound pressure p as the wave function. For a generic wave function ψ , in a homogeneous isotropic medium, we would expect the equation to be

 

 

 

 

 

(45)

which may be written more compactly as

 

 

 

 

 

(46)

where ☐, pronounced "wave" or "box",[l] is called the D'Alembertian operator and is defined by

 

 

 

 

 

(47)

in this paper, although other conventions exist.[m]

In a static situation, the second term on the right of (47) is zero. So one advantage of definition (47), over any alternative definition that changes the sign or the scale factor, is that in the static case, the D'Alembertian is reduced to the Laplacian, making it especially obvious that in the static case, the wave equation is reduced to Laplace's equation [compare (46) and (41)]. Also notice that the D'Alembertian, being a linear combination of two linear operators, is itself linear.

Spherical waves

edit

Having established that there are wavelike time-dependent fields described by equation (45), in which the constant c has the units of velocity, we can now make an informed guess at an elementary solution of the equation. Consider the candidate

 

 

 

 

 

(48)

where  r = rr ̂  is the position vector (so that r is distance from the origin),  f  is an arbitrary function (arbitrary except that it will need to be twice differentiable),  t is time, and c is a constant (and obviously ψ is not defined at the origin even if f  is.)

If, at the origin, the function f  has a certain argument at time  t = τ ,  then at any distance r  from the origin, it has the same argument at time  t = τ + rc ,  which is  rc later  than at the origin. Hence, if f  has a certain feature (e.g., a zero-crossing) at the origin, the time taken for that feature to reach any distance r  is rc ,  implying that the feature travels outward from the origin at speed c.  Another way to perceive this is to set the argument of f  equal to a constant (corresponding to some feature of the function) and differentiate w.r.t. t , obtaining  r ̇ = c  (the speed at which the feature recedes from the origin). Thus equation (48) describes waves  radiating outward from the origin with speed c. [n]

Equation (48) further implies that there are surfaces over which the wave function ψ  is uniform—namely surfaces of constant r,  i.e. spheres centered on the origin. These are the wavefronts. So (48) describes spherical waves.

Because the surface area of a sphere is proportional to the square of its radius, we should expect the radiated intensity (power per unit area) to satisfy an inverse-square law (if the medium is lossless—neither absorbing nor scattering the radiated power). That does not mean that the wave function itself should satisfy an inverse-square law. In a traveling wave in 3D space, there will be an "effort" variable (e.g., sound pressure) and a "flow" variable (e.g., fluid velocity), and the instantaneous intensity will be proportional to the product of the two. If the two are proportional to each other, the instantaneous intensity will be proportional to the square of one or the other. Hence if the instantaneous intensity falls off like 1/r ², the effort and flow variables—and the wave function, if it is proportional to one or the other—will fall off like 1/r. That suggests the attenuation factor 1/r  in (48).

But there are big if s in that argument. For all we know so far, the relation between effort and flow could involve a lag, so that the instantaneous product of the two could swing negative although it averages to something positive. And for all we know so far, the lag could vary with r, allowing at least one of the two (effort or flow) to depart from the 1/r  law, even if their average product still falls off like 1/r ². The 1/r  factor in (48) is therefore only an "informed guess". Notwithstanding these complications, we have also guessed that the form of the function f  (the waveform) does not change as r increases; we have not considered whether this behavior might depend on the medium, or the waveform, or the geometry of the wavefronts.

So let us carefully check whether (48) satisfies (45) or, equivalently, (46).

As a first step, and as a useful inquiry in its own right, we find ψ from definition (4L), given that ψ is a function of (r, t) only. For the surface δS  let us start with

  • a cone (not a double cone) with its apex at the origin, subtending a small solid angle ω at the origin,
  • a sphere centered on the origin, with radius r, and
  • a sphere centered on the origin, with radius r + dr ;

and let the volume element be the region inside the cone and between the spheres, so that its enclosing surface δS  has three faces: a segment of the cone, a segment of the inner sphere with area r ² ω , and a segment of the outer sphere with area (r + dr)2ω. By the symmetry of ψ , the outward normal derivative n ψ  is equal to zero on the conical face,  +r ψ(r + dr, t) on the outer spherical face, and  r ψ(r, t) on the inner spherical face. The volume of the element is  dV = r ² ω dr. So, assembling the pieces of definition (4L), we get

 

i.e.

 

 

 

 

 

(49)

Now we can verify our "informed guess". Differentiating (48) twice w.r.t. t  by the chain rule gives

 

 

 

 

 

(50)

where each prime (′) denotes differentiation of the function w.r.t. its own argument. Differentiating (48) once w.r.t. r  by the product rule and chain rule, we get

 

 

 

 

 

(51)

Proceeding as specified in (49), we multiply this by r ², differentiate again w.r.t. r (giving three terms, of which two cancel), and divide by r ², obtaining

 

 

 

 

 

(52)

Then if we substitute (52) and (50) into (47), we obviously get  ψ = 0 , satisfying (46). So we have guessed correctly.

Having shown that the D'Alembertian of ψ , as given by (48), is zero everywhere except at the origin (where it is not defined), let us now find its integral over a volume V (enclosed by a surface S) that includes the origin. From (47),

 

where the second equality follows from theorem (5L). Now because the integrand on the left is zero except at the origin, any V containing the origin will give the same integral. So for convenience, let V be a spherical ball of radius R centered on the origin. Then, by the spherical symmetry of ψ , integration over S reduces to multiplication by 4πR2, and n is equivalent tor , and dV can be taken as 4πr 2dr. With these substitutions we have

 

or, substituting from (51) and (50),

 

Again noting that any V containing the origin will give the same volume integral, we can let R approach zero, with the result that the right-hand side approaches  −4πf (t). This is the integral ofψ over any volume containing the origin, for ψ given by (48). Meanwhile ψ is zero everywhere except that the origin. In summary,

 

 

 

 

 

(53)

Shifting the center of the spherical waves from the origin to position r′, we get

 

 

 

 

 

(54)

We shall refer to the field given by (48) as the wave function due to a monopole source with strength f (t) at the origin. The D'Alembertian of this wave function is given by (53).[28] Hence the field whose D'Alembertian is given by (54) is the wave function due to a monopole source with strength f (t) at position r′. In each case, the D'Alembertian is zero everywhere except at the source.

Field with given D'Alembertian

edit

Now suppose that, instead of a wave source with strength f (t) at the general position r′, we have at that position a wave-source density  in an elemental volume dV′, whose contribution to the wave function ψ at position r  is

 

where for each r, the dimensions of each volume element are small compared with |r − r′|. Then the total wave function is the sum of the contributions:

 

 

 

 

 

(55)

where the integral is over all space.

Independently of the physical significance of ψ(r, t), we can take its D'Alembertian "under the integral sign" by rule (54), obtaining

 

that is,

 

 

 

 

 

(56)

Mathematically, equation (56) is an identity which applies if ψ(r, t) is given by (55). Substituting from (55) and solving for  we can write the identity in full as

 

 

 

 

 

(57)

where the integral is over all space, or at least all of the space in which  may be non-zero. Subject to the convergence of the integral, this shows that we can construct a wave function with a given D'Alembertian.

Physically, equation (56) gives the D'Alembertian of the wave function for a source density . It is the inhomogeneous wave equation, which applies in the presence of an arbitrary source density—in contrast to the homogeneous wave equation (46), which applies in a region where the source density is zero. In this context the word homogeneous or inhomogeneous describes the equation, not the medium (which has been assumed homogeneous and isotropic).

In a static situation, in which the D'Alembertian is reduced to the Laplacian, the inhomogeneous wave equation (56) is reduced to the form of Poisson's equation (40). As written, equation (40) is Poisson's equation in electrostatics; it applies to the charge density ρ(r), for which the scalar potential [in (39)] is

 

In electrodynamics, which takes time-dependence into account, the scalar potential due to the charge density ρ(r, t) is

 

where the wave speed c is the speed of light; this is the same as in the static case except for the delay  |r − r′| /c , indicating that the influence of the change density at r′ travels outward from that point at the speed of light. In the dynamic case, by rule (57), the D'Alembertian of the scalar potential is

 

This result is the inhomogeneous wave equation in the scalar potential—the equation which, in the electrostatic case, reduces to Poisson's equation (40).

In electrodynamics, however, the electric field  E is not simply  but  where A is the magnetic vector potential, whose defining property is that its curl is the magnetic flux density:

 

By identity (24d), this property implies

 

which is Gauss's law for magnetism. We have noted in passing—but not yet proven—that (24d) has a converse, whereby the solenoidality ofB implies the existence of the vector potential A.  Precedents suggest we might be able to prove this by finding a vector field whose curl is a delta function—perhaps through new identities relating it to a field whose divergence is a delta function—and using it to construct a vector field with a given curl. In fact we shall prove our "converse" differently, but we shall still need some new identities for the purpose. And to obtain those identities (among others), we must take the detour that we have made a virtue of not taking until now…

Cartesian coordinates

edit

Indicial notation; implicit summation

edit

Considering that a scalar field is a function of three coordinates, while a vector field has three components each of which is a function of three coordinates, we can readily imagine that coordinate-based derivations of vector-analytic identities are likely to be excruciatingly repetitive—unless perhaps we choose a notation that concisely specifies the repetition. So, instead of writing the Cartesian coordinates as x, y, z ,  we shall usually write them as xi  where  i = 1, 2, 3 ,  respectively;  and instead of writing the unit vectors in the directions of the respective axes as i, j,k ,  we shall usually write them as ei.  And for partial differentiation w.r.t. xi , instead of writing /∂xi or even xi , we shall write i.

Now comes a stroke of genius for which we are indebted to Einstein—although he used it in a more sophisticated context!  Instead of writing the position vector as

 

or even as

 

we shall write it simply as

 

where it is understood  that we sum over the repeated index. More generally, we shall write the vector field q as

 

with implicit summation, and the vector field v as

 

with implicit summation, and so on. (By that nomenclature, the position vector in Cartesian coordinates should be, and often is, called x; but we called it r because we wanted to call its magnitude r, for radius.)

Implicit summation not only avoids writing the Σ symbol and specifying the index of summation, but also allows a summation over two repeated indices, say i and j , to be considered as summed first over i and then over j or vice versa, removing the need for an explicit regrouping of terms. Of course, if we hide messy details behind a notation, we need to make sure that it handles those details correctly. In particular, when we perform an operation on an implicit sum, we implicitly perform it term-by-term, and must therefore make sure that the operation is valid when interpreted that way.

Formulation of operators

edit

Gradient:  Putting  s = xi  in (9g), we find that the scalar component of  ∇p in the direction of each ei  is  ∂i p.  To obtain the vector component in that direction, we multiply by ei.  Assembling the components, we have (with implicit summation)

 

 

 

 

 

(58g)

or, in operational terms,

 

 

 

 

 

(58o)

or, in traditional longhand notation,

 

 

 

 

 

(58t)

It is also worth noting, from (58g), that the squared magnitude of  ∇pis

 

 

 

 

 

(58s)

where we write  i p ∂i p  rather than (i p)2  to ensure that implicit summation applies!

As reported by Tai (1994), there are unfortunately some textbooks in which the del operator is defined as

 [sic! ]

—which, on its face, is not an operator at all, but a self-contained expression whose value is the zero vector (because it is a sum of derivatives of constant vectors). Among the offenders is Erwin Kreyszig, who, in the 6th edition of his bestselling Advanced Engineering Mathematics (1988, p. 486), misdefines the del operator thus and then rewrites the gradient of  f  as ∇ f, apparently imagining that the differentiation operators look through the constant vectors rather than at  them. Six pages later, he defines the divergence in Cartesian coordinates (which we shall do shortly) and then immediately informs us that "Another common notation for the divergence of v is ⸱ v," where is defined as before, but the resulting ⸱ v is apparently not identically zero![29] These errors persist in the 10th edition (2011, pp. 396, 402–3). Tai finds similar howlers in mathematics texts by Wilfred Kaplan, Ladis D. Kovach, and Merle C. Potter, and in electromagnetics texts by William H. Hayt and Martin A. Plonus.[30]  Knudsen & Katz, in Fluid Dynamics and Heat Transfer (1958), avoid the misdefinition of ∇, but implicitly define the divergence of V as V⸱  (which, as we have seen, is actually an operator), and then somehow reduce it to the correct expression for  div V. [31]  But I digress.

Curl and divergence:  Expressing the operand of the curl in components, and noting that the unit vectors are uniform, we can apply (8p):

 

If we sum over j first, this is

 

 

 

 

 

(59c)

or, in operational terms,

 

 

 

 

 

(59o)

or, in traditional longhand,

 

For the divergence we proceed as for the curl except that, instead of (8p), we use (8g):

 

that is,

 

 

 

 

 

(60d)

or, in operational terms,

 

 

 

 

 

(60o)

or, in traditional longhand,