The main purpose of this Study Guide is to give some motivation behind various topics in this course and to explain their connections to other subjects. Hopefully it will help you to understand the most basic things in this course. Sometimes, I add appropriate remarks slightly beyond this course to broaden your mind. The presentation here is rather informal in order to make this guide more readable: for example, I use the word ``recipe'' for ``formula''.
Some people think mathematics is often boring and it shows a bit of excitement only when it is applied. On the other hand some people think mathematics is exciting and it becomes boring once it is applied. No matter what they think, we instructors are trying very hard to make this course an exciting experience for you.
In general, when we write down a decimal expansion of a number a, say , we mean a is equal to . If were replaced by in the last expression, you would get a Taylor series.
How do we ``encode'' vectors in a d-dimensional space? We introduce a ``rectangular coordinate system'' so that every vector is ``encoded'' by a d-tuple of numbers called the coordinates of this vector. (You are familiar with this in a linear algebra course. But let me briefly remind you here.) A ``rectangular coordinate system'' is a set of ``basic vectors'', say , which are orthogonal to each other and are of unit length. By means of these ``basic vectors'' we can write any vector as an ``expansion'' (in a unique way) of the form . The vector is ``encoded'' by the ``coefficients'' of this expansion, called the coordinates of . The set of coordinates is not only a device to keep track of vectors, it also gives us important information about vectors for computational purposes. From the ``orthonormality'' of the basic vectors it is easy to deduce that the nth ``coefficient'' of the ``expansion'' for , namely the coefficient in front , is given by .
The story about using Fourier coefficients to ``encode'' functions of period 2L is similar to the one about vectors I have reminded you. The ``basic functions'' we pick are
These functions almost form an orthonormal basis in the space of (real-valued) functions of period 2L if we define the ``dot product'' of two functions f and g in this space to be
I say ``almost'' because of one little bad thing about these ``basic functions'': the first function is not of unit length: . Not too bad - this is the only bad thing. The choice of seems to be rather odd, but it has a great advantage, as we will see. With these ``basic functions'', we can write down the Fourier expansion of f as
In the previous story about vectors we have mentioned that the coefficient of the ``expansion'' of in front of is the dot product of with . Similarly, the coefficient here in front of the basic function is the dot product of f and this basic function, namely
Now I tell you the advantage of picking as our first basic function: the above recipe for also works for n=0. A similar recipe for finding the coefficients is given in your Fourier Series Chapter.
Finding the Fourier expansion of a function is an ``encoding'' problem. Summing a Fourier series to recover the function is a ``decoding'' problem. Studying the nature of Fourier coefficients of a function from known properties of this function is called ``harmonic analysis'' of the function. To find out properties of a function from the behaviour of its Fourier coefficients is called ``harmonic synthesis''of this function. Let us say no more; otherwise we'll get into either music or an extremely difficult area of mathematics.
10.2 Review: polar coordinates and its relation to rectangular coordinates: . Another elementary but useful identity: .
Polar coordinates, cylindrical coordinates and spherical coordinates are useful for problems involving rotational symmetries.
10.3. To find areas in polar coordinates, we use the following recipe:
The ingredient here represents the area dA of an ``infinitesmally thin slice of pizza'' depicted in the following figure: . 10.4. The best way to view a parametric curve is to regard it as the trajectory of a moving particle so that the point (x(t), y(t)) stands for the location of the particle at the time instance t.
10.5. Formally, (see Fig. 10.5.1 on P. 596); here, stands for ``the square of dx'', not ``d of ''! (``d of '' should be , which is equal to 2xdx.) Divide both sides by to get . Then take square roots: . So , which is the main ingredient in the following formula for arc length
The informal way of deriving a formula like this is very common in physical sciences.
12.1, 12.2. Review: the planar and spatial vectors. The textbook distinguishes from (a,b,c); the former expression represents a vector and the latter represents a point. Mathematically, such a distinction is unnecessary: both of them represent an ordered triple of real numbers, that is, an element in . Regarding such an element as a vector or a point is our own interpretation, in other words, our subjective opinion. So in this course let us not to be fussy about this distinction. Forget about ! Just write (a, b, c).
Everything in these two sections can be easily generalized to n-dimensional space (here n is any positive integer.) The dot product and the length for elements and in are given respectively by
The angle between and can be found from . Note that we have . It is fair to say that almost everything we study in this course can be easily generalized to the higher dimensional settings. To keep matter simple, we only look at two or three dimensional situations.
12.3. Review: the cross product. To compute , use either (1) or (5). The magnitude of is given by either (6) or (7). The direction of is determined by the statements in Fig. 12.3.1 and Fig. 12.3.2. The triple product is given by the determinant in (17). The absolute value of this determinant gives the volume of the parallelepiped spanned by these three vectors. Its sign tells us the so-called orientation of the triple and , in that order.
I should point out that one of the prettiest formulae in vector algebra is the identity in Exercise 33. It is highly nontrivial and surprisingly simple. I still don't know how to understand it properly to make it transparent. But we are not going to touch this formula in this course.
12.4. Given a point and a nonzero vector , we have the following parametric equations of the line L through the point in the direction of : , or, in the vector form . Don't worry about the symmetric equations (6) on P. 735: they are nice to look at, but they are not useful.
The (parametric) equation of the line going through two (distinct) points and is given by . Indeed, this vector equation can be rewritten as and hence it runs through and is in the direction of . Clearly and . So, when t increases from 0 to 1, moves from to . This tells us that when t is between 0 and 1, is on the line segment joining and and vice versa. [Line segments are needed for defining convex sets, which is one of the most important concepts in optimization theory.] The general equation of a plane is ax+by+cz+d=0, where a, b, c are not simultaneously zeros. This equation inform us one important thing: the vector is perpendicular to the plane. From this it is not hard to see that a plane passing through a given point and perpendicular to a giev vector is described by the equation .
In general a surface can be described by an equation of the form f(x,y,z)=C; (it is useful to view this as a level surface of the scalar field f at the level C.) A plane is just the special case when we take f(x,y,z)=ax+by+cz+d and C=0. Often it is more convenient to use parametric equations for surfaces (for computing surface areas, for example). Parametric equations for planes are given by (16) on P. 739.
12.5. Conceptually nothing is new in this section. It is just an extension of Section 10.2 by adding one more dimension. In general, a parametric curve in n-space is given by a vector-valued function . It can be used to represent the motion of a particle. The first derivative of gives the velocity: . The magnitude of the velocity is called the speed: . (The distinction between and v is crucial! Look out!) The second derivative of gives the acceleration: . When the particle moves with the unit speed (that is, ), the magnitude of the acceleration, namely , gives the curvature of this parametric curve. In some engineering design problems the concept of curvature is crucial. But in this course we do not ask you anything about curvature in the exam.
Recall that a straight line in the parametric form is . When n=3, putting , and we can rewrite this vector equations by three scalar equations: . \
12.8. Cylindrical coordinates is easy; just add z=z to polar coordinates.
Spherical coordinates is harder. Keep Fig. 12.8.5 (on P. 785) in mind. From this figure you can immediately tell the change between rectangular coordinates and spherical coordinates:
The spherical coordinate system is very handy, because many problems in physics and engineering have settings with rotational symmetries.
13.3. Don't worry about the precise definition of limits given on P. 807; it is very hard to understand and still harder to apply! Students in pure mathematics has to struggle through this definition for the future profession they want to enter. They have to learn it, but not from a calculus book like this, in which the rigorous treatment of limits is done half-heartedly. If you could understand this definition, I would be very pleased. Otherwise, just keep the following crude idea in mind: `` '' means that, when the point (x,y) approaches to (a,b), f(x,y) is getting nearer and nearer to L, (allowing f(x,y) to flutuate when it is getting close to L.) smallskip
Study Examples 6, 7 and 8 in that section carefully. Think about the followng question: what is the difference between the expressions
Give an example to illustrate your point.
13.4. Partial derivatives are not much harder than usual derivatives. To find the partial derivative of a function f(x,y,z), all you have to do is to differentiate f with respect to the variable x, pretending that y and z are constants.
Aside: In thermodynamics, one often uses expressions like which means, considering u as a function of v and w, the partial derivative of u with respect to v, keeping w fixed. Quite often one has the illusion in thinking that and should be the same because both partial derivatives are computed by regarding u as a function of v alone. Alas! This is a bad mistake! Watch out for this pitfall. A simple example can illustrate what is going wrong:
Example: Let u=v+w. Then . Introduce a new variable z by putting z=w+v. Then w=z+v and hence w=2v+z, which gives . So
Here is a simple way to see why the plane tangent to the surface z=f(x,y) at the point (a,b, f(a,b)) should be the one given by (11) on P. 818:
First, write down the equation for a (nonvertical) plane . Requirement 1: this plane should go through the point (a,b, f(a,b)). Consequently C must be f(a,b). Requirement 2: f and g have the same (first order) partial derivatives at (a,b). Consequently and .
13.5. Necessary condition for Max and Min: This is analogous to the one-dimension case, except that the computation may be more complicated. Let me give you an entertaining example to show how this ``first derivative test'' is applied.
Example: Given three ``sites'' , and in the xy-plane, find a location (a,b) such that the sum of distances from (a,b) to these sites is minimized. (You may think of three small towns located at these sites. You are asked to find the location of a shopping mall which minimizes the cost of building the roads connected to these towns.) Solution: Take any point (x,y). The distance to the ith site is
We have to minimize . The necessary conditions for local extremum at (a,b) are (P. 825, (3)): . But ; (for simplicity, we use the summation symbol ). An easy computation shows
Now we make a crucial observation: and can be put in a vector form: ; (here is considered as a vector with two components, namely and -you should recognize that this vector can also be written as , the gradient of f). Since and , the vector can be written as the sum of three vectors: , where
is a unit vector for each i=1,2,3. We have three unit vectors whose sum is zero. So the angle between each pair must be . So the required location for the shoppimg mall is the one such that the angles between the roads leading to towns are . [Such a location may not exist. In this case the required location is one of the given sites - the shopping mall is in one of the towns. There is a nice geometric construction to get this location (in case it is not one of these sites). On each side of the triangle erect an equilateral triangle. Denote these equilateral triangles by , and . The lines , and will intersect at a common point, which is the location we are looking for.]
Suppose that we link (x,y) to three sites by springs with the same elastic constant k. Then, for finding the equilibrium location, indeed of minimizing , we have to minimize the total potential enery . Find out where the equilibrium location should be and verify your answer by applying Hooke's Law.
13.6. You shouldn't get the wrong impression from the textbook that the differentials are introduced only for using to estimate the value of a function at a point which is close to another point at which the value of the function is easy to find. Differentials of functions are just at the beginning part of a huge subject called differential forms. It is impossible to describe in a few words about this vast and deep subject. Let me just clarify the way differential forms (of first degree) are used in thermodynamics, which often seems to be very obscure and mysterious in physics or chemistry textbooks, making a famous mathematician named J. Marsden complain that thermodynamics is usually badly taught.
First, you have to be told the difference between two words: differentials and differential forms. Differentials are often called exact differentials to emphasize this difference. By a differential we mean something that can be written as df for some function f, which is called the differential of f. A differential form (of first degree) is something which can be written as for some functions . Thus a differential is a differential form, but not vice versa. For example, xdx+ydy is a differential because it can be rewritten as . Another example: xdy+ydx is a differential because xdy+ydx=d(xy). But the differential form xdy is not a differential. Otherwise we would have for some f and hence and . But means that f is independent of x and hence so is , contradicting !
In thermodynamics we have quantities called state functions such as P (pressure), V (volume), T (temperature), U (internal energy), S (entropy), etc. We can pick any two of them as independent variables and consider the rest as functions of them. There are many ways to pick these two variables among many of them. This is the basic reason why there is an enormous amount of formulas in this subject, many of which seem to be only skin-deep but turn out to be surprisingly useful for solving practical problems. Since these quantities are functions, you are allowed to take their differentials. So things like dV, dP, dU etc. make sense. There are other quantities which are not state functions, such as Q (heat) and W (work): amount of heat released or amount of work done is not a function of, say, T and V. You are not allowed to write things like dQ, dW. You have to put them into something with a different look, such as or , to represent ``infinitesmal amount of heat'' or ``infinitesmal amount of work''. The first law of thermodynamics says . When work is done by pushing a piston, we have .
Why can't we write dW=PdV? Well, since P and V can be chosen to be two independent variables, PdV just like xdy cannot be a differential! The second law of thermodynamics tells us that the expression
is actually a differential and you can put it as dS. In this way a new function S called entropy can be introduced. Watch out: the existence of S is not purely a mathematical fact! You cannot prove this within the framework of mathematics. You do need physics to get it.
In this course we do not plan to get into differentials forms, which is a huge subject much beyond the level of this course. There will be at most one question (or none, ha ha) about differentials in the exam.
13.7. Here you are taugh in great detail how to apply the chain rule properly and carefully. But in many physics and engineering books this is usually done in a vary sloppy way and confusion often arises. One reason is that in such books the variables of functions are rarely mentioned. You have to figure them out by yourself before you can make sense out of these functions. Here I show you an identity often seen in thermodynamics, which, at first sight, looks very puzzling.
Example: Verify the following identity:
Solution: Regard u as a function of v and w, say u=f(v,w). Then is just the partial derivative . Similarly, writing v=g(u,w) we have . Substitute v=g(u,w) into u=f(v,w), we have
Since u,w are considered as independent variables in h(u, w), it follows from h(u,w)=u that . On the other hand, applying the chain rule to h(u,w)=f(g(u,w),w), we have
Now is just . Furthermore, when v is fixed, u as a function of w is inverse to w as a function of u. Therefore . Thus (*) can be rewritten as
Rearranging terms, we get the required identity.
There are other identities of this sort in thermodynamics. They can be verified in the same manner. But your physics professor may advise you to memorize them. Well, I cannot say very much about this advice (to be honest, I don't like it) because I am not an expert in this matter.
A student asked me: enginneers are usually very sloppy about the chain rule; so why do we have to be so meticulous when we are studying this topic? Well, my answer is: yes, they appear to be sloppy, but they don't make any mistake - they can tell this from their knowledge or their expert sense of accuracy. But now you are in the learning stage and being meticulous is the only way to avoid mistakes.
13.8. A gradient vector is always perpendicular to the ``level curve'' F(x,y)=C at each point of this curve (or ``level surface'' F(x,y,z)=C at each point of this surface). It is pointing at the direction in which F has its maximal rate of change . For a unit vector , the directional derivative is the rate of change of F in the direction of . Since the gradient of F at P is perpendicular to the surface F(x,y,z)=0 for a point on this surface, the tangent plane to the surface at this point is given by
Study Example 5 on P. 859. This tells you a favourite kind of questions for a professor to put in exam. Notice that equation (11) on page 818 is a special case of the present one because the surface z=f(x,y) can be treated as the level surface F(x,y,z)=0 with F(x,y,z)=f(x,y)-z.
Finally, I should mention that for finding maxima or minima in practical problems we need special numerical methods, such as the method of steapest descent, or Newton's method, or conjugate direction method. Most of them involves the notion of gradients. [Recently, a probabilistic method called genetic algorithm invented by Holland has become very popular. I saw a job offer of professorship in computer specifying in this area.]
13.9. The Lagrange Multiplier for one constraint means a new variable introduced so that you can put down (P. 864, (2)). Some people put the identity as . This shouldn't cause any alarm because their is just the negative of your . The equations set up by applying Lagrange multipliers are often quite difficult to solve. The important thing you learn is how to set up these equations properly. This method doesn't tell you whether the extremum points you gave obtained give you maxima, minima, or neither. Even you look through many good math books you can not find something like ``the second derivative test'' for max or min problems with contraints. Surprisingly, there are many economic books dealing with this problem - obviously max and min are basic problems when one studies profits, costs, ultility functions, budget constraints etc.in which Lagrange multipliers become prices. [This second derivative test roughly says that a function f of n-variables subjected to r ``independent'' constraints ( ) attains its minimum at a point in the ``submanifold'' C defined by these constraints if the gradient vanishes at and if the Hessian matrix of is positive definite on the tangent space to the submanifold C at , where is the Lagrangian function defined by . A watered down version of Hessian will be described in the next section.]
13.10. The second derivative test for two variables is applied in the following way. Suppose that and vanish at (a,b); (otherwise (a,b) is not a critical point and we don't go further.) First look at the determinant
When it is negative at (a,b), no need to go further: just conclude that (a,b) is a saddle point. When this determinant is positive, take the second step: look at the sign of (or ) at (a,b): + gives Min and - gives Max. Notice that the second step is consistent with the second derivative test for functions of one variable. In this course we only study the second dervative test for functions of two variables. For functions of more variables the statement about this test is more complicated. It involves something called Hessian matrices and something called positive definiteness about such matrices. For two variales case, above is just the determinant of the Hessian matrix of f.
14.1, 14.2, 14.3, 14.6. To compute multiple integrals
usually we converte them into iterated integrals. For example, when the region R is vertically simple, we use
to compute the double integral. Here, and are the lower and the upper boundary curves of the region R over which we integrate f: In applying this formula, first you integrate with respect to y for each fixed x. With x fixed, y is ``running'' from to ; see the above figure. This explains why we set the lower and upper limits and in the integral . Notice that the last integral is a function of x; (this x comes from three places: f(x,y), and .) Integrating this function of x over the interval [a,b] gives us the value of the double integral . The whole procedure is rather routine. A few exercises will give you a good idea. The crucial step is to find out the upper and lower boundary curves , of the region R and to realize that at this stage you are letting variable y (not x!) running for fixed x. Of course you need a correct (but not necessarily accurate) picture of R to find the boundary curves. When the region R is rectangular, described by something like , things become very simple: and are just constants c and d respectively.
14.4, 14.7, 14.9. If you try to find the double integral
by means of the method described in Section 14.3, you will fall into the lamentable situation of being forced to compute the following horrendous iterated integral:
The trouble here is caused by ignoring the circular symmetry of the region R. For such a region, one should consider using polar coordinates and apply formula (5) on P. 908 instead:
In the present case we have and . Putting , , and into the original integral and figuring out the boundary of the region in terms of polar coordinates, we get
which is easy to manage. Picking the ``right'' coordinate system for multiple integrals is often crucial for solving the problem. See how the ``Gaussian integral'' on P. 910 is evaluated by a very slick use of palar coordinates.
For changing one coordinate system to another for finding integrals, we may use the change of variable formula. For functions of one variable, this formula reads
(The interval [c,d] is the one which maps onto [a,b] by the function x(u).) You probably do not remember or even are not aware of this formula. I don't blame you, because we rarely need this in calculating integrals for functions of one variable. But for multiple integrals the situation is entirely different. The analogous formula for change of variables becomes extremely important. In the several variables case, the derivative in is replaced by something called Jacobian
When the transform sending (u,v) to (x,y) is linear, say , we have , , , and hence the Jacobian in the case is just the determinant of this linear transform:
According to the change of variables formula, this linear transform sends any region in the uv-plane to a region in the xy-plane with area enlarged by |D| times. For example, we know that the area of the disk R surrounded by the circle in the uv-plane is . The linear transform (a and b are positive numbers) sends this disk to the region in the xy-plane bounded by the ellipse . The determinant of this linear transform is ab. Therefore the area enclosed by the ellipse is . It takes only a minute of reflection for obtaining this answer, which takes a lot of work to get if you use the usual method of first year calculus. Formally, we may write
For example, from and we have , , and . So
and hence . Of course you recognize this identity as the ``heart'' of formula (3) on P. 907.
Read P. 950 (5a) and P. 952 (9). Also read Examples 3 and 5 in §14.9.
14.8. A parametric surface needs two parameters for its description because a surface is an object with two ``degrees of freedom'' and each parameter is only responsible for one ``degree of freedom''. The area for a parametric surface is given by (8) on P. 942:
[This formula can be generalized to surface integrals on P. 944; (see formula (4). However, we will barely mention this surface integral in this course.] It is interesting to note that, when the z-component of is identically zero, the formula for the surface integral becomes the change of variable formula (5a) on P. 950. Indeed, in this case, . Hence and . So
Let me say again: in this course you are not responsible for the surface integral mentioned here.
15. 1. The operator
is one of the most commonly seen animals in physics and engineering, just like those rodents all over our campus. It begets grad, div and curl:
Note that grad transforms a scalar field to a vector field, div transforms a vector field to a scalar field and curl transforms a vector field to a vector field. Both grad and div work for other dimensions, but curl only works in 3-dimension. For a scalar field in , we have
For a vector field in , where each component of is a function of , we have
For a vector field , where components P, Q and R are functions of (x,y,z), is given by the recipe (12) on P. 963. The basic connections between grad, div and curl are:
The local version of the converses of these are also true [examples similar to Example 4 on P. 988 show that the global versions are invalid]:
Finally, we mention that is , the Laplacian of f:
A function f is called a harmonic function if its Laplacian is zero: . [You are not responsible for Laplacians in this course.]
15.2, 15.3. Line integrals: there are two types. The first type is less important. You should concentrate on the second type given by:
(The textbook put as . We do not recommend that! It gives one the wrong idea that the line integral has something to do with the metric structure of the space because of the presence of ds. This is not so!) When , we have
The following fact (Theorem 2 on P. 979) tells you why line integrals (of the second type) are important: a vector field is conservative (i.e. it has a potential f so that ) if and only if every line integral is independent of the path C, (it only depends on the end points of the path C.) This fact is true for vector fields in a space of any dimension. Practically, the condition on line integrals about the independence of paths is impossible to check. Luckily, when the domain of is a box, it is enough to check the identities
(Theorem 2 on P. 981 is the special case n=2 of this fact.)
Given a potential field, how do we find its potential? There are two ways, illustrated by Example 3 on P. 980 and Example 4 on P. 981. Study both of them carefully. This belongs to the kind of questions that a professor loves to put in the exam.
How would a physicist solve a nonlinear second order differential equation of the form ? Well, he or she would say: Regard F as a force asserting on a point of mass m=1. Introduce a potantial function V by putting (here a is any convenient point.) Then we can verify that the total energy remains constant throughout the motion. Indeed,
To finish the proof, just quote: a function is a constant if its derivative vanishes everywhere.) Thus we have arrived at a first order equation , which can be solved by the usual method of separation of variables. [You will not be asked any question about differential equations in the exam.]
A central field is a vector field of the form ; (recall that ). Many important vector fields in physics are central fields. A well-known fact is: a central fields have potentials. Why? How to find these potentials? [Don't worry if you cannot find the answer. I am not going to ask you questions like this in the exam.]
The rest of the textbook is about three major theorems in several variable calculus: Green's theorem, the divergence theorem and Stoke's theorem. All three theorems can be considered as higher dimension generalizations of of the Fundamental Theorem of Calculus you learned in the first year. Recall this Fundamental Theorem, which is about an identity valid under mild assumptions:
These theorems combined can be considered as the summit of an advanced calculus course. You are not responsible for them, but we urge you to browse through them - at least you should know their names.
It is possible to give a unified treatment of these theorems or theorems alike to come up with a really neat formula . Unfortuntely, interpretation of this formula needs works hundred times longer than this chapter.
15.5, 15.6. There are two types of surface integrals. The first type is the one with respect to surface area (P. 944, (4)), which is less important:
The second type more important and trickier (P. 997 (17) and P. 998 (18)):
We need this surface integral to describe the divergence theorem:
The left hand side of the identity is the flux through the boundary surface S. Imagine that T is a building on fire. Somewhere inside the building water is dried up by heat and we have there, or water is leaking from pipes and we have instead. Of course you cannot get in to find out exactly what happens. However you can watch the total amount of water coming out of the building from outside, which is the surface integral . Notice that when everywhere, there is no source and no sink. In this case the theorem tells us that the flux through any closed surface is zero, just as what we expect.
15.7. The curl of a vector field at a point tells you how drastic and in which way surrounding this point is turning. When , we say that is irrotational. In this case, if you put a grain of sand and let it flow along this vector field, it cannot go in a circular motion. In fluid mechanics one consider something called vortex sheets and vortex tubes. A vortex tube for a vector field is a ``tubular surface'' tangent to the vector field ; see the following figure.
Let us take two loops and around this tube. Then
(This identity tells us that if is smaller than , then the spinning of along is more drastic.) Why? Well, let S be the portion of the tube between and . Since at every point of S is tangent to S, we have at each point of S. Now just apply Stoke's theorem to S. Can you see how vortex tube is related to tornado?