Before we tackle the idea of “solving non-linear equations”, we shall start by learning various methods of finding the so we can understand when to use one method over another, starting with Newton’s Method. From a pedagogical standpoint, Newton’s Method isn’t exactly the best starting point (according to my arbitrary intuition); however, I also don’t think it will be too difficult to reintegrate the relationships and understandings of these different ideas/methods (seeing as they all solve the same problem (for certain functions), in different, but similar ways). So let us begin.
Newton’s Method
Newton’s method (also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson) is a We will learn several different root-finding algorithms to help us approximate the root of a function, but first, what is the root of a function?
Definition (Root of a function)
Newton’s Method can help us approximate the value of
Definition (Newton's Method)
Let be a real-valued differentiable function and let be the initial estimate (or initial guess) for a root , such that . Then the sequence of approximations is defined recursively by
for all , if
Example (Approximating roots)
Find the root of for up to decimal places.
Solution
First we will rewrite the function as Evaluating the derivative we have We will choose , and using eqref1 from Definition 1.2 we will approximate
By Definition 1.2 again, we will approximate .
By Definition 1.2 again, we will approximate .
Intuition
The question asks us to approximate the root to decimal places. Notice that the first four decimals of and are the same. For this case, we round that up. Thus the root is as desired.
Exercise
Find the root of for up to decimal places.
Answer
The convergence of Newton’s Method
Explanation
Consider It’s clear its roots are and . So Definition 1.2 isn’t necessary, but say we did it anyway. Picking arbitrary values from (for simplicity sake), we will check how many iterations are necessary for an accuracy of three decimal places and a calculation tolerance of
| Start Value () | Total Iterations () |
|---|---|
| -0.99 | 4 |
| -0.75 | 7 |
| -0.50 | 15 |
| -0.25 | 34 |
| -0.01 | 122 |
| 0.00 | Undefined |
| 0.01 | 122 |
| 0.25 | 34 |
| 0.50 | 15 |
| 0.75 | 7 |
| 0.99 | 4 |
Seeing this, it’s clear we got lucky with our choice of in Solution 1.4 and our given in Exercise 1.6. Both required only a few iterations of eqref(1) from Definition 1.2 to meet an acceptable accuracy, but for choosing poorly within the small inequality , is the difference between iterating eqref(1) times or times. In fact, 122 isn’t the limit. can be arbitrarily large. Similarly, can converge in a single computation, provided you pick an that converges while being close enough to the real root and meeting the accuracy you’re looking for (i.e., and must meet the same requirements that Intuition 1.5 mentioned for and ). What about ? Why is it undefined for this function? For we know and looking at eqref(1) it’s clear that dividing by zero is undefined; looking at it geometrically, this makes sense because the derivative is zero, therefore there exists a horizontal tangent line at that is at and this line grows without bound in both directions, i.e. in (which is well defined since the line itself is defined for for all ( here is not referring to the root of course, it represents the horizontal coordinate in we are overloading variables for simplicity sake). However it is not computable in standard models of arithmetic and so it also fails for Newton’s Method).
Now, this leaves a lot of questions, and explaining why is a bit tedious (and not really instructive without motivation). So, instead, you will learn the answers to those questions by answering them yourself; i.e., you will fill in the gaps of understanding that come with this unfinished exposition by doing mathematics. Below are some problems that will hopefully help you answer and fill in all the gaps I mentioned.
Some problems
Problem
From Explanation 1.8, it seems that the positive values converge at the same rate as their additive inverse. Is this always the case? In other words, is for each where denotes the number of steps it takes for some function to converge starting at (the initial estimate). Supposing exceptions exist, what type of function(s) are necessary for to hold? (and why?)
Hint
Look at the function then look at the function . They both have symmetry, but one type of symmetry that one of those functions has is a property that allows to hold.
Problem (identify stationary points)
From Explanation 1.8 again, we saw that choosing is undefined for for eqref(1). It’s built into Definition 1.2 that is necessary to use Newton’s Method, and so being able to identify which points of a function fail to converge could come in handy (in more ways than one). Learn how to find the stationary point of any arbitrary function (assuming the function has stationary points).
Hint
Refer to Definition 1.13 and visually look at some functions. This one is relatively easy (all the problems in this section are). The key is to deepen your understanding of these simple ideas you’re (probably already) familiar with, to a high enough level, so that you can see (and thus learn) how to use these simple ideas in (not so simple) novel ways.
Definition (Stationary point)
Let be differentiable. A point is called a stationary point if
Here denotes the gradient of (i.e.,
Problem
Geometrically derive Definition 1.2, and use that geometric intuition to understand why the speed of convergence is higher or lower for different initial guesses across differerent functions.
Hint
First geometrically sketch some examples using simple functions (like , or etc.). After deriving Definition 1.2 play with some initial points of for arbitrary functions, and look at the tangent line and gradient very closely.
Problem
Consider the function Given an initial guess use eqref(1) from Definition 1.2 to compute a few iterations of the sequence. Observe the sequence and use your geometric intuition to make sense of what’s going on. Then generalise this understanding to arbitrary functions that have the same properties.
Hint
Using stationary point(s) is one way to understand this. Refer to your understanding of Problem 1.11 and Definition 1.13. Think about how the gradient, concavity, and the tangent line work together.
Problem
Use your newly equipped geometric understanding of Definition 1.2 to answer any other question(s) you have.
Hint
If you can’t think of anything else after having done the previous problems, then this is a good opportunity to make your own definitions for ideas you think may be useful. For example, some generalisation about an idea that you think may come up again and again, but wasn’t explicitly written, or some idea that is true but you’re unsure of whether it can be extended to -dimensions. In other words, start valuing some ideas more than others to build some structure on what’s really relevant (the big picture ideas), build some tools you think will be useful going forward, and try to fill as many gaps as you can.