Solving non-linear equations via different numerical methods.

Before we tackle the idea of “solving non-linear equations”, we shall start by learning various methods of finding the $\textit{root of a function,}$ so we can understand when to use one method over another, starting with Newton’s Method. From a pedagogical standpoint, Newton’s Method isn’t exactly the best starting point (according to my arbitrary intuition); however, I also don’t think it will be too difficult to reintegrate the relationships and understandings of these different ideas/methods (seeing as they all solve the same problem (for certain functions), in different, but similar ways). So let us begin.

Newton’s Method

Newton’s method (also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson) is a $\textit{root-finding algorithm.}$ We will learn several different root-finding algorithms to help us approximate the root of a function, but first, what is the root of a function?

Definition (Root of a function)

$\text{The root of a function}$ $f(x)$ $\text{is the value of}$ $x,$ $\text{such that}$ $f(x)=0.$

Newton’s Method can help us approximate the value of $x.$

Definition (Newton's Method)

Let $f$ be a real-valued differentiable function and let $x_0$ be the initial estimate (or initial guess) for a root $x$ , such that $f(x)=0$ . Then the sequence of approximations $(x_n)_{n=0}^{\infty}$ is defined recursively by

\begin{equation} x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, \end{equation}

for all $n\geq 0$ , if $f'(x_n)\neq 0.$

Example (Approximating roots)

Find the root of $-\frac12\ln x=2x$ for $0<x<\frac12$ up to $3$ decimal places.

Solution

First we will rewrite the function as $f(x)=2x+\frac12 \ln x.$ Evaluating the derivative we have $f'(x)=\frac{1}{2x}+2.$ We will choose $x_0=\frac14$ , and using eqref1 from Definition 1.2 we will approximate $x_1.$

\begin{align*} x_1=x_0-\frac{f(x_0)}{f'(x_0)} \\ x_1=\frac14-\frac{f(\frac14)}{f'(\frac14)} \\ x_1=\frac14-\frac{2(\frac14)+\frac12 \ln(\frac14)}{\frac{1}{2(\frac14)}+2}\\ x_1\approx 0.29828679514. \\ \end{align*}

By Definition 1.2 again, we will approximate $x_2$ .

\begin{align*} x_2=x_1-\frac{f(x_1)}{f'(x_1)} \\ x_2\approx 0.29828679514-\frac{f(0.29828679514)}{f'(0.29828679514)}\\ x_2\approx 0.29828679514-\frac{2(0.29828679514)+\frac{1}{2}\ln(0.29828679514)}{\frac{1}{2(0.29828679514)}+2}\\ x_2 \approx 0.300538100661. \\ \end{align*}

By Definition 1.2 again, we will approximate $x_3$ .

\begin{align*} x_3 = x_2-\frac{f(x_1)}{f'(x_1)} \\ x_3\approx 0.300538100661-\frac{f(0.300538100661)}{f'(0.300538100661)} \\ x_3\approx 0.300538100661-\frac{2(0.300538100661)+\frac{1}{2}\ln(0.300538100661)}{\frac{1}{2(0.300538100661)}+2}\\ x_3\approx 0.300541968288. \end{align*}

Intuition

The question asks us to approximate the root to $3$ decimal places. Notice that the first four decimals of $x_2$ and $x_3$ are the same. For this case, we round that up. Thus the root is $x\approx 0.301$ as desired.

Exercise

Find the root of $27e^{-2.7x}=\frac{1 }{27}{x^2}$ for $x_0=1.9$ up to $3$ decimal places.

Answer

\begin{aligned} \text{Exercise for the reader.} \tag*{$\blacksquare$} \end{aligned}

The convergence of Newton’s Method

Explanation

Consider $x^6-1.$ It’s clear its roots are $x=-1$ and $x=1$ . So Definition 1.2 isn’t necessary, but say we did it anyway. Picking arbitrary values from $-1 \leq x\leq 1$ (for simplicity sake), we will check how many iterations are necessary for an accuracy of three decimal places and a calculation tolerance of $1.0\times10^{-7}.$

Start Value ( $x$ _$0$)	Total Iterations ( $N$ )
-0.99	4
-0.75	7
-0.50	15
-0.25	34
-0.01	122
0.00	Undefined
0.01	122
0.25	34
0.50	15
0.75	7
0.99	4

Seeing this, it’s clear we got lucky with our choice of $x_0$ in Solution 1.4 and our given $x_0$ in Exercise 1.6. Both required only a few iterations of eqref(1) from Definition 1.2 to meet an acceptable accuracy, but for $x^6-1,$ choosing poorly within the small inequality $-1 \leq x\leq 1$ , is the difference between iterating eqref(1) $4$ times or $122$ times. In fact, 122 isn’t the limit. $N$ can be arbitrarily large. Similarly, $N$ can converge in a single computation, provided you pick an $x_0$ that converges while being close enough to the real root and meeting the accuracy you’re looking for (i.e., $x_0$ and $x_1$ must meet the same requirements that Intuition 1.5 mentioned for $x_2$ and $x_3$ ). What about $x_0=0$ ? Why is it undefined for this function? For $x^6-1$ we know $f'(0) = 0,$ and looking at eqref(1) it’s clear that dividing by zero is undefined; looking at it geometrically, this makes sense because the derivative is zero, therefore there exists a horizontal tangent line at $x_0$ that is at $y=-1,$ and this line grows without bound in both directions, i.e. $(-\infty,\infty)$ in $\Reals ^2$ (which is well defined since the line itself is defined for $(x,-1),$ for all $x \in \Reals$ ( $x$ here is not referring to the root of course, it represents the horizontal coordinate in $\Reals ^2,$ we are overloading variables for simplicity sake). However it is not computable in standard models of arithmetic and so it also fails for Newton’s Method).

Now, this leaves a lot of questions, and explaining why is a bit tedious (and not really instructive without motivation). So, instead, you will learn the answers to those questions by answering them yourself; i.e., you will fill in the gaps of understanding that come with this unfinished exposition by doing mathematics. Below are some problems that will hopefully help you answer and fill in all the gaps I mentioned.

Some problems

Problem

From Explanation 1.8, it seems that the positive $x_0$ values converge at the same rate as their additive inverse. Is this always the case? In other words, is $N (x_0)= N(-x_0)$ for each $x_0 \in \Reals,$ where $N(x_0)$ denotes the number of steps it takes for some function to converge starting at $x_0$ (the initial estimate). Supposing exceptions exist, what type of function(s) are necessary for $N (x_0)= N(-x_0)$ to hold? (and why?)

Hint

Look at the function $x^2,$ then look at the function $x^3$ . They both have symmetry, but one type of symmetry that one of those functions has is a property that allows $N (x_0)= N(-x_0)$ to hold.

Problem (identify stationary points)

From Explanation 1.8 again, we saw that choosing $x_0=0$ is undefined for $x^6-1$ for eqref(1). It’s built into Definition 1.2 that $f'(x)\neq 0$ is necessary to use Newton’s Method, and so being able to identify which points of a function fail to converge could come in handy (in more ways than one). Learn how to find the stationary point of any arbitrary function (assuming the function has stationary points).

Hint

Refer to Definition 1.13 and visually look at some functions. This one is relatively easy (all the problems in this section are). The key is to deepen your understanding of these simple ideas you’re (probably already) familiar with, to a high enough level, so that you can see (and thus learn) how to use these simple ideas in (not so simple) novel ways.

Definition (Stationary point)

Let $f: \Reals ^n \to \Reals$ be differentiable. A point $c=(c_1,c_2,\dots,c_n)\in \Reals ^ n$ is called a stationary point if

\begin{equation} \nabla f(c)=0. \end{equation}

Here $\nabla f$ denotes the gradient of $f.$ (i.e., $\nabla f = (\frac {\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\dots,\frac{\partial f}{\partial x_n})).$

Problem

Geometrically derive Definition 1.2, and use that geometric intuition to understand why the speed of convergence is higher or lower for different initial guesses $x_0$ across differerent functions.

Hint

First geometrically sketch some examples using simple functions (like $y=4x$ , or $y=e^x$ etc.). After deriving Definition 1.2 play with some initial points of $x_0$ for arbitrary functions, and look at the tangent line and gradient very closely.

Problem

Consider the function $x^3-2x+2.$ Given an initial guess $x_0=0,$ use eqref(1) from Definition 1.2 to compute a few iterations of the sequence. Observe the sequence and use your geometric intuition to make sense of what’s going on. Then generalise this understanding to arbitrary functions that have the same properties.

Hint

Using stationary point(s) is one way to understand this. Refer to your understanding of Problem 1.11 and Definition 1.13. Think about how the gradient, concavity, and the tangent line work together.

Problem

Use your newly equipped geometric understanding of Definition 1.2 to answer any other question(s) you have.

Hint

If you can’t think of anything else after having done the previous problems, then this is a good opportunity to make your own definitions for ideas you think may be useful. For example, some generalisation about an idea that you think may come up again and again, but wasn’t explicitly written, or some idea that is true but you’re unsure of whether it can be extended to $n$ -dimensions. In other words, start valuing some ideas more than others to build some structure on what’s really relevant (the big picture ideas), build some tools you think will be useful going forward, and try to fill as many gaps as you can.