Error Analysis

Understanding numerical errors, floating-point representation, and convergence analysis.

1. Floating-Point Arithmetic

IEEE 754 Double-Precision Format

A 64-bit floating-point number is represented as:

(-1)^s \times 2^{(e-1023)} \times (1 + f)

Sign (s) 1 bit: 0 for positive, 1 for negative

Exponent (e) 11 bits: biased by 1023

Mantissa (f) 52 bits: fractional part

Machine Epsilon

\varepsilon \approx 2.22 \times 10^{-16}

Interactive IEEE 754 Converter

Enter a decimal number

Binary Representation (64 bits)

Sign

Exponent (decimal)

0 (biased: -1023)

Mantissa (decimal)

Normalization

A floating-point number in base 10 is normalized when the leading digit d₁ ≠ 0, i.e., the mantissa satisfies 0.1 ≤ |m| < 1:

x = (-1)^s \times 0.d_1 d_2 d_3 \ldots \times 10^e, \quad d_1 \neq 0

Static Examples

6.238 = (−1)⁰ × 0.6238 × 10¹ normalized ✓

−0.0014 = −0.0014 × 10⁰ → normalized: −0.14 × 10⁻²

0.00345 → normalized: 0.345 × 10⁻²

Interactive Normalizer

Enter a decimal number

Normalized form: (+1) × 0.6238 × 10^1 normalized ✓

(-1)^{0} \times 0.6238 \times 10^{1}

Overflow & Underflow

The exponent e is bounded: m < e < M. For IEEE 754 double precision, e ∈ [−1022, 1023].

e_{\min} = -1022 \leq e \leq 1023 = e_{\max}

Overflow e > e_max → result is ±Infinity (or NaN)

Underflow e < e_min → result rounds to 0 (or subnormal)

Static Examples

10³⁰⁸ ≈ 1e308 — near maximum representable value normal

10³⁰⁹ = 1e309 → Infinity overflow

10⁻³⁰⁸ ≈ 1e-308 — near minimum normal value normal

10⁻³²⁴ = 5e-324 → subnormal / 0 underflow

Interactive Classifier

Enter a number (e.g. 1e308, 1e309, 1e-308, 5e-324)

Normal Representable as a normal IEEE 754 double. |x| ≈ 1.0000e+308

2. Error Types

Theory

Absolute Error

E_{abs} = |p - p^*|

Relative Error

E_{rel} = \frac{|p - p^*|}{|p|}

Significant Digits

The approximation $p^*$ has $n$ significant digits with respect to $p$ if $E_{rel} < 5 \times 10^{-n}$

Error Calculator

Exact value (p) Approximate value (p*)

Visual Error Bar (relative error scaled)

Absolute Error

0.000000e+0

Relative Error

0.000000e+0

Significant Digits

Worked Examples

Practice Problems

Compute the absolute error, relative error, and significant digits for each pair.

Problem 1: p = 2.71828, p* = 2.718

Problem 2: p = 1.41421, p* = 1.414

Problem 3: p = 9.8696, p* = 9.87

3. Convergence Rates

Theory

A sequence {α_n} converges to zero with rate $r$ if:

|\alpha_{n+1}| \leq C|\alpha_n|^r

r = 1 Linear convergence (e.g., O(h))

r = 2 Quadratic convergence (e.g., O(h²))

1 < r < 2 Superlinear convergence

Convergence Visualization

Number of iterations: 15

Linear (r=1)

αₙ = 0.5ⁿ

Quadratic (r=2)

αₙ = 0.5^(2ⁿ)

Superlinear (r≈1.618)

αₙ = 0.5^(φⁿ)

Sequence Rate Analyzer

Enter a sequence of error values (comma-separated) to classify its convergence rate.

Error sequence

4. Condition Number

Theory

Condition number of evaluating $f$ at $x$ :

\kappa(x) = \left|\frac{x \cdot f'(x)}{f(x)}\right|

Well-conditioned: $\kappa \approx 1$

Small input changes produce small output changes.

Ill-conditioned: $\kappa \gg 1$

Small input changes produce large output changes.

Relative error amplification:

\text{rel\_err}(f(x)) \approx \kappa(x) \cdot \text{rel\_err}(x)

Function

f(x)

Evaluation point

x

Perturbation

\varepsilon

= 0.0100

f(x)

= 10.00000

f'(x)

(central diff) = 0.05000000

\kappa(x)

= 0.50000

Classification: Well-conditioned

Output perturbation

\kappa \cdot \varepsilon

= 0.005000

Visualization shows $f(x)$ near the evaluation point. Brackets illustrate how input perturbation $\varepsilon$ maps to output perturbation $\kappa \cdot \varepsilon$ .

f(x)

Perturbation brackets

f(x₀)

5. Series Convergence

Theory

An infinite series is a sum of infinitely many terms:

S = \sum_{n=1}^{\infty} a_n

The N-th partial sum accumulates the first N terms:

S_N = \sum_{n=1}^{N} a_n = a_1 + a_2 + \cdots + a_N

Convergence means the partial sums approach a finite limit:

S_N \xrightarrow{N\to\infty} S \in \mathbb{R}

Ratio Test

L = \lim_{n\to\infty}\left|\frac{a_{n+1}}{a_n}\right|

L < 1 → converges; L > 1 → diverges

Comparison Test

0 \le a_n \le b_n

If Σbₙ converges, so does Σaₙ

Integral Test

\sum a_n \sim \int_1^\infty f(x)\,dx

Same convergence when f is decreasing

Preset Series

$a_n = r^n$

Converges when |r| < 1

r = 0.50

Custom Series

Enter a formula for a_n using variable n. Supports: + - * / ^ sin cos sqrt abs log exp pi e n!

Number of terms N = 50

Appears to converge to ≈ 1.000000

Ratio test estimate: |a_{n+1}/a_n| ≈ 0.50000 (< 1, supports convergence)

Visualization

━━ Partial sums Sₙ ╌╌ |aₙ| (term size) ╌╌ Estimated limit

Partial Sums Table (first 20 rows)

n	aₙ	Sₙ
1	0.500000	0.500000
2	0.250000	0.750000
3	0.125000	0.875000
4	0.062500	0.937500
5	0.031250	0.968750
6	0.015625	0.984375
7	0.007813	0.992188
8	0.003906	0.996094
9	0.001953	0.998047
10	0.000977	0.999023
11	0.000488	0.999512
12	0.000244	0.999756
13	0.000122	0.999878
14	6.104e-5	0.999939
15	3.052e-5	0.999969
16	1.526e-5	0.999985
17	7.629e-6	0.999992
18	3.815e-6	0.999996
19	1.907e-6	0.999998
20	9.537e-7	0.999999

6. Floating-Point Arithmetic Proof

Theory

Floating-point representation model

\text{fl}(x) = x(1 + \delta), \quad |\delta| \leq u

Unit roundoff for IEEE 754 double precision

u = 2^{-53} \approx 1.11 \times 10^{-16}

Arithmetic operations (each rounded once)

\text{fl}(x \oplus y) = (x + y)(1 + \varepsilon_1), \quad |\varepsilon_1| \leq u

\text{fl}(x \otimes y) = (x \cdot y)(1 + \varepsilon_2), \quad |\varepsilon_2| \leq u

Error accumulation after n operations

\left|\frac{\hat{f} - f}{f}\right| \lesssim n \cdot u \quad (\text{first-order bound})

Chopping (truncation)

Relative error bound: $\beta^{1-k}$

Rounding (nearest)

Relative error bound: $\tfrac{1}{2}\beta^{1-k}$

Step-by-Step Proof Walkthrough

Step 1: Normalized Floating-Point Form

\pm 0.d_1 d_2 \cdots d_k \times \beta^e, \quad d_1 \neq 0

Any nonzero real number is written in normalized form with base β (typically 2 for binary), k significant digits d₁d₂…dₖ where d₁ ≠ 0, and exponent e. For IEEE 754 double precision: β = 2, k = 53 (1 implicit + 52 stored).

1 / 6

Interactive: Catastrophic Cancellation

Computing $(1 + \varepsilon) - 1$ should yield $\varepsilon$ . Watch the relative error grow as ε approaches the machine epsilon.

ε value	Exact result	Computed result	Relative error
1.0000e-13	1.0000e-13	9.9920e-14	7.99e-4
1.0000e-14	1.0000e-14	9.9920e-15	7.99e-4
1.0000e-15	1.0000e-15	1.1102e-15	1.10e-1
1.0000e-16	1.0000e-16	0	1.00e+0
1.0000e-17	1.0000e-17	0	1.00e+0

For ε = 10⁻¹⁶ ≈ u, JavaScript (IEEE 754) computes (1 + ε) = 1 exactly due to rounding, so (1 + ε) − 1 = 0. The relative error is 100%.

For ε = 10⁻¹⁷ (below machine epsilon), the situation is the same — ε is rounded to 0 when added to 1.

Error Propagation Calculator

Given two values and their relative errors, compute the propagated relative error bound.

Relative error of x (ρₓ)

Relative error of y (ρ_y)

Result (1.5 + 2.3)

3.800000000

Propagated rel. error bound

1.0000e-10

Addition error bound

\rho_{x+y} \leq \dfrac{|x|\,\rho_x + |y|\,\rho_y}{|x+y|} + u

Loss of Significance

Theory

When two nearly equal numbers are subtracted, the leading significant digits cancel and the relative error of the result can be much larger than the relative errors of the individual operands.

If $E_r(x_A)$ and $E_r(y_A)$ are small, the relative error of $z_A = x_A - y_A$ satisfies:

E_r(z_A) \approx \frac{|x| \, E_r(x_A) + |y| \, E_r(y_A)}{|x - y|}

When $x \approx y$ , the denominator $|x - y|$ is tiny while the numerator remains of order $|x|$ , causing catastrophic amplification.

Example 1.11 — Subtraction of Nearly Equal Numbers

Given values and their approximations (rounded):

x = 7.6545428, \quad x_A = 7.6545421 \; (6 \text{ sig. digits})

y = 7.6544201, \quad y_A = 7.6544200 \; (7 \text{ sig. digits})

Step 1: Compute the subtraction

z = x - y = 7.6545428 - 7.6544201 = 0.0001227

z_A = x_A - y_A = 7.6545421 - 7.6544200 = 0.0001221

Step 2: Absolute error of the result

|z - z_A| = |0.0001227 - 0.0001221| = 0.6 \times 10^{-6}

Although $x_A$ had 6 significant digits, $z_A$ has only 3 significant digits — three digits were lost to cancellation.

Step 3: Error amplification factor

E_r(z_A) \approx 53736 \times E_r(x_A)

The relative error of the result is roughly 53,736 times larger than the relative error of the original approximation — a dramatic loss of significance.

Example 1.13 — Reformulation to Avoid Cancellation

Consider the function:

f(x) = x\bigl(\sqrt{x+1} - \sqrt{x}\bigr)

For large $x$ , the terms $\sqrt{x+1}$ and $\sqrt{x}$ are nearly equal, causing catastrophic cancellation. Rationalising the numerator gives the equivalent but numerically stable form:

f(x) = \frac{x}{\sqrt{x+1} + \sqrt{x}}

x	Naive (cancellation)	Stable (rationalised)
1	0.414214	0.414214
100	4.987562	4.987562
1,000	15.807437	15.807437
10,000	49.998750	49.998750
100,000	158.113488	158.113488

At $x = 100{,}000$ the naive form suffers from severe cancellation. The stable form $x / (\sqrt{x+1} + \sqrt{x})$ is algebraically identical but avoids subtracting nearly equal square roots, preserving all significant digits.

Propagated Error & Stability

Theory

By the Mean Value Theorem, if $x_A$ approximates $x$ , then the error in $f(x)$ propagates as:

f(x) - f(x_A) \approx f'(x)\,(x - x_A)

Converting to relative errors:

E_r(f(x)) \approx \left|\frac{x\,f'(x)}{f(x)}\right| E_r(x) = \kappa(x)\,E_r(x)

Well-conditioned: $\kappa \approx 1$

Error is not amplified. Algorithm is stable.

Ill-conditioned: $\kappa \gg 1$

Errors amplify dramatically. Algorithm is unstable.

Example 1.16 — Stability of $f(x) = \sqrt{x+1} - \sqrt{x}$

Consider $x = 12345$ . The true value is:

f(12345) = \sqrt{12346} - \sqrt{12345} \approx 0.004500

Step 1: 3-digit rounding of each square root

\sqrt{12346} \approx 111.112 \xrightarrow{\text{3-digit}} 111

\sqrt{12345} \approx 111.108 \xrightarrow{\text{3-digit}} 111

f_A = 111 - 111 = 0

Step 2: Relative error of the naive computation

E_r(f_A) = \frac{|f(12345) - f_A|}{|f(12345)|} = \frac{|0.004500 - 0|}{0.004500} \approx 100\%

The computed result is completely wrong — a 100% relative error from 3-digit rounding.

Step 3: Condition number confirms instability

\kappa(12345) = \left|\frac{x\,f'(x)}{f(x)}\right| \approx {0}

A condition number of 0 means every digit of relative error in $x$ produces roughly 0 digits of relative error in $f(x)$ . This function is severely ill-conditioned for large $x$ .

Step 4: Stable reformulation

f(x) = \sqrt{x+1} - \sqrt{x} = \frac{1}{\sqrt{x+1} + \sqrt{x}}

This algebraically equivalent form avoids subtracting nearly equal numbers and is numerically stable.

Condition Number vs. $x$ for $f(x) = \sqrt{x+1} - \sqrt{x}$

x	f(x)	$\kappa(x)$
1	0.41421356	0.4
10	0.15434713	0.5
100	0.04987562	0.5
1,000	0.01580744	0.5
12,345	0.00450003	0.5
100,000	0.00158113	0.5

The condition number grows without bound as $x \to \infty$ , confirming that $f(x) = \sqrt{x+1} - \sqrt{x}$ is increasingly ill-conditioned for large $x$ .

Error Analysis

1. Floating-Point Arithmetic

IEEE 754 Double-Precision Format

Interactive IEEE 754 Converter

Normalization

Interactive Normalizer

Overflow & Underflow

Interactive Classifier

2. Error Types

Theory

Error Calculator

Worked Examples

Practice Problems

3. Convergence Rates

Theory

Convergence Visualization

Sequence Rate Analyzer

4. Condition Number

Theory

Worked Examples

5. Series Convergence

Theory

Preset Series

Custom Series

Visualization

Partial Sums Table (first 20 rows)

6. Floating-Point Arithmetic Proof

Theory

Step-by-Step Proof Walkthrough

Step 1: Normalized Floating-Point Form

Interactive: Catastrophic Cancellation

Error Propagation Calculator

Loss of Significance

Theory

Example 1.11 — Subtraction of Nearly Equal Numbers

Example 1.13 — Reformulation to Avoid Cancellation

Propagated Error & Stability

Theory

Example 1.16 — Stability of f(x)=x+1−xf(x) = \sqrt{x+1} - \sqrt{x}f(x)=x+1​−x​

Condition Number vs. xxx for f(x)=x+1−xf(x) = \sqrt{x+1} - \sqrt{x}f(x)=x+1​−x​

Example 1.16 — Stability of $f(x) = \sqrt{x+1} - \sqrt{x}$

Condition Number vs. $x$ for $f(x) = \sqrt{x+1} - \sqrt{x}$