Run-time analysis

From Wikipedia, the free encyclopedia

It has been suggested that Analysis of algorithms and Programming Complexity be merged into this article or section. (Discuss)

This article or section relies largely or entirely upon a single source.
Please help improve this article by introducing appropriate citations of additional sources.

Run-time analysis is a theoretical classification that evaluates and anticipates the increase in running time (or run-time) of an algorithm as its input size (usually denoted as "n") increases. Run-time efficiency is a topic of great interest in Computer Science: a program can take seconds, hours or even years to finish executing, depending on which algorithm it implements.

1 Shortcomings of empirical metrics
2 Orders of growth
3 Evaluating run-time complexity
4 Growth rate analysis of other resources
5 Footnotes
6 See also
7 References

[edit] Shortcomings of empirical metrics

Since algorithms are platform-independent (i.e. a given algorithm can be implemented in any programming language on any computer running any operating system), there are significant drawbacks to using an empirical approach to gauge the comparative performance of a given set of algorithms.

Take as an example a program that looks up a specific entry in a sorted list of size n. Suppose this program were implemented on Computer A, a state-of-the-art machine, using a linear search algorithm, and on Computer B, a much slower machine, using a binary search algorithm. Benchmark testing on the two computers running their respective programs might look something like the following:

n (list size)	Computer A run-time (in nanoseconds)	Computer B run-time (in nanoseconds)
15	7 ns	100,000 ns
65	32 ns	150,000 ns
250	128 ns	200,000 ns
1,000	500 ns	250,000 ns

Based on these metrics, it would be easy to jump to the conclusion that Computer A is running an algorithm that is far superior in efficiency to what Computer B is running. However, if the size of the input-list is increased to a sufficient number, that conclusion is dramatically demonstrated to be in error:

n (list size)	Computer A run-time (in nanoseconds)	Computer B run-time (in nanoseconds)
15	7 ns	100,000 ns
65	32 ns	150,000 ns
250	125 ns	200,000 ns
1,000	500 ns	250,000 ns
...	...	...
1,000,000	500,000 ns	500,000 ns
4,000,000	2,000,000 ns	550,000 ns
16,000,000	8,000,000 ns	600,000 ns
...	...	...
63,072 * 10¹²	31,536 * 10¹² ns, or 1 year	1,375,000 ns, or 1.375 miliseconds

Computer A, running the linear search program, exhibits a linear growth rate. The program's run-time is directly proportional to its input size. Doubling the input size doubles the run time, quadrupling the input size quadruples the run-time, and so forth. Whereas Computer B, running the binary search program, exhibits a logarithmic growth rate. Doubling the input size only increases the run time by a constant amount (in this example, 25,000 ns). Even though Computer A is ostensibly a faster machine, Computer B will inevitable surpass Computer A in run-time because it's running an algorithm with a much slower growth rate.

[edit] Orders of growth

Main article: Big O notation

Informally, an algorithm can be said to exhibit a growth rate on the order of a mathematical function if beyond a certain input size n, the function f(n) times a positive constant provides an upper bound or limit for the run-time of that algorithm. In other words, for a given input size n greater than some n₀ and a constant c, an algorithm can run no slower than c * f(n). This concept is frequently expressed using Big O notation (see Big O notation for a more formal discussion). For example, since the run-time of insertion sort grows quadratically as its input size increases, insertion sort can be said to be of order O(n²).

Big O notation is a convenient way to express the worst-case scenario for a given algorithm, although it can also be used to express the average-case — for example, the worst-case scenario for quicksort is O(n²) but the average-case run-time is O(n lg n).^[1]

[edit] Evaluating run-time complexity

The run-time complexity for the worst-case scenario of a given algorithm can sometimes be evaluated by examining the structure of the algorithm and making some simplifying assumptions. Consider the following pseudocode:

1    get a positive integer from input
2    if n > 10
3        print "This might take a while..."
4    for i = 1 to n
5        for j = 1 to i
6            print i * j
7    print "Done!"

A given computer will take a discrete amount of time to execute each of the instructions involved with carrying out this algorithm. The specific amount of time to carry out a given instruction will vary depending on which instruction is being executed and which computer is executing it, but on a conventional computer, this amount will be deterministic^[2]. Say that the actions carried out in step 1 are considered to consume time T₁, step 2 uses time T₂, and so forth.

In the algorithm above, steps 1, 2 and 7 will only be run once. For a worst-case evaluation, it should be assumed that step 3 will be run as well. Thus the total amount of time to run steps 1-3 and step 7 is:

$T_1 + T_2 + T_3 + T_7 \,$

The loops in steps 4, 5 and 6 are trickier to evaluate. The outer loop test in step 4 will execute ( n + 1 ) times, which will consume T₄( n + 1 ) time. The inner loop, on the other hand, is governed by the value of i, which iterates from 1 to n. On the first pass through the outer loop, j iterates from 1 to 1: the inner loop makes one pass, so running the inner loop body (step 6) consumes T₆ time, and the inner loop test (step 6) consumes 2T₅ time. During the next pass through the outer loop, j iterates from 1 to 2: the inner loop makes two passes, so running the inner loop body (step 6) consumes 2T₆ time, and the inner loop test (step 6) consumes 3T₅ time.

Altogether, the total time required to run the inner loop body can be expressed as an arithmetic progression:

$T_6 + 2T_6 + 3T_6 + \cdots + (n-1) T_6 + n T_6$

which can be factored^[3] as

$T_6 \left[ 1 + 2 + 3 + \cdots + (n-1) + n \right] = T_6 \left[ \frac{1}{2} (n^2 + n) \right]$

The total time required to run the inner loop test can be evaluated similarly:

$2T_5 + 3T_5 + 4T_5 + \cdots + (n-1) T_5 + n T_5 + (n + 1) T_5$

$= T_5 + 2T_5 + 3T_5 + 4T_5 + \cdots + (n-1)T_5 + nT_5 + (n+1)T_5 - T_5$

which can be factored as

$T_5 \left[ 1+2+3+\cdots + (n-1) + n + (n + 1) \right] - T_5 = T_5 \left[ \frac{1}{2} (n^2 + 3n + 2) \right] - T_5$

Therefore the total running time for this algorithm is:

$f(n) = T_1 + T_2 + T_3 + T_7 + (n + 1)T_4 + \left[ \frac{1}{2} (n^2 + n) \right] T_6 + \left[ \frac{1}{2} (n^2+3n+2) \right] T_5 - T_5$

which reduces to

$f(n) = \left[ \frac{1}{2} (n^2 + n) \right] T_6 + \left[ \frac{1}{2} (n^2 + 3n) \right] T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7$

As a rule-of-thumb, one can assume that the highest-order term in any given function dominates its rate of growth and thus defines its run-time order. In this example, n² is the highest-order term, so one can conclude that f(n) = O(n²). Formally this can be proven as follows:

Prove that $\left[ \frac{1}{2} (n^2 + n) \right] T_6 + \left[ \frac{1}{2} (n^2 + 3n) \right] T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7 \le cn^2,\ n \ge n_0$

$\left[ \frac{1}{2} (n^2 + n) \right] T_6 + \left[ \frac{1}{2} (n^2 + 3n) \right] T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7$
$\le ( n^2 + n )T_6 + ( n^2 + 3n )T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7$ (for n ≥ 0)

Let k be a constant greater than or equal to [T₁..T₇]

$T_6( n^2 + n ) + T_5( n^2 + 3n ) + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7 \le k( n^2 + n ) + k( n^2 + 3n ) + kn + 5k$
$= 2kn^2 + 5kn + 5k \le 2kn^2 + 5kn^2 + 5kn^2$ (for n ≥ 1) $= 12 k n 2$

Therefore $\left[ \frac{1}{2} (n^2 + n) \right] T_6 + \left[ \frac{1}{2} (n^2 + 3n) \right] T_5 + (n + 1)T_4 + T_1 + T_2 + T_3 + T_7 \le cn^2, n \ge n_0$ for $c = 12k,\ n_0 = 1$

A more elegant approach to analyzing this algorithm would be to declare that [T₁..T₇] are all equal to one unit of time greater than or equal to [T₁..T₇]. This would mean that the algorithm's running time breaks down as follows:^[4]

$4+\sum_{i=1}^n i\leq 4+\sum_{i=1}^n n=4+n^2\leq5n^2$ (for n ≥ 1) $= O (n 2)$

[edit] Growth rate analysis of other resources

The methodology of run-time analysis can also be utilized for predicting other growth rates, such as consumption of memory space. As an example, consider the following pseudocode which manages and reallocates memory usage by a program based on the size of a file which that program manages:

1    While file is still open
2        Let n = file size
3        For every 100,000 kilobytes of increase in file size
4            Double the amount of memory reserved

In this instance, as the file size n increases, memory will be consumed at an exponential growth rate, which is order O(2ⁿ).^[5]

[edit] Footnotes

^ The term lg is often used as shorthand for log₂
^ However, this is not the case with a quantum computer
^ It can be proven by induction that $1 + 2 + 3 + \cdots + (n-1) + n = \frac{n(n+1)}{2}$
^ This approach, unlike the above approach, neglects the constant time consumed by the loop tests which terminate their respective loops, but it is trivial to prove that such omission does not affect the final result
^ Note that this is an extremely rapid and most likely unmanagable growth rate for consumption of memory resources