Amortized Analysis

Amortized analysis is a worst-case analysis of a a sequence of operations — to obtain a tighter bound on the overall or average cost per operation in the sequence than is obtained by separately analyzing each operation in the sequence. For an example, when we considered the union and find operations for the disjoint set data abstraction, we were able to bound the running time of individual operations by O(log n). However, for a sequence of n operations, it is possible to obtain tighter than an O(n log n) bound (although that analysis is more appropriate to 4820 than to this course). Here we will consider a simplified version of the hash table problem above, and show that a sequence of n insert operations has overall time O(n).

Three main techniques used for amortized analysis are given below :

  • The aggregate method, where the total running time for a sequence of operations is analyzed.
  • The accounting (or banker’s) method, where we impose an extra charge on inexpensive operations and use it to pay for expensive operations later on.
  • The potential (or physicist’s) method, in which we derive a potential function characterizing the amount of extra work we can do in each step. This potential either increases or decreases with each successive operation, but cannot be negative.

Consider an extensible array that can store an arbitrary number of integers, like an ArrayList or Vector in Java. These are implemented in terms of ordinary (non-extensible) arrays. Each add operation inserts a new element after all the elements previously inserted. If there are no empty cells left, a new array of double the size is allocated, and all the data from the old array is copied to the corresponding entries in the new array. For instance, consider the following sequence of insertions, starting with an array of length 1:

Insert 11  |11|
Insert 12  |11|12|
Insert 13  |11|12|13|  |
Insert 14  |11|12|13|14|
Insert 15  |11|12|13|14|15|  |  |  |

The table is doubled in the second, third, and fifth steps. As each insertion takes O(n) time in the worst case, a simple analysis would yield a bound of O(n2) time for n insertions. But it is not this bad. Let’s analyze a sequence of n operations using the three methods.

Aggregate Method :

Let ci be the cost of the i-th insertion:

ci = i if i−1 is a power of 2
     1 otherwise

Let’s consider the size of the table si and the cost ci for the first few insertions in a sequence:

i   1  2  3  4  5  6  7  8  9 10
si  1  2  4  4  8  8  8  8 16 16
ci  1  2  3  1  5  1  1  1  9  1

Alteratively we can see that ci=1+di where di is the cost of doubling the table size. That is

di = i−1 if i−1 is a power of 2
     0 otherwise

Then summing over the entire sequence, all the 1’s sum to O(n), and all the di also sum to O(n). That is,

Σ1≤i≤n ci   ≤   n + Σ0≤j≤m 2j−1 

where m = log(n − 1). Both terms on the right hand side of the inequality are O(n), so the total running time of n insertions is O(n).

More in sem3
C++ disjoint set

C++ disjoint set About disjoint set : C++ Disjoint sets are very useful data structures that are used in many famous Algorithms, and...