TopCoder/ Sorting
Things to consider
- efficiency
- memory space
- stability
Bubble sort
|
|
it’s O(n^2), 常数的memory space, stable sort
Insertion Sort
|
|
插入排序对近似已经有序的序列效果很好。还有就是对于动态增长的序列,使用插入排序可以动态维护。
Merge Sort
|
|
n(n^logn) . mege sort比较适合对链表的排序,因为quick sort里面有比较多的random access操作,所以quick sort对list并不是很合适。
Heap Sort
可以使用STL里面的priority_queue, 默认是最大堆,如果想使用最小堆,可以这样priority_queue<int, vector<int>, greater<int> > pq
然后pq就是最小堆了。记得跟greater less相反就行。
注意跟sort中greater less用法的不同。
|
|
Quick Sort
|
|
Quick sort time complexity is O(n^logn), worst case is O(n^2)
Counting Sort
Time complexity is O(n+k), n is the number of elements in input array and k is the range of input.
pseudocode: the data are in the range 0 to k ($0 <= data_i <= k$)
- Take a count array to store the count of each unique object
- Modify the count array such that each element at each index stores the sum of previous counts. It indicates the position of each object in the output sequence.
- Ouput each object from the input sequence followed by decreasing its count by 1.
|
|
Radix Sort
As we know, counting sort’s time complexity is O(n+k), what if the elements are in range from 0 to $n^2$? We cannot use counting sort because counting sort will take $O(n^2)$ which is worse than comparision based sorting algorithms. Can we sort such an array in linear time?
Radix sort is to do digit by digit sort starting from least significant digit to most significant digit. Radix sort uses counting sort as a subroutine to sort.
This algorithm is only used to sort numbers
Time Complexity $O((n+b)d)$ or $O((n+b)log_b(m))$
Where n is the size of arrary, b is the representation base, for example, for decimal system, b is 10. d is the digits number of input integers. For example, m is the max number of array, then d should be $log_b(m)$
|
|
Bucket Sort
In some special cases, counting sort or radix sort could achieve linear time complexity. However, for counting sort or radix sort, they both have the same limitation that they use key as index. So if u want to sort a range of float number? How to do it?
Bucket sort is mainly useful when input is uniformly distributed over a range. For example, consider the following problem. Sort a large set of floating point numbers which are in range from 0.0 to 1.0 and are uniformly distributed across the range. How do we sort the numbers efficiently?
Pseudocode:
- Create some buckets and put elements into them
- apply some sorting algo to sort the elements in each bucket(insertion sort)
- Take out and join all buckets together
Time Complexity average case is O(n+k) k is the number of buckets.
Worst is O(n^2), depends on the distribution of input.
|
|
External Sort
When the data is too big or the computer’s memory is too small to load the data completely, at that time, we can take use of disk.
Reference Link: http://faculty.simpson.edu/lydia.sinapova/www/cmsc250/LN250_Weiss/L17-ExternalSortEX2.htm
External Sorting: Example of multiway external sorting
Ta1: 17, 3, 29, 56, 24, 18, 4, 9, 10, 6, 45, 36, 11, 43
Assume that we have three tapes (k = 3) and the memory can hold three records.
Main memory sort
The first three records are read into memory, sorted and written on Tb1,
the second three records are read into memory, sorted and stored on Tb2,
finally the third three records are read into memory, sorted and stored on Tb3.
Now we have one run on each of the three tapes:
Tb1: 3, 17, 29
Tb2: 18, 24, 56
Tb3: 4, 9, 10
The next portion of three records is sorted into main memory
and stored as the second run on Tb1:
Tb1: 3, 17, 29, 6, 36, 45
The next portion, which is also the last one, is sorted and stored onto Tb2:
Tb2: 18, 24, 56, 11, 43
Nothing is stored on Tb3.Thus, after the main memory sort, our tapes look like this:
Tb1: 3, 17, 29, | 6, 36, 45
Tb2: 18, 24, 56, | 11, 43
Tb3: 4, 9, 10
Merging
–Merging runs of length M to obtain runs of length k*M
In our example we merge runs of length 3
and the resulting runs would be of length 9.
We build a heap tree in main memory out of the first records in each tape.
These records are: 3, 18, and 4.
We take the smallest of them - 3, using the deleteMin operation, and store it on tape Ta1.
The record ‘3’ belonged to Tb1, so we read the next record from Tb1 - 17, and insert it into the heap. Now the heap contains 18, 4, and 17.
The next deleteMin operation will output 4, and it will be stored on Ta1.The record ‘4’ comes from Tb3, so we read the next record ‘9’ from Tb3
and insert it into the heap.
Now the heap contains 18, 17 and 9.Proceeding in this way, the first three runs will be stored in sorted order on Ta1.
Ta1: 3, 4, 9, 10, 17, 18, 24, 29, 56
Now it is time to build a heap of the second three runs.
(In fact they are only two, and the run on Tb2 is not complete.)The resulting sorted run on Ta2 will be:
Ta2: 6, 11, 36, 43, 45
This finishes the first pass.
–Building runs of length k*k*M
We have now only two tapes: Ta1 and Ta2.
We build a heap of the first elements of the two tapes - 3 and 6, and output the smallest element ‘3’ to tape Tb1.
Then we read the next record from the tape where the record ‘3’ belonged - Ta1, and insert it into the heap.
Now the heap contains 6 and 4, and using the deleteMin operation. the smallest record - 4 is output to tape Tb1.
Proceeding in this way, the entire file will be sorted on tape Tb1.
Tb1: 3, 4, 6, 9, 10, 11, 17, 18, 24, 29, 36, 43, 45, 56
The number of passes for the multiway merging is logk(N/M).
In the example this is [log3(14/3)] + 1 = 2.