Skip to content

Sorting algorithm for natural numbers based on array indexing

License

Notifications You must be signed in to change notification settings

holczmannb/IndexSorter

Repository files navigation

IndexSort

Description:

Reference implementation for IndexSort algorithm and compare the runtime with quick sort and the linear time counting sort and bin sort.

QuickSort: https://en.wikipedia.org/wiki/Quicksort
CountingSort: https://opendatastructures.org/ods-java/11_2_Counting_Sort_Radix_So.html
BinSort: https://opendsa-server.cs.vt.edu/ODSA/Books/CS3/html/BinSort.html

Index sort is an sorting algorithm what works on set of distinct natural numbers and can deliver better runtime performance than QuickSort, CountingSort and BinSort in many cases. This is not a generic sorting algorithm but works very effectively with positive unique integer values what is used in many business scenario. It is based on the bin (bucket) sorting algorithm but utilize the unique numbers in the sortable array.

How to run reference implementation:

Download the /bin/IndexSorter.jar
Call: java -jar IndexSorter.jar param1 param2 param3 param4

Parameters:
param1: number of samples to be generated (like 5)
param2: size of the array what will be sorted (like 100)
param3: range of the natural numeric numbers (like 1000)
param4: print statistic only and skip the array printouts. (like true)

Algorithm with pseudocode:

array a; // sortable array
array b; // helper array
n = length(a) // length of the sortable array
max = 0 // store the maximum value in the sortable array
min = infinite // store the minimum value in the sortable array
Loop i = 1 .. n { if a[i] > max then max = a[i]; if a[i] < min then min = a[i]; } // look for maximum and minimum
b = new array[max] // helper array creation
Loop i = 1 .. n { b[a[i]] = true } // set the true by value as index
pos = 0
Loop i = min .. max {if (b[i]) then a[pos++] = i} // rewrite the sortable array with position pointer

Algorithm execution by a sample

  1. Sample to sort [8, 5, 3, 7]
  2. Create helper boolean array where length of the array is the maximum number in the sample (8):
    [false, false, false, false, false, false, false, false]
  3. Use the number from input array as index and set true by this index in the helper array.
    [false, false, true, false, true, false, true, true]
  4. Loop over the helper array and write back to the index value to the original array left to right if the helper array contains true in the index position.
    [3, 5, 7, 8]

Formal analysis:

Sort an array of n distinct elements, quicksort takes O( n log n ) time in best case and O ( n2 ) in worst case.
Index sorting needs three loops over the the array.

  1. Selecting the maximum and minimum element: n step.
  2. Setting the helper array value: n step.
  3. Writing back the index values to original: (max - min) step, where max and min are the maximum and minimum values in the array.

The time is always linear and O ( 2n + max ) in every case.
n = 1000 then QuickSort needs 3000 steps in best case. IndexSort needs 2000 + (maximum - minimum value) steps in every case.
CountingSort has O (n + max) what is comparable with IndexSort but IndexSort on random datasets shows better performance.

Statistics:
Index sort shows the best execution time in case of big array sizes. There is no big exection time difference in the small arrays. Index sort delivers always better results than BinSort in any sample size

Number of SampleSize of arrayNumber intervall 1..nQuickSort WinsIndexSort WinsCountingSort Wins
100 10 100 8116 3
100 10 100 92 6 2
100 10 100 89 8 3
100 100 100 38 55 7
100 100 100 50 45 5
100 100 100 46 50 4
100 100 1000 51 43 6
100 100 1000 46 44 10
100 100 1000 48 46 6
100 1000 1000 3 65 32
100 1000 1000 3 64 33
100 1000 1000 2 76 22
100 1000 10000 7 81 12
100 1000 10000 8 77 15
100 1000 10000 6 81 13
100 10000 10000 0 96 4
100 10000 10000 0 92 8
100 10000 10000 0 99 1
100 10000 100000 0 100 0
100 10000 100000 0 98 2
100 10000 100000 0 100 0
10 100000 10000000 10 0


Disadvantages and improvement oportunities

  1. It works only with distinct values in the array.
  2. Runtime depends on the distance between maximum and minimum value in the array.
  3. It can work with negative numbers with transforming the values to positive by adding the minimum value to each element.
  4. It can work with real numbers with transforming the values to integer whole number with multiplication by 10s.
  5. Better representation of the helper array to avoid big size array (memory usage and loop size). Current represenation is good for natural business numbers becasue the values in the helper array are 1 bits in memory only.

Releases

No releases published

Packages

No packages published

Languages