Counting inversions of a sequence (array) in pure-functional immutable Scala, using a Merge Sort

Algorithm goal

The number of inversions in a sequence is the number of pairs of elements that are out of order, ie\(|{(i, j) : i < j, A_i > A_j}|\) (count of distinct \((i,j)\) such that \(i < j\) and value of the \(i\)th element is greater than that of the \(j\)th one).

Why is this algorithm useful?

One situation where this is very useful is to see how different preferences between two people are, when ranked. This sort of algorithm, or versions of it, would be useful in eCommerce, or even movie recommendations. While it is not as advanced as machine learning, it has the big advantage of high performance at \(O(n\log{n})\), meaning it can be used as a tactical solution in a highly paced environment, where the cost of implementation of a machine learning solution could only become viable only once the commercial viability of the product has established.

The brute-force solution is \(O(n^2)\) complexity, which eats up computation time very quickly

Examples

\([2,1]\) has 1 inversion, because swapping \(1\) with \(2\) leads to array \([1,2]\) which is sorted.

21
12

Likewise, \([3, 1, 8, 5, 6, 4, 7]\) has 7 inversions: \((1, 3)\), \((8, 5)\), \((8, 6)\), \((8, 4)\), \((8, 7)\), \((5, 4)\).\((6, 4)\).

3185647
1345678

\([1,2,4,3]\) has 1 inversion \((4, 3)\):

1243
1234

What is the most curious about the above diagrams is that the number of crossings between arrows corresponds exactly to the number of inversions! What do you think this would mean?

It will be very helpful to first understand the problem of MergeSort first and compare the two. Although the merging function is different - because this time, we really have to count how many exchanges there would be, as we are sorting it. This is what leads us to a solution that is much more efficient than a typical brute-force solution (which, in fact, we include in the algorithm solution).

The algorithm code here, while a MergeSort, is implemented using the bottom-up approach of the algorithm MergeSortStackSafe, which is more stack-safe.

Explanation

The problem is actually closely related to the marge sort. In the merge sort, we approach the problem by the divide-and-conquer method, where we process one half of the array, the other half, and then we merge them.

If we divide our input sequence into 2 parts, through example, we will notice that the total number of inversions is equal to the number of inversions on the left-hand side (LHS), plus inversions on RHS, plus the number of inversions across the two sides. Example: (this is © from www.scala-algorithms.com)

\([3,2,1,4]\) has inversion \((2,3)\) on the LHS to make it sorted, no inversion on RHS, and 2 inversions across the half: \((3,1)\) and \((2,1)\), thus 3 inversions in total.

Left hand side with 1 inversion only
32
23Counted 1 inversion
Right hand side with 0 inversions
(it's already sorted)
14
14Counted 0 inversions

As we are sorting the data on a side, we record how many times it was out of order. Even in a sub-problem of 2, we count an inversion of a change between the left side and the right side.

The rest of the Explanation is available for subscribers!

Alternatively, get unlimited solutions for US$3.99 per month!

'Unlimited Scala Algorithms' gives you access to all solutions!

Upon purchase, you will be able to Register an account to access solutions on multiple devices.

We use Stripe for secure payment processing.

Scala Concepts & Hints

Pattern Matching

Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.

assert("Hello World".collect {
  case character if Character.isUpperCase(character) => character.toLower
} == "hw")

Read more

Def Inside Def

A great aspect of Scala is being able to declare functions inside functions, making it possible to reduce repetition.

def exampleDef(input: String): String = {
  def surroundInputWith(char: Char): String = s"$char$input$char"
  surroundInputWith('-')
}
Stack Safety

Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.

This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)

In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.

def sum(from: Int, until: Int): Int =
  if (from == until) until else from + sum(from + 1, until)

def thisWillSucceed: Int = sum(1, 5)

def thisWillFail: Int = sum(1, 300)

Read more

Drop, Take, dropRight, takeRight

Scala's `drop` and `take` methods typically remove or select `n` items from a collection.

assert(List(1, 2, 3).drop(2) == List(3))

assert(List(1, 2, 3).take(2) == List(1, 2))

assert(List(1, 2, 3).dropRight(2) == List(1))

assert(List(1, 2, 3).takeRight(2) == List(2, 3))

assert((1 to 5).take(2) == (1 to 2))

Read more

Tail Recursion

In Scala, tail recursion enables you to rewrite a mutable structure such as a while-loop, into an immutable algorithm.

def fibonacci(n: Int): Int = {
  @scala.annotation.tailrec
  def go(i: Int, previous: Int, beforePrevious: Int): Int =
    if (i >= n) previous else go(i + 1, previous + beforePrevious, previous)

  go(i = 1, previous = 1, beforePrevious = 0)
}

assert(fibonacci(8) == 21)

Read more

Lazy List

The 'LazyList' type (previously known as 'Stream' in Scala) is used to describe a potentially infinite list that evaluates only when necessary ('lazily').

Read more

Algorithm in Scala

79 lines of Scala (version 2.13).

This solution is available for purchase!

Alternatively, get unlimited solutions for US$3.99 per month!

'Unlimited Scala Algorithms' gives you access to all solutions!

Upon purchase, you will be able to Register an account to access solutions on multiple devices.

We use Stripe for secure payment processing.

Test cases in Scala

assert(countInversions(1) == 0)
assert(countInversions(1, 2) == 0)
assert(countInversions(2, 1) == 1)
assert(countInversions(3, 2, 1) == 3)
assert(countInversions(4, 3, 2, 1) == 6)
assert(countInversions(3, 2, 1, 4) == 3)
assert(countInversions(4, 1, 2, 3, 9) == 3)
assert(countInversions(4, 1, 3, 2, 9, 5) == 5)
assert(countInversions(4, 1, 3, 2, 9, 1) == 8)
assert(countInversions(3, 1, 8, 5, 6, 4, 7) == 7)