Scala algorithm: Remove duplicates from a sorted list (Sliding)

Published

Algorithm goal

In Streaming data, data may come in duplicated; it could be due to various factors such as duplicated data from sources and idempotency for redundancy; for consumption though we may need to deduplicate the data for at-most-once processing. Some deduplicators retain state of long-gone elements (in which case .distinct will suffice, but have a memory cost), but in this case we are looking at only consecutive duplicate elements.

This is an alternative approach to RemoveDuplicatesFromSortedListStateMachine

Test cases in Scala

assert(removeDuplicates(List.empty[Int]) == List.empty[Int])
assert(removeDuplicates(List(1)) == List(1))
assert(removeDuplicates(List(1, 1)) == List(1))
assert(removeDuplicates(List(1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 1, 2)) == List(1, 2))
assert(removeDuplicates(List(1, 1, 2, 3, 3)) == List(1, 2, 3))
assert(removeDuplicatesWithMarker(List(1, 1, 2, 3, 3)) == List(1, 2, 3))

Algorithm in Scala

24 lines of Scala (version 2.13), showing how concise Scala can be!

Get the full algorithm Scala algorithms logo, maze part, which looks quirky!

or

'Unlimited Scala Algorithms' gives you access to all the Scala Algorithms!

Upon purchase, you will be able to Register an account to access all the algorithms on multiple devices.

Stripe logo

Explanation

The approach here is a little different to the State Machine approach:

The idea is that by running a sliding window, we can check pairs of elements for their equality, and if they are not equal, emit the newly found one. (this is © from www.scala-algorithms.com)

Full explanation is available for subscribers Scala algorithms logo, maze part, which looks quirky

Scala concepts & Hints

  1. Collect

    'collect' allows you to use Pattern Matching, to filter and map items.

    assert("Hello World".collect {
      case character if Character.isUpperCase(character) => character.toLower
    } == "hw")
    
  2. Option Type

    The 'Option' type is used to describe a computation that either has a result or does not. In Scala, you can 'chain' Option processing, combine with lists and other data structures. For example, you can also turn a pattern-match into a function that return an Option, and vice-versa!

    assert(Option(1).flatMap(x => Option(x + 2)) == Option(3))
    
    assert(Option(1).flatMap(x => None) == None)
    
  3. Pattern Matching

    Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.

    assert("Hello World".collect {
      case character if Character.isUpperCase(character) => character.toLower
    } == "hw")
    
  4. Sliding / Sliding Window

    Get fixed-length sliding sub-sequences (sliding windows) from another sequence

View the rest of Scala algorithms