Scala algorithm: Remove duplicates from a sorted list (state machine)

Published

Algorithm goal

In Streaming data, data may come in duplicated; it could be due to various factors such as duplicated data from sources and idempotency for redundancy; for consumption though we may need to deduplicate the data for at-most-once processing. Some deduplicators retain state of long-gone elements (in which case .distinct will suffice, but have a memory cost), but in this case we are looking at only consecutive duplicate elements.

Here the goal is to implement a Deduplicator in a way that will work with any collection or streamed input, using a State machine.

This is an alternative approach to RemoveDuplicatesFromSortedListSliding

Test cases in Scala

This algorithm comes with test cases.
To see the free test cases for all our algos, as well as run them in our TDD IDE, Register (free).

Algorithm in Scala

30 lines of Scala (version 2.13), showing how concise Scala can be!

Get the full algorithm Scala algorithms logo, maze part, which looks quirky!

or

'Unlimited Scala Algorithms' gives you access to all the Scala Algorithms!

Upon purchase, you will be able to Register an account to access all the algorithms on multiple devices.

Stripe logo

Explanation

stateDiagram
    [*] --> Start
    Start --> FirstOfElement
    FirstOfElement --> SeenElement
    SeenElement --> FirstOfElement
        

We begin with the most fundamental streaming abstraction which defines an immutable state and produces another immutable state. It includes an Emit method and an Include method

At the start of the stream, we have nothing to emit, so we do not emit anything (this is © from www.scala-algorithms.com)

Full explanation is available for subscribers Scala algorithms logo, maze part, which looks quirky

Scala concepts & Hints

  1. Lazy List

    The 'LazyList' type (previously known as 'Stream' in Scala) is used to describe a potentially infinite list that evaluates only when necessary ('lazily').

  2. Option Type

    The 'Option' type is used to describe a computation that either has a result or does not. In Scala, you can 'chain' Option processing, combine with lists and other data structures. For example, you can also turn a pattern-match into a function that return an Option, and vice-versa!

    assert(Option(1).flatMap(x => Option(x + 2)) == Option(3))
    
    assert(Option(1).flatMap(x => None) == None)
    
  3. scanLeft and scanRight

    Scala's `scan` functions enable you to do folds like foldLeft and foldRight, while collecting the intermediate results

    assert(List(1, 2, 3, 4, 5).scanLeft(0)(_ + _) == List(0, 1, 3, 6, 10, 15))
    
  4. Stack Safety

    Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.

    This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)

    In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.

    def sum(from: Int, until: Int): Int =
      if (from == until) until else from + sum(from + 1, until)
    
    def thisWillSucceed: Int = sum(1, 5)
    
    def thisWillFail: Int = sum(1, 300)
    
  5. State machine

    A state machine is the use of `sealed trait` to represent all the possible states (and transitions) of a 'machine' in a hierarchical form.


Scala Algorithms: The most comprehensive library of algorithms in standard pure-functional Scala

Think in Scala & master the highest paid programming language in the US

Scala is used at many places, such as AirBnB, Apple, Bank of America, BBC, Barclays, Capital One, Citibank, Coursera, eBay, JP Morgan, LinkedIn, Morgan Stanley, Netflix, Singapore Exchange, Twitter.

Study our 116 Scala Algorithms: 6 fully free, 65 published & 51 upcoming

Fully unit-tested, with explanations and relevant concepts; new algorithms published about once a week.

  1. Find minimum missing positive number in a sequence
  2. Longest increasing sub-sequence length
  3. Compute the length of longest valid parentheses
  4. Monitor success rate of a process that may fail
  5. Remove duplicates from an unsorted List
  6. Find combinations adding up to N (unique)
  7. Find k closest elements to a value in a sorted Array
  8. Make a queue using stacks (Lists in Scala)
  9. Single-elimination tournament tree
  10. Quick Sort sorting algorithm in pure immutable Scala
  11. Compute a Roman numeral for an Integer, and vice-versa
  12. Matching parentheses algorithm with foldLeft and a state machine
  13. Traverse a tree Breadth-First, immutably
  14. Read a matrix as a spiral
  15. Remove duplicates from a sorted list (state machine)
  16. Merge Sort: stack-safe, tail-recursive, in pure immutable Scala, N-way
  17. Binary search a generic Array
  18. Merge Sort: in pure immutable Scala
  19. Make a queue using Maps
  20. Is an Array a permutation?
  21. Count number of contiguous countries by colors
  22. Add numbers without using addition (plus sign)
  23. Tic Tac Toe MinMax solve
  24. Run-length encoding (RLE) Encoder
  25. Print Alphabet Diamond
  26. Balanced parentheses algorithm with tail-call recursion optimisation
  27. Reverse a String's words efficiently
  28. Count number of changes (manipulations) needed to make an anagram with foldLeft and a MultiSet
  29. Count passing cars
  30. Counting inversions of a sequence (array) using a Merge Sort
  31. Longest common prefix of strings
  32. Check if an array is a palindrome
  33. Check a directed graph has a routing between two nodes (depth-first search)
  34. Compute nth row of Pascal's triangle
  35. Run-length encoding (RLE) Decoder
  36. Check if a number is a palindrome
  37. In a range of numbers, count the numbers divisible by a specific integer
  38. Find the index of a substring ('indexOf')
  39. Reshape a matrix
  40. Closest pair of coordinates in a 2D plane
  41. Find the contiguous slice with the minimum average
  42. Compute maximum sum of subarray (Kadane's algorithm)
  43. Pure-functional double linked list
  44. Binary search in a rotated sorted array
  45. Check if a directed graph has cycles
  46. Rotate Array right in pure-functional Scala - using an unusual immutable efficient approach
  47. Length of the longest common substring
  48. Tic Tac Toe board check
  49. Find an unpaired number in an array
  50. Check if a String is a palindrome
  51. Count binary gap size of a number using tail recursion
  52. Remove duplicates from a sorted list (Sliding)
  53. Find sub-array with the maximum sum
  54. Find the minimum absolute difference of two partitions
  55. Find maximum potential profit from an array of stock price
  56. Fibonacci in purely functional immutable Scala
  57. Fizz Buzz in purely functional immutable Scala
  58. Find combinations adding up to N (non-unique)
  59. Make a binary search tree (Red-Black tree)
  60. Count factors/divisors of an integer
  61. Compute single-digit sum of digits
  62. Traverse a tree Depth-First
  63. Reverse bits of an integer
  64. QuickSelect Selection Algorithm (kth smallest item/order statistic)
  65. Rotate a matrix by 90 degrees clockwise

Explore the 21 most useful Scala concepts

To save you going through various tutorials, we cherry-picked the most useful Scala concepts in a consistent form.

  1. Class Inside Class
  2. Class Inside Def
  3. Collect
  4. Def Inside Def
  5. Drop, Take, dropRight, takeRight
  6. foldLeft and foldRight
  7. For-comprehension
  8. Lazy List
  9. Option Type
  10. Ordering
  11. Partial Function
  12. Pattern Matching
  13. Range
  14. scanLeft and scanRight
  15. Sliding / Sliding Window
  16. Stack Safety
  17. State machine
  18. Tail Recursion
  19. Type Class
  20. View
  21. Zip

Register now (free)