In Streaming data, data may come in duplicated; it could be due to various factors such as duplicated data from sources and idempotency for redundancy; for consumption though we may need to deduplicate the data for at-most-once processing. Some deduplicators retain state of long-gone elements (in which case .distinct will suffice, but have a memory cost), but in this case we are looking at only consecutive duplicate elements.
This is an alternative approach to RemoveDuplicatesFromSortedListStateMachine
Test cases in Scala
Algorithm in Scala
23 lines of Scala (compatible versions 2.13 & 3.0), showing how concise Scala can be!
The approach here is a little different to the State Machine approach:
The idea is that by running a sliding window, we can check pairs of elements for their equality, and if they are not equal, emit the newly found one. (this is © from www.scala-algorithms.com)
Scala concepts & Hints
'collect' allows you to use Pattern Matching, to filter and map items.
The 'Option' type is used to describe a computation that either has a result or does not. In Scala, you can 'chain' Option processing, combine with lists and other data structures. For example, you can also turn a pattern-match into a function that return an Option, and vice-versa!
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
Sliding / Sliding Window
Get fixed-length sliding sub-sequences (sliding windows) from another sequence