Run-length encoding is one of the most basic compression methods, which is especially useful where there long runs of a particular character or a group of characters. It could be particularly useful for example in column-based databases which have potentially many repeating values.
Run-length turns a sequence of characters into effectively a sequence of count-character pairs. For example,
See RunLengthEncoding for the Encoder.
Test cases in Scala
assert(decodeString("1W") == "W") assert(decodeString("3W") == "WWW") assert(decodeString("3W1A1W") == "WWWAW") assert(decodeString("3W1A2B2W") == "WWWABBWW") assert( decodeString("x3W1A2B2W") == "WWWABBWW", "Invalid character in front is ignored" ) assert( decodeString("3W1A2B2") == "WWWABB", "Invalid character at the back is ignored" ) assert( decodeString("13Y") == "YYYYYYYYYYYYY", ">10-strong RLE is emmitted correctly" )
Algorithm in Scala
21 lines of Scala (version 2.13), showing how concise Scala can be!
While this can be implemented in Tail Recursion, we will implement in a streaming style because:
- It is more complex to read and reason about
- It it is not possible to make it work in a streaming fashion (because who wants to run out of memory?).
In a functional/immutable style approach, we must think in terms of transformations , and in this particular algorithm, because we need to parse the input, we need to consider that either an input is a digit or a non-digit, then fuse any digits into numbers of occurrences (because we do wish to be able to handle inputs like
13Y); and finally, combine digit-char pairs and emit them as a
String. (this is © from www.scala-algorithms.com)
Scala concepts & Hints
'collect' allows you to use Pattern Matching, to filter and map items.
Pattern matching in Scala lets you quickly identify what you are looking for in a data, and also extract it.
scanLeft and scanRight
Scala's `scan` functions enable you to do folds like foldLeft and foldRight, while collecting the intermediate results
Sliding / Sliding Window
Get fixed-length sliding sub-sequences (sliding windows) from another sequence
Stack safety is present where a function cannot crash due to overflowing the limit of number of recursive calls.
This function will work for n = 5, but will not work for n = 2000 (crash with java.lang.StackOverflowError) - however there is a way to fix it :-)
In Scala Algorithms, we try to write the algorithms in a stack-safe way, where possible, so that when you use the algorithms, they will not crash on large inputs. However, stack-safe implementations are often more complex, and in some cases, overly complex, for the task at hand.
.viewsyntax creates a structure that mirrors another structure, until "forced" by an eager operation like .toList, .foreach, .forall, .count.