Ruminations of a Programmer: Functional Data Structures in a Persistent Setting

In the last post we saw how Okasaki's BatchedQueue does not fit in the model of persistence. Okasaki showed that the data structure gives you an amortized O(1) for tail but strictly in an ephemeral way where we have only one logical future for a sequence of operations. Let's consider a hypothetical situation with the scenario that we left with in the last post. The main problem why the data structure failed in the context of persistence was that by the time it reached the realization stage of logical future #2, we were left with no credits to pay for it. But what happens if we can ensure that the credits for logical future #2 has already been paid for before. That is, logical future #2 could reuse the computation that logical future #1 had already done before.

Enter Memoization and call-by-need. Implementing the functional queue in a lazy language that supports call-by-need (like Haskell) will enable us to reuse the computation that one logical future has already performed.

It's called a BankersQueue that gives us amortized O(1) in a persistent setting. Here's the snippet that implements the data model..

module BankersQueue (BankersQueue) where

  import Prelude hiding (head, tail)

  import Queue



  data BankersQueue a = BQ Int [a] Int [a]

The basic stuff is the same as the BatchedQueue. In the implementation of the BatchedQueue in a strict language, we would use lists with strict evaluation semantics. BankersQueue need lazy lists and call-by-need semantics. In addition to the two lists, the BankersQueue also maintains the lengths of the lists as part of the data structure. This information will be used when we try to determine when to reverse the rear and append it to the front.

Lazy functional amortized data structures are not easy to analyze for time complexity. The main difficulty is the fact that operations don't get executed when they seem to. Instead we have suspensions or thunks that get created, only to be forced when actually needed. Okasaki's framework based on debits suggests that suspensions can be forced only if the debts assigned to them (proportional to the shared cost of the operation) has been paid off. I am not going into the details of explaining the banker's method framework. The basic idea is that you can enjoy your pie only if you have completely paid for it. There has to be a linear and sequential ordering between the following three stages in the lifecycle of a suspension.

Create a Suspension (1) => Payoff a Suspension (2) => Force a Suspension (3)

In the implementation for BatchedQueue we were doing the reverse of the rear only when the front becomes empty. In the banker's method based on debits, doing so will not give us sufficient time to pay for the entire reverse operation. Remember the crucial thing is to ensure that the following sequence is honored:

calling reverse creates a suspension and we assign a debt to it proportional to the shared cost of the operation. In our case it's the length of the rear list.
operations must discharge debits enough to pay off the debt assigned to the suspension for reverse.
actually execute reverse since it's debt has been paid off.

In order to ensure that we pay off the debt assigned to reverse, we have to discharge an equivalent number of debits before the suspension for reverse is forced. And since the debt on the suspension is |rear|, this is possible only if we force the reverse after |rear| operations. Every tail discharges one debit off the front and we need to force the reverse after |front| operations. Which means that, after |front| operations we must be ready to execute reverse assuming that we have paid off all debts associated with it. This is possible if we create the suspension when |rear| just exceeds |front|. The following changes to the check function ensures this ..

check lenf f lenr r =

  if lenr <= lenf then BQ lenf f lenr r

  else BQ (lenf + lenr) (f ++ reverse r) 0 []

Have a look at the following diagram that shows how reverse gets it done in O(1) amortized in a persistent setting.

Note how laziness with call-by-need is the secret sauce behind the numbers. BankersQueue can also be implemented in languages that offer optional laziness. Scala can implement it with Stream, which is a lazy sequence and lazy val, that offers call by need semantics. However an interesting benchmark can be done with the corresponding implementation in Haskell. It remains to be seen how the overhead of creating and managing suspensions can have an impact on the runtime complexity.

In the next post on this series I will discuss yet another data structure that offers O(1) amortized functional queue and deque implementations. It's actually a tree structured model and can be malleable enough to offer a host of other data structure implementations as well.

Ruminations of a Programmer

Monday, June 14, 2010

Functional Data Structures in a Persistent Setting - Part 2: Okasaki's BankersQueue

No comments: