EMC Developer Network
 


View All Articles

Producer-Consumers Revisited

Posted by Aashish Patil (patil_aashish AT emc.com) on July 29, 2007

Abstract

This article discusses some implementation scenarios of the classic producer-consumer problem in computer science. Although, the article uses Repoint and Documentum Query Interfaces as examples the article is applicable to generic thick client UIs and data stores.

Introduction

Recently, during EMC World I received some complaints that Repoint's Query View was slow for long running queries as compared to other Documentum Query Language (DQL) editors. As a result, I've been updating Repoint to improve the query performance.

Background

DFC stands for Documentum Foundation Classes, which is the primary API to access the Documentum Content Server functionality. SWT is the Standard Widget Toolkit, the UI toolkit on which the Eclipse is built. SWT can be also used to build independent applications outside of the Eclipse framework.

Current Implementation

The Query View executes queries using Documentum Foundation Classes(DFC) IDfQuery interface and dumps the results in a Standard Widget Toolkit(SWT) Table. After executing the query, the view collects all results in memory (i.e. exhausts the collection) and then dumps them all in the UI. This was primarily done to accommodate the TableViewer#setInput(...) method, which provides all the contents of a table at once. For large collection sets this locks up the UI for a long time because exhausting a collection can take time. Although Repoint shows a progress dialog, the dialog still runs in the UI thread and thus locks the UI till the collection is emptied.

Instead of the above approach, it would be better to stream the results to the UI. This way a user can start using the results as soon as they are available and the UI is not locked up.

UI and Threads

Repoint is built on top of Eclipse, which uses SWT. The UI runs in its own single thread. My understanding is that most UI architectures do the same. Basically, in any GUI application there is an event loop that goes through a queue of events and acts upon those events one at a time. Thus, any work done in the event handler for one of these events must happen quickly in order to keep the UI responsive. Any event handler always runs in the UI thread. However, long running operations like query execution should run in their own thread separate from the UI thread. Thus, the event handler's job is to spawn off this thread and immediately return control back to the UI.

Note:Single UI thread is the most common model for SWT although it is possible to run multiple UI threads.

UI Update in SWT

The thread executing the query cannot update the UI because it is a separate thread from the UI thread. In SWT, the way to update the UI is to embed the UI update logic in a java.lang.Runnable, which is then handed over to the Display.asyncExec(...) or Display.syncExec(...) method. These methods then interrupt the UI thread and execute the logic within the java.lang.Runnable. The first method, as its name implies, interrupts the UI asynchronously. I used asyncExec since the calling thread is not waiting for any result from the UI thread after the update is over. The logic within the Runnable which is handed off to the asyncExec(...) method should not be long running otherwise the UI thread will get locked.

Streaming Results

The correct way to stream results is to run and execute the query in a thread separate from the UI thread and then update the UI only when there is something to display. This allows a user to start interacting with the UI even while the query is executing. This situation is a manifestation of the classic producer-consumer problem in which a producer writes to a buffer and a consumer reads from this buffer whenever data is available. The background thread writes to a buffer and periodically another thread interrupts UI thread and updates the UI.

In Repoint's case, the eventual consumer (UI Thread) does not monitor the buffer. Instead the producer itself wakes interrupts the consumer when the buffer size reaches a certain level. The producer spawns off a new worker thread that interrupts and updates the consumer (UI) when the buffer is full. The worker thread in this case refers to the java.lang.Runnable handed off to the Display.asyncExec() method. It is not possible to have just one worker because a worker needs to update the UI quickly and return without blocking the UI for a long time.

Using a Buffer

Should the background thread update the UI for every element that it reads from the collection? Interrupting UI is expensive because there is a context switch to the UI thread. Doing the context switch for updating just one element seems expensive. It would be better to buffer a certain number of results and then update the UI. Thus, the code should take full advantage use of the context switch. In addition, constant interruptions could also disturb user experience though the interrupt is probably too short to do this.

The size of the buffer would depend on how fast the background thread is reading results from the data store. If the results are being read slowly, then a small buffer size would make sense. This is because if a large buffer is used when the latency is high (the background thread would wait longer for the buffer to fill up before interrupting/updating the UI), the UI won't be getting updated for a long time causing a bad user experience.

Queue Implementation

One implementation is to use the classic queue structure that works on the First In First Out (FIFO) principle. The producer writes to the queue and consumer threads read from this queue. In this case, the producer spawns off consumer threads to read from the queue. Even though multiple threads get spawned, each is always reading from the queue head and thus results are ordered. Since all threads access the same memory area, there is a synchronization penalty. This is partially due to Java 1.4 limitations which make it impossible to concurrently do put and get operations from a collection. Java 5 offers alternatives to this. However, synchronization is still needed when reading the queue head by multiple consumers.



Fig - Queue Implementation

Repoint's Query View uses this approach. A user expects to see the results ordered exactly the way the content server returns them. Thus ordering is important here. The collection used in this case was a LinkedList because there is no Queue structure in 1.4. A linked list is used to simulate a Queue - removeFirst() pops the first item in the queue and add(...) adds the object to the end of the queue. A linked list was chosen over ArrayList because the code does not need indexed access to the list and needs access to just the head or tail of the queue.

Multiple Buffers Implementation

Another way of implementing this scenario is to use multiple one-time buffers. Thus, use a buffer of fixed size (say an array). Every time the buffer size is full the producer creates a new buffer and hands off the old one to a consumer thread. This way there is no problem of synchronization as a buffer is always used at any time by only one thread. The drawback to this is that since there are multiple consumer threads writing from multiple buffers to a single UI resource, there is no guarantee of order of results dumped in the UI. For example buffer 2 thread might get scheduled before buffer 1 thus causing later results to be displayed first. Another important bookkeeping that should be done in this method is disposal of buffers. Once a consumer has finished reading from a buffer, the memory needs to be released back to the pool. The Java GC should be able to handle this well. The advantage of this approach is that no synchronization overhead is needed. Thus, the background thread never gets blocked when updating the buffer with new data. It always has a ready buffer to write to.



Fig - Multiple Buffers

Repoint's folder contents view uses this approach. It is not necessary to maintain the order of the query results here. The viewer sorts elements based on columns in the view.

Concurrency

Threads are popular means to attain concurrency in most programming languages. Threading was a moot point on client machines with single CPUs. In reality, because of the existence of a single CPU, only one thread could execute on the CPU at a time. However, now with most of laptops and desktops coming with multi-core CPUs, threading is a very real and necessary technique to improve an application's performance, which cannot rely solely on increasing CPU speeds. Multi-cores affect developers because they will need to develop their applications with threading in mind.

Discuss

Discuss this article in our EDN forums

Acknowledgements

I want to thank Gary Frankel, Architect from EMC Engineering. A discussion with him helped clear out some fundamental concepts.

View All Articles