Behavior-Driven vs. Data-Driven
A Non-Issue?

Desmond D'Souza

Introduction

In this series of articles we will present some unique ways to motivate and teach modeling of objects, ranging from simple programming level classes to complex components, based upon the key principles used in the Catalysis method.

In this first article in the series, we will examine the long-standing controversy between methodologies that are data-driven vs. those that are behavior driven. While not trying to reconcile specific methodologies, we will show how, using an interpretation of "type-models" as defined by the Catalysis method, the controversy is largely irrelevant. Our new interpretation will make it easier for beginners to understand the justification for certain kinds of models, and will equip them with concrete steps to create initial models based primarily upon behaviors. The main points of this article (and its immediate follow-ups) may be summarized as:

The Problem

There has been much discussion about data-driven vs. behavior driven methodologies. Certain methods have been characterized as behavior driven (CRC, Booch, OBA, etc.), while others have been characterized as data-driven (OMT, Shlaer&Mellor). Some object purists reject data-driven approaches on the basis that objects are primarily about responsibilities and behaviors, and about hiding implementations. On the other hand, data-modeling people find it hard to understand how considerations of data can be so thoroughly avoided (or deferred). So, let us step back for a bit of perspective on objects as pure behaviors, objects as black boxes, and introduce the notion of a type model.

Objects are characterized by their visible behaviors, and object encapsulation tells us that we should hide details of how an object is implemented, including its data representations. Rather, we should concentrate on describing its responsibilities and services, and try to do so without describing how those services might be implemented.

Still, if we try to describe a component by purely a list of services it offers with the service names and parameters, it leaves too much unsaid. For example, I might claim that all windows provide a "move" operation. One implementation might require that the Window be in the foreground before this move operation is performed, otherwise the operation has no effect. Another version might automatically bring the Window to the foreground and then move it. Yet another might temporarily bring it to the foreground, move it, and then return it to its original layer. These are not mere implementation variations, or things we can pass off under the banner of polymorphism. There might well be client software components that will work with one version of window but not the others.

Because of this problem, most methods, both in the data-driven and behavior driven camps, admit some variation of the notion of 'attribute'. OBA describes 'attribute' as something the object knows about, deferring the decision of whether this is stored or computed. CRC distinguishes between things an object is responsible for 'knowing' and things is 'does'. OMT interprets an attribute as data stored in an instance of a class.

On reflection, we realize that some notion of 'knows' is required, if only to understand what the object 'does'. To even understand and specify the behaviors of an object, we need at least a conceptual model of what happens to that object as a result of those behaviors being exercised. This conceptual model describes the 'knowledge' that object must have in some form, and constitutes an abstract model of its state. Note that the interpretation of attributes as stored data, as done by some leading methodologies, while appropriate for models which reflect C++ code, is not appropriate for any kind of abstract analysis.

In the case of the window, if we admit the notions of the window layer (or at least some abstract boolean attribute foreground) and window location, we can now precisely say the following statement, correctly eliminating other incorrect versions:

When a window is moved by X, if it was in the foreground, then its resulting location is changed by X (otherwise nothing happens to that window).

The Issues of the "Black Box"

Most interesting objects are "stateful" i.e. they have some state information which either persists or changes across method invocations. It is very difficult to precisely characterize the behavior of such objects as pure 'black boxes'. This holds for any stateful component, including those which are not implemented as simple instances of OOPL classes.

Here is a simple progression we use in our tutorials to make this point very clearly. I will present a sequence of three different "black-box" components, to see which of them you would be comfortable with using as-is.

Given the following 'black-box' component definition, would you use this component?

Could you use such a component? No, because you don't have a clue what it means.

How about this one below?

The usual answer is "Sure". However, underlying this casual response we discover a variety of implicit assumptions. Are the elements returned by 'pop' related to the elements provided by 'push'? Are they identical elements? If I modify an element that I pop, am I modifying the element that was originally pushed? Are they similar or identical copies? What assumptions underlie correct operation of push and pop?

Just to force the point, how about using this much more complex component? It has quite a different flavor to it, doesn't it? Yet, most of us would not hesitate to answer "No, Yes, and No", respectively, to these 3 components.

Even though a Stack appears like such a trivial, academic-looking part, let us try to take apart the assumptions behind using it. In the process, we will learn some very valuable things about the need for modeling abstract 'attributes', parameterized attributes, as well as familiar visual models depicting types, attributes, and associations. In case you are tempted to skip reading any article which talks about things as boring as a stack, please take a quick look at the later section, titled Common Reactions.

Assuming that the only public operations we wish to support are push and pop, lets see what we would really like to say about push and pop. We will do this by examining a scenario for usage of the stack, and sketch out a conceptual model the client might have of the progression the stack goes through.

Notice that we will start with considerations of behavior, but will immediately and iteratively use that behavior to motivate models that appear more structural in nature.

We would like to describe this stack in a manner precise enough to eliminate the incorrect implementation above, while still remaining at an abstract level and not imposing a specific implementation.

Attributes as Abstract Model "Queries"

We can now try to describe this more precisely:
Stack::push (e: Element)
post: "stack grows and element e has become
the top element in the stack."
As soon as we say this, we have admitted the notions of top of stack and size into our conceptual notion of the stack. We go ahead and add it to a section titled "model". In order to prevent a misreading of this as implemented or stored attributes, we recommend reading the model section as written in the figure, i.e. "the notions of...".

We will use this 'model' section of the component as the only vocabulary we are permitted to use to document its services precisely, e.g. using pre and post-conditions. We can formalize push with:

Stack::push (e: Element)
post: "e is the element on top and size has increased"
e = top & size = old(size) + 1

Note that we deliberately wrote this as e=top, instead of top=e. These two mean exactly the same when specifying an interface, but the latter is too often read as "assign to a variable which is an implemented piece of storage named top". We use the notation old(size), to refer to the size before the operation.

Note also that, in order to be able to refer to size+1, we require that implementor and client have a shared notion of "+". One convenient way of doing this is to assume that size is of type integer. This choice does not imply an implementation choice, but a convenience chosen to ease the descriptions of push and pop.

After the first operation in the scenario, we can be confident that top = a; after the second top = b. At the third operation pop, the returned value should be b. We can generalize this by saying:
Stack::pop (out e: Element)
post: "returned element was from top & stack has shrunk"
e = top & size = old(size) - 1

What should happen at the next pop? Our snapshots suggests that a should be popped next. However, our models limit us to discussing the size and top of the stack, and do not permit us to describe the previous a in any sense.

In fact, our existing models and specification of push and pop permit an implementation in which every push replaces and loses the current top, as depicted in this figure!

Clearly, what is missing is the statement that, on a pop, the previous top element is returned, and the second-to-top element has become the top element. Of course, after the next pop, the third-to-top element becomes the top. In other words, we have some notion of the sequence in which the elements were pushed.

Parameterized Model "Queries"

We will model this in a very specific way to make a point about treating attributes as queries, thereby naturally supporting the notion of queries with parameters. Rather than simply postulating that there is some notion of a List, we will simply say: there is a notion of which element was pushed (and not yet popped) in which sequence. In order to do this, we also need some notion of the number of elements in the stack.

We can now try to specify push:

Stack::push (e: Element)
post: e = top & size = old(size) + 1 & e = elementAt(size)
We can now try to specify pop:
Stack::pop (out e: Element)
post: top = e & size = old(size) - 1
However, we expect the notion of top to have changed after the pop. Hence, we wish to refer to the top element before the pop. We do this with:
Stack::pop (out e: Element)
post: old(top) = e & size = old(size) - 1
The old(top) simply means the value of top before this operation.

We can either say that top = elementAt(size) is an invariant, or explicitly state it separately in the post-conditions of every operation. Since it should be true at all times, we stated it as an invariant in Figure 9. If we do this, we do not need to repeat it in the post-condition of push, since top=e implies elementAt(size)=e.

Type Models

A type is a characterization of the visible behavior of some set of objects. Any object which conforms to this characterization is called a "member", or, loosely, "instance" of that type. A type makes absolutely no statement about implementation, strictly avoiding data representation, stored attributes, and method implementations.

A class can implement a type; more accurately, one or more classes can implement one or more types. Classses introduce implementation choices, including data members (instance variables), method bodies, and implementation inheritance. This crucial distinction is, surprisingly, not recognized by the more popular methodologies today.

As we have seen, a precise description of behaviors requires some conceptual model of that component. This model formally defines what any implementation of this component must "know" (in some form or the other), and corresponds to a set of typed queries which are agreed to by client and implementor, but which are necessarily directly "callable" by the client.

The Catalysis method uses the term type model to describe models such as those shown in Figure 9, since such a model exists to help characterize the type of the containing component. In particular, the models do not describe classes. Also, as we will show in a subsequent column, these models are usually depicted visually.

Catalysis makes a clear distinction between type and class. Recent languages like Java also make this distinction very clearly. Other languages, such as C++ and Eiffel, tend to equate classes and types, and rely on programming conventions such as abstract classes, to distinguish them. This can easily lead to some unnecessary implementation decisions and coupling making their way into client code. The Smalltalk language treats classes and quite orthogonal to types (although many Smalltalk texts do not make this in the least bit clear!).

A hypothetical Re-construction

If you will bear with me, I would like to walk you through a hypothetical dialog between a developer and a client, in which a model gradually emerges. Let us pretend that we are discussing this Stack component, and wish to agree on precisely what it must do. You are playing the client (C), and I the implementor/developer/analyst (I).

As a very first step, we might recognize the notion that when an item is pushed, the stack grows.
I: "Tell me a bit about how you understand pushing onto the stack."

C: "Well, for one thing, when I push an element, it should go into the stack."

I: "What would that mean, for a stack which already contained some elements?"

C: "Well, the stack should grow to accommodate the new element."

I: "Would you say you have a fairly clear notion of the size of the stack? Could I call it 'count' instead of size?"

C: "Sure; it represents the number of elements in the stack."

I: "Is the count of the stack something which you need to access externally?"

C: "No, I will simply push and pop, and I'm not really interested in its count. Of course, I do expect the stack to grow and shrink"

I: "OK, so we will not need count as part of the interface, but we do need to share it in our mutual vocabularies, so we will add it to our shared conceptual model."

At this point, we update our mutual model of what a stack is.
I: "Would that be enough?"

C: "I think so"

I: "Well, would the following implementation work?" (Figure 5)

C: "Definitely not, I expect the last element pushed in to be popped"

I: "Aha I don't think we can say that simply in terms of count it looks like we need some notion of the last element pushed"

And so the dialog proceeds, uncovering the fact that a push grows the stack and changes the top element, and a pop returns the top element and shrinks the stack. But what is the top after I do a pop? Clearly, this only permits us to describe the very first pop, and not any subsequent ones, unless I have some notion of the second-last un-popped element, and the third-last, and... i.e. the sequence of pushed elements.

Of course, when I ask students up front whether we needed a notion of sequence of elements, they often feel that this is unnecessarily discussing implementation. There is an unfortunate tendency to equate unambiguity and precision with "implementation detail". This is particularly noticeable in some "analysis" efforts, when teams are comfortable asking high-level questions, and providing high-level answers with a worrisome degree of hand-waving, but as soon as a specific issue arises which needs precision and possibly detail, they shy away from it, claiming it is an 'implementation issue'!

Common Reactions

This development of a model for a stack draws many common reactions. One of them is: "This is too painful. I cannot find and write down such lists of attributes and parameterized attributes. Even if I could, I could never communicate it with my customers." To these I say: "Patience!". We will see soon that these models do not have to be depicted as flat, unstructured lists of queries. In fact, most interesting models will be depicted using very familiar visual models including associations, attributes, states, etc. We can even use GUI sketches or prototypes to validate such models. Moreover, the pieces of formal expressions are best used to complement a prose description, made precise by the model.

Another very common reaction is: "Stacks are so boring. When we do our modeling, using OMT/Booch/Fusion/etc., we simply assume that we have primitives like stacks, lists, and the like. We are really more interested in complex systems, and are not in building stacks or lists." Unfortunately, if we cannot clearly state what something as simple as a Stack does, how well are we likely to communicate more complex components? If you just look back at the initial series of figures in this article, you will see what I mean. If a Stack seems too academic, consider the simple Window described in the first section. Or, look at one of the Catalysis case-studies. In practice, formalizing at least selected operations is one of the surest ways of validating the model and testing it for adequacy.

Reflect for a moment upon the implications of avoiding these issues initially. Every one of these issues will necessarily have to be resolved sooner or later. The more of such issues we uncover early, the more likely we are to produce a high-level model that is, in fact, robust and capable of adequately serving the analysis and design. The more we defer such issues, the more likely we are to discover them after the high-level models have become entrenched in other parts of the architecture. Due to the re-work involved, we might well abandon the high-level models and "fix it in the code". For any system with a significant product lifetime and a dominant maintenance cost, this essentially means that the high-level models will be useless except on some poorly defined "first-pass".

Another common reaction: "but I might not want clients to be able to get at the size of a stack, or to access elements at any position". This is absolutely correct. There is no implication that a client has access to the 'model' elements of a component. However, both client and developer have to work off a common conceptual model to even define what the component must do.

Yet another common reaction is: "But, I can implement a Stack without using a list!". To these I say: "Sure you can. However, the model says that you cannot implement a stack without some notion of which was the last (second-last, ) element pushed but not yet popped. For example, you could implement it by simply throwing all elements into some un-ordered collection as they are pushed, together with a time-stamp. To pop, you could simply remove and return the element with the most recent time-stamp. In doing this, you actually have a very precise notion of the last (second-last,) element pushed: it is the element in the collection with the most (second-most, ) recent time-stamp.

Summary

We have illustrated some of the elements of justifying and interpreting type-models, as used in the Catalysis method. The behavior required from some component induces a type model of that component. Thus, the strong distinction previously made between behavior-driven vs. data-driven methodologies is not a critical factor for Catalysis. Because of this fundamental fact, Catalysis can provide very concrete steps for building type models based upon behaviors, and for checking them for adequacy and consistency.

Catalysis is based on 3 key principles: collaborations and mutual models; multiple views and composition; and refinement of collaborations and models. These principles build in a very direct way upon the foundation of type models, and we will discuss them in detail in subsequent columns.
Catalysis provides a simple yet powerful mechanism for composing views and precisely describing the essential dependencies between them. This allows us to achieve early reuse by composing existing collaborations patterns, to divide and conquer complex problems more easily, and to exploit framework-style techniques from models to implementation.

Acknowledgments

This article contains materials from course materials and tutorials by ICON Computing, and have been reproduced by permission. It also contains excerpts from the text book on Catalysis, which is currently in preparation.

References

"Practical Rigor and Refinement", D. D'Souza and A. Wills, in "Fusion in the Real World", Prentice Hall, 1995, D. Coleman et al (Eds.)

"Catalysis: Practical Rigor and Refinement", at URL:http://www.iconcomp.com