Behavior-Driven vs. Data-Driven
A Non-Issue?
Desmond D'Souza
Introduction
In this series of articles we will present some unique ways to motivate
and teach modeling of objects, ranging from simple programming level classes
to complex components, based upon the key principles used in the Catalysis
method.
In this first article in the series, we will examine the long-standing controversy
between methodologies that are data-driven vs. those that are behavior driven.
While not trying to reconcile specific methodologies, we will show how,
using an interpretation of "type-models" as
defined by the Catalysis method, the controversy is largely irrelevant.
Our new interpretation will make it easier for beginners to understand the
justification for certain kinds of models, and will equip them with concrete
steps to create initial models based primarily upon behaviors. The main
points of this article (and its immediate follow-ups) may be summarized
as:
- Pure behavioral "black boxes" still need their behaviors to
be specified
- Given a conceptual model, behaviors can be described precisely
- Attributes are not enough; parameterized attributes are needed
- Parameterized attributes are depicted visually as "type models"
- The model provides a precise vocabulary for defining contracts
- Any collaboration requires mutual models
- Models from multiple "views" may be composed
- Collaborations may be refined in mutually interesting ways
The Problem
There has been much discussion about data-driven vs. behavior driven methodologies.
Certain methods have been characterized as behavior driven (CRC, Booch,
OBA, etc.), while others have been characterized as data-driven (OMT, Shlaer&Mellor).
Some object purists reject data-driven approaches on the basis that objects
are primarily about responsibilities and behaviors, and about hiding implementations.
On the other hand, data-modeling people find it hard to understand how considerations
of data can be so thoroughly avoided (or deferred). So, let us step back
for a bit of perspective on objects as pure behaviors, objects as black
boxes, and introduce the notion of a type model.
Objects are characterized by their visible behaviors, and object encapsulation
tells us that we should hide details of how an object is implemented, including
its data representations. Rather, we should concentrate on describing its
responsibilities and services, and try to do so without describing how those
services might be implemented.
Still, if we try to describe a component by purely a list of services it
offers with the service names and parameters, it leaves too much unsaid.
For example, I might claim that all windows provide a "move" operation.
One implementation might require that the Window be in the foreground before
this move operation is performed, otherwise the operation has no effect.
Another version might automatically bring the Window to the foreground and
then move it. Yet another might temporarily bring it to the foreground,
move it, and then return it to its original layer. These are not mere implementation
variations, or things we can pass off under the banner of polymorphism.
There might well be client software components that will work with one version
of window but not the others.
Because of this problem, most methods, both in the data-driven and behavior
driven camps, admit some variation of the notion of 'attribute'. OBA describes
'attribute' as something the object knows about, deferring the decision
of whether this is stored or computed. CRC distinguishes between things
an object is responsible for 'knowing' and things is 'does'. OMT interprets
an attribute as data stored in an instance of a class.
On reflection, we realize that some notion of 'knows' is required, if only
to understand what the object 'does'. To even understand and specify the
behaviors of an object, we need at least a conceptual model of what happens
to that object as a result of those behaviors being exercised. This conceptual
model describes the 'knowledge' that object must have in some form, and
constitutes an abstract model of its state. Note that the interpretation
of attributes as stored data, as done by some leading methodologies, while
appropriate for models which reflect C++ code, is not appropriate for any
kind of abstract analysis.
In the case of the window, if we admit the notions of the window layer (or
at least some abstract boolean attribute foreground) and window location,
we can now precisely say the following statement, correctly eliminating
other incorrect versions:
When a window is moved by X, if it was in the foreground,
then its resulting location is changed by X (otherwise nothing happens to
that window).
The Issues of the "Black Box"
Most interesting objects are "stateful" i.e. they have some state
information which either persists or changes across method invocations.
It is very difficult to precisely characterize the behavior of such objects
as pure 'black boxes'. This holds for any stateful component, including
those which are not implemented as simple instances of OOPL classes.
Here is a simple progression we use in our tutorials to make this point
very clearly. I will present a sequence of three different "black-box"
components, to see which of them you would be comfortable with using as-is.
Given the following 'black-box' component definition, would you use this
component?
Could you use such a component? No, because you don't have a clue what
it means.
How about this one below?
The usual answer is "Sure". However, underlying this casual
response we discover a variety of implicit assumptions. Are the elements
returned by 'pop' related to the elements provided by 'push'? Are they identical
elements? If I modify an element that I pop, am I modifying the element
that was originally pushed? Are they similar or identical copies? What assumptions
underlie correct operation of push and pop?
Just to force the point, how about using this much more complex component?
It has quite a different flavor to it, doesn't it? Yet, most of us would
not hesitate to answer "No, Yes, and No", respectively, to these
3 components.
Even though a Stack appears like such a trivial, academic-looking part,
let us try to take apart the assumptions behind using it. In the process,
we will learn some very valuable things about the need for modeling abstract
'attributes', parameterized attributes, as well as familiar visual models
depicting types, attributes, and associations. In case you are tempted to
skip reading any article which talks about things as boring as a stack,
please take a quick look at the later section, titled Common Reactions.
Assuming that the only public operations we wish to support are push and
pop, lets see what we would really like to say about push and pop. We will
do this by examining a scenario for usage of the stack, and sketch out a
conceptual model the client might have of the progression the stack goes
through.
Notice that we will start with considerations of behavior,
but will immediately and iteratively use that behavior to motivate models
that appear more structural in nature.
We would like to describe this stack in a manner precise enough to eliminate
the incorrect implementation above, while still remaining at an abstract
level and not imposing a specific implementation.
Attributes as Abstract Model "Queries"
We can now try to describe this more precisely:
Stack::push (e: Element)
post: "stack grows and element e has become
the top element in the stack."
As soon as we say this, we have admitted the notions of top of stack and
size into our conceptual notion of the stack. We go ahead and add it to
a section titled "model". In order to prevent a misreading of
this as implemented or stored attributes, we recommend reading the model
section as written in the figure, i.e. "the notions of...".
We will use this 'model' section of the component as the only vocabulary
we are permitted to use to document its services precisely, e.g. using pre
and post-conditions. We can formalize push with:
Stack::push (e: Element)
post: "e is the element on top and size has increased"
e = top & size = old(size) + 1
Note that we deliberately wrote this as e=top, instead of top=e. These two
mean exactly the same when specifying an interface, but the latter is too
often read as "assign to a variable which is an implemented piece of
storage named top". We use the notation old(size), to refer
to the size before the operation.
Note also that, in order to be able to refer to size+1, we require that
implementor and client have a shared notion of "+". One convenient
way of doing this is to assume that size is of type integer. This choice
does not imply an implementation choice, but a convenience chosen to ease
the descriptions of push and pop.
After the first operation in the scenario, we can be confident that top
= a; after the second top = b. At the third operation pop, the returned
value should be b. We can generalize this by saying:
Stack::pop (out e: Element)
post: "returned element was from top & stack has shrunk"
e = top & size = old(size) - 1
What should happen at the next pop? Our snapshots suggests that a should
be popped next. However, our models limit us to discussing the size and
top of the stack, and do not permit us to describe the previous a in any
sense.
In fact, our existing models and specification of push and pop permit an
implementation in which every push replaces and loses the current top, as
depicted in this figure!
Clearly, what is missing is the statement that, on a pop, the previous
top element is returned, and the second-to-top element has become the top
element. Of course, after the next pop, the third-to-top element becomes
the top. In other words, we have some notion of the sequence in which the
elements were pushed.
Parameterized Model "Queries"
We will model this in a very specific way to make a point about treating
attributes as queries, thereby naturally supporting the notion of queries
with parameters. Rather than simply postulating that there is some notion
of a List, we will simply say: there is a notion of which element was pushed
(and not yet popped) in which sequence. In order to do this, we also need
some notion of the number of elements in the stack.
We can now try to specify push:
Stack::push (e: Element)
post: e = top & size = old(size) + 1 & e = elementAt(size)
We can now try to specify pop:
Stack::pop (out e: Element)
post: top = e & size = old(size) - 1
However, we expect the notion of top to have changed after the pop. Hence,
we wish to refer to the top element before the pop. We do this with:
Stack::pop (out e: Element)
post: old(top) = e & size = old(size) - 1
The old(top) simply means the value of top before this operation.
We can either say that top = elementAt(size) is an invariant, or
explicitly state it separately in the post-conditions of every operation.
Since it should be true at all times, we stated it as an invariant in Figure
9. If we do this, we do not need to repeat it in the post-condition of push,
since top=e implies elementAt(size)=e.
Type Models
A type is a characterization of the visible behavior of some set
of objects. Any object which conforms to this characterization is called
a "member", or, loosely, "instance" of that type. A
type makes absolutely no statement about implementation, strictly avoiding
data representation, stored attributes, and method implementations.
A class can implement a type; more accurately, one or more classes
can implement one or more types. Classses introduce implementation choices,
including data members (instance variables), method bodies, and implementation
inheritance. This crucial distinction is, surprisingly, not recognized by
the more popular methodologies today.
As we have seen, a precise description of behaviors requires some conceptual
model of that component. This model formally defines what any implementation
of this component must "know" (in some form or the other), and
corresponds to a set of typed queries which are agreed to by client and
implementor, but which are necessarily directly "callable" by
the client.
The Catalysis method uses the term type model to describe models
such as those shown in Figure 9, since such a model exists to help characterize
the type of the containing component. In particular, the models do not describe
classes. Also, as we will show in a subsequent column, these models are
usually depicted visually.
Catalysis makes a clear distinction between type and class. Recent languages
like Java also make this distinction very clearly. Other languages, such
as C++ and Eiffel, tend to equate classes and types, and rely on programming
conventions such as abstract classes, to distinguish them. This can easily
lead to some unnecessary implementation decisions and coupling making their
way into client code. The Smalltalk language treats classes and quite orthogonal
to types (although many Smalltalk texts do not make this in the least bit
clear!).
A hypothetical Re-construction
If you will bear with me, I would like to walk you through a hypothetical
dialog between a developer and a client, in which a model gradually emerges.
Let us pretend that we are discussing this Stack component, and wish to
agree on precisely what it must do. You are playing the client (C), and
I the implementor/developer/analyst (I).
As a very first step, we might recognize the notion that when an item is
pushed, the stack grows.
I: "Tell me a bit about how you understand pushing onto
the stack."
C: "Well, for one thing, when I push an element, it should go into
the stack."
I: "What would that mean, for a stack which already contained some
elements?"
C: "Well, the stack should grow to accommodate the new element."
I: "Would you say you have a fairly clear notion of the size of the
stack? Could I call it 'count' instead of size?"
C: "Sure; it represents the number of elements in the stack."
I: "Is the count of the stack something which you need to access externally?"
C: "No, I will simply push and pop, and I'm not really interested in
its count. Of course, I do expect the stack to grow and shrink"
I: "OK, so we will not need count as part of the interface, but we
do need to share it in our mutual vocabularies, so we will add it to our
shared conceptual model."
At this point, we update our mutual model of what a stack is.
I: "Would that be enough?"
C: "I think so"
I: "Well, would the following implementation work?" (Figure 5)
C: "Definitely not, I expect the last element pushed in to be popped"
I: "Aha I don't think we can say that simply in terms of count it looks
like we need some notion of the last element pushed"
And so the dialog proceeds, uncovering the fact that a push grows the stack
and changes the top element, and a pop returns the top element and shrinks
the stack. But what is the top after I do a pop? Clearly, this only permits
us to describe the very first pop, and not any subsequent ones, unless I
have some notion of the second-last un-popped element, and the third-last,
and... i.e. the sequence of pushed elements.
Of course, when I ask students up front whether we needed a notion of sequence
of elements, they often feel that this is unnecessarily discussing implementation.
There is an unfortunate tendency to equate unambiguity and precision with
"implementation detail". This is particularly noticeable in some
"analysis" efforts, when teams are comfortable asking high-level
questions, and providing high-level answers with a worrisome degree of hand-waving,
but as soon as a specific issue arises which needs precision and possibly
detail, they shy away from it, claiming it is an 'implementation issue'!
Common Reactions
This development of a model for a stack draws many common reactions. One
of them is: "This is too painful. I cannot find and write down such
lists of attributes and parameterized attributes. Even if I could, I could
never communicate it with my customers." To these I say: "Patience!".
We will see soon that these models do not have to be depicted as flat, unstructured
lists of queries. In fact, most interesting models will be depicted using
very familiar visual models including associations, attributes, states,
etc. We can even use GUI sketches or prototypes to validate such models.
Moreover, the pieces of formal expressions are best used to complement a
prose description, made precise by the model.
Another very common reaction is: "Stacks are so boring. When we do
our modeling, using OMT/Booch/Fusion/etc., we simply assume that we have
primitives like stacks, lists, and the like. We are really more interested
in complex systems, and are not in building stacks or lists." Unfortunately,
if we cannot clearly state what something as simple as a Stack does, how
well are we likely to communicate more complex components? If you just look
back at the initial series of figures in this article, you will see what
I mean. If a Stack seems too academic, consider the simple Window described
in the first section. Or, look at one of the Catalysis case-studies. In practice, formalizing at least selected operations is one of the surest ways of validating the model and testing it for adequacy.
Reflect for a moment upon the implications of avoiding these issues initially.
Every one of these issues will necessarily have to be resolved sooner
or later. The more of such issues we uncover early, the more likely we are
to produce a high-level model that is, in fact, robust and capable of adequately
serving the analysis and design. The more we defer such issues, the more
likely we are to discover them after the high-level models have become entrenched
in other parts of the architecture. Due to the re-work involved, we might
well abandon the high-level models and "fix it in the code". For
any system with a significant product lifetime and a dominant maintenance
cost, this essentially means that the high-level models will be useless
except on some poorly defined "first-pass".
Another common reaction: "but I might not want clients to be able to
get at the size of a stack, or to access elements at any position".
This is absolutely correct. There is no implication that a client has access
to the 'model' elements of a component. However, both client and developer
have to work off a common conceptual model to even define what the component
must do.
Yet another common reaction is: "But, I can implement a Stack without
using a list!". To these I say: "Sure you can. However, the model
says that you cannot implement a stack without some notion of which was
the last (second-last, ) element pushed but not yet popped. For example,
you could implement it by simply throwing all elements into some un-ordered
collection as they are pushed, together with a time-stamp. To pop, you could
simply remove and return the element with the most recent time-stamp. In
doing this, you actually have a very precise notion of the last (second-last,)
element pushed: it is the element in the collection with the most (second-most,
) recent time-stamp.
Summary
We have illustrated some of the elements of justifying and interpreting
type-models, as used in the Catalysis method. The behavior required from
some component induces a type model of that component. Thus, the strong
distinction previously made between behavior-driven vs. data-driven methodologies
is not a critical factor for Catalysis. Because of this fundamental fact,
Catalysis can provide very concrete steps for building type models based
upon behaviors, and for checking them for adequacy and consistency.
Catalysis is based on 3 key principles: collaborations and mutual models;
multiple views and composition; and refinement of collaborations and models.
These principles build in a very direct way upon the foundation of type
models, and we will discuss them in detail in subsequent columns.
- Collaborations and mutual models: Every interaction implies a
mutual protocol, which necessarily imposes assumptions and guarantees on
all participants. Expressing these precisely requires an underlying shared
vocabulary. Catalysis makes this vocabulary precise by defining a mutual
model, using a modeling toolbox rich enough to permit clear specification
of responsibilities, while not imposing implementation decisions. This allows
us to develop a consistent shared vocabulary between customers and team
members, always motivated by the required behaviors of the (set of)
component(s), to uncover serious misconceptions and interface deficiencies
early through a clear understanding of assumptions, guarantees, and exceptions,
and to leverage off powerful semantics-based testing tools.
- Multiple views and view compositions: Objects participate in
many collaborations, and hence play multiple roles. Each role can be independently
modeled as a separate and simplified view, defining a specific pattern of
collaborations with a supporting model. This effectively permits each object
to be characterized as more than one type, and provides strong support for
the notions of design patterns [Patterns] in the method itself. For example,
when we apply the Observer or Decorator patterns to a particular problem
model, we are taking two separately and abstractly described collaborations
and their models, and composing them with the problem model itself by making
domain objects additionally play roles from these patterns. The notions
of "open object implementations" [Kinzales] can also be described
using multiple views.
Catalysis provides a simple yet powerful mechanism for composing
views and precisely describing the essential dependencies between them.
This allows us to achieve early reuse by composing existing collaborations
patterns, to divide and conquer complex problems more easily, and to exploit
framework-style techniques from models to implementation.
- Abstraction and Refinements: Interactions can be described at
many levels of detail. Abstract descriptions often avoid interface specifics,
work at a coarse level of time granularity, or defer "internal"
collaborations. Catalysis defines clear notions of refinement, permitting
more detailed descriptions to be built in a systematic way from abstract
ones. This allows us to maintain traceability from problem domain to code,
utilize an incremental approach to model construction, and handle development
with mixed levels of detail and completeness.
Acknowledgments
This article contains materials from course materials and tutorials by ICON
Computing, and have been reproduced by permission. It also contains excerpts
from the text book on Catalysis, which is currently in preparation.
References
"Practical Rigor and Refinement", D. D'Souza and A. Wills, in
"Fusion in the Real World", Prentice Hall, 1995, D. Coleman et
al (Eds.)
"Catalysis: Practical Rigor and Refinement", at URL:http://www.iconcomp.com