Saturday, 21 August 2010

Is SOA of interest to a Data Architect?

This question came up recently on one of the forums I subscribe to and piqued my interest.

Whether Service Oriented Architecture (SOA) is specifically of interest to a Data Architect depends on what type of Data Architect you aspire to be.

There are at least dozen easily identifiable variations that all revolve around the definition and processing of data but require different skill-sets. However the main differentiating factor, for me, is whether the Data Architect is primarily interested in producing architectures for persistent data storage or transient data processing.

I do data processing but I know quite a few Data Architects that focus solely on designing and building databases and data centres for persistent storage but with little interest in what the data is or what it’s used for.

For this type of Data Architect it's all about infrastructure and technology , i.e. they are mainly interested in technical issues such as availability; backup & recovery; security; hardware performance and a bucket load of similar things to do with infrastructure and focussing on the stuff that is physically deployed to support the data repository.


For a Persistent Storage Data Architect I'd say that SOA, like pretty much any other secondary architectural framework, will add little benefit other than intellectual interest.

On the other hand, assuming it’s data processing type of Data Architecture that’s of interest, then it's important to establish the type of data processing environment that you're going to focus on e.g.:

  • What type of activity profile is required? Is it predominantly:

    • On-Line Transaction Processing (OLTP), typically characterised by high-volume short duration activities, such as a retail operation or financial markets trading.

    • Decision Support (DSS or BI) focussing on analysis and reporting against historic information.

    • Data Warehousing focussing on the long-term archiving of masses of company information for posterity.
  • What type of process distribution is being supported i.e. is it:
    • A "Centralised" environment where each business activity is executed against a single centralised data repository.

    • A "Federated" environment where a single Business Activity potentially draws information from multiple physically separated sources and merges them together in order to carry out the activity.

    • A "Distributed" environment where the multiple implementations of the same Business Activity may be executed in a number of different venues based on some external constraint such as geographic location or line of business.

Although these different data processing environments may all be implemented using the same technology platform (e.g. Oracle or SQL Server could be used as the DBMS or Java as the programming language in all cases) they have different non-functional profiles.

For example, OLTP is "high volume / short transactions" but DSS is "low volume / long transaction" processing or Federated data requires two-phase commit but Centralised data doesn't.

Now if we look at the underlying principles of Service Orientation then I'd say that SOA really only lends itself to OLTP and Federated / Distributed data processing environments (which is where all the fun really is) and, if that is the type of data processing environment you're trying to design then an understanding of Service-Orientation (if not SOA itself) is essential.

For SOA the key principles that are very, very important include:

  • Location Independent - i.e. the relationship between client and service are irrelevant),
  • Stateless - the Service Provider has no memory of any previous invocations of the Service and once completed (either successfully or otherwise) the previous state of the data environment is discarded. In other words all Services are single-phase commit and do not wait for separate acknowledgement from the Service User.
  • Idempotent - a Service can be called repeatedly without side effect for any previous or subsequent invocation.
  • Loosely coupled - the Service User only uses the parts of a Service that it needs and ignores everything else.
  • Dynamic - the characteristics of the Service being called can change between service calls and it is the Service Users responsibility to maintain compatibility with the Service Provider.
There’s a lot that can be learnt by understanding those principles irrespective of whether you’re ever going to operate in a SOA environment.

Of course, if you’ve done any serious structured systems analysis (e.g. SSADM, OOA&D etc) then most Data Architects will probably have all the basic knowledge required to get by in a Service Oriented environment without too much effort because the art of Analysis & Design doesn’t significantly change no matter what is being deployed.

Wednesday, 11 August 2010

"The Earth Is Flat" and "The Meanings of Meaning"

To a great extent Enterprise Data Architecture is a philosophical activity in that much effort is spent in defining terms and concepts and discussing what they mean rather than just engaging in physical activity.

No matter how much we try personal preferences experience will always creep into any proposed architecture or data model along with undocumented assumptions that "go without saying because everyone accepts them" and ill-defined terms whose "meaning is already known to everyone".

Personal preference based on past experience cannot be avoided (nor should it!) because past experience is after all why we are paid to do the job and any underlying assumptions, which are generally a direct result of experience, can only be documented as they are discovered through conversation with others.

It is the last point of "meaning is already known to everyone" that is the tricky bit as it involves understanding the importance of "The Meanings of Meaning" which is one of most important concepts in understanding and documenting knowledge as it underpins much of the science of semantic analysis.

The "Meanings of Meaning" states that every communication between two parties has three distinct meanings:

  1. The meaning that the party making the statement had in mind when they made the statement which is distorted by their own imperfect knowledge of the domain being discussed - the "Writers Meaning".
  2. The meaning that the party reading the statement applies to it based on their equally imperfect but different knowledge of the domain - the "Readers Meaning".
  3. What the statement actually means based on perfect knowledge of the domain and the context in which it is used - the "Contextual Meaning".

These three meanings are often different but most people assume that they are the same.

We frequently read articles and documentation that that use well known terms without actually providing any definition of what the term means in the context of the article. It's generally assumed by the writer that the reader will know what is meant by a particular term and that the Reader's Meaning will be the same as the Writer's Meaning.

In general conversation it can be even more confusing. I could, for example make a statement such as "The Earth Is Flat" which may be true or false depending on the context in which it is made.

If I was talking about the planet in general then most people (but not all!) would regard this as incorrect (which might be the politest of responses) because the planet Earth is provably spherical.

However if a was a farmer in Kansas (US) or Norfolk (UK) looking out over a newly ploughed field then the earth (the soil) might indeed look flat as far as I can see. Another farmer standing next to me might easily agree that the earth was indeed flat and even congratulate me on a job well done.

So, to fully understanding the meaning of the statement "The Earth Is Flat" I have to know the context in which it was made, what was being discussed and who was part of the discussion.

In fact, when it comes to understanding the Meanings of Meaning context is everything and something that a Data Architect needs to be very aware of in order to avoid the misunderstandings that result from applying ones own meaning to a statement and failing to explore what the actual meaning was.

If we put our minds to it we can all probably come up with hundreds of examples of misunderstandings that occurred because two different people used the same label (Noun) to mean different things e.g. when Sales and Finance talk about a Customer they are almost certainly talking about completely different things even though the Sales type of Customer might eventually become the Finance type of Customer.

Most experienced Business Analysts and Data Modellers inherently know this principle but rarely state it - it's an undocumented assumption - but is sometimes worth mentioning to the business stakeholders as an explanation as to why we keep asking them tedious and highly detailed questions about trivial things they have told us.

Tuesday, 10 August 2010

Principles, Patterns, Policies & Processes

A question that I am regularly asked is "What is the difference between Architecture and Design?" which is one of those imponderable questions that is often the subject of great debate within the Information Technology community (though mostly down the pub).

Usually this ends up as a compare and contrast type of debate with conjectures such as:
  • Architecture is about "What" and Design is about "How"
  • Architecture is Objective and Design is Subjective.
  • Architecture is Conceptual and Design is Physical
  • Architecture is about Requirements and Design is about Solutions.
  • Architecture is Platform Independent and Design is Platform Specific
  • ...add for personal favourite
However the answer I'm most often drawn to is that Architecture is about establishing the framework that analysis, design and implementation must conform to. That is, it is about establishing the Patterns; Principles; Policies; & Processes that will be used to govern the Application Development.

  • Architectural Patterns, as per their close sibling the Design Patterns, is any type of model, template, or artifact that describes a generic problem and the potential characteristics of any solution. The intent of a pattern is simply to capture the essence of a generic solution in a form that can be easily communicated to those who need the knowledge.

  • The Patterns are both abstract and concrete at the same time. They are Abstract in the sense that they describe a generic solution to a generic problem in generic terms but Concrete in that they can be directly applied to a particular instance of a problem to construct a definite solution.

    They are also reusable and can be applied many times to other problems of a similar nature where, even if not directly applicable a Pattern can still indicate the general shape of another derived pattern e.g. the “Powered Vehicle” indicates the general structure of a Car, Lorry, Aeroplane, Train etc but may need extending or modifying to deal with the specifics of each.

  • Architectural Principles define key characteristics for the design and deployment of the resulting systems.
  • Architectural Principles are usually captured in short succinct statements such as "All persistent data will reside in a designated Database of Record" or "All end-user data maintenance will be supported through a Single Point of Update", with the minimum elaboration necessary to explain what the Architectural Principle means in practical terms.
  • Policies are the non-negotiable constraints that the business itself will apply to any systems or applications that need to be developed. They are really constraints on the Architect (who may not get any input into defining them) as well as the Development team.
          These will cover things like technology decisions, programming languages, application availability requirements, disaster recovery, testing requirements and anything else that the business decides is an operational requirement
        • Processes cover the mechanisms that will be used to govern the Development with respect to the Architecture including activities such as Design Reviews, Change Management, Divergence & Convergence and so on.
          The Governance Processes are most likely to be tailored specifically to the organisation according to the development environment and how it like to manage things. However it is important to emphasise that the the Development activity should conform to the Architectural requirements NOT the other way round.

        When brought together into a single coherent Architectural Framework the 4P's provide the tools necessary to ensure that any implemented systems meet desired "quality" criteria without, hopefully, being too prescriptive in how the target systems will actually be designed and built.

        Given that the production of Architecture is really just focused on the production of a framework, the main involvement with Systems Development is in governance and, because we all like a diagram whenever possible, fits together like this:

        An important point, at a bit of a tangent, is that Architecture is not about Task & Resource Management which is a function of Project / Program Management and, given that there a quite a number of well established methodologies for managing that activity, is not something that I think an architecture needs to consider.

        So there we have it - the above is what I see as Architecture and everything else, including physical Design, is part of Development.

        That doesn't mean to say the one person cannot carry out both Architecture and Design roles but it's nice sometimes to define where the boundary of responsibility lies.


        Monday, 9 August 2010

        Using a Fact Model to verify a Data Model

        One of the main problems with verifying that a structured data model correctly defines the facts that have been provided by the Subject Matter Experts (SME) about the business is that frequently the modelling notation, whether UML, ERD, ORM or XSD, may be unfamiliar to the SME so they are unsure about what exactly is being said. Even when the basic concepts of a data model are explained there is still concern that the SME might misinterpret the model and sign-off something that is actually incorrect.

        Often these mistakes will result in additional downstream cost once the mistake is discovered and remedial work is carried out. To avoid ambiguity or misunderstandings it can be useful to use a Fact Model to express the facts in "natural language" so that they can be read as a set of individual statements that are more easily comprehended and individually verified by the SME.

        For example, take the following model fragment (in the UML notation):

        An experienced data modeller or software developer would quickly identify the following facts (amongst others):
        • A Consolidated Account is a type of Sales Account.

        • A Customer Account is a type of Sales Account.

        • A Consolidated Account aggregates one or more Sales Accounts.

        • A Sales Account must have a Registered Address which is a valid Address.

        • A Sales Account must have an Account Number which is a valid Account Number.

        • A Sales Account may have a Credit Limit …

        • A Sales Invoice must be Posted To one and only one Customer Account.

        • ...and so on
        As a set of definitive atomic statements I think it is much easier for the SME to review each of these facts and confirm whether it is correct or not and, if it isn't correct, amend it.

        Facts can also be used to explain or elaborate less obvious features of a model, such as rules captured as constraints on a Class, Attribute or Relationship. For example, all of these facts are in the model but not part of the diagrammatic representation:
        • An Address must have a Building Number or Building Name.

        • An Account Number is a numeric string of exactly six digits

        • A Customer Name is an alphanumeric string up to 40 characters in length.

        • ...and so on
        OK it can be a bit tedious pouring over 1000's of statements but with a complex data model there are a lot of benefits to breaking down the model this way when it needs to be verified with the business stakeholders, some of which are:
        • Each fact is expressed atomically, i.e. cannot be reduced to even simpler facts, so reviewed independently of any other fact.

        • Being a "natural language" the reader has better comprehension of what each fact is actually stating.

        • The Fact Model can be distributed as a text document to multiple recipients who can read and amend it without requiring access to the modelling software used to create the original model.
        Not so long ago producing a Fact Model was a standard feature of most data modelling software (and is still a part of some such as Oracle Designer) but seems to have fallen out of fashion with the newer modelling languages such as UML and XSD.

        However, if there is a lot of formal data modelling being performed then a Fact Model can be very useful and well worth investing the time to create an appropriate language (or using one of the established languages such as NORML or SBVR) and reporting capability.