Note: You are experiencing only the raw content of this site, without the intended layout and design. Either your browser has ignored the Cascading Style Sheet (CSS) files for this site, or you are using an outdated browser which does not support Web Standards. Learn more.

Home « Blogs

EDS' Next Big Thing Blog: Read and Respond to What the EDS Fellows Say About Technology

Read and respond to what the EDS Fellows have to say about the future of technology on EDS' Next Big Thing Blog on eds.com.

Data Management for SOA

by Fred Cummins

In a Service Oriented Architecture (SOA), services are loosely coupled and are accessed across organizational boundaries. The service units-the business activities that perform the services and manage the associated capabilities-are finer grained than traditional business functions and their supporting enterprise applications. The decoupling of finer-grained capabilities is key to enterprise agility and economies of scale. The implementations of service units both outside and within the enterprise should be independent of other service units so that they can be independently optimized and adapted to new requirements with minimal impact on their users. However, this decoupling and autonomy conflicts with the use of shared databases.

Those who focus on data management have, for decades, driven the industry toward consolidation of databases under a philosophy that tighter coupling means greater efficiency and consistency. Putting all the data in one place eliminates duplication and synchronization problems. The data gurus are struggling with reconciling data management with the loose coupling of SOA.

Jill Dyche asserts that "SOA Starts with Data". She advocates creating data services-creating data hubs as services that manage and provide access to master data. Starting with data services has an appeal to IT organizations that feel the need to adopt SOA.

However, if the data services duplicate data that exists in the operational service units, the problems of latency, inconsistencies and synchronization are not eliminated, and accountability for the integrity may be fragmented among multiple business organizations.  If the data services are expected to provide shared data storage for future services, then this will raise concerns about performance of the services that use that data and it undermines the autonomy of the service units. Finally, the creation of these data services achieves minimal if any business benefit.

Dan Gardner in "SOA and compute clouds point to rethinking data entirely: roles and permissions, not rows and tables" observes that much of an enterprise's data is no longer controlled by the IT organization and exists in many forms on PCs, in PDAs, in stakeholder systems and various services on the internet. With SOA and cloud computing, the data stores may be scattered over multiple, distributed computers.

Of course there has always been data outside the confines of the IT systems, but now the volume of data is exploding and connectivity and the Internet has made more data accessible from diverse sources.

The mass of uncontrolled data outside the control of the enterprise is not a SOA issue.  These data should be viewed as a source of insights about the ecosystem, market trends and opinions that may affect the enterprise. These sources must be selected and filtered to obtain meaningful results, but they can't be controlled any more than they ever were.

The potential for exposure of proprietary or confidential data is a security risk but it's not a fundamental change requiring rethinking of data management. The mechanisms and models for management of access control do need some rethinking to deal with the multitude of system and user interactions both within the enterprise and with external stakeholders. The consequential risks are increased by the internet and portability of mass storage media.

The data that must remain the primary focus of attention for SOA are the data produced, consumed and managed by business systems that represent the past, present or future state of the enterprise. From a business perspective, the concerns are not a matter of distributed storage but how the data are validated, managed and protected.

Steve Karlovitz, proposes development of a data service layer in "SOAs and Data Management: Understanding the Data Service Layer." It is not clear from his blog how he defines the Data Service Layer that he characterizes as "a single entry point" and "centralized." I see three different interpretations: (1) the data service layer is a data access facility that supports database access by all applications using a canonical view of a shared database similar to a object-relational transformation facility, (2) data from heterogeneous application databases is replicated and integrated in an enterprise database with a canonical data schema, (3) access to heterogeneous databases is provided through requests expressed as queries on a canonical, virtual database.

The first approach is the traditional shared database that includes data edits and access controls. While isolation of the physical data structures from the application views is helpful, it raises concerns similar to those for Jill Dyche's data services. In addition, many services will continue to use legacy or COTS systems that incorporate their own databases. Heterogeneity of service unit implementation technologies is fundamental to SOA agility. It enables localized service adaptation and adoption of new technologies.

Replication of data in a shared database (option 2) is useful for providing an enterprise view of the state of the enterprise. This is essentially an operational data store or reporting database. Inconsistencies can be reconciled in the loading process. However, there will be delays in the updates from various sources, so achieving a fully consistent view may still be difficult. This replicated data should be used only for queries-it would be very difficult to manage updates.  The master data, "the single version of the truth" is still in the source databases and must be controlled by their owners.

The third approach is the EII (Enterprise Information Integration) solution. A canonical, enterprise data model defines a virtual database that is the target for queries. The queries and responses are translated to obtain an integrated result from heterogeneous databases. EII did not gain much market acceptance when it was introduced several years ago, but with SOA, its time has come. While some EII tools support updates to the heterogeneous databases, updates should still be controlled by the service units that own those databases.

So the Data Services Layer (assuming approach 2 or 3) provides a solution for an enterprise view, but it does not provide a solution for management of the data that is shared by multiple service units.

In "The Case for Enterprise Data Services in SOA" Jeff Pollock also defines a layered data services approach, but he is particularly concerned that web services technology (incorporating XML, WSDL, etc.) has too much overhead for large volume data transfers. This is true, but SOA does not demand the universal use of web services technology. Conventional techniques for Extract, Transform and Load (ETL) are still appropriate for bulk data transfers.

Many of these concerns about data management arise as a result of viewing SOA as a technology instead of business architecture. A service is provided by an organization-a service unit-that includes not only one or more applications and databases, but people, intellectual property, and other facilities and resources that are necessary to produce business value-the product of the service. The value of SOA comes from the ability to integrate these service unit capabilities in multiple business contexts, and the ability to optimize and adapt them with minimal impact on their users.

Ideally, each service unit has its own database that defines the state of its operation and supports its activities. This will result in some of the same data occurring in the databases of multiple service units. This may be resolved (1) by exchanging updates or (2) by consolidation.

There are business trade-offs to be considered. For each data element there must be one master source, one service unit that is responsible for the integrity of that data element. Of particular concern is accountability for critical business records. Most often, the responsible service unit is the service unit that creates the data or does the most updates. For example, customer records are typically captured in association with order entry because that is where most updates originate. Updates from other service units must be validated and controlled by the service unit responsible for the master data. The data could be stored and controlled in a separate data service, but that just means there is one more database to synchronize.

On the other hand, consolidation of databases is a trade-off between flexibility and performance. Changes to the database schema must be coordinated among all participating service units. This calls for those service units to be closely affiliated organizationally for balancing of concerns. Organizational affiliation can bring further constraints on autonomy and thus agility and optimization of the service units with issues such as priorities and funding of changes.

Data is but one resource managed by a service unit. There are other resources such as people, machines and materials that are managed and exchanged by service units-some resources may be shared. These resources are not exchanged using XML; they are exchanged through mechanisms appropriate to the nature of the resource and the time, cost and distance for transfer. Different protocols may be employed for exchange of data. What is important is that the data exchanged must be consistent with a shared logical data model and in a form compatible to both the sender and receiver.

SOA as a business architecture is similar to traditional business architectures except the communications are faster and the service units are smaller providing greater efficiency and agility. Traditionally, service requests (orders) were communicated on paper, each department had its own files and tracked its work. SOA is an approach to optimization of the design of the enterprise leveraging new technology.

Data management for SOA should be approached as requiring an enterprise logical data model, mechanisms for federation and sharing of data among relatively autonomous service units, and a data management plan that defines responsibilities, flows, master data stores, latency of updates, synchronization strategies and accountability for data integrity and protection. This plan must align with the organizational responsibilities of service units and their data needs, and it must ultimately support an integrated representation of the state of the enterprise-history, current state and future plans.

Published Monday, June 16, 2008 2:13 PM

Subscribe to this post's comments using RSS

Comments

# Posted by Bill Miller Wednesday, June 25, 2008 11:19 PM

Fred,

Some good points here.  XAware has been building and distributing composite data services design and middleware runtime for some time now.  We have been involved with quite a few SOA projects that have dealt with these issues.  The design approach XAware adopted enables services that provide virtual "views" of underlying data.  The underlying data architecture continues to be managed appropriately.  The contract first design approach allows business process and applications developers to provide an xml model of the data as they want to deal with it, and the data managers to map to that contract so as to publish services that maintain the data architecture.  Seems to be working for some large enterprise SOA implementations. Once that allows us to talk about their implementation is Synovus (a $35Billion regional bank holding company).  They have been very effective at speeding up application development and deployment while maintain rigorous control over the data architecture using the composite data services approach I have described.  More about this in various blogs posted on the project site at xaware.org    Bill Miller, XAware

# Posted by Fred Cummins Tuesday, July 01, 2008 12:57 PM

Bill,

The data abstraction and integration of data from multiple sources is a fundamental requirement for SOA.  However, it seems that defining a set of XML schema requires that all potential needs for the data must be anticipated.  Can this be extended to providing generalized query capabilities?

Fred

# Posted by Bill Miller Thursday, July 03, 2008 11:12 PM

It isn't in practice necessary to define a schema that anticipates all potential data needs.  The typical approach is to modify or extend exiting schemas and data services as needs evolve, or create new schemas and data services as new requirements emerge, often derivatives or mashups of exiting ones.  Proliferation of derivative services isn't desirable, but evolution of the set of services to meet new and changing needs is.  XAware provides a number of capabilities that support this approach.   The customer I mentioned, Synovus, uses a strategy of service-enabling their data, and we recommend this same strategy to all our users moving towards SOA.  This means that if an application needs data, it accesses that data through a service.  If an application is the database of record, it must expose that data as a service.  This scenario may not cover ad-hoc reporting or data mining  where generalized queries are used.   I have had some difficulty with the idea that generalized queries fit with the SOA concept of reusable services.  It seems to me that generalized queries, whether or not the query is delivered or results returned through a service, are either ad-hoc and not reusable or, if frequently reused must be predictable enough to be defined rather than generalized.

Bill

# Posted by Fred Cummins Monday, July 07, 2008 6:20 PM

Bill,

I'm concerned that for strategic planning and decision-making, disruptive change may require data accesses that have not been anticipated.  They may be used only once, but they may be crucial to the business.

# Posted by Bill Miller Tuesday, July 08, 2008 9:18 PM

Fred,

I agree that for strategic planning and decision making the ability to ask any ad-hoc question and the means to query the available data for an answer is necessary. I am not arguing against the need for generalized queries for that use case and some other cases. But I don't see those cases as consistent with SOA, nor do I think that generalized query support of all data be a design consideration for SOA.  The specialized use cases that require generalized queries are best supported by a separate BI design, most likely incorporating data warehouse.  

Bill

# Posted by Fred Cummins Wednesday, July 09, 2008 3:17 PM

Bill,

I view SOA as a business design paradigm.  In this paradigm, the business is composed of services supporting both internal and external customers, so all data is managed by services.  SOA technology supports the integration of services, both internal and external.  

BI should also be a service that draws on the data being managed by other services.  It should ultimately provide a window (or windows) on the enterprise ecosystem (internal and external).

Post a New Comment

: required  
required  
optional
required  
Please only click Submit once.

Subscribe to EDS RSS Feeds

I would like to receive the EDS Newsletter