Implementing an SOA Reference Architecture

A reference architecture try to capture best practices and common knowledge for a particular subject area or a business vertical. Although you can define a reference architecture at any level, for the purpose of this blog we’ll be talking about an SOA reference architecture. When tasked with implementing an SOA architecture, blindly folloing a reference architecture might not give optimal results. If business requirements are not considered sometimes it will be not the right fit for the issues at hand.

When an SOA architecture is going to be implemented, close attention should be given to business requirements and any other technological constraints. Having to work with a specific ERP system, having to work with a legacy datastore which otherwise, too expensive to replace with a whole new system and so on. Based on these facts a solution architecture should be developed that’s aligned with the business requirements.

Looking at the solution architecture a toolset should be evaluated that maximize ROI and possible future enhancements that might come in the next 1 - 2 years. Evaluating an existing architecture every 1 - 2 years and making small refinements saves the time and effort for doing big bang replacements and modifications for critical components. While selecting a toolset having a complete unified platform helps to build the bits you need right away and still have room for additions later. If you’re looking for a complete platform you probably want to consider WSO2 middleware platform provide a complete open source solution.

I’m biased for open source solutions and having a platform that makes connected business possible is mighty cool.

Twitter Scale

High scalability has a great blog about architecture of Twitter. How Twitter grew up from just a web application to handle thousands of tweets and the internal architecture of components they have. Last week I wrote a blog about how a simple web application might evolve according to different requirements with time and how it will save you a lot of time to come up with a loosely coupled architecture between your componets and how it would help immensely when it comes to handle difficult problems. Above article is a wonderful real life example of how Twitter evolved to handle massive number of users and tweets. Twitter is all about posting messages that are only 140 characters long. How hard can it be to write a webpage to handle that?! Great example how simple things can yield to a complex architecture on the backend to give a responsive user experience.

The article has a lot of details about problems specific to Twitter and tweets. Couple of big picture points I want to highlight are (straight from High Scalability article),

  • Twitter no longer wants to be a web app. Twitter wants to be a set of APIs that power mobile clients worldwide, acting as one of the largest real-time event busses on the planet.
  • Internal clients use roughly the same API as external clients.
  • 1+ millions apps are registered against 3rd party APIs
  • Tweets are forked off in many different ways, mostly to decouple teams from each other. The search, push, interest email, and home timeline teams can work independently of each other.
  • For performance reasons the system has been being decoupled.

Read the original article.

Understanding an Enterprise Service Bus (ESB)

This blog tries to give you an understanding of what an Enterprise Service Bus is and what are the typical functionalities of an ESB. What scenarios that it can be utilized, for what purpose, and how it will help you to solve similar problems you’re having in a consistent, standardized way. Also incorporating a wealth of knowledge from real life production deployments. That’s important. If you have something which can solve world hunger or claim to be the world’s best at something and if it’s not used in real life production scenarios, it’s good as dead. Those stuff tend to make great folk stories, like bed time stories that you tell your geek younglings of how, once upon a time this great and complex thing was built over a weekend with some pizza and mountain dew ;-)


Let’s forget about ES part in ESB and concentrate about the B. The bus. This is not something new. You probably have learned about busses when you read about PC architecture. If you look at how data is being transferred from CPU to memory and also how data is received from keyboard, mouse and other peripheral devices plugged into a PC they all use a bus. Why? Main reason being, it eleminate point to point links between those systems. In todays world, where there are thousands of different peripherals that can be plugged into a computer, imagine the mess it would create if all of them need point-to-point links to the CPU, memory etc… So you have a single bus that everything connects to. That will take care of transferring information back and forth between connected peripherals.

Integration problems

Companies tend to use a lot of software for their day to day activities. This will only increase with time, not decrease. Also, things that were done using manual labour or hardware also tend to be converted to software as this excellent blog explains. Using software is great you can increase the efficiency of business operations. Even this is true, there are instances where you want the data in one system to be fed into or do some form of consolidation to have a unified overview of what’s going on. Otherwise there’s little to no advantage of having disconnected systems. Then you have to some kind of manual intervention for keeping data in sync among all these different systems.

Collection of services

Because of sheer tediousness all these enterprise software systems started exposing their functionalities as web services. Now you have easier ways of connecting one system to the other. Importing data and exporting to another system is just a matter of writing some glue code to call several web services. Just like that your data synchornization issues between two systems can be solved.

When the number of systems gets increased then you have to maintain point to point links with different glue code that connects these systems. Again, it gets error prone and tedious when you have more than a handful of systems in place. It’s not going to be a scalable solution to the problem.

As you can see from the above image, it gets very complex and hard to maintain when the number of systems are increased. Add to the complexity, each and every system or the service that’s exposed might be operating on their own message format. Which you have to map from one message format to the other through your glue code when connecting different systems.

This is where a bus architecture is useful again. A service bus. Since it’s connecting all enterprise systems, enterprise service bus. Although I have no idea how the name came probably something along those lines :-)

As you can see, this simplifies the process a LOT. Now you can move all the “glue code” logic you had to the ESB.

Functions of an ESB

When you’re integrating different systems with some glue code that should have certain functions which can be different from one system to another, you need to take all those into account when you’re selecting an ESB. An ESB should allow you to do all those things. Let’s see what they are.

Expose services

An ESB as the name says, a service bus. It should have the ability to expose services. For example, taking the above diagram as an example, it should connect the CRM and expose CRM’s connectivity as a service to other systems. CRMService may be. That service we call it as a proxy service because it proxies request for the actual CRM service.

Message transformation

Since the message format one system accepts can be different from another, an ESB should allow you to transform one message format to another. This can typically be transforming from one XML format to another XML format. XML -> JSON, JSON -> XML, Binary -> XML and so on.

Protocol transformation

There are instances where systems expose their service through different protocols. HTTP, HTTPS, JMS, FIX, FTP, SFTP, WebDAV etc… So an ESB should support accepting messages from one protocal and sending it in a different protocol.

Routing messages

This is another commonly used function of an ESB. You receive a message and based on certain values in the message you want to route the message to different systems/services. So ESB should allow you to traverse through the content of the message and filter on any attribute that’s there in message content.

Message cloning/splitting/aggregation

Another useful functionality is being able to clone an incoming message and send it to one, two or several services that accept the same message format, all at the same time. Also, splitting a message before sending and aggregating messages that comes from different services.

There are many such forms of communications and different ways of processing messages. Based on this knowledge of systems and different ways of connecting and message processing, you can identify certain patterns from these integrations.

Enterprise integration patterns

This fortunately has been documented in the excellent Enterprise Integration Patterns book. The book has a pattern catalog that’s has been developed or extracted from by looking at different real life integration scenarios in the industry. Now an ESB should be able to support or fascilitate implementing these patterns. When you see how different patterns can be implemented through an ESB with the configuration, it becomes easy to understand how the pattern is to be implemented as well as if you need alterations to match it to your specific use case, then you can do so very easily. Here’s all the configurations how an ESB can be use to implement these enterpise integration patterns.

What Is Platform as a Service?

Tech industry is filled with acronyms. People build new acronyms and buzzwords all the time which contributes to this confusion. One such acronym is PaaS - Platform as a Service. If you ask 5 people what does PaaS mean, you probably will hear 5 different stories. All of them would be right! So what is PaaS?

To answer that we need to ask what a platform is. as-a-Service part is easy. You give something as a service. There’s no downloads involved, it’s hosted somewhere on the internet and is accessible through a browser or some other tool that will know how to communicate with with a service that’s hosted on the Internet. So what is a platform? This can mean many things and that’s where the confusion lies. So platform can mean,

  1. An operating system
  2. A programming language and associated libraries/frameworks
  3. A suite of products
  4. An application container (e.g.: application servers)

There are companies and products out there which provide a “PaaS” at all these different levels. Also they refer to them as PaaS providers or companies that enable to you use a platform as a service. Which is not wrong considering different meanings for the word platform. Gartner, your friendly neightborhood researh company has tried to defined these terms and some additional terms to clear out this confusion. Gartner difines PaaS as,

A platform as a service (PaaS) offering, usually depicted in all-cloud
diagrams between the SaaS layer above it and the IaaS layer below, is
a broad collection of application infrastructure (middleware) services
(including application platform, integration, business process management
and database services). However, the hype surrounding the PaaS concept
is focused mainly on application PaaS (aPaaS) as the representative of
the whole category.

This clearly state a PaaS is about an entire middleware platform. Not about any specific application server or a programming language/framework. Also, Gartner has introduced some more acronyms for clarifying this confusion. aPaaS and iPaaS.

Gartner definition for aPaaS is,

Application platform as a service (aPaaS) is a cloud service that offers
development and deployment environments for application services.

That covers offering an application server as a service giving users to develop and deploy on top of that.

Gartner definition for iPaaS is,

Integration Platform as a Service (iPaaS) is a suite of cloud services
enabling development, execution and governance of integration flows
connecting any combination of on premises and cloud-based processes,
services, applications and data within individual or across multiple

Even though the Wikipedia page for Google App Engine and the intro document on Google help site mention it’s a PaaS, Google App Engine is not a PaaS. It’s an aPaaS. To be a PaaS, according to Gartner definition above it has to have a set of middleware services. Like for example Stratos.

AppFactory Picks Up Where SourceForge Left Off

SourceForge as the title says is a website for finding, creating and publishing open source software for free. Some very popular projects are still hosted there. If you’re doing a technology related job chances are you probably have come across this website more than once. When you create a project in sourceforge you get all the infrastructure you need for the project. A source code repository, support ticket system for tracking/reporting issues, a forum like discussion medium, user reviews, distribution system for releases, track user downloads (generate graphs for each version of a project release) and so on. This is all very useful. There are many such systems out there that allows you to create projects, host them and distribute releases. Google Code, Launchpad, Github, CodePlex are some of them. This seems like a good system to have if you’re a softawer development shop. If you have various projects going on this provide an easier way to get builds for QA, and a feedback system that the QA team use to report bugs and so on. There are open source projects that you can download and install to get a SourceForge like system for yourself and your fellow developers. If you develop a lot of internal applications that’s used inside an organization this is immensely helpful for that too.

So that’s mainly about application development aspects. Where your infrastructure is hosted, issue trackers are configured, what releases have been done etc… At this stage you would probably have configured automated build tools too to run continuous builds from the source. Then there’s the other side of application runtime. Application runtime usually will involve having multiple environments for staging, QA, and production. In a given time an app can be in any of those stages. There was little to no software that will allow you to see into what’s going on in this runtime space. Certainly no open source ones that I was aware of.

Until now.

This is one aspect that AppFactory is trying to fill. Each of those environments you have can be configured as separate PaaS deployments. So you’re having your staging PaaS, QA PaaS and your production PaaS. Entire application lifecycle can be managed through a web based portal. Deploying from your staging environment to the QA environment and subsequently into production can all be managed through a web based interface. This follows a check listed approach where you can “tick off” items that’s necessary to carry out before moving from one environment to the other. If the criteria is not met then demotion is also possible. Further, AppFactory includes having an issue tracker, source repository, automated builds, managing application versions, place to create resources that will be used in your application like DBs, APIs etc… So it helps at the application development stage too. Giving visibility into what’s going on right now, what project is at which stage, what are the products we have now in production and which versions are all business critical information to have through a web based dashboard.

Samisa has written a nice blog on how AppFactory revolutionize application development. Also this mindmap about AppFactory puts it into the broader context of what it is and what are the problems it tries to solve.