Governance in the cloud

One of the IT department’s less official roles has been as a gatekeeper to an organisations infrastructure. The cost and time to market constraints that are sometimes imposed by internal IT can lead to applications being cancelled and even to not being proposed in the first place. By allowing the business to side-step the IT department though, cloud computing enables departments and individuals within organisations to get new applications up and running quickly and with investment largely focussed on development.

Where internal IT is imposing unreasonable delays and costs, this is going to be great for businesses. There are some major caveats to add though. In particular, a lot of the governance and ‘red-tape’ that internal IT seems to impose is actually about protecting an organisation’s data. By checking that things like backup and recovery have been considered and planned for, IT ensures that an organisation’s data, reputation and ultimately it’s business are protected. Where those checks are bypassed, it is fair to expect that the ‘boring’ aspects of application development and deployment will not get the attention the really require. The litany of data loss horror stories never seems to abate. Cloud computing service providers may provide the tools to implement effective backups, but that won’t guarantee that developers will use them.

To be clear, the threat here is not that organisations will use cloud computing, which will be a great addition to the IT tool-box. The threat is the same as that posed by applications running on servers sitting under people’s desks; It is the same thread as that posed by data that leaks on portable drives; The threat is that broken governance can lead to no governance and that organisations will be compromised as a result.

The solution is for internal IT and their management to build cloud computing into their governance and release management models. In much the same way as for suppliers of physical infrastructure, organisations need to choose their suppliers and build standards for development and deployment . By doing this, they can ensure that all applications, whether hosted internally or in the cloud are checked to ensure compliance with data protection, availability and security requirements.

There’s something else to say here though and that’s to remember quite how much due diligence vendors of physical infrastructure are put through before purchase decisions are made. Ultimately, even an SLA isn’t really enough unless you are convinced that the organisation to which you are trusting your data is able to follow through on their promises. I wonder what the cloud services RFP equivalent of a double disk pull will be?

(Re)fragmenting the IT department

When I started in IT in the mid-1990s, many medium and even large organisations had highly fragmented IT delivery functions. At Ernst and Young in 1994, I worked in a small team delivering IT to the Yorkshire office in the North of England. At the start of the year, we were largely autonomous and able to deliver new services and applications quickly and with only local change control. By the end of the year, we were (along with everyone else in E&Y IT) being outsourced to Sema and amalgamated into a single IT department.

Likewise at the BBC, I started out in the IT team of the ‘Youth and Entertainment Features’ department in BBC Manchester (one of three support and delivery organisations in one building!). By the time I left Auntie in 2004, I had been through three sets of organisational consolidation before finally being outsourced (again!) to Siemens.

The last twenty years has seen a steady process of consolidation of IT delivery in organisations. The relentless trend has been for the centralisation of development, infrastructure delivery and support into the corporate IT department. Outside of business units with very high margins and esoteric IT needs, it has become increasingly difficult for business units to develop and deploy applications without the cooperation of the corporate IT department.

I wonder though, whether cloud service providers open up the risk (or is it an opportunity?) that business units will once again be able to develop, deploy and support new applications, independently of the corporate IT department. Where once, deploying an un-authorised app. would mean running servers under desks or stacking Lacie drives off the back of a desktop to create a private file server; business units can now employ any one of thousands of boutique consultancies or developers to knock up the apps of their dreams using the cloud to obviate the need to involve corporate IT.

Of course, corporate security and finance policy may well stand in the way but history suggests that these won’t be too great an impediment. Once more than a few apps have been deployed, we might see the re-growth of the parallel support organisations that the corporate IT department thought they’d seen the back of (or more likely – absorbed) over the last twenty years.

If that happens, there are all sorts of consequences, many of them nasty. Business units and even businesses as a whole might be willing to pay them to gain agility and bypass what many see (if unfairly) as bureaucratic impediments to business thrown up by corporate IT.

The answer for corporate IT is to make sure that it is nearly or as easy to develop and deploy new applications through them as it is through the cloud suppliers. Maybe that means using the public cloud as a support for internal systems or maybe it means developing private clouds (though some are already pouring cold water on that idea). Either way (or some other way…), it’s an interesting time to be in IT.

Building resilience into applications

Storagebod has written that:

“[When new applications are deployed,]often the first contact that the infrastructure team will have is when an application is delivered to be integrated into the infrastructure and they try to get the application to meet it’s NFRs, SLAs etc.

[…..]

Turning to the infrastructure to fix application problems, design flaws and oversights [in application design] should become the back-stop; yes, we will still use infrastructure to fix many problems but less often and with a greater understanding of why and what the implications are.”

I agree that it would be nice to see applications and developers bear more of the burden of ensuring they are recoverable from OR and DR perspectives. It’s worth noting though, that in the not so distant past applications that simply had to work – that were ‘carrier grade’ so to speak – would be developed on operating systems that had the necessary software ‘infrastructure’ to deliver on those NFRs. This begs the question as to why we don’t see all applications built in this way. There are a number of reasons but I’d argue that the primary one is simply that application development is more expensive than infrastructure.

Development of a new application or platform costs a lot of money. Whatever the complexity involved in developing non-functional requirements (NFRs) for things like availability, the pain involved in determining the functional requirements (or features) is far greater. Outside of a small number of edge-cases such as core software components of telecoms networks or manufacturing facilities, it does not make sense to build operational or disaster failure tolerance into the code of an application. Application developers (both internally in companies and in the wider ISV environment) focus on the functionality that delivers value to the business or will contribute to selling their product, not on replication, block-level data validation or data-recovery.

Even where developers are interested in building in resilience, they have been faced with a lack of software ‘infrastructure’ to support them. Many of those highly resilient applications in telcos, manufacturers etc. were built on operating systems or used components that provided services such as shared everything clustering and highly resilient, sharable file systems. OpenVMS which pioneered many of these services will forever be a niche product – albeit one that supports many extremely critical functions – because of the costs in terms of cash and flexibility that are paid when applications are developed on it. Building in resilience makes development (both initial and ongoing) and maintenance of applications more complex and expensive. It also means that the developer has to take responsibility for guaranteeing recoverability and who in their right mind would want to do that ;)?

Today, Oracle and MS are building a new variety of this software ‘infrastructure’ into their products but it’s only being used in a small proportion of developments. Even given the possibility that you might save some money on storage replication, people don’t seem to be using these services all that readily, for the reason I suspect, that developers (or management overhead) are still more expensive than those replication licenses.

The only way to change that situation would be (as SB notes) to make delivering resilience at the application layer simple, repeatable and manageable. That’s very much easier said than done though and the twenty-odd years of development of infrastructure resilience services is testament to that. There’s one place where the problem is being addressed though and that’s out there in the cloud….

Personally I think there is often a wider issue of integration between application and infrastructure teams that leads to situations where organisations focus on data rather than application or service recoverability. That boils down to process and in some cases (over) specialisation but it’s a question for another day I think.