Cloud coding pitfalls: Tips for avoiding big, bad bugs

06.05.2016
According to this ACM article, the seven coding constructs that have been the most frequent source of bugs are function calls, assignments, conditions, pointers, uses of NULL, variable declarations, function declarations, and return statements. There are dozens of other conference presentations, books, and taxonomies that provide statistically valid guidance — or at least opinions — on coding practices to avoid.

But so far, I haven’t found anything like that for coding in the cloud.

And make no mistake about it, the distributed, multi-language environment inherent in the cloud presents some real coding challenges. But before we nerd out entirely, let’s do a bit of bug triage. There are three interesting categories of bugs:

While all bugs are annoying, it’s been pretty widely documented that the first category is much less expensive to resolve … and those intramural bugs don’t lower the credibility of the development team or the system they are working on. The truly dangerous bugs are the last ones that might be two orders of magnitude more expensive to fix and cause failures that are in full view of the users.

Let’s go a bit further. Bugs in the first category are typically logic errors that can be caught by tools and automated testing. Bugs in the last category are caused by human frailties: imprecise communication, incomplete documentation, memory failures, long learning curves, sketchy error handling, sloppiness under pressure and plain old sloth. There are no software tools that can fix that laundry list, so the key is to avoid coding issues that make your project more vulnerable to those human foibles.

So let’s inject some humorous counter-examples:

In the most ambitious cloud platforms, language choices proliferate, and for a tricky Salesforce.com application your developers may have to work in six languages at once (yes, really). So tip #1: Don’t change languages frivolously, as cross-language debugging is painful. If there is some wonderful math library in Python you really do need to use, encapsulate it as a Web service and call it via REST.

Some languages are more prone to errors (or more likely to tempt developer slop) than others. Languages with strong variable typing and automatic memory management/garbage collection help avoid a wide range of errors. In contrast, endless articles have been written on VB and C++, but even if you encounter those in the cloud try to contain their evil within a Web service that hopefully you don’t need to work on at all.

In the cloud, your team is likely to work in Javascript and its endless libraries, and that makes them vulnerable to that language’s weak typing and excessive case sensitivity. The road to hell is paved with the excluded middle. Get the best debuggers and static analysis tools you can. If you can move on to languages like Ruby on Rails, a raft of problems disappears.

The worst of the bugs will be discovered and fixed by someone other than you at some point in the future when they’re tasked with extending the system’s functionality.

Here are some tips to make their job easier:

In the rush to get a prototype done, developers may skip some basics, particularly as they may be working in several languages at once. The next tip is to always do the following basic data checks before performing any operations:

A corollary of this tip is to watch out for (and root out) data overloading that some genius slid into a “quick fix” that has long since been forgotten.

Some calculations and logical operations seem to encourage bugs more often than average. Some of them can’t be avoided, so the next tip is to budget enough time to develop test code, scenarios and data to test these troublemakers for proper results (not just coverage):

Just because the cloud isolates code inside Web services doesn’t mean that it prevents spaghettification, particularly when modules are extended to do things that were never envisioned at initial construction time. The next tip: use profilers and network traffic sniffers to spot particularly “chatty” Web services that may indicate methods that need refactoring.

The ultimate spaghetti (meta-ghetti) comes when there’s an incomplete deployment of changes across several modules. Things seem to work, but there will be ephemeral bugs you’ve never seen before and can’t reproduce in your staging system. Pushing from staging to production in large cloud systems is painful at best, and I’ve never seen an automated system work with multiple cloud platforms (e.g., AWS and Salesforce) … even though theoretically it should work. So here’s the final series of tips:

If all this seems like a recasting of issues from the last century, I can’t disagree. We’ve just managed to make ever more sophisticated ways to waste CPU and developer cycles.

(www.cio.com)

David Taber