How to improve network monitoring
I’m an aerospace engineer by degree and an IT executive by practice. Early in my career, I worked on missile hardware and simulators with some of the smartest minds at Marshall Space Flight Center in Huntsville, AL. An adage from those days still drives me today: “Better is the evil of good enough.”
In rocket science, an astronaut’s life is literally in the balance with every engineering decision. Being perfect is mission critical. But along the way, NASA engineers realized while perfection is important, it was not to be universally adopted, for several key reasons: It is very expensive, it draws out timelines, and it can result in extreme over-engineering.
When it comes to IT network monitoring and the need for rapid response in determining, resolving and ultimately preventing problems, remnants of this old behavior are still in existence. Teams eager to build the best, the most complete, and the most comprehensive solutions can fall into the trap of endless design, constantly adding a new metric, method, or collection point to a system that’s not even deployed.
For IT leaders in today’s economy, this approach is impractical.
With today’s agile design and development practices, we are all pushed to get out minimum viable product, to fail fast, to break then fix. Success is measured in days – even hours. The most cutting edge development shops are in a continuous build, continuous test, DevOps mode, getting solutions to market at unheard of speeds.
So how are today’s IT leaders, who are as intolerant of failure as rocket scientists, supposed to respond to demands for a fast, iterative, rapid feedback monitoring solution Here are three ideas.
Pick a platform, not a set of tools. NASA taught us the benefit of the hard work in building reliable launch vehicles that could be used in many creative ways. The now-retired space shuttle was a marvel of engineering. It put into space the GPS array that each of our smartphones uses today, and delivered countless astronauts, experiment materials and components to the space station, over a lifespan of decades.
Your monitoring solution needs to be based on a reliable, extensible platform that provides basic but essential capabilities: event and data collection, message queuing, scale availability and extensibility, for example.
It is very tempting to white board a set of available tools that together provide this capability and get your smart team to put it all together. Ultimately, however, this approach fails.
Your team will begin to uncover the problems of getting all these pieces to work together seamlessly and reliably. You will make decisions based on a host of third party technologies, each with its own roadmap. Some of those technologies will fail or will disappear. Each will advance capability at a different pace.
Your monitoring team will become a platform development team, and your fundamental mission will fail.
Unified monitoring platforms focus on a single, elastic backend that is safely and efficiently storing your data in hBase. A single platform can provide predictive algorithms your team can leverage and tune for your specific needs.
After all, how do you want your team to spend its time – building the perfect platform or improving infrastructure availability and performance for your customers
Embrace open. The space shuttle used open standards in its interoperability technology. NASA worked diligently on a set of standards for how systems and technology would work together.
More than just focusing on docking collars, NASA needed to make sure power, life support systems, cooling and heating systems, and information buses would connect across a broad range of countries and over a 20-year-plus time frame. Built over two decades ago, the space station is still one of the most complex systems ever designed.
Your investment in monitoring needs the same focus. It’s crucial to have your monitoring system make easy use of today’s standards, and to incorporate tomorrow’s as well.
Open APIs are a must. And the ability to really understand how it all works is paramount to long term success. Open source technology allows for extremely deep customization of monitoring platforms. You have the option of deciding what is to be monitoring, in what order.
Watch, listen and react. While there were five space shuttles built, they each evolved differently over time. More importantly, the payloads that were launched varied greatly.
Similarly, your monitoring platform and your monitoring requirements are going to change over time. This is a certainty in today’s IT world. You need flexibility at every level. Your platform needs to be current and have a healthy roadmap, your ability to extend that platform needs to be agile, and you need to have an easy method for tuning your platform for changes needs.
So before you invest in creating a network monitoring system, understand that it is not mission-critical for your business. What is mission-critical is uptime reliability and transaction speed.
OK, monitoring isn’t exactly rocket science, but at the same time, none of us wants to hear, “Houston, we have a problem”; we want to prevent that problem in the first place. Follow these three pointers and put a platform to work for your network monitoring to catch problems before your end users are affected.
Wilson can be reached at bwilson@zenoss.com.