Monitoring As A...
I haven’t done much blogging again. Business as usual. I have plenty of drafts sitting here, but they all suck. I have plenty to say, but I don’t have the discipline to make it interesting or readable.
However, I think I might be able to make things readable if I write one of those ever-popular “reaction pieces.” You know, the “I read a blog post somewhere and here is why I’m smarter than the author” type deal.
I wrote up a super-short-form response on another website (if you’re clever maybe you can find my super-secret username on some other site), but I wanted to try my hand at it here, too.
The blog entry I read today was published by dataloop.io. I had never heard of them before, but they seem to be a start-up that is working on “Monitoring for Cloud Services”…as a service.
First, know that I’m not here to insult them or their business model. I have never used their product (it’s not available to the public and I’m not in the market), but based on screenshots and descriptions it sounds perfectly reasonable. So, with that out of the way…
Here is the blog entry in question. Again, don’t get me wrong: it is a very good blog entry. It has interesting data and I agree with a lot of the points made in the post. Please, go read it before coming back here.
But, as they say, everyone is trying to sell you something. Specifically, dataloop is trying to sell me on monitoring as a service (hereafter MonAAS). So a lot of the language used in the article - either accidentally or on purpose - is skewed in favor of MonAAS.
Both on their homepage and in the post in question, dataloop paints ‘non-cloud’ (not a quote) monitoring as a Kit Car.
In the post, they point out that “companies use an average of 2 monitoring tools…” and that many larger organizations use more than that. They paint this as a necessarily bad thing, conjuring images of a lonely Ops employee hunched over a big pile of metal parts scratching their head. Then, once that stalwart soul has constructed their car, they need to invest significant time into tuning it to keep it running. And maybe the tires fall off every 100 miles or so.
I’m sorry, I got lost in the metaphor. It’s a good one; I can’t deny that. And I’m sure it’s a reality in many companies. Everyone has seen that Nagios setup…you know the one.
But I can just as easily reframe it and make it awesome.
###…Composable Monitoring Infrastructure
The article uses a very specific example that I’m going to run with, here:
For example Nagios for alerting, Graphite for dashboards, StatsD for developer metrics, collectD for service metrics and LogStash/Kibana for logs.
Maybe I’m just insane or old fashioned, but that sounds (mostly) perfectly reasonable to me.
Each of these tools are good at something. Plug them in to one another and make use of them.
Use Nagios for service/server state checks and other metrics like that (whether using something like collectD or just NRPE). Nagios-alikes can monitor anything these days. But also plug Nagios in to Graphite - then every time you write a new check to be executed by NRPE you get a nifty graph thrown in for free.
Use LogStash to aggregate logs or other events (application errors, etc). It’s great at that! But, again, send things to Graphite. LogStash has output filters for statsd or even graphite directly. Sure, Kibana can provide you some nice graphs of time-series data like that already, but why not put it into Graphite with everything else and use LogStash for its strength of quickly sorting through all of those giant blobs of JSON you just dropped on it?
Nagios itself doesn’t need to be a pain, either. Nagios is easily (well…”easily”) managed via Puppet. It gets even easier if you’re using Sensu instead of Nagios since you just need to set the roles of the provisioned server correctly.
Granted, I’m coming at this from teams and organizations with just as much Ops expertise on-board as Dev expertise (if not more).
I can understand how this could be difficult for start-ups of only a few employees where each one is a developer essential to the main product. Maybe they don’t have someone to spare to set up competent monitoring, or even if they did maybe they just don’t have the expertise.
The other monitoring challenge called out in the post is the “DevOps”/“SOA” argument. That modern organizations are just too damn agile to be able to worry about their own monitoring.
I don’t mean to put words into the post’s mouth, here, and I don’t mean to come off as so mean-spirited, but I think I may be. Apologies.
The crux of the argument is:
…[Development] teams need monitoring for the services they own, with the ability to add and remove checks as the service changes, and customize what alerts they receive and what dashboards they view.
Which, if you read the “Composable Monitoring Infrastructure” section above here, doesn’t sound like much of a challenge anymore.
A properly-integrated monitoring infrastructure is effortless to orchestrate using your favorite configuration management tool. Your dev teams are using configuration management for their ultra-scalable service, right? Need to add or remove a service check? Adjust the manifest as appropriate. Boom, done.
I will grant that things may be difficult if you live on Heroku or something, but if you control your own servers in some way (physical, “private cloud”/OpenStack, “public cloud”/AWS) there is no reason your friendly neighborhood Ops engineers can’t provide your development teams with a sensible and centralized (I know that word is evil these days) montioring infrastructure that can provide benefit to the entire organization through shared information (services interact, right?), shared resources (or is each team paying for New Relic licenses and accounts separately?), and standarization (my old team used X…I don’t know how to use Y).
If your development and operations teams communicate even a little bit - and…don’t let me tell you how to do your job but they probably should - it shouldn’t be difficult to support shared monitoring for even the most nimble of agile organizations.
I guess it all comes down to the same question I ask anyone who tells me they don’t need backups: How important is your data?
Your data, of course, being more accurately metadata about your application. If you are going to keep a close eye on application performance, system performance, user experience/interactions, and all that fun stuff then maybe you can spare a few hours from a few developers and operations personnel and you can figure out what information you need and how best to accomplish it.
Maybe the answer for you is a cloud-based solution, maybe it isn’t. But when an article ends in a sales pitch, you owe it to yourself to think it through first.