Technical Solutions Poorly Solve Social Problems
Published on , 2650 words, 10 minutes to read
I just wanna lead this article out by saying that I do not have all the answers here. I really wish I did, but I also feel that I shouldn't have to have an answer in mind in order to raise a question. Please also keep in mind that this is coming from someone who has been working in devops for most of their career.
Or: The Social Quandry of Devops
Technology is the cornerstone of our society. As a people we have seen the catalytic things that technology has enabled us to do. Through technology and new and innovative ways of applying it, we can help solve many problems. This leads some to envision technology as a panacea, a mythical cure-all that will make all our problems go away with the right use of it.
This does not extend to social problems. Technical fixes for social problems are how we end up with an inadequate mess that can make the problem a lot worse than it was before. You've almost certainly been able to see this in action with social media (under the belief that allowing people to connect is so morally correct that it will bring in a new age of humanity that will be objectively good for everyone). The example I want to focus on today is the Devops philosophy. Devops is a technical solution (creating a new department) that helps work around social problems in workplaces (fundamental differences in priorities and end goals), and in the process it doesn't solve either very well.
There are a lot of skillset paths that you can end up with in tech, but the two biggest ones are development (making the computer do new things) and systems administration (making computers keep doing those things). There are many other silos in the industry (technical writing, project/product management, etc.), but the two main ones are development and systems administration. These two groups have vastly different priorities, skillsets, needs and future goals, and as a result of this there is very little natural cross-pollenation between the two silos. I have seen this evolve into cultural resentment.
Not to say that this phenomenon is exclusive to inter-department ties, I've also seen it happen intra-department over choice of programming language.
As far as the main differences go, development usually sees what could be. What new things could exist and what steps you need to take to get people there. This usually involves designing and implementing new software. The systems administration side of things is more likely to see it as a matter of integrating things into an existing whole, and then ensuring that whole is reliable and proven so they don't have to worry about it constantly. This causes a slower velocity forward and can result in extra process, slow momentum and stagnation. These two forces naturally come into conflict because they are vastly different things and have vastly different requirements and expectations.
Development may want to use a new version of the compiler to support a language feature that will eliminate a lot of repetitive boilerplate. The sysadmins may not be able to ship that compiler in production build toolstack because of conflicting dependencies elsewhere, but they may also not want to ship that compiler because of fears over trusting unproven software in production.
This fear sounds really odd at first glance, but this is a paraphrased version of a problem I actually encountered in the real world at one of my first big tech jobs. This place had some unique tech choices such as making their own fork of Ubuntu for "stability reasons", and the process to upgrade tools was a huge pain on the sysadmin side because it meant retesting and deploying a lot of internal tooling, which took a lot longer than the engineering team had patience for. This may not be the best example from a technical standpoint, but things don't have to make sense for them to exist.
This tension builds over a long period of time and can cause problems when the sysadmin team is chronically underfunded (due to the idea that they are successful when nothing goes wrong, also incurring the problem of success being a negative, which can make the sysadmin team look like a money pit when they are actually the very thing that is making the money generator generate money). This can also lead to avoidable burnout, unwarranted anxiety issues and unneeded suffering on both ends of the conflict.
So given the unstoppable force of development and the immovable wall of sysadmin, an organizational compromise was made. This started out as many things with many names, but as the idea rippled throughout people's heads the name "devops" ended up sticking. Devops is a hybrid of traditional software development and systems administration. On paper this should be great. The silos will shrink. People will understand the limits and needs of the others. Managers will be able to have more flexible employees.
Unfortunately though, a lot of the ideas behind devops and the overall philosophy really do require you to radically burn down everything and start from scratch. This tends to really not be conducive to engineering timetables and overall system stability during the age of turbulence.
What's the problem with burning everything down? Fire cleanses all things and purifies away the unworthy!
Not when you're the one being burned!
Wait, so what actually happens then? Does it just end up being a sysadmin team made up out of coders?
Yeah, in practice this ends up being a "new team" or a reboot of an existing team in ways that is suddenly compelling or sexy to executives because a new buzzword is on the scene. Realistically, devops did end up getting a proper definition at a buzzword conference level (being able to handle development and deployment of services from editor to production), but in practice this ends up being just some random developers that you tricked into caring about production now while also telling them that they're better than the sysadmins.
Two jobs for the price of one!
This ends up shafting the sysadmin team even harder because the new fancy devops team has things they can talk about as positives for their quarters, so people can more easily make a case for promotion. As a sysadmin, your "success" case is "bad things didn't happen", which means success can't stand out on reviews. Consider "scaled production above the rate of our customer acquistion rate" against "set up continuous delivery to ensure velocity on our team, saving 50 hours of effort per week". Which one of those do you think gets you promoted? Which one of those do you think gets headcount for new hires?
This has human costs too. At one of my past jobs doing more sysadmin-y things (it was marketed as a devops hybrid role, but the "hybrid" part was more of "frantically patch up the sinking ship with code" and not traditional software development). Sleep is really essential to helping you function properly to do your job. During the times when I was pager bitch, there was at least a 1/8 chance that I would be woken up in the middle of the night to handle a problem. I had to change my pager tone 15 times and still get goosebumps hearing those old sounds nearly a decade later. This ended up being a huge factor in my developing anxiety issues that I still feel today. I ended up getting addicted to weed really bad for a few years. I admit that I'm really not the most robust person in the world, but these things add up.
I guess "addicted to weed" isn't totally accurate or inaccurate here, it's more that I was addicted to the feeling of being high rather than dependence on the drug itself. Either way, it was bad and weed was my cope. It also probably really didn't help that I was also starting hormone replacement therapy at the time, so I was going through second puberty at the time as well. This is the kind of human capital cost when dealing with dysfunction like this. I've always been kind of afraid to speak up about this.
However, there are real technical problems that can only really be solved from a devops perspective. Tools like Docker would probably never have happened in the way they did if the devops philosophy didn't exist.
In a way, Docker is one of the perfect examples of the devops philosophy. It allows developers to have their own custom versions of everything. They can use custom compilers that the sysadmins don't have to integrate into everything. They can experiment with new toolstacks, languages and build systems without worrying about how they integrate into existing processes. And in the process it defaults to things that are so hilariously unsafe that you only really realize the problems when they own you. It makes it easy to ship around configurations for services yes, but it doesn't make supply chain management easy at all.
Wait, what about that? How does that make any sense?
Okay, let's consider this basic Dockerfile that builds a Go service. If you start from very little knowledge of what's going on, you'd probably end up with something like this:
COPY go.mod go.sum ./
RUN go mod download && go mod verify
COPY . .
RUN go build -v -o /usr/local/bin/app ./...
This allows you to pin the versions of things like the Go compiler without
bothering the sysadmin team to make it available, but in the process you also
don't know what version of the compiler you are actually running. Let's say that
you have all your Docker images built with CI and that CI has an image cache set
up (as is the default in many CI systems). On your laptop you may end up getting
the latest release of Go 1.17 (at the time of writing, this is version 1.17.8),
but since CI may have seen this before and may have an old version of the
tag cached. This would mean that despite your efforts at making things easy to
recreate, you've just accidentally put an ASN.1 parsing
DoS into production, even though
your local machine will never have this issue! Not to mention if the image
you're using has a glibc bug, a DNS parsing bug or any issue with one of the
packages that makes up the image.
So as a side effect of burning down everything and starting over you don't actually get a lot of the advantages that the old system had in spite of the dysfunction?
Yep! Realistically though you can get around this by using exact sha256 hashes of the precise Docker image you want, however this isn't the default behavior so nobody will really know about it. There are ways to work around this with tools like Nix, but that is a topic for another day.
This is what the devops experience feels like, chaining together tools that require careful handling to avoid accidental security flaws in ways that the traditional sysadmin team approach fundamentally avoided by design. By sidestepping the sysadmin team's stability and process, you learn nothing from what they were doing.
This is all of course assuming that at the same time as you go devops, you also avow the grandeur of the cloud. Statistics say that these two usually go hand in hand as the cloud is sold to executives as good for devops.
As for how to get out of this mess though, I'm not sure. Like I said, this is a social problem that is trying to be solved through a business organizational fix. I am a technical solutions kind of person and as such I'm really not the right person to ask about all this. I don't want to propose a solution here. I've thought out several ideas, but I got nowhere with them fast.
I remember at one of my jobs where I was a devops I ended up also having to be the tutor on how fundamental parts of the programming language they are using work. This one service that was handling a lot of production load had issues where it would just panic and die randomly when a very large customer was trying to view a list of things that was two orders of magnitude larger than other customers that use that service. I eventually ended up figuring out where the issue was but then I had an even harder time explaining what concurrency does at a fundamental level and how race conditions can make things crash due to undefined behavior. I think it ended up being a 3 line fix too.
I guess the thing that would really help with this is education and helping people hone their skills as developers. I understand that there's a learning curve and not everyone is going to become a programming god overnight, but every little bit sets off butterfly effects that will ripple down in other ways. Any solution that requires everyone be a programming god isn't viable for anyone, including programming gods.
This whole mentorship thing only really works when the company you work for doesn't de-facto punish you for mentoring people like that. If you aren't careful about how you frame this, doing that could make it difficult for you to prove yourself come review time. "Helped other people do their jobs better" doesn't really look good for a promotion committee.
Yeah but what are you supposed to do if that kind of mentorship is what really helps motivate you as a person and is what you really enjoy doing? I don't really see "mentor" as a job title on any postings.
There's always getting tired of trying to change things from within and then writing things out on a publicly visible blog, building up a bunch of articles over time. Then you could use that body of work as a way to meme yourself into hiring pipelines thanks to people sharing your links on aggegators like the orange site. It'd probably help if you also got a reputation as a shitposter, usually when people are able to openly joke about something that signals that they are pretty damn experienced in it.
You're describing this blog aren't you.
Like I said though, this is hard. A lot of the problems are actually structural problems in how companies do the science part of computer science. Structural problems cannot be solved overnight. These things take time, effort and patience to truly figure out and in the process you will fail to invent a light bulb many times over. Devops is probably a necessary evil, but I really wish that situations weren't toxic enough in the first place to require that evil.
Facts and circumstances may have changed since publication. Please contact me before jumping to conclusions if something seems wrong or unclear.