How an Open Source Giant forgot how Open Source works
- Details
- Written by Zackary Deems
- Category: Red Hat
- Hits: 5183
I spent about five years as a Sales Engineer / Solutions Architect for one of Red Hat's channel partners. This meant that, while I had no systems of my own to manage as part of my day job, I had to really understand all of their products in order to go stand in front of clients, listen to what projects they have on their radar, and map Red Hat product into those projects. If there simply was no fit.. no harm, no foul, time to move on. But, I used most of the products in my own projects, so I believed in them and know what they could do.
A Brief History of RHV
It's a tough task going to clients and convincing them to use a product that the vendor itself doesn't believe in, or market, or put major resources into. Specifically I'm referring to Red Hat Virtualization... an enterprise virtualization product with the KVM (Kernel Virtual Machine) Hypervisor at is center. KVM, like many of Red Hat's products, started life at a startup who saw the possibilities in KVM, and decided to throw its hat into the virtualization ring to compete against vmware. The timeline was not way off, vmware just had an extreme head start, more money, and a stable product.
When Red Hat acquired Qumranet in 2008, Solid-ICE (the product which became RHEV) was a kludgey mutt of a product that actually required a windows machine to run the management plane. As you might expect, Red Hat was anxious to get its new acquisition to market, so it rushed the process of onboarding the product, and went to market in 2010 with a very problematic product. Even worse luck, people were excited for a vmware alternative, and many bought in... only to badly regret doing so. At that point, the product was a mess. Sadly, the damage was done. Sales teams were badly snakebit on the product. Management was badly snakebit on the product. But the RHV BU and development team kept working on it and improving it.
By the time I started working to position RHEV with Clients, it was 2014 and RHV 3.3 was out, and it was light-years better than the product that went to market in 2010. I earned my RHCVA certification against RHEV 3.3. It still had the odd quirk in places, but generally speaking, RHEV was a solid competitor to vmware. Price-wise it should not have even been a conversation - customers looking to save money could do the exact same things with RHV as with vmware, at 1/10 the price. But even Red Hat was not really interested in talking to customers about RHEV. Too many in the hierarchy remembered how bad things were in 2010, and simply weren't willing to take the chance. They pointed to bad sales numbers as justification that it must not be a great product... but if your customer base doesn't know your product exists, it's unlikely they're going to beat down your door to buy it.
Toward the end of my five years as a channel partner, Red Hat started offering to give away RHV entitlements as part of their Virtual Datacenter bundles on 3 year deals, to give customers an opportunity to start using it while the customer's vmware ELA was in effect, with the intent to migrate off of vmware and onto RHV by the time the ELA was up. Sadly, many sales teams were just using this SKU on deals where they sold the bundle, and very few, if any of them, renewed their subscriptions WITH RHV when the time came... often because they didn't even know they HAD RHV.
The messaging from management continued to get more dire, especially after Container Native Virtualization became a thing. OpenShift had quickly become the darling of the company, partly because it is positioned as a frontronner in the kubernetes field, and many consider Kubernetes to be the future of IT. Partly because openshift is expensive. Either way, with OpenShift taking center stage, and this new virtualization concept that could be bundled into OpenShift, management saw an opportunity to push out the product that had bitten them so badly 10 years earlier.
When Red Hat announced the forthcoming release of RHV 4.4, they indicated that it would be the last production release, and that they were essentially putting their eggs in the OpenShift Virtualization basket. Except that OpenShift Virtualization was in its infancy, and was not even intended to be an Enterprise Virtualization solution. In reality, RHV and OSV were complementary products, but Red Hat decided to finish out release 4.4 then essentially drop the product on the sidewalk. 4.4 is THE most stable release to date, and really closed the last few gaps that existed between RHV and vmware. Yet Red Hat decided they were going to just... walk away from it. The upstream community is alive and well, meaning oVirt is going to continue on, Red Hat just won't continue developing against it.
The Final Days of CloudForms
Going back to those days from 2014 on, Red Hat had a very clear strategy and mission. They wanted to provide all of the building blocks that might be used to build a hybrid cloud solution, and/or develop apps to run in a hybrid cloud. They had built a portfolio that filled all of the gaps in the tech stack. You had:
- RHV for Virtualization Workloads
- RHOSP (OpenStack Platform) for Cloud Native workloads
- RHOCP (OpenShift Container Platform) and Atomic/CoreOS for DevOps and Microservices, designed to run atop whatever platform makes sense to you (baremetal/RHV/OpenStack/vmware/AWS, etc)
- RH CloudForms to provide a central management plane and self service portal to tie it all together and really provide the single pane of glass that companies wanted, for managing ALL of their hybrid cloud needs.
- RH Storage in two flavors to provide the data backend for all of the above
- The JBoss family of products to provide the various application features required for complex enterprise application development
- 3Scale to give you an API gateway to all of the above and whatever they need to talk to
- And RHEL underneath all of it
The best part about Red Hat's approach was that each of those products could be used independently and it would do exactly what you needed it to do.. there was none of the artificial drag that most vendors create, such that if you purchase product A, it will only do 10% of what it claims to do.. you have to purchase products Q, S and Y if you want the other 90%. So customers who were leery of relying on one company for their entire technology stack, could easily and safely pick and choose where they were willing to depend on Red Hat.
In 2018 it was announced that IBM would Acquire Red Hat. For those of us who drank the Red Hat Kool-Aid and who are Open Source true believers, this was both thrilling and horrifying. It had the POTENTIAL to take Red Hat to the next level, inject new money into the development process to finally allow some of those troubled products to ride the wave into a new position of strength. It also had the potential to destroy what many of us viewed as a true bastion of Open Source ... maybe not wholesomeness, but mindset, for sure.
I took a job with IBM in August of 2018, and the acquisition went through around the same time. My team was tasked with building an on-prem private cloud, and largely due to the acquisition, the decision was made to utilize the Red Hat Tech stack. Which is part of why they hired me, as I was well versed in most of the RH tech stack. The decision to use CloudForms was mostly made when I started, but my fervor for the product and familiarity with it pushed the decision over. So I probably have to assume a significant part of the blame for what happened to the product.
Prior to our efforts on the project, IBM paid very little attention to CloudForms as a product. They didn't know much about it, and really hadn't spent any time loooking at it. Numerous people within GTS looked at our decision and tried to take us to task for it, but we kept moving forward, and were able to demo an early version of our product. Overnight the conversation changed. People started asking us why we chose it. We demonstrated all of the various features, and much whispering started among folks who hadn't realized just how useful and flexible and feature rich it was. Then the worst possible thing happened. IBM decided it wanted CloudForms. Mere months away from our product going GA, IBM took CloudForms from Red Hat, transferred all developers to IBM, and forced Red Hat to push all existing customers to IBM for support.
This was bad for everyone involved. IBM had not taken the time to build out any kind of build environment, nor did they have any infrastructure to support providing updates (the way Red Hat did, i.e. yum update). At first it didn't really matter, because Red Hat was still providing updates for its last version of CloudForms. IBM buried CloudForms in one of its many "CloudPaks".. bundles of products that clearly can't stand on their own, but likely add value to somebody when bundled.
Now, tying back to the title of this article... CloudForms too is an open source product. Its upstream community is ManageIQ, whch, interestingly, is mostly comprised of IBMers who came over from Red Hat. The fact that it's Open Source really raised my eyebrows when IBM 'took' the product... because it's not IP. There's literally nothing stopping Red Hat from continuing to have its own ManageIQ based product.. nothing other than IBM dictating where they spend money. Sadly, IBM still has no update mechanism for Infrastructure Management (their new name for CloudForms). Although they did take the step of switching to CentOS under the hood in their templates, so you no longer get the benefits of having it live atop RHEL. So you can't patch the application, and the application lives on a community supported operating system.
Tying it all Together
Back when Red Hat was an independent company, it had a clear vision of how their products fit into the enterprise ecosystem, and they ensured they had an answer for each component of the enterprise hybrid cloud technology stack. Now, under IBM, they have no clear direction other than "We have OpenShift... oh.. and some other stuff too", and they've taken two amazing products with existing customer bases, and effectively dropped them off at lost and found. *ANYBODY* could come along and pick up either project and they would have a salable, enterprise-grade product that competes extremely well against industry leading products... and that's without having added anything in the way of value, on day 1. Heaven knows I've attempted to get at least one company to do exactly this, but so far I've not been successful.
The beauty of Open Source is that so long as a project continues to have active developers, it will live on even after the companies that built products around them die off. It does also mean that you could be picking up a product at the same time that 10 other companies decide to pick it up and run with it.. like Kubernetes... but really, that's good for the product, too, because each of those entities will have to infuse their own ideas into it to stake out their unique claim to why their flavor is best.
Red Hat exemplified the ugly side of the business behind Open Source... demonstrating that an Open Source project/product only has value as long as the people who control the money still believe in the project/product. My hope is that both projects continue working to improve, and eventually show just how foolish those bean counters were.
The sysadmin in the age of DevOps
- Details
- Written by Zackary Deems
- Category: Technology
- Hits: 2812
I remember reading an article on one of the technology news sites back around 2012/2013 expounding upon the writer's view that the systems administrator role was DEAD.. that anybody in systems administration who cared about their livelihood should be looking for what's next, and soon. I don't recall if the author provided any insight into their own profession, but I clearly recall that the article felt like it was preaching about a looming cataclysm for sysadmins, as though it would personally impact that person.
I also remember thinking that the author may not have quite the firm grasp of the situation that they thought they did. Here we are 10 years later, and system administrators are just as necessary now as they were back then. In many organizations, the role is pretty much EXACTLY what it was 10 years ago. In others, it's changed a bit, but is still necessary. So what was it that the author missed, or misunderstood a decade ago?
Actually, it's pretty simple. Automation has come a long way, but we don't yet have systems building systems intelligently, from nothing. Somebody still has to do the initial legwork to set up the templates or instances, and maintain those. Somebody has to keep the lights on, perform troubleshooting, stand up new hardware, take down old hardware, and just generally do all of the design work for new projects. Somebody also has to write and maintain all of the automation. The hard reality, though, is that the need for keyboard jockeys and simple operators is dwindling. Any job that requires pushing buttons or monitoring things, like backups, can be pretty nearly automated now. Systems can be patched en-masse via automation now. It's still good to have a trained administrator make the rounds after patching completes to make sure everything is still solid, but this is increasingly the job of the senior admin.
DevOps, microservices and container technology, when properly configured, are eliminating the need for downtime. These are highly complex installations and require highly trained individuals to build and manage them. Cloud environments, public and private, require skilled administrators to manage the moving parts on the backend.
The big difference between systems administration in 2012 and 2022? Those individuals who just need a job, and are technologically inclined, but not passionate about what they do... end up moving on to other professions. These people aren't staying current, they aren't learning the advanced technologies. They ARE being automated out of jobs. The jobs are still there, but the focus is increasingly on advanced skills in addition to solid sysadmin capabilities. Everybody has to do more with less.
Systems Administration is as much of a mindset as it is a discipline. It's more than managing systems. I personally have always maintained that sysadmins are the glue that holds the enterprise together, because we HAVE to know how all of the pieces fit together. We have to know the network configurations. We have to know the system configurations. We have to know which systems connect to which other systems, and how. We need to know which databases are used for what, and what needs to access them. We have to have a strong understanding of the security profile for the business,and how it applies to all of the above. We basically have to clearly grasp the 'big picture' of the enterprise.
When I've gone into customers and found them spending large amounts of time putting out fires, it's often because that organization artificially encapsulates its sysadmins into tiny boxes, and ends up without anybody who really knows how it all fits together.
If you find yourself on the path to becoming a systems administrator, or are already one, but are concerned about the future of your position, the best advice I can give is this:
- Understand that properly built and managed systems manage themselves, so if you're constantly putting out fires, its likely you're missing the 'properly' part of that statement.
- Take responsibility for your enterprise. Own it. The better you understand all of the moving parts, the better you'll be able to manage it.
- Make the effort to learn the big picture. Talk to the network team. Talk to the DBAs. Talk to the applications teams. It doesn't matter if it isn't your responsibility today. Think of it this way... if you're the expert on the enterprise, and you automate yourself out of a job, you'll still HAVE a job because you're invaluable to the company as the person who can keep the lights on. Also, you'll be the best candidate to help design and implement NEW projects because you'll clearly understand how they should fit in.
- Don't be satisfied with your current skill level or training level. You may be awesome, but there's already new technology you're not aware of, and it could show up on your desk tomorrow.
- always aim to be the best in the industry at your job. It's a tough goal to hit, but it's worth the effort.
- Mentor others - the best way to hone your understanding of anything, is to teach it. You learn pretty quickly just how limited your knowledge was previously.
- AND remember that you probably don't want to be a sysadmin forever... there are other, higher level positions that will interest you that are NOT management. You may just need to move to another company to find them. Some of those positions include:
- Sales Engineer
- Solutions Architect
- Systems Engineer
- DevOps Engineer
- Systems Architect
- Cloud Architect
- Chief Engineer
- Chief Architect
- Technology Director
- CTO
Some of those other roles require additional skills, and often there's a progression. Keep learning, try to climb out of the cave and interact with real people every so often, and slowly develop some diplomacy. A trained, quality sysadmin is a huge asset when they move into the higher roles, because you have a MUCH better handle on the big picture, which lends itself VERY well into Engineer and Architect roles.
Automation is More Than Wrapping Old Code in YAML
- Details
- Written by Zackary Deems
- Category: Automation
- Hits: 2277
Automation is probably the hottest word in IT. It encompasses significantly more than just 'using ansible'... anything we do to take a previously manual task and hand it off to a machine to do it for us, counts as automation. In that sense, this article doesn't necessarily have to reference ansible, but fair warning, the examples I'll use will be about ansible. Just remember that the statements are still true regardless of what your automation methodology might be.
Even though our automation tools have generally been around for 5-10 years, it can be a real challenge to properly apply modern automation techniques to our challenges. Part of the challenge is that the focus is on those manual tasks, rather than the intent of the task. The talented individuals being asked to automate things take the request from the person who wants the thing automated, and they act on it. The person doing the requesting likely has no idea how the automation works or what it can do. So often what you get is an automated process that does the EXACT set of tasks that were previously manual... with NO value added. So yes, we saved some time because Bob doesn't have to go touch all 10 of those servers every 6 months.
But if we understand how the automation works, what it CAN do, what it CAN'T do, etc... it allows us to potentially take a step back and ask the question: What is the ultimate goal behind this task? Sometimes a specific task takes on a life of its own due to events that happen over the years. If we peel that situation back, we may find that the REASON bob was logging into each of the database servers every Friday and deleting logs... is because there's no log rotation software in use. Rather than writing automation to just delete the logs, it likely makes better sense to use system components that will rotate the logs automatically. Maybe the better automation task would be to capture those rotated logs and copy them off to a central log store for analysis..
Instead of having a compliance practice that leverages a system scanner twice a year to scan a system and generate a report that gets sent off for review and ultimate generation of a remediation plan, automation lets us establish a desired 'compliant' configuration, and ENFORCE that configuration every day.. This requires a deviation from the policy, because the policy is built around the scanning/reporting/remediation process. Getting to the 'continuous compliance' practice requires us to step back from the current multi-team process and ask "What is it we're ultimately trying to accomplish?". Is the ultimate goal to create a report that gets turned into a remediation report? Or is the ultimate goal to keep the machine compliant? If it's the latter, would it make more sense to automate the report generation and communication to remove 2 steps from the process, or would it be more effective to have automation that runs on a schedule which knows the desired configuration and checks all systems to make sure they're in that state..and if they are not..fixes them.
Even in situations where there may be shell scripts that are known good, that have been used for years...those shell scripts likely predate automation and likely do not take advantage of the capabilities inherent to the automation system. If we take the time to rewrite the shell scripts in native automation language, we do gain significant advantages. Using ansible as my example here, when you use native ansible modules to perform tasks, those tasks are performed in a mode referred to as idempotent. This is a big word that basically means, you can run the task over and over without ever having to worry about anything breaking. It checks to see whether the task needs to be done before it does anything. If the system is already in the desired state, it gives the green light, does nothing, then moves on to the next task in the play.
This becomes even more important when we start building complex automations where we chain multiple tasks on top of each other.
- register a system
- set up software repositories
- install application software
- create users
- create groups
- configure application software
- integrate application with central identity management
When your task is idempotent, you don't need to go back and comment out the first five steps if they completed successfuly but the sixth failed. Just fix the failure on item 6, and re-run the whole playbook. It will see that those first five tasks are already completed, and pick up with the sixth.
It also does these across a selected inventory of hosts, rather than trying to make scripts designed to run on one host, work against multiple hosts.
Summary
So two ideas to take away from this:
- before blindly automating legacy tasks, take a step back to identify what the task is ultimately trying to accomplish, and focus on automating THAT. It may end up being the same things that were previously scripted... or it may end up going a completely new direction.
- if the old scripts are actually what need to be run, rewriting them in native automation language will provide new efficiencies and allow integration points that did not exist previously.