Since my very first days in IT almost 35 years ago I've heard people talk about something called "a single pane of glass." It was said that somewhere an application must exist that would let IT people manage all the components of their hardware, data storage, backups, software deployment, licensing, performance management, capacity planning, and availability monitoring into a single application that an engineer or manager could use to manage the entire IT environment from one place. Theoretically it would consist of a single screen showing the status of everything IT is responsible for and allow the user to drill down into any of them for more detailed information or to make changes to something without ever leaving the application.
The reason I'm dredging this up again is that I heard it again in our staff meeting last week when our new manager was talking about our team's upcoming initiatives. This is an interesting concept, but it raises a couple of questions: 1) Does it actually exist, and 2) Should it actually exist? Let's think about this for a minute.
All of those things listed above are important parts of my world in IT because I deal mostly with hardware and its deployment and operation. All of those capabilities are available to me, but I have to go to a different application for each one of them. The goal of "a single pane of glass" would be for me to see and manage all this stuff in one place.
The Holy Grail would be for all these functions to be integrated and automatic. For example, a performance management function might detect that a particular application is running slowly and automatically look at disk space consumption and memory/CPU utilization and determine that the performance problem is being caused by the disk containing the operating system being critically low on free space. It would automatically respond by using its storage function to allocate a little more space to get the system running correctly again and then fire off an email to me advising me of the situation, how it was fixed, and that follow up would be needed to find out why it happened. That would be pretty cool, especially if it was all done automatically without the need to wake me up at 2:00AM. Let's pick this situation apart for a second and create a rough design of the "Single Pane of Glass" system. That's a long name so we'll just call our new system Big Pane.
To get to Holy Grail status, this system would need to detect the problem, gather performance data from all the possible sources of the problem, sift through the data to determine the most likely cause, devise a solution or two, and then implement a possible fix. We'll throw one more wrench into the works here and say that the system is running on Windows Server, so if storage was added then it would somehow need to notify the Windows operating system that the extra storage resources were available for it to begin using.
Detecting the problem would be easy. We currently have very expensive software that can reach into another software system, perform various functions, measure how long it took, compare it to previous results, and determine if something isn't running as expected. The challenge here is that software systems are not created equally so you have to check them all in different ways to get the true picture of how they're performing. We have about 60 of these software systems and each one works only for a single software package. There is no common denominator determining how performance is measured across all of them. For this reason our new system would need different testing capabilities for each system. Just for parity with my company's existing situation, we'll just say it would need 60 different testing functions. If it actually worked, our management would require that it be expanded to accommodate our other 1440 application systems.
Next it would need to gather some data from all the possible sources of the problem. Not a really big deal because just about everything has some kind of API (Application Programming Interface) and you can usually query them for the data you want using their API. The problem is that they are all different so our system would need to know how to understand all of them.
Assuming we've gotten this far, Big Pane would need to determine the actual cause. AI hasn't come far enough for a software system to think like an engineer with 30 years of experience, so it would most likely need to get this information from a knowledgebase somewhere. Knowledgebases are not created equal, so you would probably do better creating your own, possibly from problems and solutions you've seen before firsthand. The solution could also be included in the knowledgebase so Big Pane wouldn't have to figure it out from scratch every time. Each system would almost surely need its own knowledgebase entries.
Now that Big Pane knows how to fix the problem, it will have to implement the fix. These days you can change almost anything in software so our Holy Grail system could make whatever changes are necessary just by going through the API of the problematic system. Big Pane would need to contact our SAN management module in the SAN Director System and tell it to allocate some more disk to our system. Luckily almost all of our systems use the same SAN for storage so this wouldn't be too difficult. You'd just need to tell it what kind of storage is needed, how much you want, and what system it goes to.
Now, you'd think that was all we needed to do but you'd be wrong. You see, even the newest operating systems don't automatically recognize all changes, especially hardware changes like added disk space, memory, or CPU. You have to restart them or notify them somehow that they can begin using these new resources. The notification process is different for every operating system and sometimes different between different versions of the same operating system. This part might not be a problem if a system is unusable due to a performance problem and will need to be rebooted anyway, but what about really specialized systems that are just running a little slowly?
One of the systems I support is used during open heart surgery. A patient is hooked up to a bunch of probes, sensors, and cameras that collect various pieces of data the surgeon uses during procedures and then displays them for him in the operating room. Some of these sensors are snaked up through an incision in the leg all the way to the heart. If the system stops working during a procedure then the surgery has to pause until it is restarted. In that case you have a bunch of doctors standing there as motionless as possible over a patient laying unconscious on the table with his/her heart fully exposed and several wires running through his veins. You don't want this to happen for obvious reasons.
The last task for Big Pane would be to send me a politely worded email advising me of the problem and how it was fixed. This is always done the same way so it would be very easy.
Now that we have an idea of what this would take, imagine implementing it for 1500 different software systems simultaneously. And imagine what it would take to keep it up to date with all the regular changes in software, hardware, and configurations for all those applications. Even with a huge investment in development, you'd end up with something of limited usability and it would never be truly complete. And remember, the example we've discussed here is just for one scenario on a single system. Most actual situations are much more complex than this one.
I'm not lazy and I'm not stupid either. The effort and subsequent cost would be much larger than any reward from doing it could ever be. The logical alternative to building this behemoth would be to hire a few good engineers and managers and just pay them well to use several different systems and be proactive in their daily work. This is what we do today. Now I would ask the second question "Should it exist," but you probably already know the answer to that by now. Unfortunately, every time an IT executive goes to an industry conference he/she reads about some proposed Holy Grail system in a SkyMall magazine at the airport and we start this quest all over again. If you're an IT executive and you're reading this, please stop. SkyMall is not the place where you should get ideas for how you're going to spend your organization's IT budget.
No comments:
Post a Comment
Feel free to comment, but please be considerate of others.