Introduction | After a public failure early on in the project, the IT team at a prominent university decided to deploy their web registration system to Linux, a move that nearly tripled the server's maximum user visits. |
In 1998, I was working for a new web team at the University of Minnesota, managing their server administration group. The University of Minnesota is a very large university, with nearly 6,000 undergraduate students in each institution. At that time, the school was using an older mainframe system as the student record management system. This system was outdated, so changes needed to be made.
This system is not Y2K type (LCTT translation: only use two digits when saving the year, causing the computer to record the year 2000 as 1900, see What Does Y2K Compliant Mean? for details), so we are going to build a system composed of PeopleSoft Software Corporation comes to deliver a new student records management system. This new system does a lot for the University of Minnesota, not only managing student records but also providing other functions. However, it's missing one key feature: you can't register your class over the web from your browser.
By today's standards, this is a major oversight, but in the 1990s, the Internet was still a nascent concept. Amazon has only just been established, eBay has only been in business for a year, Google has just been born, and Wikipedia has not yet emerged. So it is not surprising that PeopleSoft did not support the function of online registration of courses on the web in 1998. However, as the University of Minnesota is the birthplace of the Gopher network and has developed a set of web functional interfaces for the previous mainframe system, we feel that the web online registration function is crucial for this new student record management system.
Our task in this web team is to implement the web online registration function of this management system.
Fortunately, we are not alone. We contacted IBM and started working together to build this new web online registration system in the second year. IBM is responsible for providing the hardware and software environment to run the web system: 3 SP computer nodes running the latest AIX system (UNIX-like operating system), IBM Java and IBM WebSphere platforms, and using an IBM load balancer to implement the 3 nodes load sharing.
After more than a year of development and testing, our system is finally online! But unfortunately failures followed.
Load is too largeDuring the development process, we were unable to accurately simulate and test a real scenario where many students log in at the same time. The reason is not that there is no testing environment. The University of Minnesota has a customized web load testing software package, and IBM has its own tools to supplement it. However, this web system was so unfamiliar to us at the time that we did not realize that these testing tools were cannot meet the requirements.
After several months of testing, we adjusted the expected load of this web system to 240 concurrent users. But unfortunately, our actual usage was about twice as much as expected. On the first day when the system went online, more than 400 students immediately logged into the system at the same time. Because the load far exceeded the expected value, three web servers went down directly. Machine. Due to the sustained high load, the server kept crashing and had to be constantly restarted. Just after one machine was restarted, the other one crashed and restarted again. This situation actually lasted for a month.
Since registration cannot be efficiently done via the web, students can only register through the old method: come to the registrar's office, register with a pen, and then go out. The local newspaper also gloated and mocked: "The failure of computer software forced students to register face-to-face!"
Faced with the fact of failure, we did our best to improve software performance in the next development cycle. In the next 6 months, we frantically tried to enhance the load capacity of this system. Despite adding more code and adjusting multiple configurations, it still couldn't support more users. I tried my best, but still faced failure.
As expected, after the next iteration cycle, we still faced failure. Servers went down time and time again due to load issues. This time the headline has changed to: "Web registration system is rubbish".
We were desperate before starting the next 6-month iteration. No one knows why the server keeps going down, and we already expect this problem to be unsolvable now. We need to take some measures to solve this problem, but how? Here's how we discussed it.
Do I need to switch to a new platform?IBM introduced Linux at that time and did secondary development for its Java and WebSphere platforms. All products are RHEL certified by Red Hat, and several products are already running on our desktop systems. We realize that there is now a complete ecosystem on Linux to run our web management system, but does it perform better than AIX?
After setting up a test server and conducting basic load testing, we were surprised to find that a Linux server could easily support the load that the previous three AIX servers could not support, using the same web code, IBM Java and Under the WebSphere platform, a single Linux server can support more than 200 users.
We informed the registrar and CIO of this news, and they agreed to switch the web registration system to the Linux platform. Although this is our first time running Linux at the University of Minnesota, failure has become a habit and we are no longer afraid. AIX will fail, Linux is our only hope.
We will develop based on Linux immediately. Colleagues from another group also provided several Intel servers for us to use. We installed Red Hat systems and related IBM components on the servers, and then conducted continuous load tests on the new systems. To our delight, There are no issues with the Linux server.
After two months of intensive development and testing, our new system is finally online, and it is a huge success! Under huge load, the web registration system performs flawlessly on Linux. The peak number of users online at the same time even exceeded 600. Linux saves University of Minnesota’s web registration system~
Successful experienceWhen I look back on this project, I see that you can use the following points to introduce Linux to your team:
1. Solve the problem and don’t deceive yourselfWhen we proposed the use of Linux in the enterprise, it was not because we thought Linux was cool. Sure, we're Linux enthusiasts and have run it in our own environments, but we're in the company to solve problems. We can use Linux because our registrars and funders agree that Linux is a solution to the problem, not just because Linux is cool and we want to use it.
2. Make changes as small as possibleOur success is based on the fact that IBM has already made its Java and WebSphere products based on Linux. This allows us to switch the web system from AIX to Linux without making too many modifications and adaptations. Compared with the two, only the hardware and operating system have changed, while other system-related components have remained consistent. These are the cornerstones to ensure a successful platform switch.
Our problem is obvious: the web registration system failed in the first two iterations and will likely fail again. When we told our sponsors about our idea (to switch from AIX to Linux), we were fully aware of the risks and rewards involved. If we do nothing, we will only fail. If we try to switch to the Linux platform, we may succeed, and judging from the initial test results, the probability of success is higher than failure.
And even if the project still fails under the Linux platform, we can quickly switch back to the AIX server. With these careful analyzes and measures, the registrar was finally able to feel comfortable letting us try Linux.
4. Communicate concisely and conciselyIn the process of switching project platforms, we made an overall plan. We wrote down exactly what we planned to do and why on a blank piece of paper. The key to the success of this approach is the simplicity of the plan. Leaders don’t like to read technical ideas like a novel; they don’t want to get bogged down in technical details. So we intentionally plan at the execution level and describe at the framework level.
When we switch platforms, we will regularly inform investors of the current progress. Once the new system is successfully completed, we will submit daily updates reporting on how many students have successfully registered through the system and the issues encountered.
Although nearly 20 years have passed since this project, the lessons learned are still relevant today. Although Linux plays a critical role in this, the most important thing is that we have successfully channeled everyone's goals to solve common problems. I think this experience can be applied to many things you face.
The above is the detailed content of The Linux Project: Breathing New Life into the Project. For more information, please follow other related articles on the PHP Chinese website!