Test practice and experience of distributed storage products (1)

Source:   Editor: admin Update Time :2019-04-28

Background introduction 
Over the past few years, I have been engaged in the test and development of distributed storage products, along with its first launching and upgrading. Until today, I participate in the release of numerous versions, meeting users’ demand for our storage products. Therefore, I want to write an article in order to summarize my working experience. It would be a great honor for me if someone could get some inspiration or harvest from it. The following several aspects will be elaborated.
1. Change of test roles in the development of distributed storage products
2. Test practice of distributed storage products
3. Problems in test practice
4. Experience


Change of test roles
In general, there are several roles in a project, including product manager, development engineer, test engineer, operation engineer and project manager.
Project development process
First, product managers collect users’ requirements, analyze business scenarios, and feed them back to development and test engineers. Second, development and test engineers discuss these requirements, and define on-line functions and acceptance criteria. Third, project managers formulate the project plan and track the progress. Fourth, development engineers develop the code and give it to the test engineers. Fifth, after completing the test, test engineers send it to the operation engineer. Finally, operation engineer release products online.

The different division of test roles in different product periods
The entire development of distributed storage products can be divided into two phases.
Phase I: Rapid iteration and release of initial products
One cluster is needed, whose members are made up of 2 development engineers, 1 test engineer, 1 operation engineer, 1 product manager and 1 project manager.
They publish a new product version every day for a week.
This phase is characterized by fewer clusters, fewer users, fewer visits and fewer functions
The cluster focuses on rapid development and iteration, mainly to meet functional requirements, and to be allowed continuous trials and errors.
At this phase, it is based on rapid release.
Because developing function is relatively single, 1 test engineer can basically meet the business needs.
Problems:
Due to time constraints, the test can only be a trade-off. There is no standardized process, resulting in more problems to be repaired. In addition, Adding new requirements means introducing more frequent upgrades and tests. So the whole phase tends to fall into a vicious circle.

Test roles:
At this phase, test engineers are mainly to execute test cases. After completing unit testing, development engineers basically do not undertake testing tasks.
Phase II: Iteration and release of stable product
Dozens of clusters are needed, whose members consist of more than 10 development engineers, 1 test engineer, 3 operation engineers, 2 product managers and 1 project manager.
They publish a new product version every two months.
This phase is characterized by more clusters, more users, more visits and more functions.
The clusters focus on product stability and are not allowed trials and errors.
Problems:
Different from the previous, upgrading becomes very troublesome due to a large number of clusters. Development engineers, test engineers, operation engineers belong to different departments. For development engineers, the faster the demand goes online, the better. It's better to release some code every day. KPI is functional online. For test engineers, the test time is so insufficient that multiple launches bring quality risks. KPI means product quality. For operation engineers, the fewer releases, the better. Every release may pose a risk of misoperation. KPI is product stability. Therefore, it’s easy to cause contradictions with inconsistent interests of all parties.

Test roles:
Test roles need to change when one individual can no longer complete so many test tasks.
So the final agreed outcome is to extend release cycle and reduce release frequency.
Improvement of development process: Introductions of design and code review, static code scanning, and test coverage requirements, especially for UT test coverage, row coverage and functional branch coverage.
Improvement of test process: It requires to introduce test standards and to enhance automated test. Test engineers no longer undertake all test tasks, and assign some tasks to other development engineers, only to evaluate test scope, test plans and test cases.
Improvement of operation process: It requires to introduce automatic release process and to strengthen online monitoring.
At this phage, test engineers are mainly to establish a set of testing mechanisms, while development engineers need to undertake test tasks.

The positioning of test roles from the perspective of company
In the process of product development, the company's positioning for test roles is constantly changing.
At first, test engineers and development engineers are on a team. Then, they belong to different teams. And then, they are on the same team again. Finally, there are no full-time test engineers but full stack engineers.
Everyone may have different understandings about full stack engineers. I think full stack engineers have capacities of developing, testing, and operating. This concept has been supported and opposed. It all makes sense, just as we need large and complete department stores, as well as small and beautiful specialty stores. All decisions of the company are premised on supporting business needs. What we need to do is to embrace its decisions, strive to be multi-skilled and adapt to the rapidly changing environment.

Test practice of distributed storage products 
What do test engineers do during testing distributed storage products?
1. Content of work
Review of requirements and designs
Tests need to be involved in every process. The acceptance criteria need to be known at the time of the design review, which is the most important start. Users’ requirements are the benchmark for testing. The acceptance criteria will deviate if the users’ requirements are not understood.
Test range
Since on-line time is fixed and it may not cover all tests in a limited time, the test range must be specified. This depends on test engineers' understanding of whole system and their abilities to communicate with development engineers.

Design and development of test cases
Design and development of test cases means writing code of test tools or test cases based on requirements. Some common methods are also described in some test books. So I don’t talk much about it here.
Design and maintenance of automated test framework
Only automated testing can liberate people from simple, repetitive and tedious work. Continuous integration mechanism is introduced to find problems in code in time.
Determination of test object
This work is to determine the version to be tested, in order to ensure that the version tested finally is the version online.
Implementation and feedback of tests
After completing the test plan, test engineers write test report, and record the problems found in the test on the Bug tracking system. Then, they will collect these results for project managers to do a quality assessment. Although not comprehensive, it is also an important reference.
Statistical analysis of test results
Test engineers need to summary test coverage and track areas that are uncovered.
It should be noted here that although test coverage is enough, it does not mean that the test is completed. Only all the code is covered. Manual analysis of test completeness is also required.
On-line confirmation and writing of release memorandum
Final on-line version and its configuration files need to be confirmed. In addition, test engineers should notify all on-line functions to partners by email.
Track and feedback of on-line problems
Track and feedback of on-line problems are required to avoid the same problems in the next version.
The development and test of distributed storage products is a huge project. The tests involved needs to be categorized and graded. Therefore, test grade is introduced.
2. Test grade
Test grade ;Test resources ;Test purpose ;Test frequency
Level 1: Unit test; Completion with a single machine; Independent on other environments; Complete code function test; Take some Mock measures to remove environmental dependencies; Submit code each time.
Level 2: Functional test; Small clusters; Simulate real scene; Complete function test; Depend on other modules; Submit code each time
Level 3: System test; Small clusters; Simulate real scene; Complete system test; Combination of functions; Depend on other modules; Submit code each time
Level 4: Primary performance test; Medium clusters; Simulate real scene; Complete performance test; Focus on Latency, QPS, burr rate, throughput and other indicators; Depend on other environments; Release each time
Level 5: Secondary performance test; Moderate Clusters; Simulate real scenarios; Complete stress test and failover test; Focus on the system performance when CPU, memory, network and other resources are exhausted or unavailable; Release each time.
Level 6: Data compatibility and upgrade test; Small clusters; Simulate real scene; Complete storage and on-line release related tests; Release each time.
Level 7: End-to-end simulation user scenario test; Large clusters; Simulate user scenario; Get test data; Release each time
The purpose of test grade is mainly to divide the work.
The code cannot be tested, unless unit test and function test are completed by development engineers. Of course, these options can be trade-off when it comes to emergency online.
Different levels require different test times. The times to complete one unit test and one performance test are different. Both level 1 and level 2 must be tested successfully, while the following level can be tested selectively.
Allocation of test resources