By Gregory Solovey & Phil Gillis
Continuous Integration (CI) is a software development practice where developers integrate their work frequently, usually daily, directly on the project’s main stream. CI consists of two activities: build and test. In this article we describe the test aspects of CI. Continuous Integration Test Framework (CITF) is the Alcatel-Lucent implementation for testing 4G wireless products, developed over several years. We are not offering or promoting this particular system, but rather describe solutions that can be used to build or select similar CITFs.
Before there was CI, there was test automation. Numerous groups, working on the same large project in several locations, developed their own automation methodology and tools. Unit testing was done by developers by adding test code; integration and system testers used a variety of CLI, GUI, SNMP, and load and stress test tools.
After the adoption of the CI paradigm, it soon became apparent that reusing the various independent test solutions at the mainstream level posed a challenge. The teams responsible for the build verification had to know the specifics of all tools, be able to debug and interpret their results promptly, and manually build all the permutations of test environments.
We looked at several existing CI test systems, but none seemed suitable for our situation of testing many configurations of embedded systems. We decided to build our own framework using MySQL and a web interface, with the following goals:
Manage and allocate pools of resources, grouped by configuration. Have the ability to dynamically assemble multiple test environments for each build, release, and application.
- Create testware standards for test tools, or wrappers for all test tools, to make them look alike (use the same configuration data and convert their results to the standard hierarchical CITF format). Provide common interfaces for debugging and reporting the build status.
- Design a quick and easy way to define new software releases and projects, and integrate new test tools, testware, resource pools, and test environments.
- Intelligently select the appropriate test suites (sanity, regression, feature) whenever a build is completed to validate the integrity of the mainstream, existing, and new functionality.
1. Resource Management
Embedded systems require comprehensive test environments that include variety of test tools, network elements, load simulators, and a large selection of end devices. Each build has to be tested on multiple test environments and resource permutations. It is impossible to have a dedicated test environment for each release/project/feature. On the contrary, some teams do not want to share their resources, some teams do not have them at all, some teams care about particular resource details, and some do not know these details.
To satisfy these contradictory conditions the following approach was created:
A resource has a name, a pool it can belong to, selection attributes, and ownership attributes. The selection attributes enable resources to be identified in order be selected at the execution stage. The ownership attributes, if defined, assign the resource to a user or group of users.
A test environment is presented as a set of resource placeholders that should be filled with resources when the test task is issued. Each placeholder is defined as a set of search attributes.
A test task description refers to one test environment and describes the ways to select resource(s) for environment placeholders. There are three ways of finding a resource for a placeholder: by resource name, from a pool, or from the resources that satisfy the search attributes. However, for each resource placeholder only one way should be used.
Every time a build is done, a new instance of a test task (testTaskExe) is issued. Based on its test task description, all placeholders need to be filled with real resources. The CITF performs a two stage selection procedure:
Stage 1: Select available resource candidates based on a selection method (name, pool, attributes). Resource is considered available if it is not under test and if it is in an operational state.
Stage 2: From these candidates, select the permitted resources for the user who issued the request to test. The resource is permitted to be used if one of three conditions is true:
1. The resource user and group ownerships are not defined
2. If only group ownership is defined, the user should belong to this group
3. If user ownership is defined, it should match the user ID of the person who issued the test
A test task is started upon acquiring all the required resources. The states of its resources will be changed to “running”. The test task will start executing component by component, and each component knows how to build an environment from the selected resources. Upon completing the execution of a task, the resources are returned to the available state. If the returned resource is “unhealthy”, recovery mechanisms restore it to its initial state, making it ready for subsequent test runs.
2. Testware Management
Continuous integration deals with code before the production stage. This means the testware needs to be adaptive to frequent changes of many kinds, such as API and command syntax and semantic changes, and changes in requirements. When these changes have occurred, hardcoded syntax in testware will be hard to find and correct. The only way to achieve testware maintenance is by providing a strict relationship between architecture, requirements, and design documents, and by separating business functionality from implementation details.
2.1 Testware Hierarchy
A hierarchy of testware should reflect the architecture, requirements, and design documents that describe the object-to-test from the structural and behavioral views down to the implementation details. The latter is described below (top-down):
A test (sanity, regression, functional, performance, etc) is a collection of test sets.
A test set (TS) reflects the structural view of the object-to-test, as described by the system architecture. Examples include: a set of hardware components, a set of services, or a set of network configurations. A TS is a grouping of use cases.
A use case (UC) represents a subsystem from a behavioral (functional) point of view and is related to a specific requirement (scenario, algorithm, or feature). A UC consists of test cases.
A test case (TC) is a single verification act. A TC moves the object-to-test from an initial to a verification state, compares the real and expected results, and returns it back to its initial state. In most cases, returning the system to its initial state makes the TCs independent of each other. A TC is a series of test actions to move the object-to-test through the above phases (set, execution, verification, reset).
A test action (TA) is a single act of interaction between a test tool and the object-to-test. It supports the object-to-test’s interfaces (CLI, GUI, SNTP, HTTP). For example, a test action can be the execution of a single CLI command, a single interaction with the GUI, or sending a single http request transaction to the client.
The hierarchical testware presentation is materialized in unconditional execution of the test cases: all test sets are executed sequentially; all use cases inside each test set are executed sequentially; all test cases related to each use case are executed sequentially.
2.2 Testware External Presentation
The testware has to be updated as a result of changing the business rules, the API syntax, or the GUI appearance. To minimize the number of changes, the testware is organized externally into configuration files, test set files, test scripts, and test case libraries. Such presentation separates the implementation details from the business functions, making the testware independent of the environment. The test objects are reusable across releases, projects, and test types.
The relationships between the internal and external presentations are described below:
A configuration file contains a list of TSs that have to be executed. Each TS in a configuration file points to a test set.
A test set is a file containing a collection of test script names.
A test script contains one or more UC descriptions; each of them combines the TC calls. The actual TC descriptions are stored in a test library.
A test library contains the description of TCs as a sequence of TAs. In this manner, various UCs can reuse the same TCs from the test libraries.
A test action is a single act of communication with the interface of the object-to-test. It is presented as “action words”: set, send, push, capture, compare, repeat, etc. The TA parameters are CLI commands or GUI object methods and implemented by the language of a specific test tool.
The configuration, test set, test script, and library are files that present testware. The narrow specialization of each file type serves a maintenance purpose: a single change in the code of the object-to-test (on the structural, functional, or syntax level) should lead to a single change in testware.
3. Test Tool Management: Conform or Wrap
Embedded systems applications demand a variety of test tools and various independent test solutions. It is unreasonable to expect that the teams responsible for build verification know the specifics of all tools and are able to debug and interpret their results promptly.
The challenge is to make all test tools (CLI, GUI, load and performance) look alike, and appear transparent to the tester. The solution is to require the testware framework to be followed or to create a wrapper for each test tool that will serve as an interface between the test framework and test tools.
Before starting its tests, each wrapper dynamically creates its configuration files, which include references to selected tests, resources to be used, and the location of the results. The configuration files are built from templates, based on the release, project, and resource parameters. A debug file that is created during a test execution needs to follow a standard organization. Upon completion of a test, the results produced by different test tools are converted into a standard hierarchical format and uploaded to a results repository together with logs and traces.
The debug file should have a common look for all test tools, with an emphasis on what are the stimuli and responses, and how the comparison was made in order to let the tester write CRs that communicate the problem precisely to the developer.
To find a single result from the tens of thousands of test runs per day with a few mouse clicks, the results repository area needs to mirror (Figure 4):
- The build infrastructure, as a set of various release/project/ feature/developer streams
- The build test infrastructure as a set of test environments and applications to test
- The test structure as a tree of test sets, use cases, and test cases
Failed test cases can be filtered for known issues in order to avoid reanalyzing expected failures. The standard results presentation format supports five levels of test hierarchy, which are presented on the web.
4. Test Process
New releases/projects/features, test environments, and testware have to be created daily. This requires interfaces that work with test related objects: to create, to edit, to delete, to run, to monitor, and to report.
The management web interface has to provide a quick and easy way to define new software releases and projects, and integrate new test tools, testware, resource pools, and test environments directly into the database. The build server requests a test for a new build through an execution interface and is notified of test results and metrics.
The request to run a test is called a verification request (Figure 6). A verification request specifies a set of test tasks that are independent and can execute in parallel. Each test task (task for short) specifies a set of components, which are sets of tests to run sequentially, and is associated with a test environment, which specifies a set of resources that must be acquired before execution.
Each component calls its test tool to start a test. A test task monitors all components and, based on the database settings, can terminate it if the execution goes on too long, repeat its execution, or skip the execution of the subsequent components.
A considerable number of failed test cases are not real code errors, but are environment issues, such as random network glitches, testware mis-synchronization, or resource failures. The following built-in testing reliability features help test teams handle such errors:
- Starting test cases from an initial state and, in case of failure, a resource is returned to its initial state by the test case’s recovery sequence.
- Using filters to mask expected test failures in case of known problems.
Automatically rerunning some software components in case of intermittent failure during test execution.
- Recovering a hardware resource from a failure state when the resource is returned to the pool after a failed test execution.
- Manually updating test results after a tester reruns failed test cases and reporting the build status promptly.
- Maintaining redundancy of database and web servers, along with frequent backups for quick restoration of functionality in the event of a system outage or database corruption.
Metrics capture a snapshot of the quality of each build. The primary metrics are test object pass/fail result counts and test execution times. These are calculated at the test tool layer.
The second category of metrics is failure reasons, which are used to identify bottlenecks in the CI process. Sometimes failures unrelated to the code can occur, such as network glitches, database errors, testware mis-synchronization, etc. These data, collected and analyzed over time, can identify areas of CITF or the environment that should be improved.
The third category of metrics is coverage: feature, requirements, and code coverage. The percentage of code that was touched by tests does not prove that a test is complete (i.e. covers all possible errors) but rather reveals areas of the code that were not tested at all. The code coverage metrics can be useful in CI if the build consists of many independent layers and modules. An object’s quality can be defined as the quality of the weakest link. Therefore, it is good practice to request the same percentage of code coverage for all system components. This is especially important for new code deliveries, since it is the primary proof (along with requirements traceability) that the necessary automated tests were added along with the new code.
The fourth category of metrics describes the quality of each build: code review data, code complexity, warnings, and memory leaks.
CI puts heavy demands on testing systems. We found that no commercial solution offered the functionality we needed, which is why we developed our own CI test framework. The development was driven by the demands of the test teams responsible for build validation, whose major constraint was the ability to determine the cause of failure in a short interval. As a result, we deployed five releases of CITF during the five years we have been in operation. We currently support four major releases, approximately thirty projects for each release, ten different test tools (commercial and in-house), and twenty embedded systems applications. We support a pool consisting of hundreds of geographically distributed physical resources. The system verifies tens of builds daily, by running hundreds of thousands of test cases.