University of Minnesota
Software Engineering Center

You are here

Software Inspections

Date of Event: 
Thursday, September 5, 1996 - 6:30pm to 8:00pm

We had 17 people attend last month's gathering. After attending to TwinSPIN
business and announcements in the front end, Steve Kan wrapped up his discussion
on Inspections a la IBM at Rochester by talking from 6:30 until 8pm. There
were many questions and much involvement. It was a good meeting, in spite
of the relatively small attendance. Following is a detailed description
of the meeting provided by Gail Bertossi.
Business Meeting Sponsors:

The cost of using Dunwoody is $50 per meeting. If your company is willing
to subsidize a meeting, contact me, Jesse Freese (882-0800).
Special thanks to all the sponsoring organizations and their representatives.
Are you willing to give a presentation at a SPIN meeting?
Some people volunteered at the meeting. All volunteers should contact Jesse
Freese to book a date and a topic.
Steve Gitelis volunteered to lead a discussion with a talk on test automation
- November?
Burke Maxey volunteered to lead a discussion on "SPI-ROI", that is, measuring
the cost of "quality" - December?
Mel Brauns and Tim Olson agreed to work together to contact SEI for a speaker
to kickoff the Twin SPIN this year.
In addition to the topics listed on the agenda, folks wanted to add
"Culture change" as a potential topic/theme for the future.
Feature Presentation:

Steve Kan of IBM, Rochester, brought copies of his presentation 'Software
Inspection: The IBM Rochester Approach' given at the 1996 SEPG National
Meeting. Last May, he gave most of the presentation at the Twin SPIN meeting.
Tonight he finished the presentation and also provided some supplemental
material on using test data to indicate readiness for shipping the product.
This provides a nice lead-in to the tentative topic for the next meeting
of testing.

Steve indicated that projects sought in-process measurements to answer
the following types of questions: How is the product development proceeding?
Is quality improving? Are the defect removal activities on track? How good
is this release compared to previous releases?
One approach is to analyze the defects from a phase (e.g., code) to
determine where they were introduced (code, design, or requirements phase)
as an indication of the readiness of the product. If defects are caught
in the phase where the work was done (code defects in the code phase),
the product is progressing well. If the defects are from an earlier phase,
then the product is probably not ready and significant rework may be required.
A better approach is to use an effort-outcome matrix to answer the questions.
It is misleading to use defect counts as the single indicator of quality.
There is a need to measure the effort to detect the defects, also. The
defect rate can be artificially decreased by a cessation of testing. Effort
and defect data are needed together to verify that a lower defect discovery
rate relates to a sufficient amount of testing.
Data from the previous release provide a baseline for comparison with
the current release.

Steve showed a table that indicated how effort and defect counts were
related for inspections. (IBM's experience was that the quality of the
inspection has a high correlation with the effort expended in preparation.)
Effort is HI Defects are LO => Good inspection/good product

Effort is HI Defects are HI => Good inspection/not bad product

Effort is LO Defects are HI => Poor inspection/bad product

Effort is LO Defects are LO => Poor inspection/unsure product
When the effort was higher than the baselined effort, the results of the
inspection were accurate and indicative of the quality of the product.
Inspectors were encouraged to keep up the good work. If defect counts were
higher than the baseline, then corrective action could be taken to improve
the product's quality.

When the effort was less than the baselined effort, the results of the
inspection were suspect. If defect counts were hi from a low effort inspection,
the product was known to be bad. But if the defect count and the effort
was low, the quality of the product was unknown. Corrective action was
indicated whenever the inspection effort was low.

An alternate method was tried to assess the quality of inspections without
collecting effort or defect data. Each inspection team was asked to complete
a survey listing a series of items and a rating scheme of 1 (low) to 10
(high), indicating how well the inspection covered each item. The expectation
was to relate the inspection survey results to the results of testing the
same modules. The method did not prove to be useful due to a lack of data
points (1 per inspection) and an initial tendency for inspectors to rate
all items as average.

IBM recorded the number of inspectors at each inspection in addition
to collecting the effort for preparation, meeting, and rework. The data
indicated appropriate team sizes for inspections of various work products:
high level design, 8-10; low level design, 6-8; and code, 3-4.
Using data from inspections and testing and extrapolating the data for
problem reports from the field, IBM Rochester arrived at an approximate
ratio of 1:13:92 for finding and fixing problems found during inspections:testing:field
usage. This is similar to the ration reported in the 80's from IBM Santa
Teresaa of 1:20:82.

Steve summarized the main points of his presentation as follows:
Software reviews & inspections are effective and efficient ways to
improve quality
Reviews & inspections vs testing
reviews/inspections are deductive; testing is inductive.
reviews/inspections find design problems
reviews/inspections effectively debug error recovery code
reviews/inspections complement testing
Design reviews/inspections are very important regardless of what process
model or technology is used
Good preparation is essential for an effective, efficient inspection.

Supplemental Material
For each release, IBM conducts a kickoff meeting with all affected groups
represented to define the process, release contents, and quality goals.
The process requires inspection of the High Level Design (HLD) with accurate
record keeping. Checklists are provided for good design consideration and
for good inspection practices. Emphasis is placed on I0 (HLD) inspections
and less emphasis is placed on I2 (code) inspections due to their experience
and greater acceptance by the practitioners of I0 inspections over I2 inspections.
The quality strategy is to capture data at completion of a release for
analysis, sharing of lessons learned (good & bad), and establishing
a metrics baseline for the next release. The goal is to learn from past
experiences and improve the process and quality of the product.

Q. How to track the in-process product quality during testing? A. Steve
identified several pre-GA (general availability) and post-GA indicators
used at IBM:
Pre-GA Indicators
defect arrival rate and pattern
testing progress (hours per CPU)
defect severity distribution
critical problem list
defect backlog
system crash history
calls to the support team on crashes and hangs
stability during system test
GA driver quality
early programs
use of product in-house
number of outstanding defects
reliability growth model to estimate field defects
Post-GA Indicators
defect arrival rate and pattern
defect severity distribution
defect backlog
error prone component analysis
critical situations
customer satisfaction
support line analysis
The effort/outcome approach used for inspections was also useful for testing.
Testing effort HI Defects LO => good test/good product

Testing effort HI Defects HI => good test/not bad product

Testing effort LO Defects HI => poor test/bad product

Testing effort LO Defects LO => poor test/unsure product
The goal is to reach HI test effort and LO defect count before development
test is complete. All other outcomes indicate that corrective action is
needed. When the testing effort is HI, the results of the testing are accurate
and can be used to make decisions. When the effort is LO, the data is less
accurate and remedial action is taken.

Additional test metrics include test coverage and test environments
(having simple & complex customer-like environments). Using a baseline
from the previous release, the analyst can chart the data showing % tests
attempted and % tests successful over time. Steve showed a second chart
with problem report arrivals over time; this was the key indicator for
the reliability growth curve. Time was measured in weeks before GA. The
expectation was that the % of successful tests would increase as the number
of problem reports would decrease to an acceptable level a few weeks before

Integration can be a troublesome area if the development is rewarded
for integrating on time without accompanying quality data. If schedule
is the only criteria, problems are transferred from development to system
test on schedule. Development organizations should have to verify the product
is problem-free before integration. To address such an issue, IBM has 2
integration managers, uses mini-builds to test the submitted product before
performing the system build, and does system builds on a weekly basis.
This provides the affected parties with an early indication of the quality
of the next system to be built. Projects focus on I0 inspections of high
level design and are required to conduct project-defined verification before
integration. These activities are improving the quality of the product
submitted for system test.

Steve showed 2 charts, used together to indicate the quality of the
test effort & the product before GA. One chart showed total crashes
per system per week; the goal is to reach zero a couple weeks before GA.
The other chart showed CPU hours per system per day (test effort). As the
number of system crashes approaches zero, the manager uses the effort chart
to verify the number of testing hours is increasing. Ideally, the number
of hours is maximized when the number of crashes reaches zero. With both
charts, the manager can make an informed decision about the quality of
testing and the quality of the product as GA approaches.