A nameless soldier...: February 2007

这里说的软件整合测试阶段指的是开发组发布了第一个可以测试的版本之后的版本管理，有几点需要澄清：1）不是说软件测试是从第一个版本发布之后才开始，软件测试是伴随着整个软件开发的生命周期，当然也不一定是指对代码的测试，也可以包括对文档的review。 2）为什么这里说软件整合测试？因为在”软件测试的艺术“这本书中，系统测试指的是非功能性的测试，所以我用整合测试来表示对系统功能的测试（不是单元测试）。
想起前段时间组织的对安徽通彩网项目的测试。很混乱，测试团队和开发团队缺乏有效的沟通，很大原因上也是因为版本管理的混乱，当大家在说到一个bug的时候，这个bug并没有定义在一个有效的软件版本下，我们总是说最新版本，但是很显然很多bug并不是在最新的版本中。如果我们总是说最新版本，这其实就表示了我们根本没有版本管理。此外，虽然在测试团队里面采用了跌代的测试方式，但是开发团队里面却并没有一个明确的开发过程定义。实际上，测试团队采用什么样的测试过程应当取决于开发团队的开发过程。如果开发团队也是一个跌代的开发过程，这表示每个跌代开发团队都会发布一个可测试的软件版本，很自然的，测试团队应该也采用跌代的测试方式。在安徽通彩网的测试中，我定义了每三天为一个测试迭代，在这个迭代中所有的测试用例需要被重新执行，测试环境需要被重新构建，而且软件版本实际上是由测试团队来定义的。。。我现在能说的就是，好歹测试团队遵循了一个测试过程，有这个过程比没有这个过程要好。
昨天看了一份文档‘Revision Control with Subversion’，其中关于branch的那一段里面定义了一个软件发布阶段的版本管理过程，在这个过程中branch扮演重要角色（在这份文档中定义了几种branch pattern，其中包括release branch, feature branch），这种branch被定义为release branch，现在直接摘录。
Here's where version control can help. The typical procedure looks like this:
• Developers commit all new work to the trunk. Day-to-day changes are committed to /trunk: new features, bugfixes, and so on.
• The trunk is copied to a “release” branch. When the team thinks the software is ready for release (say, a 1.0 release), then /trunk might be copied to /branches/1.0.
• Teams continue to work in parallel. One team begins rigorous testing of the release branch, while another team continues new work (say, for version 2.0) on /trunk. If bugs are discovered in either location, fixes are ported back and forth as necessary(不管是在trunk或者branch中发现了bug，这个bug的解决都应该被merge到另外一方). At some point, however, even that process stops. The branch is “frozen” for final testing right before a release.
• The branch is tagged and released. When testing is complete, /branches/1.0 is copied to /tags/1.0.0 as a reference snapshot. The tag is packaged and released to customers(版本的发布应该由开发团队和测试团队共同定义，比如测试团队声明所有测试都通过了，那么开发组就可以发布一个新版本).
• The branch is maintained over time. While work continues on /trunk for version 2.0, bugfixes continue to be ported from /trunk to /branches/1.0. When enough bugfixes have accumulated, management may decide to do a 1.0.1 release: /branches/1.0 is copied to /tags/1.0.1, and the tag is packaged and released(就算软件已经正式发布，仍然有可能发现bug，这个bug被解决以后会发布一个bugfix的版本).

现在来说，我认为性能测试是个专业性很强的话题，不是随便一个人拿个测试工具就可以做的了的，因为这样测试出来的结果很可能和真实的结果相差很远，最终就是误导开发人员，导致开发人员在错误的方向上耗费大量的精力。就我来说，我觉得现在关于性能测试需要搞清楚两点：1）什么是性能测试？有哪些方式？ 2）性能测试应该测试哪些指标，如何分析测试结果。
前些天找到了一个专门讨论软件测试的论坛www.sqaforum.com,里面有一个关于性能测试的目录，看来有很多人和我一样对于性能测试的很多东西都不清楚，尤其是纠缠于performance test, load test, stress test之间的关系，我转贴了一段得到认同的回帖：
BASIC DEFINITIONS

This is an excerpt from my forthcoming book on performance and load testing.

While there is no universal consistency in how people use terms like performance test and robustness test, I can say that the definitions provided here are as much in the mainstream as any others.

The Definition of Performance Testing

The purpose of performance testing is to measure a system’s performance under load. As Humpty Dumpty said, a word can mean whatever one chooses it to mean, so it is worth our time to examine what we mean by the words “measure”, “performance” and “load”.

Performance testing is a measurement of performance characteristics, although sometimes the use of the word “testing” confuses people. Some performance professionals feel strongly that it is important to not use the term “performance testing”, but to call it performance measurement instead. They are concerned that this measurement will get confused with feature testing and debugging, which it is not. They point out that measurement is only testing if the collected measurements are checked against pre-established goals for performance, and that measurement is often done without preconceptions of required performance.

These people have a good point: clarity of terminology is important. But since most people use the term “performance testing” we will go with the majority and use it too.

The term performance can mean response time, throughput, availability, error rate, resource utilization, or another system characteristic (or group of them), which we are interested in measuring. “All promise outruns performance.” Ralph Waldo Emerson

Performance testing simulates the typical user experience under normal working conditions. The load is a typical, representative mix of demands on the system. (And, of course, there can be several different representative loads -- the work load at 2 p.m., at 2 a.m., etc.) Another name sometimes used for a performance test is a capacity test, though there is a minor difference in these terms as we will see later.

First, the performance testers need to define what the term performance means in a specific test situation -- that is, what the objectives are and what we need to measure in the test. The answer to this question is that we measure performance usually as a weighted mix of three characteristics of a system: throughput, response time and availability. In real-time systems, for example, the users need a guarantee that a task will always be completed within a fixed time limit. Performing a task correctly but a millisecond too late could literally be fatal.

The term load simply means the mix of demands placed on a system while we measure its performance and robustness characteristics. In practice, most loads vary continually, so later we will address the challenge of determining the most appropriate load(s) for testing. The terms work load and benchmark are sometimes used as synonyms for load. A benchmark usually means a standard load, one used to compare the performance of systems, system versions, or hardware environments, but the benchmark is not necessarily the actual mix of demands at any one user installation. The term work load is a synonym for a load, and you see both of the terms in this book: they are interchangeable.

Definition of Load Testing

In contrast to a performance test, a load test is a measurement of performance under heavy load: the peak or worst-case conditions. Because loads can have various sizes, more precise terms for this type of testing are peak-load testing or worst-case-load testing.

A performance test usually is done with a typical, representative load, but this measurement may not tell us much about the system’s behavior under heavy load. For example, let’s assume that the peak load on a system is only 15% more than the average load. The system performance may degrade gracefully – the system runs 15% slower at peak load. Often, though, the performance under load is non-linear: as the load increases by a moderate amount (in this case, 15%), the response time does not increase by a comparable percentage but instead becomes infinite because the system fails under the increased load.

Definition of Stress Testing

A stress test is one which deliberately stresses a system by pushing it beyond its specified limits. The idea is to impose an unreasonable load on the system, an overload, without providing the resources which the system needs to process that load.

In a stress test, one or more of the system resources, such as the processor, memory, or database I/O access channel, often “maxes out” and reaches saturation. (Practically, saturation can happen at less than 100% of the theoretical usable amount of the resource, for many reasons.)

This means that the testware (the test environment, test tools, etc.) must be sufficiently robust to support the stress test. We do not want the testware to fail before we have been able to adequately stress the system.

Many bugs found in stress testing are feature bugs which we cannot see with normal loads but are triggered under stress. This can lead to confusion about the difference between a feature bug and a stress bug. We will address this issue in the upcoming section entitled: “Testing Performance and Robustness versus Features”.

Some testers prize stress testing because it is so fruitful in finding bugs. Others think it is dangerous because it misdirects projects to fix irrelevant bugs. Stress testing often finds many bugs, and fixing these bugs leads to significant delays in the system delivery, which in turn leads to resistance to fixing the bugs. If we find a bug with a test case or in a test environment which we can’t connect to actual use, people are likely to dismiss it with comments like: "The users couldn’t do that.", “.. wouldn’t do that” or “... shouldn’t do that.”

Stress, Robustness and Reliability

Although stress, robustness and reliability are similar, the differences among them mean that we test them in related but different ways.

We stress a system when we place a load on it which exceeds its planned capacity. This overload may cause the system to fail, and it is the focus of stress testing.

Systems can fail in many ways, not just from overloading. We define the robustness of a system by its ability to recover from problems; its survivability. Robustness testing tries to make a system fail, so we can observe what happens and whether it recovers. Robustness testing includes stress testing but is broader, since there are many ways in which a system can fail as well as from overloading.

Reliability is most commonly defined as the mean time between failure (MTBF) of a system in operation, and as such it is closely related to availability. Reliability testing measures MTBF in test mode and predicts what the system reliability will be in live operation.

Robustness and reliability testing are discussed in the companion volume to this book, entitled “System Robustness Testing”.

Ross

---------------

这里基本上是说load test, stress test都是performance test的特例，更多的信息可以参考http://www.qaforums.com/Forum2/HTML/000724.html, 还有一个很有个实际指导意义的资源：http://www.codeplex.com/PerfTesting

A nameless soldier...

Wednesday, February 28, 2007

软件整合测试阶段的版本管理。

Sunday, February 25, 2007

关于性能测试

About Me