Sunday, February 25, 2007

关于性能测试

现在来说, 我认为性能测试是个专业性很强的话题, 不是随便一个人拿个测试工具就可以做的了的, 因为这样测试出来的结果很可能和真实的结果相差很远,最终就是误导开发人员,导致开发人员在错误的方向上耗费大量的精力。 就我来说,我觉得现在关于性能测试需要搞清楚两点:1)什么是性能测试? 有哪些方式? 2)性能测试应该测试哪些指标,如何分析测试结果。
前些天找到了一个专门讨论软件测试的论坛www.sqaforum.com,里面有一个关于性能测试的目录,看来有很多人和我一样对于性能测试的很多东西都不清楚, 尤其是纠缠于performance test, load test, stress test之间的关系,我转贴了一段得到认同的回帖:
BASIC DEFINITIONS

This is an excerpt from my forthcoming book on performance and load testing.

While there is no universal consistency in how people use terms like performance test and robustness test, I can say that the definitions provided here are as much in the mainstream as any others.

The Definition of Performance Testing

The purpose of performance testing is to measure a system’s performance under load. As Humpty Dumpty said, a word can mean whatever one chooses it to mean, so it is worth our time to examine what we mean by the words “measure”, “performance” and “load”.

Performance testing is a measurement of performance characteristics, although sometimes the use of the word “testing” confuses people. Some performance professionals feel strongly that it is important to not use the term “performance testing”, but to call it performance measurement instead. They are concerned that this measurement will get confused with feature testing and debugging, which it is not. They point out that measurement is only testing if the collected measurements are checked against pre-established goals for performance, and that measurement is often done without preconceptions of required performance.

These people have a good point: clarity of terminology is important. But since most people use the term “performance testing” we will go with the majority and use it too.

The term performance can mean response time, throughput, availability, error rate, resource utilization, or another system characteristic (or group of them), which we are interested in measuring. “All promise outruns performance.” Ralph Waldo Emerson

Performance testing simulates the typical user experience under normal working conditions. The load is a typical, representative mix of demands on the system. (And, of course, there can be several different representative loads -- the work load at 2 p.m., at 2 a.m., etc.) Another name sometimes used for a performance test is a capacity test, though there is a minor difference in these terms as we will see later.

First, the performance testers need to define what the term performance means in a specific test situation -- that is, what the objectives are and what we need to measure in the test. The answer to this question is that we measure performance usually as a weighted mix of three characteristics of a system: throughput, response time and availability. In real-time systems, for example, the users need a guarantee that a task will always be completed within a fixed time limit. Performing a task correctly but a millisecond too late could literally be fatal.

The term load simply means the mix of demands placed on a system while we measure its performance and robustness characteristics. In practice, most loads vary continually, so later we will address the challenge of determining the most appropriate load(s) for testing. The terms work load and benchmark are sometimes used as synonyms for load. A benchmark usually means a standard load, one used to compare the performance of systems, system versions, or hardware environments, but the benchmark is not necessarily the actual mix of demands at any one user installation. The term work load is a synonym for a load, and you see both of the terms in this book: they are interchangeable.

Definition of Load Testing

In contrast to a performance test, a load test is a measurement of performance under heavy load: the peak or worst-case conditions. Because loads can have various sizes, more precise terms for this type of testing are peak-load testing or worst-case-load testing.

A performance test usually is done with a typical, representative load, but this measurement may not tell us much about the system’s behavior under heavy load. For example, let’s assume that the peak load on a system is only 15% more than the average load. The system performance may degrade gracefully – the system runs 15% slower at peak load. Often, though, the performance under load is non-linear: as the load increases by a moderate amount (in this case, 15%), the response time does not increase by a comparable percentage but instead becomes infinite because the system fails under the increased load.

Definition of Stress Testing

A stress test is one which deliberately stresses a system by pushing it beyond its specified limits. The idea is to impose an unreasonable load on the system, an overload, without providing the resources which the system needs to process that load.

In a stress test, one or more of the system resources, such as the processor, memory, or database I/O access channel, often “maxes out” and reaches saturation. (Practically, saturation can happen at less than 100% of the theoretical usable amount of the resource, for many reasons.)

This means that the testware (the test environment, test tools, etc.) must be sufficiently robust to support the stress test. We do not want the testware to fail before we have been able to adequately stress the system.

Many bugs found in stress testing are feature bugs which we cannot see with normal loads but are triggered under stress. This can lead to confusion about the difference between a feature bug and a stress bug. We will address this issue in the upcoming section entitled: “Testing Performance and Robustness versus Features”.

Some testers prize stress testing because it is so fruitful in finding bugs. Others think it is dangerous because it misdirects projects to fix irrelevant bugs. Stress testing often finds many bugs, and fixing these bugs leads to significant delays in the system delivery, which in turn leads to resistance to fixing the bugs. If we find a bug with a test case or in a test environment which we can’t connect to actual use, people are likely to dismiss it with comments like: "The users couldn’t do that.", “.. wouldn’t do that” or “... shouldn’t do that.”

Stress, Robustness and Reliability

Although stress, robustness and reliability are similar, the differences among them mean that we test them in related but different ways.

We stress a system when we place a load on it which exceeds its planned capacity. This overload may cause the system to fail, and it is the focus of stress testing.

Systems can fail in many ways, not just from overloading. We define the robustness of a system by its ability to recover from problems; its survivability. Robustness testing tries to make a system fail, so we can observe what happens and whether it recovers. Robustness testing includes stress testing but is broader, since there are many ways in which a system can fail as well as from overloading.

Reliability is most commonly defined as the mean time between failure (MTBF) of a system in operation, and as such it is closely related to availability. Reliability testing measures MTBF in test mode and predicts what the system reliability will be in live operation.

Robustness and reliability testing are discussed in the companion volume to this book, entitled “System Robustness Testing”.

Ross

---------------

这里基本上是说load test, stress test都是performance test的特例,更多的信息可以参考http://www.qaforums.com/Forum2/HTML/000724.html, 还有一个很有个实际指导意义的资源:http://www.codeplex.com/PerfTesting

No comments: