Thursday, November 01, 2007

Database Normalization And Design Techniques

One of the most important factors in dynamic web page development is database definition. If your tables are not set up properly, it can cause you a lot of headaches down the road when you have to perform miraculous SQL calls in your PHP code in order to extract the data you want. By understanding data relationships and the normalization of data, you will be better prepared to begin developing your application in PHP. ----
我们可以快速的开发一套系统,但是系统是否具有高可用性,是否支持大量用户的并发访问。前几天就有新闻,说是奥运门票的第二阶段销售开始,结果当天销售门票的网站就崩溃了,因为一下子有太多的用户去访问这个网站。 我想这种情况是可以预期的,谁都会知道抢购奥运门票的情形该有多火爆,所以应该是网站没有做好准备。 这里我主要关注系统的性能问题,即支持大并发量的用户访问,而所有决定系统性能的因素当中,数据库设计应该是最重要的,这是数据架构级别的问题,是决定性的,当然还有其他的因素,比如分布式系统的远程调用, 数据的缓存等等,这些也很重要。
我在网上找了一些资料,整理一下。
Basically, the Rules of Normalization are enforced by eliminating redundancy and inconsistent dependency in your table designs.

  1 引言
   数据库优化的目标无非是避免磁盘I/O瓶颈、减少CPU利用率和减少资源竞争。为了便于读者阅读和理解,笔者参阅了Sybase、Informix和 Oracle等大型数据库系统参考资料,基于多年的工程实践经验,从基本表设计、扩展设计和数据库表对象放置等角度进行讨论,着重讨论了如何避免磁盘 I/O瓶颈和减少资源竞争,相信读者会一目了然。

  2 基于第三范式的基本表设计

  在基于表驱动的信息管理系 统(MIS)中,基本表的设计规范是第三范式(3NF)。第三范式的基本特征是非主键属性只依赖于主键属性。基于第三范式的数据库表设计具有很多优点:一 是消除了冗余数据,节省了磁盘存储空间;二是有良好的数据完整性限制,即基于主外键的参照完整限制和基于主键的实体完整性限制,这使得数据容易维护,也容 易移植和更新;三是数据的可逆性好,在做连接(Join)查询或者合并表时不遗漏、也不重复;四是因消除了冗余数据(冗余列), 在查询(Select)时每个数据页存的数据行就多,这样就有效地减少了逻辑I/O,每个Cash存的页面就多,也减少物理I/O;五是对大多数事务 (Transaction)而言,运行性能好;六是物理设计(Physical Design)的机动性较大,能满足日益增长的用户需求。

  在基本表设计中,表的主键、外键、索引设计占有非常重要的地位,但系统设计人员往往只注重于满足用户要求,而没有从系统优化的高度来认识和重视它们。实际上,它们与系统的运行性能密切相关。现在从系统数据库优化角度讨论这些基本概念及其重要意义:

   (1)主键(Primary Key):主键被用于复杂的SQL语句时,频繁地在数据访问中被用到。一个表只有一个主键。主键应该有固定值(不能为Null或缺省值,要有相对稳定 性),不含代码信息,易访问。把常用(众所周知)的列作为主键才有意义。短主键最佳(小于25bytes),主键的长短影响索引的大小,索引的大小影响索 引页的大小,从而影响磁盘I/O。主键分为自然主键和人为主键。自然主键由实体的属性构成,自然主键可以是复合性的,在形成复合主键时,主键列不能太多, 复合主键使得Join*作复杂化、也增加了外键表的大小。人为主键是,在没有合适的自然属性键、或自然属性复杂或灵敏度高时,人为形成的。人为主键一般是 整型值(满足最小化要求),没有实际意义,也略微增加了表的大小;但减少了把它作为外键的表的大小。

OIDs Should Have No Business Meaning
A very critical issue that needs to be pointed out is that OIDs should have absolutely no business meaning
whatsoever. Nada. Zip. Zilch. Zero. Any column with a business meaning can potentially change, and if
there’s one thing that we learned over the years in the relational world it’s that it’s a fatal mistake to give
your keys meaning. If your users decide to change the business meaning, perhaps they want to add some
digits or make the number alphanumeric, you need to make changes to your database in every single spot
where you use that information. Anything that is used as a primary key in one table is virtually guaranteed
to be used in other tables as a foreign key. What should be a simple change, adding a digit to your
customer number, can be a huge maintenance nightmare. Yuck. In the relational database world, this OID
strategy is referred to as employing surrogate keys.


  (2)外键(Foreign Key):外键的作用是建立关系型数据库中表之间的关系(参照完整性),主键只能从独立的实体迁移到非独立的实体,成为后者的一个属性,被称为外键。

   (3)索引(Index):利用索引优化系统性能是显而易见的,对所有常用于查询中的Where子句的列和所有用于排序的列创建索引,可以避免整表扫描 或访问,在不改变表的物理结构的情况下,直接访问特定的数据列,这样减少数据存取时间;利用索引可以优化或排除耗时的分类*作;把数据分散到不同的页面 上,就分散了插入的数据;主键自动建立了唯一索引,因此唯一索引也能确保数据的唯一性(即实体完整性);索引码越小,定位就越直接;新建的索引效能最好, 因此定期更新索引非常必要。索引也有代价:有空间开销,建立它也要花费时间,在进行Insert、Delete和Update*作时,也有维护代价。索引 有两种:聚族索引和非聚族索引。一个表只能有一个聚族索引,可有多个非聚族索引。使用聚族索引查询数据要比使用非聚族索引快。在建索引前,应利用数据库系 统函数估算索引的大小。

  ① 聚族索引(Clustered Index):聚族索引的数据页按物理有序储存,占用空间小。选择策略是,被用于Where子句的列:包括范围查询、模糊查询或高度重复的列(连续磁盘扫 描);被用于连接Join*作的列;被用于Order by和Group by子句的列。聚族索引不利于插入*作,另外没有必要用主键建聚族索引。

   ② 非聚族索引(Nonclustered Index):与聚族索引相比,占用空间大,而且效率低。选择策略是,被用于Where子句的列:包括范围查询、模糊查询(在没有聚族索引时)、主键或外 键列、点(指针类)或小范围(返回的结果域小于整表数据的20%)查询;被用于连接Join*作的列、主键列(范围查询);被用于Order by和Group by子句的列;需要被覆盖的列。对只读表建多个非聚族索引有利。索引也有其弊端,一是创建索引要耗费时间,二是索引要占有大量磁盘空间,三是增加了维护代 价(在修改带索引的数据列时索引会减缓修改速度)。那么,在哪种情况下不建索引呢?对于小表(数据小于5页)、小到中表(不直接访问单行数据或结果集不用 排序)、单值域(返回值密集)、索引列值太长(大于20bitys)、容易变化的列、高度重复的列、Null值列,对没有被用于Where子语句和 Join查询的列都不能建索引。另外,对主要用于数据录入的,尽可能少建索引。当然,也要防止建立无效索引,当Where语句中多于5个条件时,维护索引 的开销大于索引的效益,这时,建立临时表存储有关数据更有效。

  批量导入数据时的注意事项:在实际应用中,大批量的计算(如电信话 单计费)用C语言程序做,这种基于主外键关系数据计算而得的批量数据(文本文件),可利用系统的自身功能函数(如Sybase的BCP命令)快速批量导 入,在导入数据库表时,可先删除相应库表的索引,这有利于加快导入速度,减少导入时间。在导入后再重建索引以便优化查询。

  (4) 锁:锁是并行处理的重要机制,能保持数据并发的一致性,即按事务进行处理;系统利用锁,保证数据完整性。因此,我们避免不了死锁,但在设计时可以充分考虑 如何避免长事务,减少排它锁时间,减少在事务中与用户的交互,杜绝让用户控制事务的长短;要避免批量数据同时执行,尤其是耗时并用到相同的数据表。锁的征 用:一个表同时只能有一个排它锁,一个用户用时,其它用户在等待。若用户数增加,则Server的性能下降,出现“假死”现象。如何避免死锁呢?从页级锁 到行级锁,减少了锁征用;给小表增加无效记录,从页级锁到行级锁没有影响,若在同一页内竞争有影响,可选择合适的聚族索引把数据分配到不同的页面;创建冗 余表;保持事务简短;同一批处理应该没有网络交互。

  (5)查询优化规则:在访问数据库表的数据(Access Data)时,要尽可能避免排序(Sort)、连接(Join)和相关子查询*作。经验告诉我们,在优化查询时,必须做到:
  ① 尽可能少的行;
  ② 避免排序或为尽可能少的行排序,若要做大量数据排序,最好将相关数据放在临时表中*作;用简单的键(列)排序,如整型或短字符串排序;
  ③ 避免表内的相关子查询;
  ④ 避免在Where子句中使用复杂的表达式或非起始的子字符串、用长字符串连接;
  ⑤ 在Where子句中多使用“与”(And)连接,少使用“或”(Or)连接;
  ⑥ 利用临时数据库。在查询多表、有多个连接、查询复杂、数据要过滤时,可以建临时表(索引)以减少I/O。但缺点是增加了空间开销。
除非每个列都有索引支持,否则在有连接的查询时分别找出两个动态索引,放在工作表中重新排序。

  3 基本表扩展设计
   基于第三范式设计的库表虽然有其优越性(见本文第一部分),然而在实际应用中有时不利于系统运行性能的优化:如需要部分数据时而要扫描整表,许多过程同 时竞争同一数据,反复用相同行计算相同的结果,过程从多表获取数据时引发大量的连接*作,当数据来源于多表时的连接*作;这都消耗了磁盘I/O和CPU时 间。

  尤其在遇到下列情形时,我们要对基本表进行扩展设计:许多过程要频繁访问一个表、子集数据访问、重复计算和冗余数据,有时用户要求一些过程优先或低的响应时间。

  如何避免这些不利因素呢?根据访问的频繁程度对相关表进行分割处理、存储冗余数据、存储衍生列、合并相关表处理,这些都是克服这些不利因素和优化系统运行的有效途径。

  3.1 分割表或储存冗余数据
  分割表分为水平分割表和垂直分割表两种。分割表增加了维护数据完整性的代价。
水 平分割表:一种是当多个过程频繁访问数据表的不同行时,水平分割表,并消除新表中的冗余数据列;若个别过程要访问整个数据,则要用连接*作,这也无妨分割 表;典型案例是电信话单按月分割存放。另一种是当主要过程要重复访问部分行时,最好将被重复访问的这些行单独形成子集表(冗余储存),这在不考虑磁盘空间 开销时显得十分重要;但在分割表以后,增加了维护难度,要用触发器立即更新、或存储过程或应用代码批量更新,这也会增加额外的磁盘I/O开销。

   垂直分割表(不破坏第三范式),一种是当多个过程频繁访问表的不同列时,可将表垂直分成几个表,减少磁盘I/O(每行的数据列少,每页存的数据行就多, 相应占用的页就少),更新时不必考虑锁,没有冗余数据。缺点是要在插入或删除数据时要考虑数据的完整性,用存储过程维护。另一种是当主要过程反复访问部分 列时,最好将这部分被频繁访问的列数据单独存为一个子集表(冗余储存),这在不考虑磁盘空间开销时显得十分重要;但这增加了重叠列的维护难度,要用触发器 立即更新、或存储过程或应用代码批量更新,这也会增加额外的磁盘I/O开销。垂直分割表可以达到最大化利用Cache的目的。

  总之,为主要过程分割表的方法适用于:各个过程需要表的不联结的子集,各个过程需要表的子集,访问频率高的主要过程不需要整表。在主要的、频繁访问的主表需要表的子集而其它主要频繁访问的过程需要整表时则产生冗余子集表。
注意,在分割表以后,要考虑重新建立索引。

  3.2 存储衍生数据
  对一些要做大量重复性计算的过程而言,若重复计算过程得到的结果相同(源列数据稳定,因此计算结果也不变),或计算牵扯多行数据需额外的磁盘I/O开销,或计算复杂需要大量的CPU时间,就考虑存储计算结果(冗余储存)。现予以分类说明:
  若在一行内重复计算,就在表内增加列存储结果。但若参与计算的列被更新时,必须要用触发器更新这个新列。

  若对表按类进行重复计算,就增加新表(一般而言,存放类和结果两列就可以了)存储相关结果。但若参与计算的列被更新时,就必须要用触发器立即更新、或存储过程或应用代码批量更新这个新表。

  若对多行进行重复性计算(如排名次),就在表内增加列存储结果。但若参与计算的列被更新时,必须要用触发器或存储过程更新这个新列。

  总之,存储冗余数据有利于加快访问速度;但违反了第三范式,这会增加维护数据完整性的代价,必须用触发器立即更新、或存储过程或应用代码批量更新,以维护数据的完整性。

  3.3 消除昂贵结合
   对于频繁同时访问多表的一些主要过程,考虑在主表内存储冗余数据,即存储冗余列或衍生列(它不依赖于主键),但破坏了第三范式,也增加了维护难度。在源 表的相关列发生变化时,必须要用触发器或存储过程更新这个冗余列。当主要过程总同时访问两个表时可以合并表,这样可以减少磁盘I/O*作,但破坏了第三范 式,也增加了维护难度。对父子表和1:1关系表合并方法不同:合并父子表后,产生冗余表;合并1:1关系表后,在表内产生冗余数据。

  4 数据库对象的放置策略
  数据库对象的放置策略是均匀地把数据分布在系统的磁盘中,平衡I/O访问,避免I/O瓶颈。

  ⑴ 访问分散到不同的磁盘,即使用户数据尽可能跨越多个设备,多个I/O运转,避免I/O竞争,克服访问瓶颈;分别放置随机访问和连续访问数据。
  ⑵ 分离系统数据库I/O和应用数据库I/O。把系统审计表和临时库表放在不忙的磁盘上。
  ⑶ 把事务日志放在单独的磁盘上,减少磁盘I/O开销,这还有利于在障碍后恢复,提高了系统的安全性。
  ⑷ 把频繁访问的“活性”表放在不同的磁盘上;把频繁用的表、频繁做Join*作的表分别放在单独的磁盘上,甚至把把频繁访问的表的字段放在不同的磁盘上,把访问分散到不同的磁盘上,避免I/O争夺;
   ⑸ 利用段分离频繁访问的表及其索引(非聚族的)、分离文本和图像数据。段的目的是平衡I/O,避免瓶颈,增加吞吐量,实现并行扫描,提高并发度,最大化磁盘 的吞吐量。利用逻辑段功能,分别放置“活性”表及其非聚族索引以平衡I/O。当然最好利用系统的默认段。另外,利用段可以使备份和恢复数据更加灵活,使系 统授权更加灵活。



Resources:

Friday, August 10, 2007

在ubuntu下面配置subversion。

http://wiki.ubuntu.org.cn/SubVersion
这篇文章是ubuntu中文网站上的文档,跟着安装了一遍, 不错, 成功了。

Monday, June 18, 2007

硬盘安装ubuntu7.0.4

因为对6.10的那个ubuntu的启动画面相当不满,所以一直想升级到7.0.4,但是刚刚听别人说7.0.4的启动画面一样的老土,只好先停在这里,再等等吧。。。一个不幸的事情是我的笔记本光驱好像坏了,逼得我现在一门心思的再琢磨怎么从硬盘安装,在网上找了写资料,决定自己先记录下来,虽然现在不升级,正所谓居安思危,有备无患。
-----------------------------
1.从 http://releases.ubuntu.com/feisty/ 下载 ubuntu-7.04-alternate-i386.iso 并放到C:\,并且确认C:为FAT32分区 (这点就要命,我的window分区都是ntfs,好像是因为intz什么的不支持ntfs格式)
2.
下载 http://archive.ubuntu.com/ubuntu/dists/feisty/main/installer-i386/current/images/hd-media/ 里的文件,同样拷贝到C:\
initrd.gz
vmlinuz
3.有的文档说要执行这一步,有的没说,我也写在这里:
下载grub_for_dos-0.4.2,将里面的 grldr提取 复制到 C:\,编辑C:\BOOT.INI,加入一行代码:C:\GRLDR=”GRUB”
4.启动到grub,出现菜单时按下C键,进入grub的命令行模式,输入如下命令,即可启动安装程序:
grub> kernel (hd0,0)/vmlinuz root=/dev/ram ramdisk_size=256000 devfs=mount,dall
grub> initrd (hd0,0)/initrd.gz
grub> boot
天,不知道/dev/ram是什么意思? 为什么不是/dev/hd0呢?
理论上,就应该可以看到安装界面了!

Tuesday, May 29, 2007

Asynchronous calls and remote callbacks using Lingo Spring Remoting

很不错的技术文章,至少可以帮你澄清远程对象引用(传值/传引用),以及一些漂亮的编程技巧。
我觉得最酷的还是lingo居然可以将一个interface中的方法有的暴露成同步有的暴露成异步。 但是在这篇文章的示例代码中,我没有搞清楚他是怎么定义solve是异步,而cancel和registerXX是同步的?

请参考:
http://jroller.com/page/sjivan?entry=asynchronous_calls_and_callbacks_using

As mentioned in my previous blog entry, Lingo is the only Spring Remoting implementation that supports asynchronous calls and remote callbacks. Today I'll cover all the nitty gritty details of the async/callback related functionality along with the limitations and gotchas.

Asynchronous method invocation and callback support by Lingo is an awesome feature and there are several usecases where these are an absolute must. Lets consider a simple and rather common use case : You have a server side application (say an optimizer) for which you want you write a remote client API. The API has methods like solve() which are long running and methods like cancel() which stops the optimizer solve.

A synchronous API under such circumstances is not really suitable since the solve() method could take a really long time to complete. It could be implemented by having the client code spawn their own thread and do its own exception management but this becomes really kludgy. Plus you have to worry out network timeout issues. You might be thinking "I'll just use JMS if I need an asynchronous programming model". You could use JMS but think about the API you're exposing. Its going to be a generic JMS API where the client is registering JMS listeners, and sending messages to JMS destinations using the JMS API. Compare this to a remote API where the client is actually working with the Service interface itself.

Lingo combines the elegance of Spring Remoting with the ability to make asynchronous calls. Lets continue with our Optimizer example and implement a solution using Lingo and Spring. OptimizerService interface

public interface OptimizerService {
void registerCallback(OptimizerCallback callback) throws OptimizerException;

void solve();

void cancel() throws OptimizerException;
}

The solve() method is asynchronous while the cancel() and registerCallback(..) methods are not. Asynchronous methods by convention must not have a return value and also must not throw exceptions. The registerCallback(..) method registers a client callback with the Optimizer. In order to make an argument be a remote callback, the argument must implement java.util.EventListener or java.rmi.Remote. In this example the OptimizerCallback interface extends java.util.EventListener. If the argument does not implement either of these interfaces, it must implement java.io.Serializable and it will then be passed by value.

OptimizerCallback interface

public interface OptimizerCallback extends EventListener {

void setPercentageComplete(int pct);

void error(OptimizerException ex);

void solveComplete(float solution);
}

The callback API has a method for the Optimizer to set the percentage complete, report an error during the solve() process (remember that the solve() method is asynchronous so it cannot throw an exception directly) and finally the solveComplete(..) callback to inform the client that the solve is complete along with the solution.

OptimizerService implementation

public class OptimizerServiceImpl implements OptimizerService {

private OptimizerCallback callback;
private volatile boolean cancelled = false;


private static Log LOG = LogFactory.getLog(OptimizerServiceImpl.class);

public void registerCallback(OptimizerCallback callback) {
LOG.info("registerCallback() called ...");
this.callback = callback;
}


public void solve() {
LOG.info("solve() called ...");
float currentSolution = 0;

//simulate long running solve process
for (int i = 1; i <= 100; i++) { try {
currentSolution += i;
Thread.sleep(1000);
if (callback != null) {
callback.setPercentageComplete(i);
}
if (cancelled) {
break;
}
} catch (InterruptedException e) {
System.err.println(e.getMessage());
}
}
callback.solveComplete(currentSolution);


}

public void cancel() throws OptimizerException {
LOG.info("cancel() called ...");
cancelled = true;
}
}

The solve() method sleeps for a while and makes the call setPercentageComplete(..) on the callback registered by the client. The code is pretty self explanatory here.

Optimizer Application context - optimizerContext.xmlWe now need to export this service using Lingo Spring Remoting. The typical Lingo Spring configuration as described in the Lingo docs and samples is :

xml version="1.0" encoding="UTF-8"?>


<beans>
<bean id="optimizerServiceImpl" class="org.sanjiv.lingo.server.OptimizerServiceImpl" singleton="true"/>

<bean id="optimizerServer" class="org.logicblaze.lingo.jms.JmsServiceExporter" singleton="true">
<property name="destination" ref="optimizerDestination"/>
<property name="service" ref="optimizerServiceImpl"/>
<property name="serviceInterface" value="org.sanjiv.lingo.common.OptimizerService"/>
<property name="connectionFactory" ref="jmsFactory"/>
bean>


<bean id="jmsFactory" class="org.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="tcp://localhost:61616"/>
<property name="useEmbeddedBroker">
<value>truevalue>
property>
bean>

<bean id="optimizerDestination" class="org.activemq.message.ActiveMQQueue">
<constructor-arg index="0" value="optimizerDestinationQ"/>
bean>
beans>

In this example, I'm embedding a JMS broker in the Optimizer process. However you are free to use an external JMS broker and change the JMS Connection Factory configuration appropriately.

Note : The above optimizerContext.xml it the typical configuration in the Lingo docs/examples
but is not the ideal configuration. It has some serious limitations which I'll cover in a bit
along with the preferred "server" configuration.

OptimizerServer The "main" class that exports the OptimizerService simply needs to instantiate the "optimizerServer" bean in the optimizerContent.xml file.

public class OptimizerServer {

public static void main(String[] args) {
if (args.length == 0) {
System.err.println("Usage : java org.sanjiv.lingo.server.OptimizerServer ");
System.exit(-1);
}
String applicationContext = args[0];


System.out.println("Starting Optimizer ...");
FileSystemXmlApplicationContext ctx = new FileSystemXmlApplicationContext(applicationContext);

ctx.getBean("optimizerServer");

System.out.println("Optimizer Started.");


ctx.registerShutdownHook();
}
}

The ClientIn order for the client to lookup the remote OptimizerService, we need to configure the client side Spring application context as follows : Client Application Context - clientContext.xml

xml version="1.0" encoding="UTF-8"?>


<beans>
<bean id="optimizerService" class="org.logicblaze.lingo.jms.JmsProxyFactoryBean">
<property name="serviceInterface" value="org.sanjiv.lingo.common.OptimizerService"/>
<property name="connectionFactory" ref="jmsFactory"/>
<property name="destination" ref="optimizerDestination"/>


<property name="remoteInvocationFactory" ref="invocationFactory"/>
bean>


<bean id="jmsFactory" class="org.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="tcp://localhost:61616"/>
bean>

<bean id="optimizerDestination" class="org.activemq.message.ActiveMQQueue">
<constructor-arg index="0" value="optimizerDestinationQ"/>
bean>

<bean id="invocationFactory" class="org.logicblaze.lingo.LingoRemoteInvocationFactory">
<constructor-arg>
<bean class="org.logicblaze.lingo.SimpleMetadataStrategy">

<constructor-arg value="true"/>
bean>
constructor-arg>
bean>
beans>

Now all a client needs to do to is obtain a handle of the remote OptimizerService by looking up the bean "optimizerService" configured in clientContext.xml.

OptimizerCallback implementationBefore going over the sample Optimizer client code, lets first write a sample implementation of the OptimizerCallback interface - one which the client will register with the remote Optimizer by invoking the registerCallback(..) method.

public class OptimizerCallbackImpl implements OptimizerCallback {

private boolean solveComplete = false;
private OptimizerException callbackError;
private Object mutex = new Object();


public void setPercentageComplete(int pct) {
System.out.println("+++ OptimzierCallback :: " + pct + "% complete..");
}

public void error(OptimizerException ex) {
System.out.println("+++ OptimzierCallback :: Error occured during solve" + ex.getMessage());
callbackError = ex;
solveComplete = true;
synchronized (mutex) {
mutex.notifyAll();
}
}


public void solveComplete(float soltion) {
System.out.println("+++ OptimzierCallback :: Solve Complete with answer : " + soltion);
solveComplete = true;
synchronized (mutex) {
mutex.notifyAll();
}
}


public void waitForSolveComplete() throws OptimizerException {
while (!solveComplete) {
synchronized (mutex) {
try {
mutex.wait();
if (callbackError != null) {
throw callbackError;
}
} catch (InterruptedException e) {
e.printStackTrace();
break;
}
}
}
}
}

OptimizerClient

public class OptimizerClient {

public static void main(String[] args) throws InterruptedException {


if (args.length == 0) {
System.err.println("Usage : java org.sanjiv.lingo.client.OptimizerClient ");
System.exit(-1);
}

String applicationContext = args[0];
FileSystemXmlApplicationContext ctx = new FileSystemXmlApplicationContext(applicationContext);

OptimizerService optimizerService = (OptimizerService) ctx.getBean("optimizerService");
OptimizerCallbackImpl callback = new OptimizerCallbackImpl();


try {
optimizerService.registerCallback(callback);
System.out.println("Client :: Callback Registered.");

optimizerService.solve();
System.out.println("Client :: Solve invoked.");

Thread.sleep(8 * 1000);
System.out.println("Client :: Calling cancel after 8 seconds.");


optimizerService.cancel();
System.out.println("Client :: Cancel finished.");
//callback.waitForSolveComplete();

} catch (OptimizerException e) {
System.err.println("An error was encountered : " + e.getMessage());
}
}
}

The test client registers a callback and calls the asynchronous method solve(). Note that the solve method in our sample OptimizerService implementation takes ~100 seconds to complete. The client then prints out the message "Client :: Solve invoked.". If the solve() call is indeed invoked asynchronously by Lingo under the hoods, this message should be printed to console immediately and not after 100 seconds. The client then calls cancel() after 8 seconds have elapsed.

Here's the output when we run the Optimizer Server and Client

Notice that the solve method has been called asynchronously and after 8 seconds the client makes the cancel() call however the server does not seem to be receiving this call and continues with its setPercentageComplete(..) callback.

I asked this question on the Lingo mailing list but did not get a response. This misbehaviour was pretty serious because what this meant was that while an asynchronous call like solve() was executed asynchronously by the client, the client was not able to make another call like cancel() until the solve() method completed execution on the server... which defeats the purpose of a method like cancel().

Lingo and ActiveMQ are open source so I rolled up my sleeves and ran the whole thing through a debugger. Debugging multithreaded applications can get tricky but after spending several hours I was able to get the to bottom of this issue.

Recollect that we exported the OptimizerSericve using the class org.logicblaze.lingo.jms.JmsServiceExporter in optimizerContext.xml. On examining the source, I found that this class creates a single JMS Session which listens for messages on the configured destination ("optimizerDestinationQ" in our example) and when messages are received, it invokes a Lingo listener which does the translation of the inbound message into a method invocation on the exported OptimizerServiceImpl service object.

The JMS spec clearly states

A Session object is a single-threaded context for producing and consuming messages.
...
It serializes execution of message listeners registered with its message consumers.

Basically a single JMS Session is not suitable for receiving concurrent messages. I understood why the cancel() method wasn't being invoked until the solve() method completed. But this behavior still didn't make sense from an API usage perspective.

Fortunately Spring 2.0 added support classes for receiving concurrent messages which is exactly what we need (yep, Spring rocks!). There are a few different support classes like DefaultMessageListenerContainer, SimpleMessageListenerContainer, and ServerSessionMessageListener .

The ServerSessionMessageListenerContainer "dynamically manages JMS Sessions, potentially using a pool of Sessions that receive messages in parallel". This class "builds on the JMS ServerSessionPool SPI, creating JMS ServerSessions through a pluggable ServerSessionFactory".

I tried altering optimizerContext.xml to use this class optimizerContextPooledSS.xml

xml version="1.0" encoding="UTF-8"?>


<beans>
<bean id="optimizerServiceImpl" class="org.sanjiv.lingo.server.OptimizerServiceImpl" singleton="true">
bean>

<bean id="optimizerServerListener" class="org.logicblaze.lingo.jms.JmsServiceExporterMessageListener">
<property name="service" ref="optimizerServiceImpl"/>
<property name="serviceInterface" value="org.sanjiv.lingo.common.OptimizerService"/>
<property name="connectionFactory" ref="jmsFactory"/>
bean>

<bean id="optimizerServer" class="org.springframework.jms.listener.serversession.ServerSessionMessageListenerContainer">
<property name="destination" ref="optimizerDestination"/>
<property name="messageListener" ref="optimizerServerListener"/>
<property name="connectionFactory" ref="jmsFactory"/>
bean>


<bean id="jmsFactory" class="org.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="tcp://localhost:61616"/>
<property name="useEmbeddedBroker">
<value>truevalue>
property>
bean>

<bean id="optimizerDestination" class="org.activemq.message.ActiveMQQueue">
<constructor-arg index="0" value="optimizerDestinationQ"/>
bean>
beans>

Unfortunately the behavior was still the same - cancel() was not executing on the server until solve() completed. I posted this question on the Spring User list but did not get a response. This class uses the ServerSessionPool SPI so I'm not sure if there is a problem with the Spring class, the ActiveMQ implementation of this SPI or something that I've done wrong.

Anyway I was able to successfully configure the DefaultMessageListenerContainer class and observed the desired behavior. In contrast to ServerSessionMessageListenerContainer, DefaultMessageListenerContainer "creates a fixed number of JMS Sessions to invoke the listener, not allowing for dynamic adaptation to runtime demands". While ServerSessionMessageListenerContainer would have been ideal, DefaultMessageListenerContainer is good enough for most use cases as you'd typically want to have some sort of thread pooled execution on the server anyways.

optimizerContextPooled.xml

xml version="1.0" encoding="UTF-8"?>


<beans>

<bean id="optimizerServiceImpl" class="org.sanjiv.lingo.server.OptimizerServiceImpl" singleton="true">
bean>

<bean id="optimizerServerListener" class="org.logicblaze.lingo.jms.JmsServiceExporterMessageListener">
<property name="service" ref="optimizerServiceImpl"/>
<property name="serviceInterface" value="org.sanjiv.lingo.common.OptimizerService"/>
<property name="connectionFactory" ref="jmsFactory"/>
bean>

<bean id="optimizerServer" class="org.springframework.jms.listener.DefaultMessageListenerContainer">
<property name="concurrentConsumers" value="20"/>
<property name="destination" ref="optimizerDestination"/>
<property name="messageListener" ref="optimizerServerListener"/>
<property name="connectionFactory" ref="jmsFactory"/>
bean>


<bean id="jmsFactory" class="org.activemq.ActiveMQConnectionFactory">
<property name="brokerURL" value="tcp://localhost:61616"/>
<property name="useEmbeddedBroker">
<value>truevalue>
property>
bean>

<bean id="optimizerDestination" class="org.activemq.message.ActiveMQQueue">
<constructor-arg index="0" value="optimizerDestinationQ"/>
bean>

beans>
Note : Although some Lingo examples have the destination created as a Topic(ActiveMQTopic)
with the org.logicblaze.lingo.jms.JmsServiceExporter class, you must use a Queue when
using multiple JMS sessions for concurrent message retreival as a Topic will be received
by all listeners which is not what we want.

Here's the result when using applicationContextPooled.xml

You can download the complete source for this here and run the sample server and client. JRoller doesn't allow uploading .zip files so I've uploaded the sample as a .jar file instead. The source distribution has a Maven 1.x project file. To build, simply run "maven". To run the optimizer sever without pooled JMS listeners, run startOptimizer.bat under dist/bin/. To run with pooled JMS listeners, run startOptimizerPooled.bat and to run the test client, run startClient.bat

I am using this architecture to provide a remote API for our C++ optimizer. The C++ optimizer has a thin JNI layer which loads the Spring application context file and the OptimizerServiceImpl has a bunch of native methods which is tied to the underlying C++ optimizer functionality using the JNI function RegisterNatives(). Do you Lingo? I'd like to hear how others are using Lingo/Spring Remoting.

Monday, March 12, 2007

在ubuntu下配置vsftp。

唉,这个小东西真烦人。 我在网上找了个贴子,跟着试了试,可以用,就直接贴过来了。
http://linux.hiweed.com/node/1080

Wednesday, February 28, 2007

软件整合测试阶段的版本管理。

这里说的软件整合测试阶段指的是开发组发布了第一个可以测试的版本之后的版本管理,有几点需要澄清:1)不是说软件测试是从第一个版本发布之后才开始,软件测试是伴随着整个软件开发的生命周期,当然也不一定是指对代码的测试, 也可以包括对文档的review。 2)为什么这里说软件整合测试?因为在”软件测试的艺术“这本书中,系统测试指的是非功能性的测试,所以我用整合测试来表示对系统功能的测试(不是单元测试)。
想起前段时间组织的对安徽通彩网项目的测试。很混乱,测试团队和开发团队缺乏有效的沟通,很大原因上也是因为版本管理的混乱,当大家在说到一个bug的时候,这个bug并没有定义在一个有效的软件版本下,我们总是说最新版本, 但是很显然很多bug并不是在最新的版本中。如果我们总是说最新版本, 这其实就表示了我们根本没有版本管理。 此外,虽然在测试团队里面采用了跌代的测试方式,但是开发团队里面却并没有一个明确的开发过程定义。实际上,测试团队采用什么样的测试过程应当取决于开发团队的开发过程。 如果开发团队也是一个跌代的开发过程,这表示每个跌代开发团队都会发布一个可测试的软件版本,很自然的,测试团队应该也采用跌代的测试方式。 在安徽通彩网的测试中,我定义了每三天为一个测试迭代,在这个迭代中所有的测试用例需要被重新执行,测试环境需要被重新构建,而且软件版本实际上是由测试团队来定义的。。。我现在能说的就是,好歹测试团队遵循了一个测试过程,有这个过程比没有这个过程要好。
昨天看了一份文档‘Revision Control with Subversion’,其中关于branch的那一段里面定义了一个软件发布阶段的版本管理过程, 在这个过程中branch扮演重要角色(在这份文档中定义了几种branch pattern,其中包括release branch, feature branch),这种branch被定义为release branch,现在直接摘录。
Here's where version control can help. The typical procedure looks like this:
• Developers commit all new work to the trunk. Day-to-day changes are committed to /trunk: new features, bugfixes, and so on.
• The trunk is copied to a “release” branch. When the team thinks the software is ready for release (say, a 1.0 release), then /trunk might be copied to /branches/1.0.
• Teams continue to work in parallel. One team begins rigorous testing of the release branch, while another team continues new work (say, for version 2.0) on /trunk. If bugs are discovered in either location, fixes are ported back and forth as necessary(不管是在trunk或者branch中发现了bug,这个bug的解决都应该被merge到另外一方). At some point, however, even that process stops. The branch is “frozen” for final testing right before a release.
• The branch is tagged and released. When testing is complete, /branches/1.0 is copied to /tags/1.0.0 as a reference snapshot. The tag is packaged and released to customers(版本的发布应该由开发团队和测试团队共同定义,比如测试团队声明所有测试都通过了,那么开发组就可以发布一个新版本).
• The branch is maintained over time. While work continues on /trunk for version 2.0, bugfixes continue to be ported from /trunk to /branches/1.0. When enough bugfixes have accumulated, management may decide to do a 1.0.1 release: /branches/1.0 is copied to /tags/1.0.1, and the tag is packaged and released(就算软件已经正式发布,仍然有可能发现bug,这个bug被解决以后会发布一个bugfix的版本).

Sunday, February 25, 2007

关于性能测试

现在来说, 我认为性能测试是个专业性很强的话题, 不是随便一个人拿个测试工具就可以做的了的, 因为这样测试出来的结果很可能和真实的结果相差很远,最终就是误导开发人员,导致开发人员在错误的方向上耗费大量的精力。 就我来说,我觉得现在关于性能测试需要搞清楚两点:1)什么是性能测试? 有哪些方式? 2)性能测试应该测试哪些指标,如何分析测试结果。
前些天找到了一个专门讨论软件测试的论坛www.sqaforum.com,里面有一个关于性能测试的目录,看来有很多人和我一样对于性能测试的很多东西都不清楚, 尤其是纠缠于performance test, load test, stress test之间的关系,我转贴了一段得到认同的回帖:
BASIC DEFINITIONS

This is an excerpt from my forthcoming book on performance and load testing.

While there is no universal consistency in how people use terms like performance test and robustness test, I can say that the definitions provided here are as much in the mainstream as any others.

The Definition of Performance Testing

The purpose of performance testing is to measure a system’s performance under load. As Humpty Dumpty said, a word can mean whatever one chooses it to mean, so it is worth our time to examine what we mean by the words “measure”, “performance” and “load”.

Performance testing is a measurement of performance characteristics, although sometimes the use of the word “testing” confuses people. Some performance professionals feel strongly that it is important to not use the term “performance testing”, but to call it performance measurement instead. They are concerned that this measurement will get confused with feature testing and debugging, which it is not. They point out that measurement is only testing if the collected measurements are checked against pre-established goals for performance, and that measurement is often done without preconceptions of required performance.

These people have a good point: clarity of terminology is important. But since most people use the term “performance testing” we will go with the majority and use it too.

The term performance can mean response time, throughput, availability, error rate, resource utilization, or another system characteristic (or group of them), which we are interested in measuring. “All promise outruns performance.” Ralph Waldo Emerson

Performance testing simulates the typical user experience under normal working conditions. The load is a typical, representative mix of demands on the system. (And, of course, there can be several different representative loads -- the work load at 2 p.m., at 2 a.m., etc.) Another name sometimes used for a performance test is a capacity test, though there is a minor difference in these terms as we will see later.

First, the performance testers need to define what the term performance means in a specific test situation -- that is, what the objectives are and what we need to measure in the test. The answer to this question is that we measure performance usually as a weighted mix of three characteristics of a system: throughput, response time and availability. In real-time systems, for example, the users need a guarantee that a task will always be completed within a fixed time limit. Performing a task correctly but a millisecond too late could literally be fatal.

The term load simply means the mix of demands placed on a system while we measure its performance and robustness characteristics. In practice, most loads vary continually, so later we will address the challenge of determining the most appropriate load(s) for testing. The terms work load and benchmark are sometimes used as synonyms for load. A benchmark usually means a standard load, one used to compare the performance of systems, system versions, or hardware environments, but the benchmark is not necessarily the actual mix of demands at any one user installation. The term work load is a synonym for a load, and you see both of the terms in this book: they are interchangeable.

Definition of Load Testing

In contrast to a performance test, a load test is a measurement of performance under heavy load: the peak or worst-case conditions. Because loads can have various sizes, more precise terms for this type of testing are peak-load testing or worst-case-load testing.

A performance test usually is done with a typical, representative load, but this measurement may not tell us much about the system’s behavior under heavy load. For example, let’s assume that the peak load on a system is only 15% more than the average load. The system performance may degrade gracefully – the system runs 15% slower at peak load. Often, though, the performance under load is non-linear: as the load increases by a moderate amount (in this case, 15%), the response time does not increase by a comparable percentage but instead becomes infinite because the system fails under the increased load.

Definition of Stress Testing

A stress test is one which deliberately stresses a system by pushing it beyond its specified limits. The idea is to impose an unreasonable load on the system, an overload, without providing the resources which the system needs to process that load.

In a stress test, one or more of the system resources, such as the processor, memory, or database I/O access channel, often “maxes out” and reaches saturation. (Practically, saturation can happen at less than 100% of the theoretical usable amount of the resource, for many reasons.)

This means that the testware (the test environment, test tools, etc.) must be sufficiently robust to support the stress test. We do not want the testware to fail before we have been able to adequately stress the system.

Many bugs found in stress testing are feature bugs which we cannot see with normal loads but are triggered under stress. This can lead to confusion about the difference between a feature bug and a stress bug. We will address this issue in the upcoming section entitled: “Testing Performance and Robustness versus Features”.

Some testers prize stress testing because it is so fruitful in finding bugs. Others think it is dangerous because it misdirects projects to fix irrelevant bugs. Stress testing often finds many bugs, and fixing these bugs leads to significant delays in the system delivery, which in turn leads to resistance to fixing the bugs. If we find a bug with a test case or in a test environment which we can’t connect to actual use, people are likely to dismiss it with comments like: "The users couldn’t do that.", “.. wouldn’t do that” or “... shouldn’t do that.”

Stress, Robustness and Reliability

Although stress, robustness and reliability are similar, the differences among them mean that we test them in related but different ways.

We stress a system when we place a load on it which exceeds its planned capacity. This overload may cause the system to fail, and it is the focus of stress testing.

Systems can fail in many ways, not just from overloading. We define the robustness of a system by its ability to recover from problems; its survivability. Robustness testing tries to make a system fail, so we can observe what happens and whether it recovers. Robustness testing includes stress testing but is broader, since there are many ways in which a system can fail as well as from overloading.

Reliability is most commonly defined as the mean time between failure (MTBF) of a system in operation, and as such it is closely related to availability. Reliability testing measures MTBF in test mode and predicts what the system reliability will be in live operation.

Robustness and reliability testing are discussed in the companion volume to this book, entitled “System Robustness Testing”.

Ross

---------------

这里基本上是说load test, stress test都是performance test的特例,更多的信息可以参考http://www.qaforums.com/Forum2/HTML/000724.html, 还有一个很有个实际指导意义的资源:http://www.codeplex.com/PerfTesting

Wednesday, January 31, 2007

Understanding Transaction Isolation

自从开始作为一个developer, 就可以是接触和了解事务隔离级别(transaction isolation level) , 但是坦白说我还没有真正的理解过这个概念, 总是停留再一种似懂非懂的境界. 期间也去网上找过很多资料, 但是都是搜索中文网页, 老实说, 国内的网站技术文章太少了 ,而且都是一篇文章所有网站抄来抄去, 很悲哀到了现在还是似懂非懂, 难道要糊涂的生, 然后糊涂的死么?
昨天, 当我现在在这里写这些文字的时候,我一直在回想, 昨天是什么促使我突然又开始关注事务隔离级别这个话题. 想不起来了, 昨天晚上公司年夜饭, 喝了不少红酒, 还喝了一杯白酒, 回家之后倒头就睡,过去的记忆仿佛都在昨天被抛弃了. 就是说想不起来了, 想不起来也无所谓, 这件事情还是要做的, 所以今天又google了一把, 关键字是transaction isolation level, 嘿嘿,找到了一些资源, 先看看. 写这些文字的时候, 我还根本没看那些找到的文章, 有点颠倒是非. 先写道这里, 看了之后, 然后再说我是否已经真正理解了事务隔离级别这个概念. 理解了我便要在这里写下我的理解.
下面这篇文章不错, 我决定直接copy过来.
--------------------------

Understanding Transaction Isolation

http://www.expresscomputeronline.com/20040426/techspace01.shtml

Article summary

This is a topic related to SQL Server. However, the concept is applicable to any RDBMS. While auditing many applications, I have found that incomplete or no understanding of ‘Isolation Level’ leads to a lot of real life problems. These include performance degradation, blocking, locking as well as major deadlocks.

This article provides an easy-to-understand view of what ‘Isolation Levels’ really mean and when to use which level.

‘Isolation Level’ is a setting that decides how data which is a part of an ongoing transaction is made visible to other transactions.

Transactions are well understood. So what is the problem?

As all of us know, it is a unit of work. The work may contain many steps but either all steps happen or all steps don’t happen. This is fairly well known. Nothing new here. But consider how transactions occur in real life.

  • I am updating one row in a Begin Tran and Commit section. I have issued Begin Tran as well as the Update statement. Commit is not yet issued – because I want to perform some more actions in the same transactions on some other tables. If some other user wants to read this row that I have updated but not yet committed, what should happen?
  • Consider another scenario. I start a transaction. I calculate a total of a field based upon all rows in a table. Now, I need to add a new record in another table which contains this total. Now, can the original table be changed by some other user after I calculate the total? In which case, there could be a mismatch. Do you want to take such chances?
  • Another scenario. I am working on some transaction table between a range of keys within a transaction—say 10 and 20. There were only 5 records when I read the range – 10, 12, 14, 16, 20. Now I am working on other things in the transaction. Before I could commit the transaction, someone added another row with a key value of 11. Now, my base assumption about what records I read between 10 and 20 and further work upon them itself is wrong. Problem!!! Is it not?
  • I start reading a long table. It is not a transaction at all. But other users want to refer to that table for updating some fields in specific rows. The query takes 20 minutes to read all the rows. What happens to other users who are trying to update the rows? Do they wait for 20 minutes? Or they are allowed to update the rows even when the rows are being read in a large query? What if the query was used to generate a summary report containing grand totals? The total would be wrong because after the summation started, some rows have changed. Some of these rows could have changed after the summation occurred. What’s to be done now?

As you can see all these situations are confusing and prone to inaccuracies. To avoid such problems we have a feature called “Isolation Levels”.

Isolation Levels are applicable to transactions. These decide the visibility of information which is a part of an ongoing transaction.

In very simple terms Isolation Levels decide “What happens when some data being referred to (for reading or writing) within an incomplete transaction is also being referred to (for reading or writing) from another connection (or user – actually it is called another ‘transaction lock space’)?”

To make things sound technical, all these problems have been given nice and complex sounding names.

Data visibility problems that can occur during a transaction

Let us understand some jargon.

  • Transaction Lock Space:

Each transaction performs certain operations (select / insert / update / delete) on one or more rows of one or more tables. Transaction starts with Begin Tran command and ends with Commit. If unsuccessful, it ends with Rollback command. Now, after Begin Tran is issues and before Commit is issued, multiple tables may participate in the transaction related commands. Specific parts of these tables need to be locked during this phase to ensure that other users do not interfere with this transaction. This is called Transaction Lock Space.

  • Uncommitted (dirty) data:

Consider this code snippet (Blue text is code; green indicates comments):

1. Begin Transaction

—we want to change the customer status from active = “yes” to active = “no”

2. Update customer Set active = “No”

where CustomerID = 2324

—some more commandss

3. Select * from customer where CustomerID = 2324

4. Commit Transaction

Now, if the transaction commits, the value will be “No”. If it does not commit (for whatever reason), the value will remain “Yes”.

Consider that the value has already been changed to “No”. But there are more commands to be executed before the entire transaction commits. These commands are time consuming and take, say, 3 minutes. During these three minutes, if some user outside the transaction lock space reads the value of Active field in the Customer table for ID 2324, what should they see? “Yes” or “No”? The answer is simple. The value is changed but not committed. Therefore, external queries should still show “Yes”.

Now what would happen if a Select active from Customer where CustomerID – 2324 returns “No” to another user? What happens if the transaction rolls back? This is called Uncommitted Data. Ideally this should not be visible outside the transaction.

However, consider the same command executed within the transaction (line 4). It should – and it will return “No”.

Now let us consider various problems that can occur. The problems can be of three types:

1. Dirty Read

2. Non-repeatable read

3. Phantom rows

- Dirty reads

This is when connections outside the transaction space can read the uncommitted data. You don’t want this to happen in most cases. The only reason when you may want to allow this is when you are creating a report which assumes a certain amount of inaccuracy. You think this situation is rare? Not really. Suppose the report compares the current month sale with last month sale and returns a percentage difference. Consider that the report takes 5 minutes to generate. During these 5 minutes, more transactions may be getting added to the sales data. Typically 5 transactions would get added to the sales table. The average transaction value is Rs. 1000. The total sale for the month is typically 30-40 lacs. In such cases, you really don’t need to worry about absolute accuracy. You can intentionally allow dirty reads because the impact is really not significant from a business perspective.

- Non-repeatable read

Consider the above code. We change the value of Active to “No” in line 2. Now in line 4 you expect the value to be “No” because we are querying the row ‘within’ the transaction.

Now what would happen if some other transaction was allowed to change the same value to, say, “Maybe”? In line 4, we would expect value of “No” but actually get value of “Maybe”. This problem is called non-repeatable read. The idea is that within a transaction, the same data read any number of times should yield consistent results. If it is not same then the reads are ‘non-repeatable’.

- Phantom Rows

This was explained in the introduction (range 10 to 20 example). The concept is simple. If you have already read a range of data based upon a key value, another transaction inserts a new row which happens to have a value between this range. The original range which was referred to would now become invalid. This is because a “Phantom row” was added. This is also a major problem.

Important learning

Although the above three issues are listed as problems, they may not always be considered as problems! Paradoxical? Not really. As explained for Dirty Read, whether it is a problem or not depends entirely upon the business context. Technology exists to allow these problems to occurs as well as prevent them. The choice is yours. I know this complicates matters. But if these matters were not complicated, what are IT professionals paid for!

Now let us see how to prevent these problems? The answer is “using Isolation Levels”.

Types of Isolation Levels

To solve the problem of … The Isolation Level should be…
Dirty Read Read Committed (Default of SQL Server)

Dirty Read and Non-Repeatable Read Repeatable Read Non-Repeatable Read

Dirty Read and Non-Repeatable

Read and Phantom Rows

Serializable
To retain all three problems Read Uncommitted

Before we discuss each isolation level, let us understand how to set or change it in T-SQL.

Syntax

SET TRANSACTION ISOLATION LEVEL
{ READ COMMITTED
| READ UNCOMMITTED
| REPEATABLE READ
| SERIALIZABLE
}

Remember:

- This setting applies to all transactions which are happening within a single connection.

- Multiple connections can have different Isolation Levels.

- Within a single connection, you can change the Isolation Level for specific transactions.

- You must know the kind of transaction you are performing and its business requirements before you can decide the right isolation level.

- Isolation level does not just affect the current transactions in a given connection. It also affects how other transactions issued within other connections will behave.

- Therefore, you should have a good idea of what kind of concurrent transactions occur in your application when the system is live and many users are active.

This is the single biggest problem which leads to low performance, locking and blocking, as well as deadlocks

- Most developers think of individual transactions as though they are occurring in a single user system.

- It is difficult to envisage all permutations of possible transactions which can occur together. But it is very much possible to plan for it in a structured and informed manner. This requires additional effort, monitoring live systems and tweaking of specific transactions and so on.

- Unfortunately, this additional effort is rarely a part of system deployment! Such problems surface only after the system load increases in such a way that small, unnoticed issues become amplified due to large volume of data and / or large number of concurrent users.

- Only one of the options can be set at a time.

- It remains in effect till you explicitly change the option by issuing another command. This is an important aspect to consider.

Best Practice : When you change the Isolation Level for a transaction, remember to set it back to the original level after the transaction is completed.

Isolation levels

  • Read Uncommitted

This is as good (or bad) as not having any isolation. All data which is uncommitted is readable from any connection. This should not be used unless you have a very good reason to do so.

  • Read Committed

This prevents dirty reads. This does not prevent phantoms or non-repeatable reads. This is the default. Although it is the default, it does not mean that this isolation level is ideal for all transactions. One of the regular tendencies amongst techies is to simply use default values without giving it a second thought! I cannot imagine the number of phantom and non-repeatable reads that must be occurring in the world because someone simply used the ‘default’ value. It is a scary thought to say the least.

This level is obviously more restrictive than the ‘Read Uncommitted’ level.

  • Repeatable read

This prevents dirty reads as well as non-repeatable reads. It does not prevent phantom rows. This is more restrictive than Read Committed level. When I say restrictive what does it mean? It means that the chances of other transactions having to wait for this one to finish are INCREASED. Another way of saying this is – Repeatable Read reduces concurrency compared to Read Committed level.

  • Serializable

This is the most restrictive of the options. This should never be used as the default level. If you use this one as the default, it will most probably create a single user system!

This prevents all three problems.

How do Isolation Levels work?

You will notice that we have not yet addressed this issue at all in this article. The answer to this question is another Pandora’s box. The answer is simple. It uses the appropriate types of locks to achieve the desired effects. We will see what locks are and how they are implemented using Isolation Levels in the next article.

In the meantime, please go through your code and see if you are simply using the default level or are there some transactions which merit a different isolation level. If you feel like changing a level, never do it in production system. It can create havoc. Try it out in a test environment first, satisfy yourself that it has no side effects and then implement it in production system.

Don’t stop there. Monitor the production system. Some problems are never detected in test systems. They may occur in production. Handle them as required.

If you have a packaged product which handles large amount of data and many concurrent users, you must analyse the appropriate usage of Isolation Levels.