Wednesday, February 24, 2010

The flush mode and behaviour of Hibernate

A few days ago, I found some strange problem from TE's log. There is a service names 'sell ticket', which implement below logic,I will present them in pseudocode.
  1. TransactionDao.getByDevAndTraceMessageId(X,Y)
  2. TransactionDao.insert(trans)
  3. SettlementReportDao.getByOperatorAndBatchNoAndMerchant(X,Y,Z)
  4. GameInstanceDao.getByGameIdAndDrawNo(X,Y)
  5. MerchantDao.update(X) //1
  6. MerchantDao.update(Y) //2
  7. TicketDao.insert(X)
  8. LottoEntryDao.insert(X) //1
  9. LottoEntryDao.insert(Y) //2
  10. LottoEntryDao.insert(Z) //3
  11. print log 'finish saving entries'
  12. TransactionDao.update(trans)
  13. commit transaction.
  14. print log 'Start to sync transaction log'
Some long transaction spend about 30 seconds from stop#11 to step#14. As my understanding, from step#11 to step#14 only update te_transaction and then commit transaction, why take so long time?
I have below two considerations:
  1. when update transaction, this transaction is blocked.
  2. when commit transaction, it will cost much
For point#1, it should not be blocked, cause at step#2, this transaction should already captured the lock. For point#2, if all sql statements have been sent to database, then committing should be very quick.

I understand hibernate in a wrong way which cause I get wrong conclusion. Let's think about the flush behaviour of hibernate. Each time when we call a Dao to insert,update,delete a entry, it won't hit database immediately, hibernate will record these actions, update entity states. When do flush, hiberate will send all these sql to database. Remember below points:
  • A select statement will always hit database to retrieve latest data from database, and update the related entity in cache(it should be 2nd level cache, which is managed by hibernate, so hibernate runtime is responsible for ensuring that all updates to the state of all entities in the persistence context which could potentially affect the result of the query are visible to the processing of the query.
     ..actually it isn't always true, there are 2 exceptions. #1)Query entity by id(entity identifier). for example if insert a entity first, then query that entity by ID, hibernate won't hit the underlying database, as the inserted entity is completedly managed by hibernate runtime. #2)If insert a entity, and then query that entity from outside of hibernate runtime, such as direct JDBC. In this case, as hibernate runtime has no awareness of the querying, it won't flush the insert/update/delete SQL to underlying database. ).
  • When execute a select query, hibernate will check if some previous actions(save, update,delete) have changed the status of queried entity(if changed, hibernate marked it as dirty). If so, hibernate will flush all update actions of entity to underlying database.
  • Hibernate will organize all actions into 'insert', 'update','delete' category. When do flushing, hibernate will perform all actions in order 'insert', 'update', 'delete'. It means maybe you invoke a update method before a insert, but in fact, insert will be performed first.
Then what is flush? when hibernate will flush? Flush will sync the entity state between hibernate cache and database. In case of below condition, hibernate will perform flush(We are talking about FlushMode.AUTO).
  • Before select, and some previous actions have changed the queried entity state. Image it, if a transaction query a entity(E1), then change some fields(E2), then query this entity again. As query will always hit database, if doesn't flush, this query will get E1. Obviously it is incorrect, that is why hibernate will perform flush before the second query.
  • Before committing transaction.
  • Manualy invoke flush() method.

"When execute a select enquiry, hibernate will check if some previous actions(save, update,delete) will change the status of queried entity." I will further explain what does this statement means. Lets say there are 2 entities: WorkingKey and Payout(these 2 entities have no any relationship), and:
1) insert a new Payout entity
2) query WorkingKey by some criteria
3) commit transaction.
The test code as below:
public void testHibernateFlush() throws Exception{
  Payout p = DomainMocker.mockPayout();
  logger.debug("before insert payout...");
  logger.debug("before querying working key...");
  this.getWorkingKeyDao().getWorkingKey("20101216", "GPE-111");
Ok, what will be printed out? -------------
[2011-01-17 15:39:08,984][main][DEBUG][BaseTransactionTest] before insert payout...
[2011-01-17 15:39:09,046][main][DEBUG][BaseTransactionTest] before querying working key...
[2011-01-17 15:39:09,359][main][INFO ][sqltiming] select workingkey0_.ID as ID24_, workingkey0_.CREATE_TIME as CREATE2_24_, workingkey0_.UPDATE_TIME as UPDATE3_24_, workingkey0_.VERSION as VERSION24_, workingkey0_.CREATE_DATE as CREATE5_24_, workingkey0_.DATA_KEY as DATA6_24_, workingkey0_.GPE_ID as GPE7_24_, workingkey0_.MAC_KEY as MAC8_24_ from GPE_KEY workingkey0_ where workingkey0_.CREATE_DATE='20101216' and workingkey0_.GPE_ID='GPE-111' {executed in 0 msec}
[2011-01-17 15:39:09,375][main][INFO ][sqltiming] insert into PAYOUT (CREATE_TIME, UPDATE_TIME, VERSION, TOTAL_AMOUNT_B4_TAX, DEV_ID, GAME_INSTANCE_ID, IS_BY_MANUAL, IS_VALID, MERCHANT_ID, OPERATOR_ID, SETTLEMENT_FLAG, SETTLEMENT_TIME, STATUS, TICKET_SERIALNO, TOTAL_AMOUNT, TRANSACTION_ID, TYPE, ID) values (2011-01-17 15:39:08.984, 2011-01-17 15:39:08.984, 0, NULL, 0, 'GII-111', 0, 1, 0, NULL, NULL, NULL, 0, '123456', 1245.01, 'TRANS-8111', 1, '97d660a332774c5fa8a434aa9a18c0bf'[BR] {executed in 16 msec}
[2011-01-17 15:39:09,375][main][DEBUG][WorkingKeyDaoImplTest] Committed transaction after execution of test [testHibernateFlush].
Surprise! You will see 'insert payout...' was executed after 'select workingkey...'. Why? yes, it is due to 'insert Payout' won't change the state of entity 'Workingkey'.

Flush in Spring test framework
I wrote a testcase extending from Spring AbstractTransactionalDataSourceSpringContextTests, by spring test framework, the transaction will be rollbacked automatically when finish testcase.
 /** the persistence technology is JPA(hibernate) */
public void testInsert() {
int init_countOfSP = this.countRowsInTable("RA_TE_TICKET_SCHEMA");
int init_countOfEntry = this.countRowsInTable("RA_TE_ENTRY");
     // generate a SalePackage instance which map to table 'RA_TE_TICKET_SCHEMA'
SalePackage sp = mock();
this.getSalePackageDao().insert(sp); int countOfSP = this.countRowsInTable("RA_TE_TICKET_SCHEMA");
int countOfEntry = this.countRowsInTable("RA_TE_ENTRY");
assertEquals(init_countOfSP + 1, countOfSP);
assertEquals(init_countOfEntry + 3, countOfEntry);
As my understanding, when countRowsInTable("RA_TE_TICKET_SCHEMA") (the underlying SQL is 'select count(0) from RA_TE_TICKET_SCHEMA) hibernate will issue a 'insert salepackage...' to database....but actual result is no any 'insert...' SQL...why??
It is due to countRowsInTable("RA_TE_TICKET_SCHEMA") is executed from jdbcTemplate(spring + jdbc), JPA implementation has no information of countRowsInTable. Finaly this test case will fail, as no salepackage is inserted into database.

Select will always trigger flush??
Continue with above testcase. As countRowsInTable is executed by JDBC directly, i changed my testcase as below:
public void testInsert() {
int init_countOfSP = this.countRowsInTable("RA_TE_TICKET_SCHEMA");
int init_countOfEntry = this.countRowsInTable("RA_TE_ENTRY");
SalePackage sp = mock();
sp = this.getSalePackageDao().findById(SalePackage.class, sp.getId());
int countOfSP = this.countRowsInTable("RA_TE_TICKET_SCHEMA"); int countOfEntry = this.countRowsInTable("RA_TE_ENTRY");
assertEquals(init_countOfSP + 1, countOfSP);
assertEquals(init_countOfEntry + 3, countOfEntry);
I will query SalePackage immediately after insert, and the implementation of findById:
public T findById(Class clazz, String id){ 
return this.getJpaTemplate().find(clazz, id);
The testcase failed again...why? I think it is caused by this.getJpaTemplate().find(clazz, id);. As JPA implementation(hibernate) has its own cache, when findById hibernate will find this entity from its cache, no need to hit database, and the most important is hibernate knows that Entity with id(sp.getid()) is inserted in current transaction, it means no other transactions can see this entity. So hibernate even no need to issue a 'select...' SQL to database, it will retrieve this entity from its local cache directly.
But how about if we implement findById() like this:
public T findById(Class clazz, String id){ 
List sp = this.getJpaTemplate().find("from SalePackage p where'" + id + "'");
if (sp.size() > 0) return (T)sp.get(0);
return null;
The test result is hibernate will issue two SQLs: 'insert salepackage...' and 'select ...with id=..'. As when you call this.getJpaTemplate().find(clazz, id), JPA runtime definitely know that you are query entity by identifier, while if this.getJpaTemplate().find("from SalePackage p where'" + id + "'") hibernate runtime doesn't know it :).

Wow, I found a nice post about hibernate flush, ""