Chris Donnan : Programming - Brooklyn Style
software, trading, family, fun
Posted .net, ESP/ CEP, programming on Sunday, June 15th, 2008.
I have had a good deal of looking at and playing with Coherence for (.net specifically). Several people have been talking about using it in areas near mine. It has a few features that are quite similar to a solution that I have worked on for some time. In any case, I am a fan so far. My question is; is it a good enough event processing engine for the majority of use cases?
Essentially, most use cases for stream processing are solved without things like sliding lookback windows etc. Certainly - those ARE useful features and there ARE LOTS of use cases for them, but I have seen a few classes of applications be solved 100% by types of stream processors that DO NOT do those types of temporal things.
Imagine for instance that you want to see;
All booked trades. This is usually easy enough. Maybe you have some XML on the wind, maybe it is a tib message of some sort (RV/ EMS), maybe you put an object on the dist’d cache. In all cases, you are essentially putting your data somewhere, and notifying existing parties. I think the old standard for this was essentially, map your object to the tib, and to the database. This means you can save, transport/ notify, etc.
Mapping layers
This is of course 2x mapping code
Not so great. With Coherence you have to write the code to serialize your classes to the correct forms (similar in many ways to implementing microsofts IXmlSerializeable ala sax parsing looking stuff). In the more recent Coherence for .net, not only can you implement IPortableObject (the serialization methods to get/ put to the cache), but you can also register external serializers. This is excellent in my opinion as you can have a 100% non-invasive solution. You tell the cache engine what class will serialize your given types, and you are good to go.
This is all well and good, but - you are still essentially going to get the 2x mapping problem… 1x for the cache serialization, 1x for the long term persistence (likely a database). This is not so great :(.
Continuous Queries
The fact that Coherence lets you register a ContinuousQuery is great. The filter syntax is not so pretty:
Filter filter = new AndFilter(new EqualsFilter(”getTrader”, traderid),
new EqualsFilter(”getStatus”, Status.OPEN));
It will work, and it will lend itself to higher abstractions above that lower level AST-like abstraction. It will call you back when your cache gets the relevant insert, update, delete, etc. Good stuff. This is the essence of continuous queries. My basic thought is that is ‘good enough’ for many, many use cases. Sure, the temporal stuff is good as well, but it is really unneeded for many problems.
My least favorite part of this is the mapping layers. These are layers that really do not add any business value, they are insulation, infrastructure, etc. Another missing link with coherence is the history of changes. Each change is going to change the entity in the cache, but the incremental changes, those are not held onto, unless you write code to do so.
Entity Capabilities
I guess this gets me thinking on how important it is to have entities that are;
- diff’able
- merge’able
- support incremental change broadcasting (If you have a 1 mb entity, and you mod 1 field, you do not want to broadcast the whole thing.)
I am a fan - of this Coherence business, but there are still other bits of the puzzle that must be thought about. How are you going to keep an audit history? What will you do for long term persistence/ reporting on those entities? How can you apply a sort of bulk change set to your entity? Etc.
-Chris
Posted .net, ESP/ CEP, programming on Monday, May 26th, 2008.
CLINQ to NEsper - I want it ASAP.
I will be toying with this ASAP.
Now if you could just make it FAST CONCURRENT RUBY CONTINUOUS - I would be super satisfied… How do we get there?
Posted .net, ESP/ CEP, programming on Saturday, July 7th, 2007.
NEsper OO CEP/ESP comes to .net. It has been a bit in the coming - but the esper 1.3 port to C# seems ready.
-CD
Posted ESP/ CEP, programming on Sunday, December 10th, 2006.
Via Marco on ESP:
This is what we have been talking about - SOA should be event driven! Tobco folks take on it.
Posted ESP/ CEP, programming on Sunday, November 12th, 2006.
After a week of playing with Coral 8 - I am in love. Short post - it is simple - go download Coral 8 and be impressed today
Polished, fast, complete, excellent product vision and implementation, mature, ready for the real world, they thought of all the important ‘enterprisey’ stuff. Great client tools. Simple - yet complete… What else - oh yeah - it is cool!
-Chris
Posted ESP/ CEP on Sunday, October 8th, 2006.
The 1st ever ..
Distributed Event-Based Systems Conference
The area is getting bigger and bigger.
Posted .net, ESP/ CEP, java, messaging on Saturday, October 7th, 2006.
Recently - my current employer has been looking for a solution for what it has been calling ‘continuous queries’. The CEP groups @ Yahoo have been debating about the most appropriate language for processing real time events. It winds up being quite an interesting topic. Depending on what your ‘query-able’ target is - you may want different semantics.
If you pre-suppose that you have an ‘in memory database’ with real time events when an insert, update or delete happens, the SQL-like languages seem to make sense. Since classic sql is bad at the temporal -Â these SQL like languages basically add temporal functionality.
What if you want to mostly query XML messages passing in the wind?? Then some sort of XPath predicate query semantics seem most desirable.
What if you want to query state changes of in memory java or .net objects?? Do you use an OGNL like query language? JXPath?? Some db4o like Soda query, query by example or native query?? What makes most sense for registering interest in real time distributed events for object based updates? How does a delta update work in this world? Do I REALLY have to turn my nice objects into a relational database-like structure to use real time eventing???
These are just a few of the questions running through my heads and the heads of several others in my area. I can never say that I do not get to play with interesting problems
-Chris
Posted ESP/ CEP on Tuesday, September 26th, 2006.
I have been reading the excellent Distributed Event Based Systems (lets call it DEPS for brevity) from Springer. These guys publish some of the absolute best cutting edge computer science work in the world. 2 of my favorite references that I have used for work in the past:
Recent Advances in Memetic Algorithms
Evolutionary Computation in Data Mining
These were phenomenal references while implementing serious cutting edge software. The academic research into optimal algorithms and strategies for computing complex ‘things’ is pivotal in implementing a world class solution. We are in the information age and we have SO many resources available to us. So much research, so much published material etc. It is silly not to use these resources for our real world software solutions!
Anyhow - onto my main point - the basic nuts and bolts of a content filtering event processing system - as stated in DEPS (somewhat paraphrased of course).
The Simple Operations
- Publisher - Publish( a message )
- Subscriber - Subscribe( a subscription )
- Subscriber - Unsubscribe( a subscription )
- Event Broker - Notify( a subscriber, a message)
- Publisher/ Event Broker - Advertise( a message schema )
- Publisher/ Event Broker - Unadvertise( a message schema )
The Collaborating Parts
Subscription
-a subscriber
-a message filter
Message Filter
-some predicates that the message should satisfy
Event Broker
- the part that gets the messages, tests filters, and notifies qualified subscribers
Publisher
-any part that publishes a message to the event broker
Subscriber
-any part that subscribes to some message filter of messages
Thats it - all done…..
So - in essence, it is pub/ sub of some XML messages via a broker/ daemon process (sounds like tib rv eh). From there the ‘content based filtering’ comes into play. How do we sort out the relevant bits from the stream of XML messages. Whereas Tib Rv gives you topic based filtering - you have to receive the message and open it up to get any more filtering. The ‘content based’ part winds up being key.
The trick with all of this will be to make it scale out. Seems like a lot of XML parsing. The real magic will be in the broker layer(s). There are myriad topographies described in DEPS. Being able to correctly scale out the broker tiers is a task with many smaller parts.
- Covering algorithms to decide if a message matches a filter
- Matching algorthms to decide what messages match
- Routing and delivery algorithms
- Inter-Broker semantics (how to deal with cycles, etc)
- Event scoping
- Routing table management
- Efficent XML parsing
The list goes on and on. All of the rest is making it efficent. This is an interesting set of prospects. It seems there is a sufficent body of research and a number of academic implementations (Jedi, Rebecca, Sienna, and more). The commerical ESP folks seem to have taken it all a more (Stream) SQL route. All interesting. I am curious to see if any commerical products are out there for plain XML message ‘content based filtering’.
More Eventually;
Chris
Posted ESP/ CEP, programming on Thursday, August 31st, 2006.
I am seeing all the more focus on ESP/ CEP stuff these days. The need for a layer of abstraction on top of ‘lots of messages’ is strong. In technology we are already pretty good at ‘lots of data’ flying around, but watching in a repeatable way with some good constructs is still in need. The need to translate some flying data into meaningful events is key. This is particularly relevant in regards to algorithmic trading I might add.
It seems that the general trend in this space is a sort of SQL like query for event flow.
You will see things like (Apama-esque):

or: (StreamBase-esque)
CREATE STREAM TickTriples AS
SELECT symbol, T1.price AS price1, T2.price AS price2, T3.price AS price3
FROM Ticks T1 -> Ticks T2 -> Ticks T3
WHERE T1.symbol = T2.symbol AND T2.symbol = T3.symbol;
SELECTÂ *
FROM TickTriples
WHERE price1 > 80 AND price2 < 80 AND price3 < 80 AND symbol = "IBM";
The Apama version reminds me of TradeStation which is sort of similar in that it consumes streams of events and it can do simlar event sequence operations. Obviously - TradeStation is specific to market data, it is older and not so advanced as the Aleris, StreamBases, Coral8s, etc… The ‘StreamSQL’ version has obvious SQL lineage.
To my amusement (and a few others I imagine)Â - a StreamBase employee and an Apama employee have been ‘going at it’ on one of the CEP email forums. The quibble points are over bold statements about performance, leadership in industry etc. Classic vendor stuff. Long-story-short; the emerging point is that there seems to be no really good set of metrics/ standards by which to compare performance of ESP/ CEP products. That will come soon I bet.
Anyhow - there you have it - CEP/ ESP technology is building. Plenty of tech fun and work to be found here I bet!
-Chris