August 2006
M T W T F S S
« Jul   Sep »
 123456
78910111213
14151617181920
21222324252627
28293031  
Chris Donnan

Create Your Badge

Chris Donnan : Programming – Brooklyn Style

software, trading, family, fun

Real time data mining

For the past few years – I have been talking about/ working on what I have been calling ‘real time data mining’. Marc is back blogging again and he has an interesting post here - that lead me to this guys interesting blog. They are talking about ESP – event stream processing. In other places this is also called CEP – complex event processing. These things are an infrastructure technology for my ‘real time data mining’. I will be doing my best to check out the companies/ technologies in this space.
In the past – working on real time algorithmic trading systems – in particular working with depth-of-market data from the CME has shown me a few things:

  • Real time data can be plentiful (several GB per day per instrument for depth of market data)
  • Real time data can be fast (many coming ticks per second per instrument)
  • Real time data can be hard to keep a current state (depth of market data is broadcast up to 20 levels out from the current bid/ ask – you need to keep aggregate calculations based on the current bid/ ask spread which means you need to update your aggregates, scaling, normalization, formulas etc rapidly)
  • Get ready to deal with lots of threading :)

It sounds like having real companies that are able to enable good logical APIs that provide the needed capacity, performance etc. These enabling technologies will enable yet-another layer – the real time data mining layer. I have worked on this for real time trading – and I think this is/ will be a HUGH area for these technologies.

For a few years, I also worked on the 30th and 60th busiest sites on the web. These guys are basically micro-marketers. I was trying hard to drive them towards a real time data mining model. We did manage to get in place (thanks much to the efforts of my friend John) a good set of real time ‘decision logic’ – updateable rules based actions that react to real time and aggregated historic data. This is a step in the right direction. This area can however also be included in the group of companies that could use a decent set of standardized real time ESP/ CEP/ Real Time Data Mining- or let me coin ‘RTDM’. Since the BI space (business intelligence) space seems to have a zillion buzz words/ acronyms – I will add fuel to the fire.
In any case – cool stuff. I will keep working on my RTDM – (particularly optimization and classification)

-Chris


You can leave a response, or trackback from your own site.