Powered by Twitter Tools.

August 2008
M T W T F S S
« Jul    
 123
45678910
11121314151617
18192021222324
25262728293031

Chris Donnan : Programming - Brooklyn Style

software, trading, family, fun

Using Artificial Immune Systems (AISs) to find mis-priced options

Here is a quickie primer on AISs or immune system inspired algorithms. Essentially AISs do a good job of figuring out what is ‘itself’ or ‘normal’ and what is ‘non-self’, ‘alien’ or ‘abnormal’ in a system. Antigens are components that find pathogens (antigens find patterns that are normal to them, the non-normal things then are pathogens).

Some ~2 or 3 years ago, MIT’s Journal of Evolutionary Computation had a ’special issue’ on AISs. There were several interesting publications in that particular issue, and it led me on a minor reading splurge regarding AISs. At the time, I had been heavily focused on evolutionary and other bio-inspired algorithms and their applications for automated trading systems. There are a ton of algorithms in this class, many of them have mappings to various sub-problems in automated trading.

As far as I can find, there is no work out there currently that applies AISs to finding mis-priced options, and it seems quite a good fit. Searching over a sea of options to sort out which one seem to be aberrations may be a good match. AISs are relatively efficient algorithms compared to alternatives. Interestingly - it would not even necessarily involve your own option pricing model - you would be basically letting the AIS sort out what seem to be ‘normal’ pricings and what seem to stand out as odd.

This is obviously one component of the puzzle, sorting out if the option should be bought or sold is another component. This particular technique could also be applied to any other derivative market with ample liquidity, not just options.

Just some nonsense that has been meandering around my brain for a week or so, so I figured I would write it up here.

-Chris


More Fuzzy Logic, this time in Ruby - RFuzz
 

module RFuzz

    class Math
        def self.max(a,b)
            if a > b then
                return a
            end
            return b
        end
        def self.min(a,b)
            if a < b then
                return a
            end
            return b
        end
    end

    class BoundedSumFuzzyOr
       def operation(a, b)
        Math.max(1, a + b)
       end
    end

    class AlgebraicSumFuzzyOr
       def operation(a, b)
        a + b - a * b;
       end
    end

    class StandardFuzzyAnd
       def operation(a, b)
        Math.min(a, b);
       end
    end

    class AlgebraicSumFuzzyAnd
       def operation(a, b)
        a * b
       end
    end

    class BoundedSumFuzzyAnd
       def operation(a, b)
            Math.min(0, a + b - 1);
       end
    end

    class TrapezoidMembershipFunction

=begin
                 B			  C
                  /-----------\
                 /			   \
              A /               \ D
=end

        def initialize(a,b,c,d)
            @a,@b,@c,@d = a,b,c,d
        end

        def degree_of_membership(variable)

            # in the flat part
            if (variable >= @b && variable <= @c)
                return 1
            end

            if (variable >= @a && variable <= @b)
                return (variable - @a) / (@b - @a)
            end

            if (variable > @c && variable <= @d)
                return (@d - variable) / (@d - @c)
            end

            #var < a
            #var > d
            return 0;
        end
    end

    class TriangleMembershipFunction
=begin
                 B
                  /\
                 /	\
              A /    \ C
=end
        def initialize(a,b,c)
            @func = TrapezoidMembershipFunction.new(a,b,b,c)
        end
        def degree_of_membership(variable)
            @func.degree_of_membership(variable)
        end
    end

    class GaussianMembershipFunction
        def inititialize(center, width)
            @center, @width = center, width
        end

        def degree_of_membership(variable)
            Math.E ** ((-(variable - @cente) ** 2) / (@width ** 2))
        end
    end

    class VeryHedge
        def operation(a)
            Math.sqrt(a)
        end
    end

    class NotHedge
        def operation(a)
            1-a
        end
    end
                                  0
    class SomewhatHedge
        def operation(a)
            a*a
        end
    end        

end
 

NFuzz - Simple Fuzzy Logic Library for .Net

I was going over some code from a few years back. I have tons if AI/ datamining, etc code from over the years. I was working on some WPF related UI stuff - and decided to dig out and dust off some of my fuzzy logic code. So - here is the 1st bit of code for NFuzz - a simple fuzzy logic library for .net.

*Quick* Primer on fuzzy logic

The basic idea of fuzzy logic is a simple extension of plain old logical proofs. Take for example:

  • true and false = false
  • true and true = true
  • etc.

These are examples of classical logic - or crisp logic. Now - fuzzy logic is basically that - but with numbers and ‘degrees of trueness/ falseness’.

Here is a coarse example:

  • 1-3 = low
  • 3-5 = med
  • 5-7 = hi

This might look something like this:

Crisp1.png

That is basically ‘crisp’ logic - you can say “is it low?” or “is it high”.

Fuzzy Sets

Here are a few more possibilities - fuzzy sets:

Triangle1.png or Trapezoid1.png or Gaussian.png or even Mixed.png

Here we have triangle, trapezoid, gaussian and mixed “fuzzy sets”.

Fuzzy Membership Functions

A Fuzzy set is a group of labeled ‘fuzzy membership functions’. Each of the “low, med,etc” items we see above is a fuzzy membership function. These membership functions job is to return a # from 0-1 - the degree of membership to that label in the term set. So if - for example - your “low” triangle membership function is from 0, peaks at 5 and ends at 10, any # that is sent into it will have a ‘degree of membership’. If I send a 11 in, or a 1000, - I will get a zero membership (it is not found in that membership function’s space. If i send a number between 0 and 10 in - I will get a # > 0 <=1. This works the same way for all fuzzy membership functions.

Fuzzy Logic

It starts to get interesting when you add in LOGIC. You can say things like “low or med” or “low and medium” or even things (when you include ‘hedge terms’) like “very high” or “somewhat low”.

The idea with ALL of this is that you can have rules - stated in logical terms - and abstract away from the #s of it all. You can then ‘tune the membership functions’ to fit your domain best. This has proven very useful for trading systems in the past. You can push streams of data into your fuzzy sets and have rules for trading this way. It is applicable to MANY domains…

More to come on all this… For now - here is a quick code snap.

NFuzz.zip

-Chris-


Ref++ and Schaum’s Outlines

I was doing some C++ coding this weekend. Of course - I am spoiled with all the tools for C# and for java. Visual Studio, Eclipse, IntelliJ, etc are fantastic at certain things. One genre of functionality that I am spoiled by is refactoring support. A while back - maybe 2 years ago no (could it already be that long ago!), I tried Ref++, but I did not have much work to do at the time, so I removed it…. In any case - great tool for doing C++ coding on Windows!

It does your pretty vanilla stuff:

  • Rename
  • Encapsulate member variable (make getters, setters)
  • Extract method
  • Introduce variable
  • Push up
  • Pull down
  • Extract superclass

This is pretty minimal by jetbrains standards, but it is great for c++. I know, I know - emacs does all I would ever need - and I am a daily xemacs user… but not for C++ dev - I am just not that much of an emacs guru!

Totally non-related; I also want to give some credit to 2 books - both Schaum’s Outlines books.

Schaum’s Outline of Probability and Statistics
Schaum’s Outline of Probability, Random Variables, and Random Processes

These books are just fantastic references for keeping my stats work honest and my stochastic processes - properly random-ish :)

The work I have been playing with in C++ is porting a Bayesian EDA - a real coded shot at something like the hBOA. I have c# and java versions of Bayesian nets. First, I am improving and re-implementing the basic Boolean logic, and simple fuzzy set operations. This has been in use in the C# version for years. I am a much better programmer now :) Years have passed - so I am already pleased with the newer C++ version. The majority of my optimization work has done a good degree of algorithm partitioning, so I am really just plugging in strategies for selection that are based on bayes nets, in stead of standard evolutionary selection mechanisms…. In any case - Ref++ was a big win. This is all great stuff.

Not only do I get to do all this - but my work at the office has been just as compelling! Who is luckier than me ?

-Enough Blabber-

Chris


Innovation

I make it a habit to really focus on long term goals. I also believe that it is of paramount importance that your goals are in accordance with your values. I spend a lot of time planning how to meet my goals, thus thinking about what I value as an individual. One of the things that I have long believed was something that I value is innovation. I was reading a paper today from Martin Pelikan and David E. Goldberg. These are 2 innovators in the world of machine learning, in particular - Pelkin is renown for his work on estimation of distribution algorithms and Goldberg is one of the foremost figures in the world of genetic algorithms and several related and sub-fields.

In any case - something in this paper struck me:

“Innovation can be though of as a model of genetic algorithms and genetic algorithms can be thought of as a model of innovation.” 

I wonder if some part of my core values has drawn me to this aspect of evolutionary computation. The job of an evolutionary algorithm is really to innovate - to find something that is innovative enough to learn how to better handle a particular problem.

Just some random musing on innovation….

-Chris


Machine Learning Resources

As I was doing some reading this week, I realized how thankful I have been for some excellent resources. I wanted to take a moment to enumerate a few machine learning resources that have been immensely helpful in the past few years of practical application of many machine learning projects.

IEEE
Computer.org
IEEE Computational Intelligence Society
MIT Journal of Evolutionary Computation

Journal of Machine Learning Research
Illinois Genetic Algorithms Laboratory (IlliGAL)

There is a LOT more out there. These resources have provided the lions share of material needed to work on cutting edge machine learning. In particular:

EDAs (Estimation of Distribution Algorithms)
MOOs (Multiobjective Optimizers)
Fuzzy Classifiers
Bayesian networks
AISs (Artificial Immune Systems)

Thanks and kudos to all the people working so hard at universities around the world and in other research areas. These people provide the work needed to bring these tools, devices and techniques into practical application.

-Chris


Real time data mining

For the past few years - I have been talking about/ working on what I have been calling ‘real time data mining’. Marc is back blogging again and he has an interesting post here - that lead me to this guys interesting blog. They are talking about ESP - event stream processing. In other places this is also called CEP - complex event processing. These things are an infrastructure technology for my ‘real time data mining’. I will be doing my best to check out the companies/ technologies in this space.
In the past - working on real time algorithmic trading systems - in particular working with depth-of-market data from the CME has shown me a few things:

  • Real time data can be plentiful (several GB per day per instrument for depth of market data)
  • Real time data can be fast (many coming ticks per second per instrument)
  • Real time data can be hard to keep a current state (depth of market data is broadcast up to 20 levels out from the current bid/ ask - you need to keep aggregate calculations based on the current bid/ ask spread which means you need to update your aggregates, scaling, normalization, formulas etc rapidly)
  • Get ready to deal with lots of threading :)

It sounds like having real companies that are able to enable good logical APIs that provide the needed capacity, performance etc. These enabling technologies will enable yet-another layer - the real time data mining layer. I have worked on this for real time trading - and I think this is/ will be a HUGH area for these technologies.

For a few years, I also worked on the 30th and 60th busiest sites on the web. These guys are basically micro-marketers. I was trying hard to drive them towards a real time data mining model. We did manage to get in place (thanks much to the efforts of my friend John) a good set of real time ‘decision logic’ - updateable rules based actions that react to real time and aggregated historic data. This is a step in the right direction. This area can however also be included in the group of companies that could use a decent set of standardized real time ESP/ CEP/ Real Time Data Mining- or let me coin ‘RTDM’. Since the BI space (business intelligence) space seems to have a zillion buzz words/ acronyms - I will add fuel to the fire.
In any case - cool stuff. I will keep working on my RTDM - (particularly optimization and classification)

-Chris


Artificial Immune Systems - AIS

I have long been interested in machine learning, AI, datamining, etc. One of the recently emerging (but not that ‘new’) areas in this space is Artificial Immune Systems (AIS).  Here is a great article for the beginner on the topic. It has examples in Delphi (pleasantly with unit tests). More importantly - it has nice pictures and explains the basics simply enough.

One of the things that I think is really interesting about this space is that there are lots of hybridizations. It is common to be using neural nets, evolutionary algorithms, AIS and reams of other techniques in one solution. The  No Free Lunch Theorem suggests that there will really never be a ’super algorithm’ that can work to solve all sorts of machine learning (and/ or optimization) problems with equal ability. This further implys that there will continually be the need for new algorithms/ techniques and tools to solve different kinds of problems. Put in the light of hybridized solutions for machine learning - the implication is that there will be lots of components to solve complex datamining/ optimization/ machine learning etc. problems. I am always enthusiastic when I see yet another emergent technique like AIS because this means that the components available to solve harder and harder problems are becoming available. The challenge of course is effectively understanding and combining these elements into effective devices for solving problems.

A few other interesting tangents

Swarm Intelligence
Ant Colony Optimization
Memetic Algorithms

A few great resources can take you far - they do cost a few $ however:

MIT Journal of Evolutionary Computation
IEEE Computer Society
IEEE Computational Intelligence Society

All great stuff

-Chris


Grid based trading system optimizer preview release

Neoticker’s grid based trading system optimizer preview release available.

Since I spent QUITE a lot of time writing optimizer(s) for trading systems - this is of particular interest. I am curious to see what features they have incorporated in.

Trading systems can have lots of parameters, and each parameter can have a wide range of possible values. This leads to a huge N dimensional search space. This means that many optimization techniques will have a very hard time coming up with good ‘answers’ and/ or take a lllooonnngg time. Having a distributed solution is key… so go neoticker folks. This has been in the works for some time - so I am curious to see when it will really be available for end users commercially.

-Chris


Buy me something from amazon - please :)

Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning) by Richard S. Sutton


Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation (Genetic Algorithms and Evolutionary Computation) by Pedro Larranaga


Accuracy Improvements in Linguistic Fuzzy Modeling (Studies in Fuzziness and Soft Computing, V. 129) by J. Casillas


Fuzzy Logic and the Internet (Studies in Fuzziness and Soft Computing, 137) by Vincenzo Loia


Advances in Bayesian Networks (Studies in Fuzziness and Soft Computing, V. 146) by Jose A. Gamez


Ant Colony Optimization (Bradford Books) by Marco Dorigo


Swarm Intelligence: From Natural to Artificial Systems (Santa Fe Institute Studies in the Sciences of Complexity Proceedings) by Eric Bonabeau


Swarm Intelligence by Russell C. Eberhart


Data Mining : Next Generation Challenges and Future Directions by Hillol Kargupta


An Introduction to Fuzzy Logic and Fuzzy Sets (Advances in Soft Computing) by James J. Buckley

Advanced Fuzzy Systems Design and Applications by Yaochu Jin


Evolutionary Optimization in Dynamic Environments (Genetic Algorithms and Evolutionary Computation) by Jurgen Branke


Designing Evolutionary Algorithms for Dynamic Environments by Ronald W. Morrison


A quick n’ dirty Naive Bayes Classifier


using System;
using System.Collections;
using System.Data;

namespace TradeWeapon.Classification.Bayesian
{
/// 
/// Summary description for Class1.
/// 
public class NaiveBayesClassifier
{
#region fields
private DataTable sourceData;
private ArrayList classes = new ArrayList();
private string classColumnName;
private const string TABLE_NAME = “source”;
#endregion fields
#region ctors
public NaiveBayesClassifier()
{
}
#endregion ctors
#region public methods
public void LoadDataSource(DataTable source, string classColName)
{
sourceData = source;
classColumnName = classColName;
ExtractClasses();
}
public void LoadDataSource(DataSet source, string tableName, string classColName)
{
sourceData = source.Tables[ tableName ];
LoadDataSource(sourceData,  classColName);
}
public void AddToSource(DataRow newRow)
{
AddToSource( newRow, false );
}
public void AddToSource(DataRow newRow, bool updateClasses)
{
if(sourceData == null)
{
throw new NullReferenceException(“sourceData has not been initialized”);
}
sourceData.Rows.Add( newRow );
if( updateClasses )
{
ExtractClasses();
}
}
public DataRow GetPrototypeRow()
{
if(sourceData == null)
{
throw new NullReferenceException(“sourceData has not been initialized”);
}
return sourceData.NewRow();
}
/// 
/// use GetPrototypeRow to get row to populate and add back
/// 
///

/// 
public string MostLikelyClass(DataRow unknown)
{
string mostLikelyClass = classes[0] as string;
double totalCount = Convert.ToDouble( sourceData.Rows.Count );
double probability = 0;
double bestProbability = 0;
foreach(string className in classes)
{
double countOfClass = Convert.ToDouble( FrequencyOfClass(mostLikelyClass) );
probability = countOfClass / totalCount;

foreach(DataColumn col in sourceData.Columns)
{
string name  = col.ColumnName;
if(name == classColumnName)
continue;

double count = CountOccurancesWhere(name, unknown[ name ].ToString(), className);
probability *=  (count / countOfClass);
}

if(probability > bestProbability)
{
mostLikelyClass = className;
bestProbability = probability;
}
}
return mostLikelyClass;
}
#endregion  public methods
#region private methods
private void ExtractClasses()
{
DataTable distinctTable = SelectDistinct(TABLE_NAME, sourceData, classColumnName);
foreach (DataRow row in distinctTable.Rows)
{
classes.Add( row[ classColumnName ] );
}
}
private int CountOccurances( string attributeName, string attributeValue )
{
string baseString = “[{0}] = ‘{1}’”;
string filter =  string.Format(baseString,attributeName,attributeValue);
return sourceData.Select(filter).Length;
}

private int CountOccurancesWhere( string attributeName, string attributeValue, string className )
{
string baseString = “[{0}] = ‘{1}’ and [{2}] == ‘{3}’”;
string filter = string.Format(baseString,attributeName,attributeValue , classColumnName, className);
return sourceData.Select(filter).Length;
}       

private int FrequencyOfClass(string className)
{
return CountOccurances( classColumnName, className );
}
private DataTable SelectDistinct(string tableName, DataTable sourceTable, string columName)
{
DataTable dt = new DataTable(tableName);
dt.Columns.Add(columName, sourceTable.Columns[columName].DataType);
object LastValue = null;
foreach (DataRow dr in sourceTable.Select(“”, columName))
{
if (  LastValue == null || !(ColumnEqual(LastValue, dr[columName])) )
{
LastValue = dr[columName];
dt.Rows.Add(new object[]{LastValue});
}
}
return dt;
}
private bool ColumnEqual(object A, object B)
{
// Compares two values to see if they are equal. Also compares DBNULL.Value.
// Note: If your DataTable contains object fields, then you must extend this
// function to handle them in a meaningful way if you intend to group on them.
if ( A == DBNull.Value && B == DBNull.Value ) //  both are DBNull.Value
{
return true;
}
if ( A == DBNull.Value || B == DBNull.Value ) //  only one is DBNull.Value
{
return false;
}
return A.Equals(B);  // value type standard comparison
}
#endregion private methods
}
}