10 Years, 11 Months, 16 Days since my first blog post here

10 Years, 11 Months, 16 Days

I thought I was due a blog post. It has really been a while.

I have recently read a slew of books, here are my brief comments:

HFT/ Algo-trading related:

Management/ Interpersonal:

Go read.

You can see all of my real reading list + ratings here on Goodreads. (Much more sci-fi, etc.)

There you have it. Finally – an update!

-Chris-

 

 

Hive, Hadoop and the rest of that world….

I recommend watching this over at Info Q for a solid overview of the space: http://www.infoq.com/presentations/Introducing-Apache-Hadoop . I like the frame here; this is a data ‘operating system’. There needs to be a macro level data rationalisation of data in the world and Hadoop is the right ‘base layer’ IMO (based on todays technology offerings out there). Even though this technology is somewhat ‘old’ at this point it is much newer than say old style relational classical SQL DB and it is now at a reasonable point of maturity for general adoption.

Google’s Spanner is worth reading about for what is now ‘newer’ tech, but not ready for mainstream consumption unless you want to build your own.

  1. Setup some Hadoop for yourself – Ubuntu “quick” guide here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
  2. Setup Hive here: https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration

I think the whole ecosystem here consisting of Hadoop, Hive here is great. Add to the list of useful related technologies that are just available now to get and use:

  • Flume (get data in!)
  • sqoop interop/ cope with traditional relational DBs
  • Pig more ETL like tool, not sure if it is redundant with Hive/ other techs yet…
  • OpenTSDB time series database – useful for capturing data that is … well – a time series (think app metric streams)

This is all makes the case for a large scale data management environment – using open source tools – that can handle massive amounts of data in many different forms.

 

Weekend reads (market impact models, Markov models, probability and statistics)

 

GCC 4.6.1 on Mac Lion

Good instructions here, but too much klunking around….

On Lion, will need to download:

../configure --prefix=$HOME/my_gcc ABI=64
../configure --prefix=$HOME/my_gcc --with-gmp=$HOME/my_gcc ABI=64
../configure --prefix=$HOME/my_gcc --with-gmp=$HOME/my_gcc --with-mpfr=$HOME/my_gcc ABI=64
../configure --prefix=$HOME/my_gcc --enable-checking=release --with-gmp=$HOME/my_gcc --with-mpfr=$HOME/my_gcc --with-mpc=$HOME/my_gcc --program-prefix=my- ABI=64

FastFlow

FastFlow (???) is a parallel programming framework for multi-core platforms based upon non-blocking lock-free/fence-free synchronization mechanisms.

Ooh, aah. This is lovely. Look out disruptor, there is nothing new under the sun apparently.

Very, very nice.A work colleague pointed FastFlow out to me yesterday and I have spent the morning reading/ playing. Big fan so far.