Sunday, December 13, 2015

Implementing statistics gatherer design by using Boost library in C++

As a result of Monte Carlo simulation process, we get a lot of simulated values. After this, we usually want to calculate a set of desired statistics from those simulated values. A great statistics gatherer design has been thoroughly presented in The book by Mark Joshi. However, just for the curious and for the sake of trying to create something original (while respecting traditions), I wanted to implement my own statistics gatherer design, in which several different statistical measures could be calculated independently by using the same set of processed data. Moreover, the number of such calculation objects and their implementations for algorithmic details would be fully configurable. Finally, I wanted to avoid any direct coupling between data (object, which holds the source data) and algorithms (objects, which are performing the actual statistical calculations).

Why bother?


In the following example program below, a sample of discounted payoffs from Monte Carlo process has been hard-coded and two statistical measures are then calculated and printed out.

void MonteCarloProcess()
{
 // process has been producing the following discounted payoffs
 // for this sample : average = 10.4881, standard error = 1.58502
 double discountedPayoffs[] = { 18.5705, 3.31508, 0.0, 3.64361, 0.0, 0.0, 
  47.2563, 10.6534, 85.5559, 0.0, 30.2363, 0.0, 17.8391, 2.15396, 0.0, 
  66.587, 0.0, 9.19303, 0.0, 0.0, 24.2946, 29.6556, 0.0, 0.0, 0.0, 65.926, 
  0.0, 14.0329, 1.43328, 0.0, 0.0, 0.0, 1.37088, 0.0, 2.49095, 21.4755, 
  36.5432, 0.0, 16.8795, 0.0, 0.0, 0.0, 19.8927, 11.3132, 37.3946, 10.2666, 
  26.1932, 0.0, 0.551356, 29.7159, 0.0, 31.5357, 0.0, 0.0, 4.64357, 4.45376, 
  21.6076, 12.693, 16.0065, 0.0, 0.0, 0.0, 0.0, 25.9665, 18.7169, 2.55222, 
  25.6431, 8.5027, 0.0, 0.0, 29.8704, 0.0, 22.7266, 22.8463, 0.0, 0.0, 0.0, 
  0.0, 4.90832, 13.2787, 0.0, 0.0, 9.77076, 24.5855, 12.6094, 0.0, 0.0, 1.92343, 
  5.66301, 0.0, 0.0, 13.6968, 0.0, 0.0, 35.2159, 0.0, 8.44648, 7.21964, 0.0, 19.2949 };
 //
 // dump array data into vector
 std::vector<double> data(discountedPayoffs, std::end(discountedPayoffs));
 //
 // calculate and print mean and standard error
 double mean = AlgorithmLibrary::Mean(data);
 double standardError = AlgorithmLibrary::StandardError(data);
 std::cout << mean << std::endl;
 std::cout << standardError << std::endl;
}

There is nothing fundamentally wrong in this example, but a great deal of it could be done in a bit different manner. The scheme presented in this example offers a chance to explore some modern tools for implementing flexible and configurable C++ programs.

Chinese walls


Personally, I would prefer to separate data (discounted payoffs) and algorithms (mean, standard error) completely. Ultimately, I would like to have a design, in which a process (monte carlo) is generating results and adding those results into a separate container. Whenever this process is finished, it will send a reference of this container for several entities, which will calculate all required statistics independently.

There are many ways to implement this kind of a scheme, but I have been heavily influenced by delegates stuff I have learned from C#. Needless to say, the equivalent mechanism for C# delegate in C++ is a function pointer. However, instead of raw function pointer I will use Boost.Function and Boost.Bind libraries.

My new design proposal will have the following components :
  • AlgorithmLibrary for calculating different statistical measures. The header file for this contains collection of methods for calculating different statistical measures.
  • ResultContainer class for storing processed results and function pointers (boost function) which are sending processed results for further calculations.
  • StatisticsElement class for storing a value of statistical measure and function pointer (boost function) for an algorithm, which can be used for calculating required statistical measure.

 

Strangers in the night


In the first stage, StatisticsElement object will be created. This object will host a single statistical measure and a pointer to an algorithm (boost function) for calculating this specific measure. By giving required algorithm as a pointer (boost function), the object is not tied with any hard-coded algorithm. In the case there would be a need for another type for algorithm implementation for calculating required statistical measure, the scheme is flexible enough to be adjusted. In the constructor of this class, we are giving this pointer to an algorithm (boost function). Moreover, this object will be indirectly connected with ResultContainer object with a function pointer (boost function). Function pointer (boost function) will be created and binded (boost bind) with the specific method (process) of StatisticsElement object.

In the second stage, ResultContainer object will be created and all previously created function pointers (boost function, boost bind) for processing calculation results will be added into ResultContainer. This object is ultimately being shared with a process. Process will generate its results (double) and these results will be added into container object. When a process is finished, container object method (sendResults) will be called. The call for this method will trigger a loop, which will iterate through a vector of function pointers (boost function) and sending a reference for a result vector to all connected StatisticsElement objects.

Finally, the client program (in this example : main) will request the calculated statistical measures directly from all StatisticsElement objects. It should be stressed, that these two objects described above, do not have any knowledge about each other at any point. ResultContainer is just storing updates from a process and finally sending results "to somewhere" when the processing is over. StatisticsElement objects are processing their own calculation procedures as soon as they will receive results "from somewhere". It should also be noted, that this design actually implements observer pattern, where ResultContainer object is Observable and StatisticalElement objects are Observers.


Proposal


// ResultContainer.h
#pragma once
#include <vector>
#include <boost\function.hpp>
//
// class for storing processed results and function pointers 
// which are sending results for further processing
class ResultContainer
{
private:
 // container for storing processed results
 std::vector<double> results;
 // container for storing function pointers
 std::vector<boost::function<void
  (const std::vector<double>&)>> resultSenders;
public:
 // method for adding one processed result into container
 void addResult(double result);
 // method for adding one function pointer into container
 void addResultSender(boost::function<void
  (const std::vector<double>&)> resultSender);
 // method for sending all processed results
 void sendResults() const;
};
//
//
//
// StatisticsElement.h
#pragma once
#include <vector>
#include <boost\function.hpp>
//
// class for storing a value of statistical measure and 
// function pointer for an algorithm which can be used 
// for calculating required statistical measure
class StatisticsElement
{
private:
 // statistical measure
 double statisticsValue;
 // function pointer to an algorithm which calculates 
 // required statistical measure
 boost::function<double(const std::vector<double>&)> algorithm;
public:
 // parameter constructor
 StatisticsElement(boost::function<double
  (const std::vector<double>&)> algorithm);
 // method for processing data in order to 
 // calculate required statistical measure
 void process(const std::vector<double>& data);
 // method (overloaded operator) for accessing 
 // calculated statistical measure
 double operator()() const;
};
//
//
//
// AlgorithmLibrary.h
#pragma once
#include <vector>
#include <numeric>
//
// algorithms library for calculating statistical measures
namespace AlgorithmLibrary
{
 // calculate arithmetic average
 double Mean(const std::vector<double>& data) 
 { 
  return std::accumulate(data.begin(), data.end(), 0.0) 
   / data.size(); 
 }
 // calculate standard error estimate
 double StandardError(const std::vector<double>& data) 
 { 
  double mean = AlgorithmLibrary::Mean(data);
  double squaredSum = std::inner_product(data.begin(), 
   data.end(), data.begin(), 0.0);
  return std::sqrt(squaredSum / data.size() - mean * mean) 
   / std::sqrt(data.size());
 }
}
//
//
//
// ResultContainer.cpp
#include "ResultContainer.h"
//
void ResultContainer::addResult(double result)
{
 results.push_back(result);
}
void ResultContainer::addResultSender(boost::function<void
 (const std::vector<double>&)> resultSender)
{
 resultSenders.push_back(resultSender);
}
void ResultContainer::sendResults() const
{
 std::vector<boost::function<void
  (const std::vector<double>&)>>::const_iterator it;
 for(it = resultSenders.begin(); it != resultSenders.end(); it++)
 {
  (*it)(results);
 }
}
//
//
//
// StatisticsElement.cpp
#include "StatisticsElement.h"
//
StatisticsElement::StatisticsElement(boost::function<double
 (const std::vector<double>&)> algorithm) 
 : statisticsValue(0.0), algorithm(algorithm) 
{
 //
}
void StatisticsElement::process(const std::vector<double>& data)
{
 if(algorithm != NULL) statisticsValue = algorithm(data);
}
double StatisticsElement::operator()() const
{
 return statisticsValue;
}
//
//
//
// MainProgram.cpp
#include <boost\bind.hpp>
#include <iostream>
#include "AlgorithmLibrary.h"
#include "ResultContainer.h"
#include "StatisticsElement.h"
//
void MonteCarloProcess(ResultContainer& resultContainer)
{
 // process has been producing the following discounted payoffs
 // for this sample : average = 10.4881, standard error = 1.58502
 double discountedPayoffs[] = { 18.5705, 3.31508, 0.0, 3.64361, 0.0, 0.0, 
  47.2563, 10.6534, 85.5559, 0.0, 30.2363, 0.0, 17.8391, 2.15396, 0.0, 
  66.587, 0.0, 9.19303, 0.0, 0.0, 24.2946, 29.6556, 0.0, 0.0, 0.0, 65.926, 
  0.0, 14.0329, 1.43328, 0.0, 0.0, 0.0, 1.37088, 0.0, 2.49095, 21.4755, 
  36.5432, 0.0, 16.8795, 0.0, 0.0, 0.0, 19.8927, 11.3132, 37.3946, 10.2666, 
  26.1932, 0.0, 0.551356, 29.7159, 0.0, 31.5357, 0.0, 0.0, 4.64357, 4.45376, 
  21.6076, 12.693, 16.0065, 0.0, 0.0, 0.0, 0.0, 25.9665, 18.7169, 2.55222, 
  25.6431, 8.5027, 0.0, 0.0, 29.8704, 0.0, 22.7266, 22.8463, 0.0, 0.0, 0.0, 
  0.0, 4.90832, 13.2787, 0.0, 0.0, 9.77076, 24.5855, 12.6094, 0.0, 0.0, 1.92343, 
  5.66301, 0.0, 0.0, 13.6968, 0.0, 0.0, 35.2159, 0.0, 8.44648, 7.21964, 0.0, 19.2949 };
 //
 // dump array data into vector
 std::vector<double> data(discountedPayoffs, std::end(discountedPayoffs));
 // create vector iterator, loop through data and add items into result container object
 std::vector<double>::const_iterator it;
 for(it = data.begin(); it != data.end(); it++)
 {
  resultContainer.addResult(*it);
 }
 // trigger result processing for all 'connected' statistical element objects
 resultContainer.sendResults();
}
//
int main()
{
 // create : function pointer to mean algorithm, statistics element 
 // and function pointer to process method of this statistics element
 boost::function<double(const std::vector<double>&)> meanAlgorithm = 
  AlgorithmLibrary::Mean;
 StatisticsElement mean(meanAlgorithm);
 boost::function<void(const std::vector<double>&)> resultSenderForMean = 
  boost::bind(&StatisticsElement::process, &mean, _1);
 //
 // create : function pointer to standard error algorithm, statistics element and 
 // function pointer to process method of this statistics element
 boost::function<double(const std::vector<double>&)> standardErrorAlgorithm = 
  AlgorithmLibrary::StandardError;
 StatisticsElement standardError(standardErrorAlgorithm);
 boost::function<void(const std::vector<double>&)> resultSenderForStandardError = 
  boost::bind(&StatisticsElement::process, &standardError, _1);
 //
 // create : result container and add previously created function 
 // pointers (senders) into container
 ResultContainer resultContainer;
 resultContainer.addResultSender(resultSenderForMean);
 resultContainer.addResultSender(resultSenderForStandardError);
 //
 // run (hard-coded) monte carlo process
 MonteCarloProcess(resultContainer);
 //
 // print results from the both statistics elements
 std::cout << mean() << std::endl;
 std::cout << standardError() << std::endl;
 return 0;
}


Help


Concerning the actual installation and configuration of Boost libraries with compiler, there is a great tutorial by eefelix available in youtube. For using Boost libraries, there is a document available, written by Dimitri Reiswich. Personally, I would like to present my appreciations for these persons for their great contribution.

Thanks for reading my blog. Have a pleasant wait for the Christmas.
-Mike