Sunday, January 8, 2017

C++11 : String Replacer for Directory Files using PPL

Sequential algorithms can be easily replaced with the corresponding Parallel Algorithms. However, these algorithms can only be applied, if the loop operations are completely independent and they are not using any shared variables. The second condition can be relaxed by using concurrent data structures. This scheme is perfect for cases, in which we need to process a chain of independent items, such as modifying files in a directory.

The devil is in the details


Let us think a real-life case, in which we have a lot of CSV files in one specific directory. For example, we may have transaction files (containing cash flow tables, date schedule tables, parameters) produced by some front office system batch on a daily basis. These files are then used as feed for a third-party analytical software for some intensive calculations. The only problem is, that some of the information in feed files is not in recognizable form for third-party software. In a nutshell, feed files may contain some specific strings, which should be replaced with another string.


War stories


The next question is how to do this operation quickly and on a safe manner? Needless to say, there are pretty much as many approaches as there are people performing this heartbreaking operation. Personally, I have used

  • find-and-replace strategy with notepad++, until the amount of files to be processed and parameters to be replaced grew a bit too large.
  • custom batch script, until I realized that the script was sometimes not working as expected, by either leaving some cases out from replacement operations or resulting incorrect replacements. The scariest part was, that the script was actually working well most of the time.
  • custom Powershell script (created by someone else), which was using hard-coded configurations for all replacement strings and all strings which should be replaced. All configurations (hosted in source code file) needed to be set in a specific order. This was actually working well up to a point, where those hard-coded configurations should have been changed to correspond changes made in other systems. Moreover, execution times were a bit too high.

Finally, I decided to create my own program for handling this task.


United we fall, divided we stand


After learning a bit of parallel algorithms, I soon realized that this kind of a scheme (due to task independency) would be suitable candidate for implementing parallelism, in order to improve program execution speed. Each source file (containing transaction information) has to be processed separately and all key-value pairs (string-to-be-replaced and corresponding replacement string) can be stored in concurrent-safe concurrent unordered map as string pairs.


FileHandler.h


This header file is consisting free functions for all required file handling operations.

#pragma once
//
#include <filesystem>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <algorithm>
#include <sstream>
//
namespace MikeJuniperhillFileSystemUtility
{
 std::string ReadFileContentToString(const std::string &filePathName)
 {
  std::string data;
  std::ifstream in(filePathName.c_str());
  std::getline(in, data, std::string::traits_type::to_char_type(
   std::string::traits_type::eof()));
  return data;
 }
 //
 void WriteStringContentToFile(const std::string& content, const std::string &filePathName)
 {
  std::ofstream out(filePathName.c_str());
  out << content;
  out.close();
 }
 //
 std::vector<std::string> GetDirectoryFiles(const std::string& folderPathName)
 {
  std::vector<std::string> directoryFiles;
  using itr = std::tr2::sys::directory_iterator;
  for (itr it = itr(folderPathName); it != itr(); ++it)
  {
   directoryFiles.push_back(static_cast<std::string>((*it).path()));
  }
  return directoryFiles;
 }
 //
 // replace all substrings found (stringToBeReplaced) with another 
 // string (replacementString) in a given string (string)
 std::string ReplaceAllSubstringsInString(const std::string& string, 
  const std::string& stringToBeReplaced, const std::string& replacementString)
 {
  std::string result;
  size_t find_len = stringToBeReplaced.size();
  size_t pos, from = 0;
  while (std::string::npos != (pos = string.find(stringToBeReplaced, from))) 
  {
   result.append(string, from, pos - from);
   result.append(replacementString);
   from = pos + find_len;
  }
  result.append(string, from, std::string::npos);
  return result;
 }
 // read key-value pairs into std::map
 // (key = string to be found and replaced, value = replacement string)
 std::map<std::string, std::string> ReadKeysFromFileToMap(const std::string& filePathName, char separator)
 {
  std::map<std::string, std::string> keys;
  std::ifstream file(filePathName);
  std::string stream;
  std::string key;
  std::string value;
  //
  while (std::getline(file, stream))
  {
   std::istringstream s(stream);
   std::getline(s, key, separator);
   std::getline(s, value, separator);
   keys.insert(std::pair<std::string, std::string>(key, value));
  }
  return keys;
 }
}
//


Tester.cpp


First, there has to be a directory containing all source files, which are going to be processed by file processing program. Also, there has to be a specific file, which contains key-value pairs (key = string to be found and replaced, value = replacement string) for all desired string replacement cases. Main program is creating file processors into vector container (there will be as many processors as there are source files to be processed). Then, main program is executing all file processors. A single file processor is first reading source file information into string, looping through all key-value pairs and checking whether any occurences are to be found, performing string replacements and finally, writing modified information back to original source file.

#include <ppl.h>
#include <concurrent_vector.h>
#include <concurrent_unordered_map.h>
#include <functional>
#include <memory>
#include <chrono>
#include <iostream>
#include "FileHandler.h"
//
namespace MJ = MikeJuniperhillFileSystemUtility;
//
using Files = std::vector<std::string>;
using Pair = std::pair<std::string, std::string>;
using Keys = std::map<std::string, std::string>;
using GlobalKeys = concurrency::concurrent_unordered_map<std::string, std::string>;
using ProcessorMethod = std::function<void(void)>;
using FileProcessor = std::shared_ptr<ProcessorMethod>;
//
// create file processor
FileProcessor Factory(const GlobalKeys& keys, const std::string& filePathName)
{
 // deferred execution
 ProcessorMethod processor = [=, &keys](void) -> void
 {
  std::string fileUnderProcessing = MJ::ReadFileContentToString(filePathName);
  for each (Pair pair in keys)
  {
   // extract key (string to be found and replaced) and value (replacement string)
   Pair keyValuePair = pair;
   std::string key = std::get<0>(keyValuePair);
   std::string value = std::get<1>(keyValuePair);
   //
   // check if key exists in a given string (containing all information in current file)
   size_t found = fileUnderProcessing.find(key);
   if (found != std::string::npos)
   {
    // if exists, replace key with value
    fileUnderProcessing = MJ::ReplaceAllSubstringsInString(fileUnderProcessing, key, value);
   }
  }
  // finally write change into original file
  MJ::WriteStringContentToFile(fileUnderProcessing, filePathName);
 };
 FileProcessor fileProcessor = FileProcessor(new ProcessorMethod(processor));
 return fileProcessor;
}
//
int main(int argc, char* argv[])
{
 // get required path strings from command line
 std::string keysFilePathName = argv[1];
 std::string fileDirectoryPathName = argv[2];
 //
 // create list of files (files to be modified) and list of key-value pairs 
 // (key = string to be found and replaced, value = replacement string)
 Files files = MJ::GetDirectoryFiles(fileDirectoryPathName);
 char separator = ',';
 Keys keys = MJ::ReadKeysFromFileToMap(keysFilePathName, separator);
 //
 // insert keys into concurrency-safe unordered map
 GlobalKeys Keys;
 std::for_each(keys.begin(), keys.end(),
  [&](Pair pair) -> void { Keys.insert(pair); });
 //
 // create file processors :
 // each processor will be processing exactly one file (given filePathName) 
 // by looping through all keys-value pairs (in GlobalKeys) and searching for 
 // all occurrences of substring (key) and replacing this with another string (value)
 std::vector<FileProcessor> processors;
 std::for_each(files.begin(), files.end(),
  [&](std::string filePathName) -> void { processors.push_back(Factory(Keys, filePathName)); });
 //
 // execute file processor method for all file processors
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(processors.begin(), processors.end(),
  [](FileProcessor fp) -> void { (*fp)(); });
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = std::chrono::duration_cast<std::chrono::seconds>(end - start);
 std::cout << "processed in " << timeElapsed.count() << " seconds" << std::endl;
 //
 return 0;
}
//


User settings


In this particular case, a picture is worth more than thousand words.

A : arguments for executable to be used with command prompt
B : directory for key-value pairs file
C : the content of key-value pairs file
D : directory for source files.























Finally, thanks a lot again for reading this blog.-Mike

Saturday, January 7, 2017

C++11 : Multi-Threaded PathGenerator using PPL

 

FINAL DESTINATION


The circle has been closed. This post is kind of an aggregation, based on the last four posts published on generating random numbers. Initially, I started just with a simple template class for distributional random generator, then continued with a path generator using any one-factor stochastic process and finally, ended up with a multi-threaded distributional random generation scheme using Parallel algorithms. This final post (hopefully) is opening up my larger goal : to be able to generate asset price paths for any one-factor process, using multi-threading scheme.


GROUNDHOG DAY


Again, I have tested the both sequential (for_each) and parallel (parallel_for_each) schemes by using four generators, 10000 paths and 250 time steps for a single run. After this, I repeated this run for 250 times. Conclusion :

  • The average running time for this sample was 17116 milliseconds (sequential) and 8209 milliseconds (parallel). So, parallel scheme will be completed about two times faster. 
  • The actual CPU usage profiles during the simulation processes are behaving exactly as reported in this post. 
  • I also analyzed processed asset price paths for parallel scheme, just to be absolutely sure there are no path duplicates (random number generation would not be independent). Based on my analysis made in Excel, all processed asset price paths are different and there are no duplicates. 

Presented scheme for path generator is again fulfilling my two initial requirements : faster creation of asset price paths following any one-factor process and independency of random generators.


RandomGenerator.h


The basic functionality of this template class has not been changed, except for construction part : second constructor is allowing a client to give any probability distribution for uniform generator from outside of this class. Even there is actually no need for having this kind of optionality in real-life (most of the stuff in Monte Carlo is randomized by using standard normal distribution), I decided to implement this optionality for the sake of completeness.

#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include <random>
#include <memory>
//
namespace MikeJuniperhillRandomGeneratorTemplate
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 /// <summary>
 /// Template class for creating random number paths using mt19937 as default uniform 
 /// random generator and Standard Normal as default probability distribution.
 /// </summary> 
 class RandomGenerator
 {
 public:
  /// <summary>
  /// Constructor with explicit seed value
  /// </summary>
  RandomGenerator(unsigned long seed)
  {
   // construct function for processing distributional random number
   randomGenerator = [this](double x)-> double
   {
    x = distribution(uniformGenerator);
    return x;
   };
   // seed generator once
   uniformGenerator.seed(seed);
  }
  /// <summary> 
  /// Constructor for explicit seed value and client-given probability distribution.
  /// </summary>  
  RandomGenerator(unsigned long seed, const Distribution& distribution)
   // constructor delegation
   : RandomGenerator(seed)
  {
   // assign client-given probability distribution
   this->distribution = distribution;
  }
  /// <summary>
  /// Fill a given vector reference with distributional random numbers
  /// </summary> 
  void operator()(std::vector<double>& v) const
  {
   std::transform(v.begin(), v.end(), v.begin(), randomGenerator);
  }
 private:
  std::function<double(double)> randomGenerator;
  Generator uniformGenerator;
  Distribution distribution;
 };
}
//


OneFactorProcess.h


I decided to tag drift and diffusion functions with const declaration, since these functions should not modify the internal state of class data members.

#pragma once
//
namespace MikeJuniperhillOneFactorProcessLibrary
{
 /// <summary>
 /// Abstract base class for all one-factor processes for customizing 
 /// drift and diffusion functions for different types of processes.
 /// </summary>
 class OneFactorProcess
 {
 public:
  virtual double drift(double x, double t) const = 0;
  virtual double diffusion(double x, double t) const = 0;
 };
 //
 /// <summary>
 /// Implementation for Vasicek short-rate model.
 /// </summary>
 class Vasicek : public OneFactorProcess
 {
 public:
  Vasicek(double meanReversion, double longTermRate, double rateVolatility)
   : meanReversion(meanReversion), longTermRate(longTermRate), rateVolatility(rateVolatility) { }
  //
  double drift(double x, double t) const override { return meanReversion * (longTermRate - x); }
  double diffusion(double x, double t) const override { return rateVolatility; }
 private:
  double meanReversion;
  double longTermRate;
  double rateVolatility;
 };
}
//


PathGenerator.h


As in the case with RandomGenerator, the basic functionality of this template class has not been changed either, except for construction part : second constructor is allowing a client to give any probability distribution to be delivered for distributional random generator.

#pragma once
//
#include "RandomGenerator.h"
#include "OneFactorProcess.h"
namespace MJRandom = MikeJuniperhillRandomGeneratorTemplate;
namespace MJProcess = MikeJuniperhillOneFactorProcessLibrary;
//
namespace MikeJuniperhillPathGenerator
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 class PathGenerator
 {
 public:
  /// <summary>
  /// Constructor for PathGenerator template class.
  /// </summary>
  PathGenerator(double spot, double maturity, unsigned long seed,
   const std::shared_ptr<MJProcess::OneFactorProcess>& process)
   : spot(spot), maturity(maturity), process(process)
  {
   // create random generator
   generator = std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>>
    (new MJRandom::RandomGenerator<Generator, Distribution>(seed));
  }
  /// <summary>
  /// Constructor for PathGenerator template class, with a client-given probability distribution
  /// </summary>
  PathGenerator(double spot, double maturity, unsigned long seed,
   const std::shared_ptr<MJProcess::OneFactorProcess>& process, const Distribution& distribution)
   : spot(spot), maturity(maturity), process(process)
  {
   // create random generator with client-given probability distribution
   generator = std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>>
    (new MJRandom::RandomGenerator<Generator, Distribution>(seed, distribution));
  }
  /// <summary> 
  /// Fill a given vector reference with asset prices, following a given stochastic process.
  /// </summary>  
  void operator()(std::vector<double>& v) const
  {
   // transform initialized vector into a path containing random numbers
   (*generator)(v);
   //
   double dt = maturity / (v.size() - 1);
   double dw = 0.0;
   double s = spot;
   double t = 0.0;
   v[0] = s; // 1st path element will always be the current spot price
   //
   // transform distributional random number vector into a path containing 
   // asset prices from a given stochastic one-factor process
   for (auto it = v.begin() + 1; it != v.end(); ++it)
   {
    t += dt;
    dw = (*it) * std::sqrt(dt);
    (*it) = s + (*process).drift(s, t) * dt + (*process).diffusion(s, t) * dw;
    s = (*it);
   }
  }
 private:
  double spot;
  double maturity;
  std::shared_ptr<MJProcess::OneFactorProcess> process;
  std::unique_ptr<MJRandom::RandomGenerator<Generator, Distribution>> generator;
 };
}
//


Tester.cpp


Tester program is closely tracking the program presented in previous post. For the sake of additional clarity, I have used new type definitions in order to improve code readability and get rid of some lengthy variable names. The program is again using simple factory method for creating PathGenerator (function wrapped in shared pointer). In this program, OneFactorProcess implementation is created and delivered for factory method for processing. Finally, there is a method for printing processed paths to console for testing purposes.

#include <iostream>
#include <chrono>
#include <ppl.h>
#include <concurrent_vector.h>
#include "PathGenerator.h"
namespace MJGenerator = MikeJuniperhillPathGenerator;
//
// type definitions
using Path = std::vector<double>;
using Paths = concurrency::concurrent_vector<Path>;
using Process = std::shared_ptr<MJProcess::OneFactorProcess>;
using Processor = std::function<void(void)>;
using PathGenerator = std::shared_ptr<Processor>;
//
// thread-safe container for storing asset price paths, processed by path generators
Paths paths;
//
// printer for generated asset price paths
void Printer()
{
 std::for_each(paths.begin(), paths.end(),
  [](Path path) -> void
 {
  std::for_each(path.begin(), path.end(),
   [](double s) -> void
  {
   std::cout << s << ",";
  });
  std::cout << std::endl;
 });
}
//
// factory method :
// return path-generating function as function wrapper
// input arguments are common for all types of generators
PathGenerator Factory(double spot, double maturity, int nPaths, 
 int nSteps, unsigned long seed, const Process& process, Paths& paths)
{
 // create function for processing one-factor paths
 auto generator = [=, &process, &paths]() -> void
 {
  MJGenerator::PathGenerator<> oneFactorProcess(spot, maturity, seed, process);
  Path path(nSteps);
  for (auto i = 0; i != nPaths; ++i)
  {
   oneFactorProcess(path);
   paths.push_back(path);
  }
 };
 // return generator function as function wrapper
 return PathGenerator(new Processor(generator));
}
//
int main()
{
 // create vasicek process
 double longTermRate = 0.05;
 double meanReversion = 0.2;
 double rateVolatility = 0.0075; 
 Process vasicek = Process(new MJProcess::Vasicek(meanReversion, longTermRate, rateVolatility));
 //
 // define parameters and seed values for path generators
 int nGenerators = 4;
 int nPaths = 100;
 int nSteps = (250 + 1);
 std::vector<unsigned long> seed = { 10322854, 65947, 387528, 772399573 };
 //
 // use factory method for creating path generators
 double spot = 0.0095;
 double maturity = 3.0;
 std::vector<PathGenerator> generators;
 for (auto i = 0; i < nGenerators; i++) generators.push_back(
  Factory(spot, maturity, nPaths, nSteps, seed[i], vasicek, paths));
 //
 // parallel processing
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(generators.begin(), generators.end(),
  [](PathGenerator pg) -> void { (*pg)(); });
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
 std::cout << timeElapsed.count() << std::endl;
 //
 // print paths
 Printer();
 return 0;
}
//


Finally, thanks again for reading this blog.
-Mike

Sunday, January 1, 2017

C++11 : Multi-Threaded Random Generator using PPL

After some woodshedding with C++11 random and publishing distributional random generator in my blog a couple of weeks ago, the next question was, how to make this thing to happen a bit faster? Since I already have been snooping PPL (Parallel Patterns Library), I realized that Parallel Algorithms in that library would be an excellent candidate for this task. There is a great and comprehensive internet book available on this topic. After getting familiar with some of the issues presented in these sources, one should be able to parallelize algorithms succesfully.

There are some very specific issues, related to random generator seeding. As I started to parallelize my random number generation (creating several generators and processing these using parallel algorithm), I realized that my random generators were not working as expected. After some further woodshedding I came to conclusion, that (in a multi-threaded setting) each generator has to be seeded independently. Finally, concerning my current generator implementation, I made a decision that a client has to provide explicit seed values for all generators. Based on my hands-on experience with a system used for counterparty credit risk simulations (widely ranked as state-of-the-art numerical software), I dare to say that this approach is well justified. Now, the question of what kind of tools will be used for creating the actual seed values, has been left to a client judgement. The rabbit hole of these profound issues related to random generators have been chewed in this blog. Do not enter, if you are not absolutely ready to take the red pill.


ROUGH DIAGNOSTICS


I tested the both sequential (for_each) and parallel (parallel_for_each) schemes by using four generators, 10000 paths and 250 time steps for a single run. After this, I repeated this run for 250 times. The average running time for this sample was 5187 milliseconds (sequential) and 2317 milliseconds (parallel). So, parallel scheme will be completed about two times faster. The actual CPU usages during simulation process for the both schemes are shown in the screenshot below. When using parallel algorithm, computer is able to utilize all four cores in my computer.



I also tested all processed random paths for the both schemes, just to be absolutely sure there are no path duplicates (random number generation would not be independent). Based on my analysis made in Excel, all processed paths for the both schemes are different and there are no duplicates. Presented scheme is hereby fulfilling my two requirements : faster creation of random paths and independency of random generators.


RandomGenerator.h


Current implementation for distributional random path generator is presented below. Note, that the actual seeder creation takes place outside of this class.

#pragma once
#include <algorithm>
#include <functional>
#include <vector>
#include <random>
//
namespace MikeJuniperhillRandomGeneratorTemplate
{
 template <typename Generator = std::mt19937, typename Distribution = std::normal_distribution<double>>
 /// <summary>
 /// Template class for creating random number paths using
 /// Mersenne Twister as default uniform random generator and
 /// Standard Normal (0.0, 1.0) as default probability distribution.
 /// </summary> 
 class RandomGenerator
 {
 public:
  /// <summary>
  /// set explicit seed value
  /// </summary>
  RandomGenerator(unsigned long seed)
  {
   // construct lambda method for processing standard normal random number
   randomGenerator = [this](double x)-> double
   {
    x = distribution(uniformGenerator);
    return x;
   };
   // seed generator
   uniformGenerator.seed(seed);
  }
  /// <summary>
  /// fill vector reference with distributional random numbers
  /// </summary> 
  void operator()(std::vector<double>& v)
  {
   std::transform(v.begin(), v.end(), v.begin(), randomGenerator);
  }
 private:
  std::function<double(double)> randomGenerator;
  Generator uniformGenerator;
  Distribution distribution;
 };
}
//


tester.cpp


First, all simulation parameters and explicit seed values for all generators are defined. Simple factory method is then used to create the actual generator function as lambda function. Next, factory method is returning generator function wrapped inside a shared pointer. Main program is then storing each created generator function into vector container and finally processing all generators (sequentially or parallel). Each generator is storing generated paths into thread-safe nested vector (std::vector inside concurrency::concurrent_vector).

#include <iostream>
#include <memory>
#include <chrono>
#include <ppl.h>
#include <concurrent_vector.h>
#include "RandomGenerator.h"
namespace MJ = MikeJuniperhillRandomGeneratorTemplate;
//
//
// thread-safe nested vector for storing random paths processed by generators
concurrency::concurrent_vector<std::vector<double>> paths;
//
// prototype for generator function
using Generator = std::function<void(void)> ;
//
// factory method :
// return generator function as function wrapper
std::shared_ptr<Generator> Factory(int nPaths, int nSteps, unsigned long seed, 
 concurrency::concurrent_vector<std::vector<double>>& paths)
{
 // create generator function
 auto generator = [=, &paths]()->void
 {
  // step 1 : create distributional random generator
  MJ::RandomGenerator<> randomGenerator(seed);
  // step 2 : create container for a random path
  std::vector<double> path(nSteps);
  // step 3 : create path using distributional generator
  // and import created path into thread-safe container
  for (auto i = 0; i != nPaths; ++i)
  {
   randomGenerator(path);
   paths.push_back(path);
  }
 };
 // return generator function as function wrapper
 return std::shared_ptr<Generator>(new Generator(generator));
}
//
int main()
{
 // define parameters and seed values for generator functions
 int nGenerators = 4;
 int nPaths = 10000;
 int nSteps = 250;
 std::vector<unsigned long> seed = { 10322854, 65947, 387528, 772399573 };
 //
 // use factory method for creating generator function
 std::vector<std::shared_ptr<Generator>> generators;
 for (auto i = 0; i < nGenerators; i++) generators.push_back(Factory(nPaths, nSteps, seed[i], paths));
 //
 // parallel processing
 auto start = std::chrono::steady_clock::now();
 concurrency::parallel_for_each(generators.begin(), generators.end(),
  [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 //// uncomment for sequential processing
 //std::for_each(generators.begin(), generators.end(),
 // [](std::shared_ptr<Generator> ppg) -> void { (*ppg)(); });
 //
 auto end = std::chrono::steady_clock::now();
 auto timeElapsed = (end - start);
 std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(timeElapsed).count() << std::endl;
 return 0;
}
//


Finally, thanks for reading my blog. I would like to wish Happy New Year for everybody.
-Mike