On 24th October 2009 Christopher D. Clack was awarded the Doctor of Science (ScD) degree from the University of Cambridge - the university’s highest degree, awarded for distinction in the advancement of science, and conferred on scientists "with a proven record of internationally recognised scholarship, including substantial and sustained contributions to scientific knowledge".
This award recognises Clack's contribution to Computer Science in a research career at UCL that started in 1984 with the development of the world's first parallel graph reduction computer system made from stock hardware. Since then, Clack's research has covered four areas:
Distinguishing features of Clack's research have been his ability to engage with scientists of other disciplines, to explore how research developments in Computer Science can benefit science more broadly, and to validate the impact of his research via engagement with industry.
In 2009 Clack secured funding from industry, the Economic and Social Research Council, the Natural Environment Research Council, and the Technology Strategy Board to launch a national Knowledge Transfer Network for financial services.
The InterDyne Simulator is designed to support exploration of interaction dynamics and feedback effects in non-smooth complex systems. It is a general-purpose tool that can be applied to a wide variety of systems, though at UCL its primary use has been to simulate interaction dynamics in the financial markets.
Development of the InterDyne simulator
Development of the InterDyne Simulator began in 2011, initially implemented using the functional language Miranda and then ported to Haskell. Several UCL students and researchers have been involved either in the development of InterDyne, or in using InterDyne to run experiments in Interaction Dynamics. [Learn more]
InterDyne documents
A draft InterDyne User Manual is available; this provides an introduction to the basic features of InterDyne. Other working papers and project dissertations related to InterDyne are also available. [Learn more]
Development of the InterDyne Simulator began in 2011, initially implemented using the functional language Miranda (link)(link to book).
To improve performance and to take advantage of improved profiling tools, InterDyne has since been ported to the functional language Haskell (link). The Glasgow Haskell Compiler (GHC) generates optimised and efficient native code.
Several UCL students and researchers have been involved either in the development of InterDyne, or in using InterDyne to run experiments in Interaction Dynamics. These have included (with apologies for any ommissions):
Prior to the development of InterDyne, other approaches to simulating Interaction Dynamics were explored. InterDyne occupies a "Goldilocks position" that neither models at a level of abstraction that is too high (eg probabilistic modelling) nor at a level that is too low (eg process modelling). Several student projects contributed to previous approaches, and these have included:
The InterDyne Simulator - Documents
A draft InterDyne User Manual is available; this provides an introduction to the basic features of InterDyne:
The following working paper illustrates how InterDyne's dynamic interaction model can be used to gain in-depth understanding of financial market instability:
The InterDyne Simulator is not yet properly documented. However, several student projects at UCL have been based on either the development of the underlying technology or the use of the simulator, or both. Some related project dissertations are provided below:
Vikram Bakshi implemented a fairly comprehensive FIX engine (so that components in a simulation of financial markets can interact via messages that contain information in a way that corresponds fairly closely to reality) and also implemented a component that simulates a subset of the behaviour of the BATS BYX exchange. Note however that although an exchange may include a FIX tag within one of its defined input or output messages it might only accept a subset of the qualifiers which are defined by FIX. Vikram therefore designed a mechanism by which exchange-specific messages can be defined, and can be tested for correctness by the compiler (using the type system). The mechanism for interaction with the BATS BYX exchange component is explained briefly in the following document: (link to document).
The InterDyne Simulator - Modelling
An InterDyne model can be viewed either as an analytically-tractable executable specification or as an agent-based model. Taking the latter approach, the model consists of a number of agents communicating via the passing of messages. By defining in advance which agents may communicate with which other agents, a graph structure is created where the nodes are agents and the edges are permissible one-to-one communication paths. Agents may also send a one-to-many message to a broadcast channel; each agent declares in advance the broadcast channels to which it will listen.
InterDyne is run by executing the function "sim" applied to appropriate arguments. The function sim has type sim :: Int -> [Arg_t] -> [(Agent_t, [Int])] -> IO() where the first argument is the number of time steps for the simulation, the second argument is a list of (key, value) pairs ("runtime arguments") that are made available to every agent in the system, and the third argument is a list of information about each agent (namely, a two-tuple containing the agent function of type Agent_t and a list of broadcast channel IDs to which the agent will listen). Output is sent to a file. Each agent is uniquely identified by its position in the list of agent information - the first agent has ID=1, the second has ID=2 and so on. ID=0 is reserved for the "simulator harness" function that mediates messaging and controls the passage of time during simulation. The agent identifiers are used to specify the source and destination of all one-to-one messages.
Agents are functions that consume a potentially-infinite list of inbound messages and generate a potentially-infinite list of outbound messages. At each time step an agent must consume one item from the inbound list and must generate one new item on the outbound list. Each item in these lists is itself a list, so that at each time step an agent may receive multiple inbound messages and may generate multiple outbound messages. If an agent does not have any messages to receive or send at a given time step then it will either receive or generate an empty list. Optionally an agent may distinguish between an output item that is an empty list by mistake and an output item that is an empty list by design - it does this by generating an output item that is an empty list containing the distinguished empty message called a "Hiaton".
The InterDyne Simulator - Modelling Example
In early experiments all messages were sent using agent identifers. It is still possible to do this, but we have since found it useful to refer to agents by name rather than by ID number, and we now recommend inclusion in the list of "runtime arguments" a function that when executed will convert an ID into a name or vice-versa. This is included in the runtime arguments so that it can be used by any agent and so that the mapping of names to identifers is in the control of the experimenter. The following example illustrates how the function sim can be called:
import Simulations.RuntimeArgFunctions as RTAFuns
exampleExperiment :: IO ()
exampleExperiment
= do
sim 60 myargs (map snd myagents)
where
myargs = [ convert ]
myagents = [ ("Trader", (traderWrapper, [1])),
("Broker", (brokerWrapper, [3])),
("Exchange",(exchangeWrapper, [2,3]))
]
convert = RTAFuns.generateAgentBimapArg myagents
In the above example sim is instructed to run for 60 time steps; there is only one runtime argument called convert, and the third argument to sim is a list of agents and broadcast channels on which they will listen. The convert function is a partial application of the library function generateAgentBimapArg (available in InterDyne v0.25 and later) to myagents - the result will be a function that will convert a name to an ID and vice versa. The first agent subscribes to broadcast channel 1, the second subscribes to channel 3, and the third subscribes to channels 2 and 3. This example does not illustrate how to define an output file for the results, nor how to use names instead of integers for broadcast channels, nor how to specify the legal communications topology and the delays that should be applied to each communication link. It does however indicate the parsimonious style that can be achieved when using Haskell.
Agents are typically (but not always) written in two parts: (i) a "wrapper" function that manages the consumption of inbound messages, the generation of outbound messages, and the update of local state, and (ii) a "logic" function that is called by the wrapper function and which calculates the messages to be sent. The "wrapper" function is the true agent function, and it must be of type Agent_t
Here is a simple agent wrapper that does nothing (at each time step it consumes an inbound item, and creates an empty outbound item). It does not call a logic function:
f :: Agent_t
f st args ((t, msgs, bcasts) : rest) myid
= [] : (f st args rest myid)
The agent function is recursively defined and loops once per time step. It takes four arguments: st is a local state variable (in this example it is never inspected and never changed), args is a copy of the runtime arguments (every agent is passed a copy of all the runtime arguments), the last argument myid is the ID of this agent (which is decided by the simulator and should never be changed) and the third argument is the list of inbound items - each item is a 3-tuple containing (i) the current time (an integer); (ii) a list of all one-to-one messages sent to this agent by other agents; and (iii) a list of all broadcast messages available at this time step on all the broadcast channels to which this agent is subscribed.
When considering dynamical interaction between system components, we may wish to (i) define in advance the topology of legal interactions, and (ii) define the interaction delay that should be applied to such legal interactions (noting that delays may be asymmetric - i.e. that the delay from component A to component B may not be the same as the delay from B to A). InterDyne does not require the topology and delays to be defined, but provides support for such definition (from version 0.24 onwards).
To define topology and delays, two runtime arguments must be passed to the "sim" function (both arguments must be present): first, a function that takes two agent IDs (integers that uniquely specify the start point and end point of an interaction) and returns an integer delay in timesteps; and second, the maximum delay in the system. From InterDyne version 0.25 the delay function argument uses the data constructor DelayArg and the identifying string "DelayArg" and for backwards compatibility the maximum delay is a Double but should represent a whole number of time steps.
The experimenter has complete freedom to define the delay function, and if the interaction specified by the two agent IDs is illegal then the delay function should raise an error. Here is a very simple example for three agents:
import Simulations.RuntimeArgFunctions as RTAFuns
exampleExperiment :: IO ()
exampleExperiment
= do
sim 60 myargs (map snd myagents)
where
myargs = [ convert,
(Arg (Str "maxDelay", maxDelay)),
(DelayArg (Str "DelayArg", delay)) ]
myagents = [ ("Trader", (traderWrapper, [1])),
("Broker", (brokerWrapper, [3])),
("Exchange",(exchangeWrapper, [2,3]))
]
convert = RTAFuns.generateAgentBimapArg myagents
delay 1 2 = 1
delay 1 x = error "illegal interaction"
delay 2 x = 2
delay 3 2 = 3
delay 3 x = error "illegal interaction"
maxDelay = fromIntegral 3
InterDyne experimenters often find it convenient to drive the delay function from an adjacency matrix. After setting the delays as shown above, the experimenter need do no more; one-to-one messages are automatically delayed by the stated number of timesteps, and broadcast messages are split into separate messages for each recipient and delayed by the amount stated for each link.
Clack's research focuses on the application of Computer Science in Finance. For example, Financial Computing research has drawn on Clack's expertise in Genetic Computation (GC), Multi-Agent Simulation (MAS), and Functional Programming:
Dynamic Interaction in Financial Markets
Following the May 6, 2010 Flash Crash there is has been increasing interest in the dynamic behaviour of financial markets.
Given that the majority of exchange-based trading is computer-based, often with computers deciding how much to trade at what prices and at which venues, and given the rise of High Frequency Traders both as proprietary traders and as market makers, we are particularly interested in the dynamic behaviour of interacting financial algorithms. This includes inter-alia market-making algorithms, benchmark algorithms, arbitrage algorithms, momentum algorithms, contrarian algorithms and the matching algorithms used in the exchanges.
Our research is proceeding along the following lines:
To model dynamic interaction in discrete time. We focus on discrete-time modelling because in reality all interaction between computers is conducted in discrete time (albeit very fast discrete time, as data is latched into and out of the interfaces of digital computers), and because attempting to apply a continuous-time model to a discrete-time system can be problematic (cf Brigo and Mercurio 2000).
To model (i) detailed algorithms and strategies derived from published academic work; (ii) unpublished algorithms derived from private communication and personal experience; and (iii) representative algorithms that distil certain characteristics of real algorithms.
To explore the dynamics of simple dynamical systems by detailed specification of the associated discrete-time equations, by tracing the results of those interacting equations, and by analysing various state spaces of those systems. Of particular interest are issues such as information delay and feedback networks.
To explore the dynamics (including delay and feedback) of complex dynamical systems via simulation. To this end we have developed a Simulator for Interaction Dynamics (our working name for this simulator is "InterDyne"). The simulator is currently used to simulate interaction dynamics in financial markets, but the InterDyne framework is potentially applicable to other domains as well. The InterDyne Simulator supports precise control over communication topology and each communication link may have an associated information delay.
InterDyne has a very rich messaging system that supports a wide range of messages from the very simple (for which examples are given below) to the very complex (for example, detailed FIX messages between components of a financial system). The messaging system is of central importance to InterDyne since messages are the medium for interaction between components.
Messages are all of type Msg_t and are either (i) one-to-one messages, or (ii) broadcast messages. All messages include a pair of integers that indicate (i) the IDs of the sending and receiving agents for one-to-one messages, or (ii) the sending agent ID and the receiving broadcast channel ID for broadcast messages. Examples include:
If a message is sent to agent ID 0 it receives special treatment. Agent ID 0 is reserved for the simulator harness, and a message sent to ID 0 is printed to an output file. This is the primary mechanism for defining the output of the simulator. Currently the main output file is the "trace file" that records all messages sent to ID 0 except for those with constructor Datamessage - these messages are instead output to a special file "data.csv", and this provides mechanism for structured output that can be viewed as a spreadsheet.
Here is a simple agent wrapper that sends a debug message to the output trace file (via the simulator harness) at every time step. It does not call a logic function:
f :: Agent_t
f st args ((t, msgs, bcasts) : rest) myid
= [m] : (f st args rest myid)
where
m = Debugmessage (myid,0) ("Debug "++(show t))
The financial markets are subject to large and unpredictable changes. Computer algorithms that interact with financial markets have three choices: (i) adapt rapidly to change, (ii) be robust to change, or (iii) fail. But how should a Genetic Computation (GC) stock-picking system respond to volatile markets? After a market shift, the GC will need to re-train on new data, but initially there is insufficient data for re-training. Clack's research team has worked to evolve solutions that are robust to market changes:
The following paper was nominated for a Best Paper award; is now cited in Hitoshi Iba's key new book surveying the field of Evolutionary Computation in Financial Engineering; was supported by Thomson Reuters; prompted seven invited talks at industry conferences; and the work was also presented at two invited talks at subsequent GECCO conferences. This novel work is the first to identify a significant instability whereby cautious investment strategies can unwittingly become adventurous, and vice versa, following a shift in market regime.
The following invited paper in the journal special issue on “Bio-Inspired Learning and Intelligent Systems” presents a Genetic Programming technique for portfolio optimisation; the first such technique to propose preserving diversity of dynamic phenotypic behaviour, and optimisation of investment returns rather than price-prediction. It is robust to market volatility, reducing down-time while gathering data for retraining, and demonstrates a 3-times performance improvement. The technique is being used by SIAM Capital, is included in Hitoshi Iba's recent book on Evolutionary Computation in Financial Engineering, and has been shown to have superior performance to Support Vector Machines (see above).
Financial time series analysis by Genetic Algorithms (GA) is characterised by large datasets with many non-linear dependencies. The default assumption that all genes are non-linearly linked leads to very large search times and intractability. Linkage learners can improve performance by identifying non-linear linkage, and dividing the search space where linkage does not occur. However, these learners are slow. The first paper below presents a novel perturbation-based linkage-learning algorithm that focuses on the semantics of composability, with consequential speed improvements, and the second paper applies linkage detection to financial time series:
Combining Multi-agent Simulation and Genetic Computation improves on simple analysis of historical data by incorporating feedback and learning. The following papers demonstrate how a simulation of hedging pressure in futures markets can cast doubt on predictions from previous analyses:
Agent Based Simulation (ABS) uses many autonomous rule-based "agents" interacting via co-operation and competition to simulate a real-world system; a particular challenge is to understand those interactions that lead to emergent and adaptive behaviours. The work undertaken by Clack with his colleagues has provided novel and effective solutions to research challenges in two areas:
Biological systems are highly complex, and understanding how they respond and adapt to a dynamic environment is particularly challenging. Agent-based simulation can provide a convenient medium for domain experts to express and communicate complex ideas during hypothesis formulation. The following paper provides a high-level introduction to Bioscience Computing:
Multi-agent simulation (MAS) techniques support exploration of non-evolutionary adaptations within a single lifetime. The following papers explore the computational processes that underlie specific biological mechanisms and for example demonstrate how a reasonably simple MAS can exhibit lifetime adaptation with multiple goal-oriented behaviours as emergent behaviour:
Clack collaborated with Eileen Cox (Natural History Museum) to supervise Katie Bentley's work in developing a novel agent-based technique to simulate morphological dynamics in living tissue; the following paper addresses diatom morphology and the technique has since been applied to cancer tumour morphology. This work has led to the founding of a morphogenesis modelling laboratory at Harvard. Unlike previous models, this simulation recreates detailed mechanisms and chemical interactions and the following paper (JCR impact factor 2.071) provides unexpected evidence that a single mechanism produces two different types of morphogenesis. Rib-domination phenomena are faithfully reproduced and the new model, experimental methodology, and results, were assessed against state of the art techniques and SEM data by the Natural History Museum.
Agent Based Simulation Theory
How can complex behaviour in Biology and the Financial Markets be modelled, detected and analysed? Theoretical work in this area develops the new concept of "complex events" that extend in both time and space:
The novel Complex-Event formalism described in the following paper has since been applied by oncologists to model cancerous tumour growth, with specific success in the field of colonic crypt tumorigenesis. The journal is ranked 3rd out of 14 journals covering the field of computer simulation. The formalism supports the modelling of phenomena that are distributed simultaneously across different physical and temporal levels; superiority over alternative modelling techniques has been independently validated, and as further validation this paper demonstrates how it extends the established X-Machines formalism.
Genetic Programming (GP) systems were originally cumbersome and unable to evolve complex programs. But are these necessary impediments? Can GP be modified to become more efficient in the evolution of complex programs including polymorphism, recursion and higher-order functions? Building on earlier work in Functional Programming, a novel approach investigated the use of pure lambda-expression parse trees, augmented the GP system with a polymorphic higher-order type system, and incorporated recursion over finite lists. The results show substantial benefit, increase in expressiveness, decrease in parse tree size, and improved performance:
Traditional Genetic Algorithm (GA) behaviour was also unsatisfactory - to achieve good performance a crossover operator had to be chosen to match the chromosome encoding. By using explicit and detailed history about which previous modifications were beneficial and which were not, can a general-purpose GA crossover operator be devised that is insensitive to encoding? Selective Crossover is a novel extended GA representation with an enhanced crossover procedure: it monitors contributions to fitness on a per-gene basis and guarantees good behaviour regardless of problem encoding. It works exceptionally well with problems of high epistasis where it finds a better solution than the conventional crossover operators, and does this with the least number of evaluations (a repeatable 25% improvement):
Strictness Analysis (SA) determines during compilation whether functions definitely evaluate their arguments, and is essential for PFP systems that retain normal-order termination properties. However, SA time complexity is exponential. Though worst-case behaviour may be exponential, is it possible to find the fixed point efficiently for the common, well-behaved functions? The following two papers show how a simple lattice representation of finite height, with a monotonicity rule and a synergistic search algorithm, leads to a new combinatorially-implosive technique based on tracking "frontiers" in the lattice:
At its outset, Parallel Functional Programming (PFP) promised an elegant way to program parallel systems: if a sequential program were correct then correctness on a parallel system would be guaranteed - if a large number of practical implementation issues could be resolved. Prior to Clack's research, PFP was in its infancy and substantially inhibited by several issues, most notably how to obtain efficient lazy evaluation semantics. The research undertaken by Clack with his colleagues provided effective and seminal solutions to a wide range of research challenges in PFP:
Theoretical advances were demonstrated through the design and construction of real parallel hardware, with all the required software infrastructure to support PFP applications.
Additionally, Clack's functional programming research has contributed to FP language design, and the wider field of Memory Management (MM):
How can system architecture be best designed as an efficient target for a PFP compiler? GRIP (Graph Reduction In Parallel) explored the design space using stock processors and custom hardware such as intelligent memory units, and new compilation and run-time system technology were successfully demonstrated.
Elements of GRIP were eventually transferred to industry and incorporated into the ICL "Goldrush" parallel database server.
The GRIP architecture and its programing paradigm for PFP are described in the following papers:
At its outset, Parallel Functional Programming (PFP) offered "painless parallelism" due to the lack of side effects. However, the existing ALICE system at Imperial was disappointingly slow. Did this presage some fundamental problem, or was ALICE simply immature? How could a compiler bridge the gap between the highly abstract functional program and the details of parallel execution? and how could this be done efficiently?
The four-stroke reduction engine (4SRE) was one of the earliest abstract machines for PFP, the first to be implemented on GRIP (see separate section), and the progenitor of several novel features found in subsequent PFP systems: e.g. the "evaluate and die" model for transparent communication of results from child to parent tasks; the encoding of the dump in the heap; several novel mechanisms for throttling parallelism (including "discardable sparks" and preferential resumption of suspended tasks); and the lack of explicit communication between tasks. The four-stroke reduction engine is described in the following paper:
This paper is also reprinted in the following book of seminal papers:
Manual optimisation of run-time time and memory usage is problematic for lazy Functional Programing (FP) languages. FP programmers do not specify detailed memory allocations, and function evaluation may be interlaced dynamically; the mapping between the source code and what happens at run-time is complex. Is it possible to provide meaningful and intuitive profiling information in such a context? and if so how? This target behaviour was achieved by attributing time and space costs to portions of the source code as though strict evaluation had occurred, whilst reporting the quantitative measures of the actual lazy evaluation. The following paper provided the design and the first implementation of this technique for an interpreted language, together with suggestions of how to implement it for a compiled language. The technique was adopted and further developed by the profiler for the popular GHC functional language compiler.
Dynamic Memory Management (DM) is essential to the implementation of functional languages and many other modern programming languages. Prior to Clack's research, DMM could either be fast or small (i.e. have low fragmentation) but not both. DMM is a very mature research area, and to identify a new algorithm is a considerable challenge. By using a tree bitmap representation and altering the search semantics, a best-fit free-list memory allocator was created that provides both a ten times improvement in worst-case performance and excellent best-case performance. The allocator was patented:
PhD students supervised (first supervisor)
Leo Carlos-Sandberg - 2017-
Jose Gonzalez (part time) - 2017-
Yang Gu - 2011-2013 - withdrew
Samet Gogus (Barclays Treasury, part-time) - 2007-2009 - Credit risk - withdrew
Ghada Hassan (Egyptian Culture Bureau) - "Multiobjective Genetic Programming for Financial Portfolio Management in Dynamic Enviroments|" PhD awarded 2010 - now a lecturer at Cairo University
Chih-Chun Chen (EPSRC DTA) - "Complex Event Types for Agent-Based Simulation" PhD awarded 2009 - now a postdoctoral researcher at Centre d'analyse et de matheatique sociales
David Coffin (EPSRC) - 2004-2006 - Genetic Algorithms search space decomposition - subsequently changed research direction and supervisor in 2006
Wei Yan (industry) - "Evolving Robust Genetic Programming Solutions in Dynamic Environments - with a case study in hedge fund stock selection in an emerging market" PhD awarded
Katie Bentley (industry / DTA) - "Adaptive Behaviour through Morphological Plasticity in Natural and Artificial Systems" PhD awarded 2006 - now Assistant Professor, Computational Biology Laboratory, Beth Israel Deaconess Medical Center, Harvard Medical School
Lee Braine (EPSRC CASE) - "Design and Implementation of an Object-Oriented Functional Language" PhD awarded 2000 - now with the CTO Office, Barclays Investment Bank
Kanta Vekaria (EPSRC 97701326) - "Selective Crossover as an Adaptive Strategy for Genetic Algorithms" PhD awarded 2000 (link to thesis) - now principal engineer at Nokia
Tina Yu - "An Analysis of the Impact of Functional Programming Techniques on Genetic Programming" PhD awarded 1999 (link to thesis) - now Associate Professor at the Memorial University of Newfoundland, Canada
Simon Courtenage (EPSRC 90311629) - "The Analysis of Resource Use in the Lambda-Calculus by Type Inference" PhD awarded 1995 (link to thesis) - now Principal Lecturer at the University of Westminster
Stuart Clayman (SERC) - "Developing and Measuring Parallel Rule-Based Systems in a Functional Programming Environment" PhD awarded 1993 (link to thesis) - now Senior Research Fellow at UCL
David Parrott - "Synthesising Parallel Functional Programs to Improve Dynamic Scheduling" PhD awarded 1993 - now Chief Architecture Officer at Thomson Reuters