This article describes the work of the IX Swap Data Science Department that performed an analysis of the traders’ activity simulation, covering the relevant details and giving simplified explanations of the work performed.
The detailed report, codebase, and simulation data are opensourced and continuously updated on our GitHub at https://github.com/IXSwap/datascience.
The financial market is a place of abrupt changes, for making risky decisions, which acts depending on the general pattern of behaviour of other traders and exposure to emotions in decisionmaking, regardless of the trader’s resistance to stress and negative factors.
Crypto markets have higher risks, while market changes are tremendous, with little to no guarantees and lack of regulations.
Cryptographic financial markets suffer from a lack of information on most of the assets, which together with the previously identified factors, causes the behaviour of traders to depend on the currently observed market situation. This behaviour can be described as a snowball that grows in volume as it moves.
Also, the lack of regulatory structures, centralization, and monitoring services provide traders with the opportunity to conduct fraudulent transactions in the market for profit, which is difficult or impossible in traditional financial markets.
For these reasons, in order to conduct effective tests of the cryptographic market, it is necessary to rely on two types of simulation of trader behaviour:
 Construction of mathematical models that simulate the behaviour of traders on the market with the ability to customize them depending on the situation under consideration (the degree of traders’ activity, asset liquidity, exchange volume, and so on);
 Using historical summaries, history of transactions to conduct testing based on real data with their analysis to detect anomalous cases or fraudulent transactions.
Since the consideration of the real history of trading is a complex topic, a separate article will be devoted to it, thus within this article, we’ll focus on the following two aspects:
 Mathematical simulation of traders’ behaviour
 General behavioural patterns and properties of various mathematical distributions for effective simulation of traders’ behaviour will be considered.
TL;DR
There are two problems requiring a solution in performing efficient traders behaviour simulations from a mathematical approach.
Transaction frequency defines how many transactions will happen per specified time interval, and it is required for those transactions to have unstable behaviour, with transactions amount per specified time interval deviating from the mean count.
Transaction value defines how many tokens will be requested per exchange transaction by traders and simulation of those values requires similarity to the real transactions distributions.
These two problems will be used in the next solutions:
 Transaction Frequency Models

 Poisson Distribution – model that allows generating the number of transactions per specified period conforms to mean transaction count value and introduces small deviations of transaction count value, making transaction frequency similar to the real one.
 Transaction Values Models
 Normal Distribution – model that generates values conform principle that with raising the amount of taken samples most of the distributions will converge (look more and more similar) to the normal one (bellshaped distribution, demonstrated in the normal distribution section);

 LogNormal Distribution – model generates values with a more compressed/tight distribution shape, making transactions more shifted to the specified mean value (the presented “bell” will be tighter);

 Pareto Distribution – model generates values conform popular in statistics and economics principle of “80 to 20”, where 80% of taken samples will represent the 20% values interval;

 Cauchy Distribution – model that generates values from the specified starting point and probability decreasing exponentially for increasing value (higher probability of smaller values with small probabilities of big ones).
In the case of transaction values, for models was applied principle for generating only positive values for transactions due to the impossibility of appearing negative values. This principle allows tuning the models for each token of each pool individually.
The best models for simulating traders behaviour are Cauchy and LogNormal ones. For those two best models were written scripts for finding the best parameters conforming to the principle of finding the smallest harmonic error.
Parameter picking algorithms and the reason why LogNormal and Cauchy distributions are demonstrated and explained below.
Transaction Frequency
The number of operations in a set time interval is unstable, causing deviations. To simulate the behaviour of traders, a mathematical model is needed that can take into account the average number of transactions over a time interval with an unstable deviation in the number of transactions. Also, transactions must have randomly arranged time metrics for a specified time interval. To solve these problems, it was decided to use Poisson distribution.
Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events happening in fixed time intervals with a constant mean rate and independently from the last event time (it can also be applied to other metrics like distance).
The formula for the Poisson Distribution is:
where e represents Euler’s number, x represents the number of event occurrences, λ is equal to the expected value of x also equal to its variance.
This distribution is responsible for generating the number of transactions in the specified time interval, but each transaction must be timestamped. For this, it was decided to set the initial time of the considered interval and add to the initial time a randomly determined time within the interval. This technique allows you to randomly generate time for each transaction independently of each other.
This structure allows simulating the transaction rate close to the real one with the unstable placement of timestamps for transactions and unstable transaction rates within the specified average value. Using the Poisson distribution, it is possible to conveniently adjust the frequency of transactions to adjust the degree of activity of traders.
Transaction Values Generation
Poisson distribution effectively solves the issue of trader activity, but there remains one more aspect to consider – the volume of exchanged assets.
For each transaction, it is necessary to establish the volume of the transferred asset. This task requires a comparative analysis of several mathematical models, a description of the principles of their work and situations in which they can be used.
Normal Distribution
Normal distribution is important in statistics and is often used in the natural and social sciences. According to the central limit theorem, under some conditions, the average of many samples of a random variable with a finite mean and variance is itself a random variable distribution of which is converging to a normal one with an increasing number of samples.
The probability density function of this distribution is shown below:
where μ is the mean or expectation of the distribution, σ is the standard deviation, e is Euler’s constant. The probability density function of this distribution is also noted as (x).
The presented distribution can be used to generate the values of traders’ transactions, since setting the mean value of the distribution and its standard deviation makes it easy to manipulate the distribution. This makes it possible to easily simulate different market situations.
The problem with this distribution is the generation of negative values, which is impossible within transactions. To resolve this issue, the truncated normal distribution must be used:
where ψ(x) represents a probability density function of the “parent” general normal distribution with mean, variance, truncation interval represented by a and b. There is one more symbol requiring explanation – the Φ one.
Imagine a situation where we need to determine the probability that a distribution will generate a value less than a specified value of x. The calculation of this probability will be according to the following function:
which is called the cumulative distribution function that can be written as the integral of the “parent” general normal distribution formula where F(x) = Φ(x).
These formulas allow using a truncated normal distribution to generate positive transaction values that would correspond to a normal distribution over a limited range of values.
LogNormal Distribution
Lognormal distribution is the probability distribution of a random variable whose logarithm is normally distributed. Conform this distribution generated value x can be described by the formula:
where Z is a standard normal variable, μ represents distribution mean and σ – standard deviation. Considering that traders’ activity has extreme rises and drops, it is required to consider such a case, which is covered by this type of distribution.
Pareto Distribution
Pareto distribution is used for generating values conforming to the famous “80 to 20” rule, meaning that 80% of taken samples will be present in the 20% values interval. This distribution is similar to the real trading distribution principle, which defines that in most of the cases transactions will be performed with small values (matching the distribution of wealth in society that was covered by Pareto).
This distribution has the next formula:
where xm is a minimal possible value of x (also called as scale parameter) and shape parameter a.
Compared to previous distributions it would be harder to simulate different market situations due to the harder interpretability of the variables.
In the case of normal distributions, unstable market situations can be covered with a bigger standard deviation. Traders’ desire to get rid of the token will lead to bigger mean value of the previous distributions.
In the Pareto case, an unstable market situation can be covered with a bigger shape parameter and traders’ desire to get rid of the token will lead to a bigger scale parameter.
Pareto is able to generate values higher than in the Cauchy case (reviewed next), meaning that this function requires the use of some mechanism that will “truncate” the distribution till specified upper bound.
Cauchy Distribution
Cauchy distribution has this probability density function:
where x0 is the locational parameter setting the location of the distribution peak and γ is the scale parameter that specifies the halfwidth and halfmaximum. The γ is also equal to half of the interquartile range and is sometimes called the probable error.
The problem of the presented distribution is that there could be only positive values and for generating values similar to the real transaction values will be used in the next formula:
where µ is a location parameter and σ is a scale parameter.
There is still one problem remaining about Cauchy – it is able to give unrealistically big transaction values, meaning that there is a small chance that there will appear anomalous value that does not correspond to the realworld case.
This problem was solved via “mapping” values mechanism, graphical representation of which can be understood from the given example:
where the generated value is representing the original Cauchy generated value, the limit demonstrates the upper bound of the possible values. Such an algorithm allows keeping the original Cauchy distribution almost unchanged (without breaking the probabilities) and producing values only of a specific limit.
Monte Carlo Transaction Simulator
Until this moment were specified four approaches to generating the transaction values and one approach for generating transaction frequency with random timestamps. It is required to connect those approaches together in one module able to simulate realworld transactions. Normal, Lognormal, Pareto and Cauchy distributions are being called (and implemented as) transaction value generators and Poisson distribution as transaction frequency generators. All transaction value generators are implemented following similar approaches in order to make them easy to integrate into one module and to allow switching between them if required.
The Monte Carlo simulator first generates a number of transactions per specified time interval, defines timestamps for each transaction and at the second stage defines transaction values conform to chosen probability distribution functions. With all performed work there are still two problems that require solution:
 What distributions have the best match with realworld distributions;
 How to pick the best parameters for the best models to be closer to the specified distributions.
Best Correlation Distributions
The best correlations are between lognormal, Cauchy, and real distributions, which can be seen on the distributions presented below:
From the left to the right are lognormal distribution, Cauchy distribution, and real transaction values distribution.
Considering that those distributions are able to match reallife distributions it is required to write an algorithm able to automatically pick the best parameters for specified distributions.
Parameter Search Algorithms
Considering that the best distributions are lognormal and Cauchy ones it was decided to write parameter picking algorithms that will be able to find the best parameters combination.
The first problem that requires a solution – how the algorithm will pick the best possible parameters combination, considering that all probability distribution simulations generate different values and therefore distribution can have small deviations causing the probability of one launch to perform better than another one and in order to check overall efficiency it is required to perform check with multiple simulation runs (creating an average picture).
Another moment is how an algorithm will check if one distribution is “similar” or “matching” another one.
Harmonic mean formula:
is working for two parameters. It means a harmonic twoerror formula can be used to define the best possible parameters combination.
Taking into account the previously mentioned rule of “80 to 20” that can be applied to the social and economical processes, the first quartile and median are the most important elements of distributions.
The compared distributions will be the simulated one and the real one. So the final representation of finding the error based on harmonic mean formula for two numbers is:
and the model will pick as best parameters the ones where the average error harmonic mean for all launches of the simulation will be minimal. It is possible to add a version that will find harmonic mean based on three numbers and more, but in this case, the formula will change and the required amount of calculations will be higher.
It will cause a better match of the simulated distribution to the real one considering that in this case, it is possible to use for comparing 25th and 75th percentiles with medians of both distributions.
There is a range of parameters iterating through which is performed via incrementing parameter from lower bound to upper one using a step parameter. All intermediate results (each parameter set and their average harmonic error) are saved and the smallest average harmonic mean error parameter pick is chosen.
Below is presented an example of a distribution that picks the best parameter values considering the harmonic error.
Distribution shows that there are two smallest error values and with higher resolution (in other words, with more iterations) the smaller will be the error.
Parameter search algorithm works by checking the specified range of values and the specified step of parameters. For better work with this algorithm, it is required to take a big range with a big step, then the algorithm should take a smaller range with smaller steps and higher simulations count.
Conclusions
Using the aforementioned mathematical models, it is possible to perform various market simulations and demonstrate the behaviour of different groups of traders. The combination of these models together with algorithms for finding the optimal parameters make the simulations’ being closer to the real traders behaviour.
Despite the possibility of simulating different market behaviour depending on factors, the generated transactions will not be able to demonstrate or cover fraudulent transactions, but they will perfectly be able to demonstrate attacks on the market if the correct parameters are specified.
If simulations make it possible to test the strength of the market in exact accordance with the indicated market situations, then the real history of transactions will allow testing various market controlling mechanisms to prevent fraudulent activities that besides negatively affecting the activities of certain groups of traders may also affect the market as a whole.
. . . .
About IX Swap
IX Swap is the “Uniswap” for STOs and TSOs, the regulatory and liquidity solution for security tokens and tokenized stocks.
IX Swap will be the FIRST platform to provide liquidity pools and automated marketmaking functions for the STO/TSO industry. The platform will facilitate the trading of security tokens through licensed custodians and security brokers which will provide actual ownership and claim over these realworld assets.
Telegram Announcement  Telegram  Twitter  Linkedin  Medium  YouTube
References
 https://www.investopedia.com/articles/investing/050813/4behavioralbiasesandhowavoidthem.asp
 https://www.dockwalk.com/finance/whattoknowbeforeinvestingincryptocurrency
 http://www.matematicasvisuales.com/english/html/probability/varaleat/normaldistribution.html
 https://en.wikipedia.org/wiki/Normal_distribution
 https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
 https://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm
 https://www.investopedia.com/terms/l/lognormaldistribution.asp
 https://en.wikipedia.org/wiki/Lognormal_distribution#/media/File:PDFlog_normal_distributions.svg
 https://towardsdatascience.com/understandingthe6895997ruleforanormaldistributionb7b7cbf760c2
 https://en.wikipedia.org/wiki/Pareto_distribution#/media/File:Probability_density_function_of_Pareto_distribution.svg
 https://medium.com/codex/whatistruncatednormaldistribution33541dd839cf
 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.halfcauchy.html