Part 2/4 – Traders Activity Simulation

Education
Part 2/4 – Traders Activity Simulation

This article series will describe the work of the IX Swap data science department that performed an analysis of the AMM pools activity (the market). Each article covers one big segment of real-world AMM (read Uniswap) pools behaviour analysis and IXS mitigation model stress testing, covering details, and giving simplified explanations of the work performed. The detailed report, codebase, and simulation data are open-sourced and continuously updated on our GitHub at https://github.com/IX-Swap/data-science.

Diving Into the Wild Metaverse of AMMs Article Series

  1. Intro to AMMs and Their Particularities – This will talk about what is AMM, how it works, its comparison with traditional financial institutions, and base descriptions of the regularizations techniques
  2. Traders Activity Simulation – This will tackle a couple of basic principles of the traders’ behaviour, what mathematical functions and models can be applied for simulating traders behaviour, what is their difference, and what modifications were required to raise simulations efficiency.
  3. AMM Market Simulation or Some Rule-Based Magic – This will discuss a deeper description and analysis of the AMM markets by showing how to simulate AMM behaviour, what is the purpose of those simulations, and what parameters of AMM have been analyzed.
  4. Transaction History Analysis or How Unpredictable People Can Sometimes Be – This will define the reasons for analysing traders transaction histories, showing some great examples of strange traders behaviour, showing some detected attacks on pools and showing real transactions and pools distributions comparison with ones tuned using different AMM market parameters.

Traders Activity Simulation

The financial market is a place of abrupt changes, for making risky decisions, which acts depending on the general pattern of behaviour of other traders and exposure to emotions in decision-making, regardless of the trader’s resistance to stress and negative factors.

Crypto markets have higher risks, while market changes are tremendous, with little to no guarantees and lack of regulations.

Cryptographic financial markets suffer from a lack of information on most of the assets, which together with the previously identified factors, causes the behaviour of traders to depend on the currently observed market situation. This behaviour can be described as a snowball that grows in volume as it moves.

Also, the lack of regulatory structures, centralization, and monitoring services provide traders with the opportunity to conduct fraudulent transactions in the market for profit, which is difficult or impossible in traditional financial markets.

For these reasons, in order to conduct effective tests of the cryptographic market, it is necessary to rely on two types of simulation of trader behaviour:

  • Construction of mathematical models that simulate the behaviour of traders on the market with the ability to customize them depending on the situation under consideration (the degree of traders’ activity, asset liquidity, exchange volume, and so on);
  • Using historical summaries, history of transactions to conduct testing based on real data with their analysis to detect anomalous cases or fraudulent transactions.

Since the consideration of the real history of trading is a complex topic, a separate article will be devoted to it, thus within this article, we’ll focus on the following two aspects:

  • Mathematical simulation of traders’ behaviour
  • General behavioural patterns and properties of various mathematical distributions for effective simulation of traders’ behaviour will be considered.

TL;DR

There are two problems requiring a solution in performing efficient traders behaviour simulations from a mathematical approach. Transaction frequency defines how many transactions will happen per specified time interval, and it is required for those transactions to have unstable behaviour, with transactions amount per specified time interval deviating from the mean count. Transaction value defines how many tokens will be requested per exchange transaction by traders and simulation of those values requires similarity to the real transactions distributions. These two problems will be used in the next solutions:

  • Transaction Frequency Models
    • Poisson Distribution – model that allows generating the number of transactions per specified period conforms to mean transaction count value and introduces small deviations of transaction count value, making transaction frequency similar to the real one.
  • Transaction Values Models
    • Normal Distribution – model that generates values conform principle that with raising the amount of taken samples most of the distributions will converge (look more and more similar) to the normal one (bell-shaped distribution, demonstrated in the normal distribution section);
    • Log-Normal Distribution – model generates values with a more compressed/tight distribution shape, making transactions more shifted to the specified mean value (the presented “bell” will be tighter);
    • Pareto Distribution – model generates values conform popular in statistics and economics principle of “80 to 20”, where 80% of taken samples will represent the 20% values interval;
    • Cauchy Distribution – model that generates values from the specified starting point and probability decreasing exponentially for increasing value (higher probability of smaller values with small probabilities of big ones).
In the case of transaction values, for models was applied principle for generating only positive values for transactions due to the impossibility of appearing negative values. This principle allows tuning the models for each token of each pool individually. The best models for simulating traders behaviour are Cauchy and Log-Normal ones. For those two best models were written scripts for finding the best parameters conforming to the principle of finding the smallest harmonic error. Parameter picking algorithms and the reason why Log-Normal and Cauchy distributions are demonstrated and explained below.

Transaction Frequency

The number of operations in a set time interval is unstable, causing deviations. To simulate the behaviour of traders, a mathematical model is needed that can take into account the average number of transactions over a time interval with an unstable deviation in the number of transactions. Also, transactions must have randomly arranged time metrics for a specified time interval. To solve these problems, it was decided to use Poisson distribution. Poisson Distribution is a discrete probability distribution that expresses the probability of a given number of events happening in fixed time intervals with a constant mean rate and independently from the last event time (it can also be applied to other metrics like distance). The formula for the Poisson Distribution is:
where e represents Euler’s number, x represents the number of event occurrences, λ is equal to the expected value of x also equal to its variance.  This distribution is responsible for generating the number of transactions in the specified time interval, but each transaction must be timestamped. For this, it was decided to set the initial time of the considered interval and add to the initial time a randomly determined time within the interval. This technique allows you to randomly generate time for each transaction independently of each other.
This structure allows simulating the transaction rate close to the real one with the unstable placement of timestamps for transactions and unstable transaction rates within the specified average value. Using the Poisson distribution, it is possible to conveniently adjust the frequency of transactions to adjust the degree of activity of traders.

Transaction Values Generation

Poisson distribution effectively solves the issue of trader activity, but there remains one more aspect to consider – the volume of exchanged assets. For each transaction, it is necessary to establish the volume of the transferred asset. This task requires a comparative analysis of several mathematical models, a description of the principles of their work and situations in which they can be used.

Normal Distribution

Normal distribution is important in statistics and is often used in the natural and social sciences. According to the central limit theorem, under some conditions, the average of many samples of a random variable with a finite mean and variance is itself a random variable distribution of which is converging to a normal one with an increasing number of samples The probability density function of this distribution is shown below:

where μ is the mean or expectation of the distribution, σ is the standard deviation, e is Euler’s constant. The probability density function of this distribution is also noted as (x).

The presented distribution can be used to generate the values of traders’ transactions, since setting the mean value of the distribution and its standard deviation makes it easy to manipulate the distribution. This makes it possible to easily simulate different market situations. 

The problem with this distribution is the generation of negative values, which is impossible within transactions. To resolve this issue, the truncated normal distribution must be used:

where ψ(x) represents a probability density function of the “parent” general normal distribution with mean, variance, truncation interval represented by a and b. There is one more symbol requiring explanation – the Φ one.

Imagine a situation where we need to determine the probability that a distribution will generate a value less than a specified value of x. The calculation of this probability will be according to the following function:

which is called the cumulative distribution function that can be written as the integral of the “parent” general normal distribution formula where F(x) = Φ(x).

These formulas allow using a truncated normal distribution to generate positive transaction values that would correspond to a normal distribution over a limited range of values.

Log-Normal Distribution

Log-normal distribution is the probability distribution of a random variable whose logarithm is normally distributed. Conform this distribution generated value x can be described by the formula:

where Z is a standard normal variable, μ represents distribution mean and σ – standard deviation. Considering that traders’ activity has extreme rises and drops, it is required to consider such a case, which is covered by this type of distribution.

Pareto Distribution

Pareto distribution is used for generating values conforming to the famous “80 to 20” rule, meaning that 80% of taken samples will be present in the 20% values interval. This distribution is similar to the real trading distribution principle, which defines that in most of the cases transactions will be performed with small values (matching the distribution of wealth in society that was covered by Pareto). This distribution has the next formula:

where xm is a minimal possible value of x (also called as scale parameter) and shape parameter a Compared to previous distributions it would be harder to simulate different market situations due to the harder interpretability of the variables.

In the case of normal distributions, unstable market situations can be covered with a bigger standard deviation. Traders’ desire to get rid of the token will lead to bigger mean value of the previous distributions. In the Pareto case, an unstable market situation can be covered with a bigger shape parameter and traders’ desire to get rid of the token will lead to a bigger scale parameter. Pareto is able to generate values higher than in the Cauchy case (reviewed next), meaning that this function requires the use of some mechanism that will “truncate” the distribution till specified upper bound.

Cauchy Distribution

Cauchy distribution has this probability density function:

where x0 is the locational parameter setting the location of the distribution peak and γ is the scale parameter that specifies the half-width and half-maximum. The γ is also equal to half of the interquartile range and is sometimes called the probable error. The problem of the presented distribution is that there could be only positive values and for generating values similar to the real transaction values will be used in the next formula:

where µ is a location parameter and σ is a scale parameter. There is still one problem remaining about Cauchy – it is able to give unrealistically big transaction values, meaning that there is a small chance that there will appear anomalous value that does not correspond to the real-world case. This problem was solved via “mapping” values mechanism, graphical representation of which can be understood from the given example:

where the generated value is representing the original Cauchy generated value, the limit demonstrates the upper bound of the possible values. Such an algorithm allows keeping the original Cauchy distribution almost unchanged (without breaking the probabilities) and producing values only of a specific limit.

Monte Carlo Transaction Simulator

Until this moment were specified four approaches to generating the transaction values and one approach for generating transaction frequency with random timestamps. It is required to connect those approaches together in one module able to simulate real-world transactions. Normal, Log-normal, Pareto and Cauchy distributions are being called (and implemented as) transaction value generators and Poisson distribution as transaction frequency generators. All transaction value generators are implemented following similar approaches in order to make them easy to integrate into one module and to allow switching between them if required. The Monte Carlo simulator first generates a number of transactions per specified time interval, defines timestamps for each transaction and at the second stage defines transaction values conform to chosen probability distribution functions. With all performed work there are still two problems that require solution:

  1. What distributions have the best match with real-world distributions;
  2. How to pick the best parameters for the best models to be closer to the specified distributions.

Best Correlation Distributions

The best correlations are between log-normal, Cauchy, and real distributions, which can be seen on the distributions presented below:

From the left to the right are log-normal distribution, Cauchy distribution, and real transaction values distribution. Considering that those distributions are able to match real-life distributions it is required to write an algorithm able to automatically pick the best parameters for specified distributions.

Parameter Search Algorithms

Considering that the best distributions are log-normal and Cauchy ones it was decided to write parameter picking algorithms that will be able to find the best parameters combination. The first problem that requires a solution – how the algorithm will pick the best possible parameters combination, considering that all probability distribution simulations generate different values and therefore distribution can have small deviations causing the probability of one launch to perform better than another one and in order to check overall efficiency it is required to perform check with multiple simulation runs (creating an average picture). Another moment is how an algorithm will check if one distribution is “similar” or “matching” another one. Harmonic mean formula:

is working for two parameters. It means a harmonic two-error formula can be used to define the best possible parameters combination. Taking into account the previously mentioned rule of “80 to 20”  that can be applied to the social and economical processes, the first quartile and median are the most important elements of distributions. The compared distributions will be the simulated one and the real one. So the final representation of finding the error based on harmonic mean formula for two numbers is:

and the model will pick as best parameters the ones where the average error harmonic mean for all launches of the simulation will be minimal. It is possible to add a version that will find harmonic mean based on three numbers and more, but in this case, the formula will change and the required amount of calculations will be higher. It will cause a better match of the simulated distribution to the real one considering that in this case, it is possible to use for comparing 25th and 75th percentiles with medians of both distributions. There is a range of parameters iterating through which is performed via incrementing parameter from lower bound to upper one using a step parameter. All intermediate results (each parameter set and their average harmonic error) are saved and the smallest average harmonic mean error parameter pick is chosen. Below is presented an example of a distribution that picks the best parameter values considering the harmonic error. The distribution shows that there are two smallest error values and with higher resolution (in other words, with more iterations) the smaller will be the error.

Parameter search algorithm works by checking the specified range of values and the specified step of parameters. For better work with this algorithm, it is required to take a big range with a big step, then the algorithm should take a smaller range with smaller steps and higher simulations count.

Conclusions

Using the aforementioned mathematical models, it is possible to perform various market simulations and demonstrate the behaviour of different groups of traders. The combination of these models, together with algorithms for finding the optimal parameters, make the simulations’ closer to the real traders behaviour.

Despite the possibility of simulating different market behaviour depending on factors, the generated transactions will not be able to demonstrate or cover fraudulent transactions, but they will perfectly be able to demonstrate attacks on the market if the correct parameters are specified.

If simulations make it possible to test the strength of the market in exact accordance with the indicated market situations, then the real history of transactions will allow testing various market controlling mechanisms to prevent fraudulent activities that besides negatively affecting the activities of certain groups of traders may also affect the market as a whole.

References

  1. https://www.investopedia.com/articles/investing/050813/4-behavioral-biases-and-how-avoid-them.asp
  2. https://www.dockwalk.com/finance/what-to-know-before-investing-in-cryptocurrency
  3. http://www.matematicasvisuales.com/english/html/probability/varaleat/normaldistribution.html
  4. https://en.wikipedia.org/wiki/Normal_distribution
  5. https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
  6. https://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm
  7. https://www.investopedia.com/terms/l/log-normal-distribution.asp
  8. https://en.wikipedia.org/wiki/Log-normal_distribution#/media/File:PDF-log_normal_distributions.svg
  9. https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2
  10. https://en.wikipedia.org/wiki/Pareto_distribution#/media/File:Probability_density_function_of_Pareto_distribution.svg
  11. https://medium.com/codex/what-is-truncated-normal-distribution-33541dd839cf
  12. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.halfcauchy.html

. . . .

About IX Swap

IX Swap is the “Uniswap” for STOs and TSOs, the regulatory and liquidity solution for security tokens and tokenized stocks.

IX Swap will be the FIRST platform to provide liquidity pools and automated market-making functions for the STO/TSO industry. The platform will facilitate the trading of security tokens through licensed custodians and security brokers which will provide actual ownership and claim over these real-world assets.

Telegram Announcement | Telegram | Twitter | LinkedIn | Blog | YouTube

More Articles

Updated IX Swap Roadmap

Updated IX Swap Roadmap

IX Swap’s updated roadmap will show the progress of the phases of execution instead of a defined set of schedules as t...
Announcement