Quantifying Tail Risk

Modelling Operational Value-at-Risk with the Loss Distribution Approach.

Paul McAteer, MSc, MBA

pcm353@stern.nyu.edu

OpVaR with the Loss Distribution Approach.¶

Model Design¶

The management of operational risk follows a standard process: (1) Risk Identification; (2) Risk Evaluation; (3) Risk Monitoring; (4) Risk Mitigation; (5) Risk Reporting. The quantification of risk is a fundamental element for the fulfilment of steps (2) - (5).

Based on empirical loss event data, the loss distribution is estimated by modelling the probability and magnitude of potential losses in the industry. To this end, the Loss Distribution Approach (LDA), is employed, a technique which convolutes the loss frequency distribution with the loss severity distribution to model an objective distribution of aggregate losses attributable to the materialization of specified operational risks. LDA is a favored method of large financial entities for the determination of capital adequacy for operational risk¹, however it is also eminently suitable for our objective of quantifying custodial risk in the cryptocurrency universe². We describe the model construction process below in detail:

Construct the empirical discrete frequency density of CCC-mitigable loss events.

Construct the empirical continuous severity density of CCC-mitigable loss events.

Use the empirical data to parametrize the probability density functions of the expected loss frequency and expected loss severity of CCC-mitigable events.

LOSS FREQUENCY: Estimate a Poisson frequency which is described by the single parameter $\lambda$ , the expected frequency, using the mean of the empirical loss frequency distribution. The Poisson distribution has the density function:

$h (n) = \frac{λ^{n} e^{- λ}}{n!}$ $h\left(n\right)=\ \frac{{\lambda }^ne^{-\lambda }}{n!}$
LOSS SEVERITY: We assume a lognormal distribution for loss severity. The log of the loss severity (denoted ln L) is normally distributed with mean $\mu$ and variance $\sigma^2$ . The lognormal distribution has the density function:

$g (l) = \frac{1}{\sqrt{2 π σ l}} e^{(- \frac{1}{2} {(\frac{(l n (l) - μ}{σ})}^{2})}$ $g\left(l\right)=\frac{1}{\sqrt{2\pi \sigma l}}\ e^{\left(-\frac{1}{2}\ {\left(\frac{\left(\mathrm{ln}\mathrm{}(l\ \right)-\mu }{\sigma }\right)}^2\right)}$
LOSS DISTRIBUTION:Under the standard assumption that frequency and severity are independent, Monte Carlo simulation is used to generate the total loss distribution as the convolution of the frequency and severity distribution. The simulation algorithm is as follows:
1. Take a random draw from the Poisson frequency distribution³ , characterised by $\lambda$
2. Suppose that that this first iteration simulates $n$ loss events over a 1 year horizon
3. Take n random draws from the lognormal severity distribution⁴ . We suppose that results in simulated loss amounts of $(L_1\ ,\ L_2\ \dots L_n)$ over the loss horizon
4. Sum the n simulated losses to obtain a total simulated annual loss $X_1=L_1+\ L_2+\ \dots +L_n\$
5. Repeat steps A-D 100,000 times
6. Form the histogram of $X_1\dots X_{\mathrm{100000}}$ This represents the simulated total annual loss distribution
OpVaR: Based on the simulated total annual loss distribution, we can state with a specified confidence level that losses will not exceed a certain dollar amount

¹LDA is one popular method to estimate potential losses and capital requirements in the Advanced Measurement Approach (AMA) under the Basel II regulations.
²For a more detailed definition of “CCC-mitigable events”, please refer to “Cryptocurrency: Care, Custody, & Control”, New York University Stern School of Business, May 2020, co-authored by Au, Hoffman, Mattingly, McAteer, Takahashi.
³Poisson-distributed random variate ${q}$ are generated with the inverse transform method,

{q_{P o i s s o n} = F}_{P o i s s o n}^{- 1} (p)

${{\mathrm{q}}_{\mathrm{Poisson}}\mathrm{=F}}^{\mathrm{-}\mathrm{1}}_{\mathrm{Poisson}}\mathrm{\ (p)}$ ⁴Similarly to footnote 3, lognormally distributed random variates are generated with the inverse transform method:

{q_{L o g n o r m a l} = F}_{L o g n o r m a l}^{- 1} (p)

${{\mathrm{q}}_{\mathrm{Lognormal}}\mathrm{=F}}^{\mathrm{-}\mathrm{1}}_{\mathrm{Lognormal}}\mathrm{\ (p)}$

Model Implementation¶

# Import the necessary libraries
from scipy import stats
from scipy.stats import lognorm, norm, poisson
from matplotlib  import pyplot as plt
import numpy as np
import pandas as pd
from math import exp

# Read data on operational loss events into DataFrame
excel_data_df = pd.read_excel('LDA DATA.xlsx', sheet_name='Frequency')

# View sheet data
excel_data_df

freq = excel_data_df.groupby("YEAR")['CCC-MITIGABLE'].sum()
print(freq)

YEAR
2010     1
2011     8
2012    15
2013    17
2014    13
2015     7
2016     5
2017    15
2018    24
2019    18
2020     5
Name: CCC-MITIGABLE, dtype: int64

# Estimate lambda 
lamb = np.sum(freq.values) / ((excel_data_df.YEAR[excel_data_df.shape[0]-1] - excel_data_df.YEAR[0])+1)
print(lamb.round(3))

11.636

# Draw random variables from a Poisson distribtion specified with lambda = 11.636
p_rand_v = poisson.rvs(lamb, size=(10000))
 
# Plot the probability density function
hist = plt.hist(p_rand_v, bins=10000) # Collect Frequency Distribution
plt.close("all")
y = hist[0]/np.sum(hist[0]) # Ensure Frequency Represented as Probability
x = hist[1]

plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y, width=0.7, align='center', color="#2c97f1")

plt.xlabel("Number of Loss Events", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Loss Event Probability Density Function", fontsize=18)

Text(0.5, 1.0, 'Loss Event Probability Density Function')

# Sort the randomly distributed data:
data = p_rand_v 
data_sorted = np.sort(data)

# calculate the proportional values of samples
p = 1. * np.arange(len(data)) / (len(data) - 1)

# Plot the cumulative density function
plt.figure(figsize=(10, 6))
plt.bar(data_sorted, p)
plt.xlim(0, 26)
plt.ylabel("Number of Loss Events", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Loss Event Cumulative Density Function", fontsize=18)

Text(0.5, 1.0, 'Loss Event Cumulative Density Function')

# Read data on severity of operational loss events into DataFrame
xl_data_df_sev = pd.read_excel('LDA DATA.xlsx', sheet_name='Severity')

# print whole sheet data
xl_data_df_sev.head()

# View empirical annual loss data
sev = pd.DataFrame(xl_data_df_sev.groupby("Year")["$ Losses"].sum())
print(sev)

        $ Losses
Year            
2010         544
2011     3001516
2012     4395546
2013    14194854
2014   482342412
2015    39594000
2016   190907000
2017   466765000
2018  2895570000
2019  3223419000
2020       73000

# Convert loss data type from string to integer
xl_data_df_sev["$ Losses"] = pd.to_numeric(xl_data_df_sev["$ Losses"])

# Calculate natural log of losses
xl_data_df_sev["Log Losses"] = np.log(xl_data_df_sev["$ Losses"])

# Calculate first and second moments of log loss data
logmean= xl_data_df_sev["Log Losses"].mean()
logstd= xl_data_df_sev["Log Losses"].std()
logmean, logstd

(14.383098097090633, 2.839148551041988)

# Read input "x" values for visualization of pdf and cdf of loss severity
dist_data_df = pd.read_excel('LDA DATA.xlsx', sheet_name='Distribution')
x = dist_data_df['x loss']

# Check user defined function...
def lognorm_cdf(x, mu, sigma):
    shape  = logstd
    loc    = 0
    scale  = exp(logmean)
    return stats.lognorm.cdf(x, shape, loc, scale)

lognorm_cdf(x, exp(logmean), logstd)

array([0.00423231, 0.01043804, 0.01941052, 0.03423206, 0.06691017,
       0.10472734, 0.15602563, 0.24566565, 0.32850184, 0.42077365,
       0.51763723, 0.6431751 , 0.7294326 , 0.80379001, 0.86419796,
       0.89291645, 0.91044739, 0.9225029 , 0.93140372, 0.9382956 ,
       0.94381773, 0.94835826, 0.95216813, 0.95541756, 0.95822652,
       0.96068228, 0.96284997, 0.96477929, 0.96650889, 0.96806932,
       0.96948504, 0.97077596, 0.9719584 , 0.97304589, 0.97404978,
       0.97497961, 0.97584354, 0.97664851, 0.97740052])

# Against stats function
cdf = lognorm.cdf(x, s=logstd, loc=0, scale=exp(logmean))
cdf

array([0.00423231, 0.01043804, 0.01941052, 0.03423206, 0.06691017,
       0.10472734, 0.15602563, 0.24566565, 0.32850184, 0.42077365,
       0.51763723, 0.6431751 , 0.7294326 , 0.80379001, 0.86419796,
       0.89291645, 0.91044739, 0.9225029 , 0.93140372, 0.9382956 ,
       0.94381773, 0.94835826, 0.95216813, 0.95541756, 0.95822652,
       0.96068228, 0.96284997, 0.96477929, 0.96650889, 0.96806932,
       0.96948504, 0.97077596, 0.9719584 , 0.97304589, 0.97404978,
       0.97497961, 0.97584354, 0.97664851, 0.97740052])

# Check user defined function...
def lognorm_pdf(x, mu, sigma):
    shape  = logstd
    loc    = 0
    scale  = exp(logmean)
    return stats.lognorm.pdf(x, shape, loc, scale)

lognorm_pdf(x, exp(logmean), logstd)

array([4.38896543e-06, 3.89801796e-06, 3.32521266e-06, 2.67244864e-06,
       1.82691403e-06, 1.27850799e-06, 8.42952565e-07, 4.43546288e-07,
       2.54644342e-07, 1.37734732e-07, 7.01887095e-08, 2.62730914e-08,
       1.16581693e-08, 4.87375728e-09, 1.91960463e-09, 1.08269374e-09,
       7.12318190e-10, 5.11168182e-10, 3.87994651e-10, 3.06331313e-10,
       2.49029408e-10, 2.07071277e-10, 1.75310165e-10, 1.50618048e-10,
       1.30996655e-10, 1.15116596e-10, 1.02063179e-10, 9.11889030e-11,
       8.20241221e-11, 7.42209392e-11, 6.75168541e-11, 6.17105556e-11,
       5.66454115e-11, 5.21979667e-11, 4.82697763e-11, 4.47815070e-11,
       4.16686116e-11, 3.88781108e-11, 3.63661695e-11])

# Against stats function
pdf = lognorm.pdf(x, s=logstd, loc=0, scale=exp(logmean))
pdf

array([4.38896543e-06, 3.89801796e-06, 3.32521266e-06, 2.67244864e-06,
       1.82691403e-06, 1.27850799e-06, 8.42952565e-07, 4.43546288e-07,
       2.54644342e-07, 1.37734732e-07, 7.01887095e-08, 2.62730914e-08,
       1.16581693e-08, 4.87375728e-09, 1.91960463e-09, 1.08269374e-09,
       7.12318190e-10, 5.11168182e-10, 3.87994651e-10, 3.06331313e-10,
       2.49029408e-10, 2.07071277e-10, 1.75310165e-10, 1.50618048e-10,
       1.30996655e-10, 1.15116596e-10, 1.02063179e-10, 9.11889030e-11,
       8.20241221e-11, 7.42209392e-11, 6.75168541e-11, 6.17105556e-11,
       5.66454115e-11, 5.21979667e-11, 4.82697763e-11, 4.47815070e-11,
       4.16686116e-11, 3.88781108e-11, 3.63661695e-11])

# Visualize cdf
plt.figure(figsize=(10, 6))
plt.xlabel("Loss Magnitude ($)", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Cumulative Density Function - Loss Severity", fontsize=18)

cdf = lognorm.cdf(x, s=logstd, loc=0, scale=exp(logmean))

plt.plot(x, cdf, 'r')

[<matplotlib.lines.Line2D at 0x257603a2dd8>]

# Obtain random number from Poisson Distribution parametrized by lambda
r = np.random.poisson(lamb, size=1000000)
# Collect in dataframe
lda = pd.DataFrame(r, columns=["Simulated Annual Loss Events"])
lda

# Function to simulate annual losses.
def lda_severity(x):
    y= lognorm.rvs(s=logstd, loc=0, scale=exp(logmean), size=x)
    return sum(y)

# Apply Function to simulate annual losses.
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss Events'].apply(lda_severity)
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss']/1000000
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss'].round(decimals=1)
lda

arr_ann_loss_sev = lda['Simulated Annual Loss'].to_numpy()
arr_ann_loss_sev

array([ 120.7,  577.1,  379.9, ...,  298.3, 1002.7, 1646.7])

# Compute historic VaR at desired quantiles

OpVaR_90 = np.percentile(arr_ann_loss_sev, q=90).round(decimals=3)
OpVaR_95 = np.percentile(arr_ann_loss_sev, q=95).round(decimals=3)
OpVaR_99 = np.percentile(arr_ann_loss_sev, q=99).round(decimals=3)
OpVaR_90 ,OpVaR_95 ,OpVaR_99

(1827.3, 3488.205, 13610.306)

Model Results¶

# Ouput results in tabular format
from tabulate import tabulate
ltable = [['90%', OpVaR_90],['95%', OpVaR_95]]
header = ['\033[1mConfidence Level\033[0m', '\033[1mAnnual OpVaR ($Millions)\033[0m']
print(tabulate(ltable,headers=header))

Confidence Level      Annual OpVaR ($Millions)
------------------  --------------------------
90%                                     1827.3
95%                                     3488.2

# Visualize Loss Distribution
df_loss_dist = lda[lda['Simulated Annual Loss'] < 4000]
df_loss_dist

Loss_Dist_df = pd.DataFrame(df_loss_dist['Simulated Annual Loss'])
Loss_Dist_df.hist(figsize = (12, 6), bins = 10000, alpha = 1, grid = True)
plt.xlabel('Simulated Annual Losses')
plt.ylabel('Frequency')
plt.title('Loss Distribution with OpVaR @ 95% Confidence Interval', fontsize=20)
plt.axvline(x = OpVaR_95, color = 'r', linestyle = '--', label = "95% CI of VaR: {0:.2f}".format(OpVaR_95))
plt.xlim(0, 4000)

plt.legend(framealpha=1, frameon=True);
plt.show()
print ("     At a 95% confidence level, losses will not exceed ${0:.3f} over a 1 year period.".format(OpVaR_95))

     At a 95% confidence level, losses will not exceed $3488.205 over a 1 year period.

	Event	Entity Type	Date	Year	$ Losses	Type of Loss	Type
0	Stone Man Loss	Individual(s)	2010-08-09	2010	544	8,999 BTC	Negligence
1	Stefan Thomas Loss	Individual(s)	2011-06-01	2011	124793	7000 BTC	Negligence
2	Allinvain	Individual(s)	2011-06-13	2011	445688	25,000 BTC	Hack
3	Mt. Gox - June 2011	Exchange	2011-06-19	2011	47123	2,643 BTC	Hack
4	MyBitCoin - June 2011	Exchange	2011-06-20	2011	71656	4,019 BTC	Hack

	YEAR	HACKS	NEGLIGENCE	CCC-MITIGABLE
0	2010	NaN	1.0	1
1	2011	6.0	2.0	8
2	2012	15.0	NaN	15
3	2013	13.0	4.0	17
4	2014	12.0	1.0	13
5	2015	7.0	NaN	7
6	2016	4.0	1.0	5
7	2017	14.0	1.0	15
8	2018	22.0	2.0	24
9	2019	18.0	NaN	18
10	2020	1.0	NaN	5

	Simulated Annual Loss Events
0	13
1	20
2	11
3	12
4	17
...	...
999995	17
999996	12
999997	10
999998	21
999999	11

	Simulated Annual Loss Events	Simulated Annual Loss
0	13	120.7
1	20	577.1
2	11	379.9
3	12	270.9
4	17	229.8
...	...	...
999995	17	450.8
999996	12	163.3
999997	10	298.3
999998	21	1002.7
999999	11	1646.7