Quantifying Tail Risk

Modelling Operational Value-at-Risk with the Loss Distribution Approach.

Paul McAteer, MSc, MBA

pcm353@stern.nyu.edu


OpVaR with the Loss Distribution Approach.

Model Design


The management of operational risk follows a standard process: (1) Risk Identification; (2) Risk Evaluation; (3) Risk Monitoring; (4) Risk Mitigation; (5) Risk Reporting. The quantification of risk is a fundamental element for the fulfilment of steps (2) - (5).

Based on empirical loss event data, the loss distribution is estimated by modelling the probability and magnitude of potential losses in the industry. To this end, the Loss Distribution Approach (LDA), is employed, a technique which convolutes the loss frequency distribution with the loss severity distribution to model an objective distribution of aggregate losses attributable to the materialization of specified operational risks. LDA is a favored method of large financial entities for the determination of capital adequacy for operational risk1, however it is also eminently suitable for our objective of quantifying custodial risk in the cryptocurrency universe2. We describe the model construction process below in detail:

  1. Construct the empirical discrete frequency density of CCC-mitigable loss events.

  2. Construct the empirical continuous severity density of CCC-mitigable loss events.

  3. Use the empirical data to parametrize the probability density functions of the expected loss frequency and expected loss severity of CCC-mitigable events.

  4. LOSS FREQUENCY: Estimate a Poisson frequency which is described by the single parameter λ, the expected frequency, using the mean of the empirical loss frequency distribution. The Poisson distribution has the density function:

    h(n)= λneλn!

  5. LOSS SEVERITY: We assume a lognormal distribution for loss severity. The log of the loss severity (denoted ln L) is normally distributed with mean μ and variance σ2 . The lognormal distribution has the density function:

    g(l)=12πσl e(12 ((ln(l )μσ)2)

  6. LOSS DISTRIBUTION:Under the standard assumption that frequency and severity are independent, Monte Carlo simulation is used to generate the total loss distribution as the convolution of the frequency and severity distribution. The simulation algorithm is as follows:

    1. Take a random draw from the Poisson frequency distribution3 , characterised by λ

    2. Suppose that that this first iteration simulates n loss events over a 1 year horizon

    3. Take n random draws from the lognormal severity distribution4 . We suppose that results in simulated loss amounts of (L1 , L2 Ln) over the loss horizon

    4. Sum the n simulated losses to obtain a total simulated annual loss X1=L1+ L2+ +Ln 

    5. Repeat steps A-D 100,000 times

    6. Form the histogram of X1X100000 This represents the simulated total annual loss distribution


  7. OpVaR: Based on the simulated total annual loss distribution, we can state with a specified confidence level that losses will not exceed a certain dollar amount




1 LDA is one popular method to estimate potential losses and capital requirements in the Advanced Measurement Approach (AMA) under the Basel II regulations.
2 For a more detailed definition of “CCC-mitigable events”, please refer to “Cryptocurrency: Care, Custody, & Control”, New York University Stern School of Business, May 2020, co-authored by Au, Hoffman, Mattingly, McAteer, Takahashi.
3 Poisson-distributed random variate q are generated with the inverse transform method,

qPoisson=FPoisson1 (p)
4 Similarly to footnote 3, lognormally distributed random variates are generated with the inverse transform method:
qLognormal=FLognormal1 (p)

Model Implementation

In [1]:
# Import the necessary libraries
from scipy import stats
from scipy.stats import lognorm, norm, poisson
from matplotlib  import pyplot as plt
import numpy as np
import pandas as pd
from math import exp
In [2]:
# Read data on operational loss events into DataFrame
excel_data_df = pd.read_excel('LDA DATA.xlsx', sheet_name='Frequency')

# View sheet data
excel_data_df
Out[2]:
YEAR HACKS NEGLIGENCE CCC-MITIGABLE
0 2010 NaN 1.0 1
1 2011 6.0 2.0 8
2 2012 15.0 NaN 15
3 2013 13.0 4.0 17
4 2014 12.0 1.0 13
5 2015 7.0 NaN 7
6 2016 4.0 1.0 5
7 2017 14.0 1.0 15
8 2018 22.0 2.0 24
9 2019 18.0 NaN 18
10 2020 1.0 NaN 5
In [3]:
freq = excel_data_df.groupby("YEAR")['CCC-MITIGABLE'].sum()
print(freq)
YEAR
2010     1
2011     8
2012    15
2013    17
2014    13
2015     7
2016     5
2017    15
2018    24
2019    18
2020     5
Name: CCC-MITIGABLE, dtype: int64
In [4]:
# Estimate lambda 
lamb = np.sum(freq.values) / ((excel_data_df.YEAR[excel_data_df.shape[0]-1] - excel_data_df.YEAR[0])+1)
print(lamb.round(3))
11.636
In [5]:
# Draw random variables from a Poisson distribtion specified with lambda = 11.636
p_rand_v = poisson.rvs(lamb, size=(10000))
 
# Plot the probability density function
hist = plt.hist(p_rand_v, bins=10000) # Collect Frequency Distribution
plt.close("all")
y = hist[0]/np.sum(hist[0]) # Ensure Frequency Represented as Probability
x = hist[1]

plt.figure(figsize=(10, 6))
plt.bar(x[:-1], y, width=0.7, align='center', color="#2c97f1")

plt.xlabel("Number of Loss Events", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Loss Event Probability Density Function", fontsize=18)
Out[5]:
Text(0.5, 1.0, 'Loss Event Probability Density Function')
In [6]:
# Sort the randomly distributed data:
data = p_rand_v 
data_sorted = np.sort(data)

# calculate the proportional values of samples
p = 1. * np.arange(len(data)) / (len(data) - 1)

# Plot the cumulative density function
plt.figure(figsize=(10, 6))
plt.bar(data_sorted, p)
plt.xlim(0, 26)
plt.ylabel("Number of Loss Events", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Loss Event Cumulative Density Function", fontsize=18)
Out[6]:
Text(0.5, 1.0, 'Loss Event Cumulative Density Function')
In [7]:
# Read data on severity of operational loss events into DataFrame
xl_data_df_sev = pd.read_excel('LDA DATA.xlsx', sheet_name='Severity')

# print whole sheet data
xl_data_df_sev.head()
Out[7]:
Event Entity Type Date Year $ Losses Type of Loss Type
0 Stone Man Loss Individual(s) 2010-08-09 2010 544 8,999 BTC Negligence
1 Stefan Thomas Loss Individual(s) 2011-06-01 2011 124793 7000 BTC Negligence
2 Allinvain Individual(s) 2011-06-13 2011 445688 25,000 BTC Hack
3 Mt. Gox - June 2011 Exchange 2011-06-19 2011 47123 2,643 BTC Hack
4 MyBitCoin - June 2011 Exchange 2011-06-20 2011 71656 4,019 BTC Hack
In [8]:
# View empirical annual loss data
sev = pd.DataFrame(xl_data_df_sev.groupby("Year")["$ Losses"].sum())
print(sev)
        $ Losses
Year            
2010         544
2011     3001516
2012     4395546
2013    14194854
2014   482342412
2015    39594000
2016   190907000
2017   466765000
2018  2895570000
2019  3223419000
2020       73000
In [9]:
# Convert loss data type from string to integer
xl_data_df_sev["$ Losses"] = pd.to_numeric(xl_data_df_sev["$ Losses"])
In [10]:
# Calculate natural log of losses
xl_data_df_sev["Log Losses"] = np.log(xl_data_df_sev["$ Losses"])
In [11]:
# Calculate first and second moments of log loss data
logmean= xl_data_df_sev["Log Losses"].mean()
logstd= xl_data_df_sev["Log Losses"].std()
logmean, logstd
Out[11]:
(14.383098097090633, 2.839148551041988)
In [12]:
# Read input "x" values for visualization of pdf and cdf of loss severity
dist_data_df = pd.read_excel('LDA DATA.xlsx', sheet_name='Distribution')
x = dist_data_df['x loss']
In [13]:
# Check user defined function...
def lognorm_cdf(x, mu, sigma):
    shape  = logstd
    loc    = 0
    scale  = exp(logmean)
    return stats.lognorm.cdf(x, shape, loc, scale)
In [14]:
lognorm_cdf(x, exp(logmean), logstd)
Out[14]:
array([0.00423231, 0.01043804, 0.01941052, 0.03423206, 0.06691017,
       0.10472734, 0.15602563, 0.24566565, 0.32850184, 0.42077365,
       0.51763723, 0.6431751 , 0.7294326 , 0.80379001, 0.86419796,
       0.89291645, 0.91044739, 0.9225029 , 0.93140372, 0.9382956 ,
       0.94381773, 0.94835826, 0.95216813, 0.95541756, 0.95822652,
       0.96068228, 0.96284997, 0.96477929, 0.96650889, 0.96806932,
       0.96948504, 0.97077596, 0.9719584 , 0.97304589, 0.97404978,
       0.97497961, 0.97584354, 0.97664851, 0.97740052])
In [15]:
# Against stats function
cdf = lognorm.cdf(x, s=logstd, loc=0, scale=exp(logmean))
cdf
Out[15]:
array([0.00423231, 0.01043804, 0.01941052, 0.03423206, 0.06691017,
       0.10472734, 0.15602563, 0.24566565, 0.32850184, 0.42077365,
       0.51763723, 0.6431751 , 0.7294326 , 0.80379001, 0.86419796,
       0.89291645, 0.91044739, 0.9225029 , 0.93140372, 0.9382956 ,
       0.94381773, 0.94835826, 0.95216813, 0.95541756, 0.95822652,
       0.96068228, 0.96284997, 0.96477929, 0.96650889, 0.96806932,
       0.96948504, 0.97077596, 0.9719584 , 0.97304589, 0.97404978,
       0.97497961, 0.97584354, 0.97664851, 0.97740052])
In [16]:
# Check user defined function...
def lognorm_pdf(x, mu, sigma):
    shape  = logstd
    loc    = 0
    scale  = exp(logmean)
    return stats.lognorm.pdf(x, shape, loc, scale)
In [17]:
lognorm_pdf(x, exp(logmean), logstd)
Out[17]:
array([4.38896543e-06, 3.89801796e-06, 3.32521266e-06, 2.67244864e-06,
       1.82691403e-06, 1.27850799e-06, 8.42952565e-07, 4.43546288e-07,
       2.54644342e-07, 1.37734732e-07, 7.01887095e-08, 2.62730914e-08,
       1.16581693e-08, 4.87375728e-09, 1.91960463e-09, 1.08269374e-09,
       7.12318190e-10, 5.11168182e-10, 3.87994651e-10, 3.06331313e-10,
       2.49029408e-10, 2.07071277e-10, 1.75310165e-10, 1.50618048e-10,
       1.30996655e-10, 1.15116596e-10, 1.02063179e-10, 9.11889030e-11,
       8.20241221e-11, 7.42209392e-11, 6.75168541e-11, 6.17105556e-11,
       5.66454115e-11, 5.21979667e-11, 4.82697763e-11, 4.47815070e-11,
       4.16686116e-11, 3.88781108e-11, 3.63661695e-11])
In [18]:
# Against stats function
pdf = lognorm.pdf(x, s=logstd, loc=0, scale=exp(logmean))
pdf
Out[18]:
array([4.38896543e-06, 3.89801796e-06, 3.32521266e-06, 2.67244864e-06,
       1.82691403e-06, 1.27850799e-06, 8.42952565e-07, 4.43546288e-07,
       2.54644342e-07, 1.37734732e-07, 7.01887095e-08, 2.62730914e-08,
       1.16581693e-08, 4.87375728e-09, 1.91960463e-09, 1.08269374e-09,
       7.12318190e-10, 5.11168182e-10, 3.87994651e-10, 3.06331313e-10,
       2.49029408e-10, 2.07071277e-10, 1.75310165e-10, 1.50618048e-10,
       1.30996655e-10, 1.15116596e-10, 1.02063179e-10, 9.11889030e-11,
       8.20241221e-11, 7.42209392e-11, 6.75168541e-11, 6.17105556e-11,
       5.66454115e-11, 5.21979667e-11, 4.82697763e-11, 4.47815070e-11,
       4.16686116e-11, 3.88781108e-11, 3.63661695e-11])
In [19]:
# Visualize cdf
plt.figure(figsize=(10, 6))
plt.xlabel("Loss Magnitude ($)", fontsize=12)
plt.ylabel("Probability", fontsize=12)
plt.title("Cumulative Density Function - Loss Severity", fontsize=18)

cdf = lognorm.cdf(x, s=logstd, loc=0, scale=exp(logmean))

plt.plot(x, cdf, 'r')
Out[19]:
[<matplotlib.lines.Line2D at 0x257603a2dd8>]
In [20]:
# Obtain random number from Poisson Distribution parametrized by lambda
r = np.random.poisson(lamb, size=1000000)
# Collect in dataframe
lda = pd.DataFrame(r, columns=["Simulated Annual Loss Events"])
lda
Out[20]:
Simulated Annual Loss Events
0 13
1 20
2 11
3 12
4 17
... ...
999995 17
999996 12
999997 10
999998 21
999999 11

1000000 rows × 1 columns

In [21]:
# Function to simulate annual losses.
def lda_severity(x):
    y= lognorm.rvs(s=logstd, loc=0, scale=exp(logmean), size=x)
    return sum(y)
In [22]:
# Apply Function to simulate annual losses.
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss Events'].apply(lda_severity)
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss']/1000000
lda['Simulated Annual Loss'] = lda['Simulated Annual Loss'].round(decimals=1)
lda
Out[22]:
Simulated Annual Loss Events Simulated Annual Loss
0 13 120.7
1 20 577.1
2 11 379.9
3 12 270.9
4 17 229.8
... ... ...
999995 17 450.8
999996 12 163.3
999997 10 298.3
999998 21 1002.7
999999 11 1646.7

1000000 rows × 2 columns

In [23]:
arr_ann_loss_sev = lda['Simulated Annual Loss'].to_numpy()
arr_ann_loss_sev
Out[23]:
array([ 120.7,  577.1,  379.9, ...,  298.3, 1002.7, 1646.7])
In [24]:
# Compute historic VaR at desired quantiles

OpVaR_90 = np.percentile(arr_ann_loss_sev, q=90).round(decimals=3)
OpVaR_95 = np.percentile(arr_ann_loss_sev, q=95).round(decimals=3)
OpVaR_99 = np.percentile(arr_ann_loss_sev, q=99).round(decimals=3)
OpVaR_90 ,OpVaR_95 ,OpVaR_99 
Out[24]:
(1827.3, 3488.205, 13610.306)

Model Results

In [25]:
# Ouput results in tabular format
from tabulate import tabulate
ltable = [['90%', OpVaR_90],['95%', OpVaR_95]]
header = ['\033[1mConfidence Level\033[0m', '\033[1mAnnual OpVaR ($Millions)\033[0m']
print(tabulate(ltable,headers=header))
Confidence Level      Annual OpVaR ($Millions)
------------------  --------------------------
90%                                     1827.3
95%                                     3488.2
In [27]:
# Visualize Loss Distribution
df_loss_dist = lda[lda['Simulated Annual Loss'] < 4000]
df_loss_dist

Loss_Dist_df = pd.DataFrame(df_loss_dist['Simulated Annual Loss'])
Loss_Dist_df.hist(figsize = (12, 6), bins = 10000, alpha = 1, grid = True)
plt.xlabel('Simulated Annual Losses')
plt.ylabel('Frequency')
plt.title('Loss Distribution with OpVaR @ 95% Confidence Interval', fontsize=20)
plt.axvline(x = OpVaR_95, color = 'r', linestyle = '--', label = "95% CI of VaR: {0:.2f}".format(OpVaR_95))
plt.xlim(0, 4000)

plt.legend(framealpha=1, frameon=True);
plt.show()
print ("     At a 95% confidence level, losses will not exceed ${0:.3f} over a 1 year period.".format(OpVaR_95))
     At a 95% confidence level, losses will not exceed $3488.205 over a 1 year period.