Thursday, January 30, 2014

How do you model the dynamics of asset prices? A self-reflection (3 of 3)

Last time we added jumps to the recipe to enrich the original random walk dynamics. Jump diffusion model is able to fit to market implied vol much better than the diffusion only counterpart. Before moving on, just two reminders:
  1. The numerical pricing procedure becomes more involved, especially for grid/finite differencing methods (because we end up with an integro-differential equation, as opposed to a PDE)
  2. Although we usually speak of "the" jump diffusion model, we are free in specifying the probability distribution of the jump amplitude
How about good old Gaussian?
So is jump diffusion the ultimate holy grail? And is pure diffusion model done with? For the first question, we will see later in the series that there are many other approaches that supersede jump diffusion, but for now let's take a step back and address the second question.

It turns out that the use of mixtures is able to tackle at least some of the shortcomings of the original random walk model. A mixture, not surprisingly, is formed by combining more than one 'sub-distributions.' (usually, but not necessarily, of the same kind) Here we focus on Gaussian mixture. There are a number of possibilities regarding how the sub-distributions are combined, the use of Markov chain being a prominent one. Essentially, every time we need a new random variable to be generated, we first toss a coin to determine which sub-distribution out of the mixture to draw from. In the case of a mixture of two Gaussians, if one sub-distribution has a larger variance than the other, then the mixture would have fatter tail, although the individual sub-distributions do not.

       Candidate: Mixture
  • Markovian
  • Most likely not tractable
  • Excess kurtosis
  • Market incomplete
The use of mixture models in option pricing is rare, possibly because not everyone likes the idea of the hidden Markov chain that comes with the same package. Justifying and interpreting it is easier in the context of econometric modeling that deals with longer time frames, in which case the state changes can be perceived as a change in regimes. For something like intra-day option pricing, the model may require that we draw from alternate sub-distributions every 5 minutes or so, and it's not trivial to make sense of what that means. (other than the fact that it fits the data well)

Another French Mathematician
If producing a fatter tail is all we care about, then what about substituting Gaussian process with something with excess kurtosis? The resulting model would likely not have analytic solution, but we can fall back on numerical methods as our last resort. For example, ask the computer to generate 10000 random numbers from a Student's t distribution and use those in a Monte Carlo simulation.

The idea sounds neat, but one has to be cautious about it. We may not be aware of it, but when we were modeling asset prices with Gaussian or Poisson processes, both of the processes are closed under linear transformation. In other words, aggregating multiple Gaussian (or Poisson) random variables would result in yet another Gaussian (or Poisson) variable. That means when we use Gaussian process to model asset prices, even though it may not be a very faithful description of reality, at least we are being consistent and we know what we are getting ourselves into: namely, if the one-day distribution is an N(0.01, 0.05), then the one-month distribution would be an N(0.25,0.25), etc. Other distributions in general do not have this property. If in our hypothetical Monte Carlo simulation we draw from a t distribution with a degree of freedom of 10 for daily return, how would the monthly return be distributed? How about quarterly?

These problems can be avoided by going for non-Gaussian processes while staying within the territory of Levy processes, which is a broad family of processes that possess the nice property of being closed under linear transformation. (obviously both Gaussian and Poisson processes are members of this family)

Tuesday, January 28, 2014

Other Interview Questions

Q: It's 12:00 now. What time would it be when the minute hand crosses the hour hand again?
A: Between 1200 and 2400, this will happen for 11 times. So divide 12 hr by 11 and add to 1200, i.e. 13:05:27

Q: A cell can do 4 things: 1) Dies; 2) Nothing; 3) Duplicate; or 4) Triplicate. What is P(colony dies out) if we start with one single cell?
A: Solve the polynomial
$ p = \frac {1} {4} + \frac {1}{4} p + \frac {1}{4} p^2 + \frac {1}{4} p^3$

Q: A 6x6 matrix has entries of either +1 or -1. How many unique combinations are there such that all column products and row products are +1?
A: Consider the top left 5x5 sub-matrix. It can take any permutation (as long as each row/column contains no more than 3 same number) because we can rely on the rest of the 6x6 matrix (an "L" shape) to "fix" it. So the answer is $ 2^{25} $

Q: There is an integer X. The product of the digits is 96. What is the largest/smallest possible value of X?
A: smallest = 268, largest = 322222

Q: What is the sum of all digits of all the numbers between 1 and 1000000?
A: Convince yourself that all digits appear for an equal number of times. The answer is 27000001

Q: A bus makes a stop, and 3/4 of all passengers get off, while 7 get on. This repeats for two more times. What is the smallest number of initial passengers?
A: 52

Q: Paint the outside of a cube, then dice it into 27 equal smaller cubes. Pick one at random and roll it. What is P(no paint visible after the roll)?
A: $ 6/27 \times 1/6 + 1/27 $

Q: Two equally talented teams are playing a series of games in which the first team to win 4 games wins the series. What is P(7th game has to be played)?
A: $ \frac {6!}{3!3!} \frac {1}{2^6} = \frac {5}{16}$

Q: I roll a die, spin a roulette and draw a card from a deck of 52. What is the probability that all three give the same number?
A: $ 5 \times \frac 1 6 \times \frac 1 {13} \times 1 {38} $

Q: What is the last digit of 3 to the power 33?
A: Some experimentation would indicate that $3^0=1$,  $3^1=3$, $3^2=9$,  $3^3=27$, $3^4=81$, ..., hence $3^33  mod  10 = 3$

Q: N couples shake hand with others. It is known that 1) They don't shake hand with their own partners; and 2) The first (2N - 1) people surveyed all have different number of handshakes. What is the number of handshake the remaining person made?
A: N - 1 (Can draw graphs to deduce; any analytic solution?)

Q: What is P(3 random points on a circle are within an 180 degree arc)?
A: There are two approaches: 1) Area on an abstract x-y plane, x and y are distances from the first point pinned down (call it z, which is arbitrary); or 2) $ N \times \frac {1}{2^{N-1}} $, with $ N=3$. Both give the answer of 3/4.

Q: How do you simulate a 6-sided die using a fair coin? What's the expected number of toss to produce the number?
A: Use a binary scheme. With 3 tosses, there are 8 possible outcomes, which is more than enough to simulate 6 sides. The trick is to know which 2 to exclude. {HHH,TTT} would be a bad choice because you always have to wait until the 3rd toss to know if you failed. So we should choose something like {HHH,HHT} or {THH,THT}. The expected number of toss is given by solving
$E[N]=\frac 6 8 \times 3 + \frac 2 8 (2+E[N])$

Q: Place 3 points randomly on the perimeter of a square. What is P(they don't form a triangle) and P(all lie on different edges)?
A: $P(they_don't_form_a_triangle) = 4/{4^3} = 1/16$; $P(all_lie_on_different_edges) =\frac {4 \times _3P_3}{4^3} = 3/8$

Q: 10 light bulbs are lined up in a row, and it is known that no two of the adjacent light bulbs can be both on. What is the possible number of arrangements?
A: Fibonacci sequence, 144

Q: You have a large number of red and blue balls (equally many) in a large container. What is P(odd number of red) if you draw a) 3 balls from the container? b) 10 balls from the container?
A: 0.5 regardless of how many you draw. For example if 3 are drawn, the binomial distribution is 1(odd) 3(even) 3(odd) 1(even); if 4 are drawn, the binomial distribution is 1(even) 4(odd) 6(even) 4(odd) 1(even).

Q: Transport as many apples as possible to a destination 1000 km away starting with 3000 apples. For each km of transportation, you lose 1 apple.
A:  The best tactic is to always start with full load so as to maximize number of "apple km" traveled per apple lost. So the 1st stop is optimally at 3000 - 3X = 2000 <=> X = 333; the 2nd stop is optimally at 2000 - 2X = 1000 <=> X = 500. After 833 km, we are left with 1000 apples. Hence we will have 833 apples at the destination.

Q: You are playing against two players A and B individually in a tournament, with A being the better player. If you have to win at least 2 consecutive games out of 3 in order to be the champion, would you prefer the order ABA or BAB?
A: Out of the 8 outcomes {LLL,LLW,LWL,LWW,WLL,WLW,WWL,WWW}, only {WWW,WWL,LWW} are desirable. If q = P(winning against A) and p = P(winning against B) so that p > q,
P(champion with the order ABA) = pqp + pq(1-p) + (1-p)qp = 2pq - ppq
P(champion with the order BAB) = qpq + qp(1-q) + (1-q)pq = 2pq - pqq
So  P(champion with the order BAB) > P(champion with the order ABA)


Friday, January 17, 2014

Short Notes on HMM (Hidden Markov Model)

1. Regarding calibration
Numerical optimization is computatinoal intensive and prone to local max, but it is able to estimate any general model specification. On the other hand, EM algorithms such as Baum-Welch are more reliable in seeking global max, but they are only capable of backing out the parameters of models with distributions that are fully described by moments (Gaussian, Poisson, ...)
2. Problem with HMM, and extensions
As Soltan discusses in his paper, ordinary HMM is by definition prone to instability due to the exponentiated matrix structure. The instability dominates in the long run, meaning that state occupancy duration estimation of ordinary HMM is usually unreliable. There are two ways to go: 1) HSMM (a.k.a. explicit duration HMM, inhomogeneous HMM, etc.), or 2) Treat duration as an extra degree of state space (see Soltan).
References: