## Tuesday, February 23, 2016

### Book Review: Rock Breaks Scissors by Poundstone

http://www.amazon.com/Rock-Breaks-Scissors-Outguessing-Outwitting/dp/1491531185

This is the third book I have read recently that is about forecasting (see here and here for the previous two).

Poundstone did a pretty good job overall. The first chapter of the book is a general discussion on the human psychology of perceiving randomness. The following one half gives specific examples of how to outperform in various more light-hearted and trivial prediction exercises (e.g. multiple choice tests, rock-paper-scissors, passwords etc.), and the remaining one half does the same thing but with more serious topics (e.g. sports betting, web big data, stock market etc.).

Once again I'll try to distill the essence of the book here:
1. The human mind is handicapped in creating and detecting randomness
2. We can take advantage of the above in many situations for a statistically significant gain
3. In situations where randomness rules, the real pros do not care so much about views; they care more about odds

## Sunday, February 7, 2016

### Book Review: Superforecasting by Tetlock and Gardner

The theme of this book is very similar to Nate Silver's The Signal and the Noise. The first few chapters are worth reading, while as it drags on the final one-third of the book is less appealing, as it indulges in corny talks of team works etc.. The new ideas I got from this book, in addition to those already listed in the review above, include:
1. The use of Brier score for rating forecast, because 'good' forecast should be in probabilistic terms
2. As a corollary, one has to make enough forecasts in order to have a statistically significant score (=/= good score)
3. Update predictions as frequently as appropriate
4. There is a vicious cycle going on in the 'Mainstream' of our society: people are not aware of the importance/option to think probabilistic -> they crave for binary, determined answers -> the media are aware of this demand -> the media cultivate the guru/pundit culture -> the people get more addicted to this way of thinking
All in all, it is not as readable a book as Silver's, but I would still recommend it to anyone who is serious about improving and understanding the art of forecasting.

## Thursday, March 19, 2015

### Book Review: The Signal and the Noise by Nate Silver

An anecdotal account without much technical materials, nonetheless an enjoyable read. Through the case studies (weather forecast, earthquake prediction, economic forecast, sports betting, poker, etc.) the readers are gradually introduced to what quantitative prediction is/should be.

If I am to distill three most important points made in the book, they would be
1. Think probability; or even better, think CONDITIONAL probability, aka Bayesian.
2. Giving your prediction a context, or "having a story," would result in forecasts that are much better than blindly dive into datamining.
3. The evaluation of the performance of predictions should not be based (solely) on whether they got it right. Maintaining best practices is more important.
There may not be much originality in the materials, but the writing is decent and there are valuable lessons to take home. Recommended especially for people who work with quantitative data manipulation.

## Tuesday, July 1, 2014

They introduce the $-Game, and explain why it (instead of the Minority Game) is a better model for the financial market. Yeung et al, 2008, "Models of financial markets with extensive participation incentives" They introduce the Wealth Game, and define the temperature of the grand canonical ensemble. More interestingly, they connect this theoretical framework to the real market by calibrating the model to market data, thus identifying where we are on the "phase diagram." De Martino et al, "From perceived risk to collective behavior in minority games" ## Monday, June 30, 2014 ### Volatility Smile and Volatility Models: A Short Note Local Volatility (LV) Model 1. LV preserves market completeness (is it a good/must-have thing?) 2. LV gives forward smiles that are too flat compared to market 3. In fact, LV doesn't really qualify as a model: it fits implied vol perfectly but trivially, with no capability/intention to consider the underlying dynamics 4. LV also produces poor smile dynamics prediction More General Notes 1. The most sought-after quality for a model should be accurate hedging, not mathematical tractability etc. 2. In fact, there is no good reason to limit ourselves to vanillas only when calibrating a vol model; calibrating only to vanillas has been common practice, which is unfortunate 3. The information carried by vanillas is the unconditional (risk-neutral) probabilities that can be recovered by taking second derivatives w.r.t. strikes (i.e. the stock price is here at t=0, what is the probability that it will end up there at t=T?) 4. Conditional probabilities, on the other hand, is not contained in vanillas (i.e. conditional on the stock price being here at t=s 5. Another way of saying this is that vanillas are not sensitive to unconditional distributions 6. Therefore, one way to proceed is to include some exotics in the calibration: forward starting options are sensitive to conditional probabilities; while American digitals (or barriers) are sensitive to jumps 7. As a rule of thumb, jumps are "responsible" for short-dated smile, and stochastic volatility is "responsible" for longer-dated smile Reference: Ayache 2004, Can anyone solve the smile problem? Ayache 2005, Can anyone solve the smile problem? A Post-Scriptum Also, Rebonato's Volatility and Correlation: The Perfect Hedger and the Fox is relevant, and it's a great read, but that would be another episode. ## Saturday, March 1, 2014 ### How do you model the dynamics of asset prices? A self-reflection (Epilogue of 3) As the title suggests, this was supposed to be a three part series. But I feel like I have left out so much that I wanted to discuss, so although the first 3 parts have already provided an overview on what I would call the "classical" quant modeling approach (loosely speaking, one that focuses on describing the asset price time series with a random process), I would take this opportunity to talk about what lies beyond such approach. We are interested particularly in capturing the tails of the distribution. What about putting the tail on the center stage? That is the spirit of extreme value theory. It has had wide application in natural disaster forecast. The basic idea is to posit a probability distribution that specifically fits the tail of the distribution. Contrast this with, say, the stochastic volatility approach in which we tried to fatten the tail, while still keeping our focus on the 'body' of the return distribution. Now we are directly and un-apologetically modeling the tail itself. The advantage is obvious: we have more freedom in fitting the tails closely by dedicating the calibration to it. The short coming is more subtle. In order to perform extreme value analysis, we have to specify the threshold at where the tail starts, so that we can attach a distribution to it. If you define daily return of -20% as the threshold, then you can use some distribution to catch the sub-sample of daily return less than -20%. But how do you choose the threshold? Choose it too deep into the body and you miss the point of using extreme value models. After all, we are trying to focus on the tail by not worrying about modeling the boring part of the distribution. On the other hand, choose it too far off into the tail and you suffer by not having enough historical data points for an accurate calibration. The other issue is that extreme value analysis would not solve the "you don't know what you don't know" problem. You pick a threshold, you do the fitting, and chances are sooner or later a much worse draw down would take you by surprise, proving to you that the tail is even fatter that you have assumed. Now you might say, "Hey, isn't that a necessary evil of any perceivable model in finance? That one has to make assumptions and wait to be surprised?" And I tend to agree to the sad fact. Perhaps except for if we follow a completely different paradigm. Recently some researchers having been looking at crash forecasting using a wide range of unconventional tools. They range from fractal analysis that studies the (change in) scaling of time series; agent-based models that take a bottom-up approach trying to produce asset price dynamics by considering (grossly simplified) behaviors of traders; and critical point/phase change models that are inspired by Ising model in solid state physics. The common theme is the acknowledgement of market non-linearity and the lack of a single, forever stable market state. Unfortunately there is no straight forward way to connect these models to option prices, but with the advancements of numerical methods we may some day see wider applications of them. ## Friday, February 28, 2014 ### Looping the Strings Problem This one is a classic that I believe is found in one of the popular interview books. You start with N pieces of strings that are mixed into a single pile. You start randomly choosing two free ends at a time and joining them together. After N rounds, you will be done as there will be no more free ends left. If the number of closed loops is k, what is F = E[k]? If you go down the combinatorial path, you will soon see how hopeless it is to try figuring out the many different intermediate states. So perhaps combinatorial is not the right tool here. In fact, this problem can be readily solved with a closed-form solution using induction and recurrence. Before arriving at that, let's get some sense and inspiration by stating with a small N. N = 1 is trivial and we see that k = 1 surely, so F = 1. Consider N = 2. During the first round of joining, when one free end is picked, it can be joined to any of the remaining 3 free ends. In one out of these three cases, we end up with k = 2 (namely, if the free end is joined to another end of itself); while in the remaining two cases, we end up with k = 1 (namely, if the free end is joined to any end of the other string). So$ F_2 = \frac 1 3 (1+F_1) + \frac 2 3 (F_1)=\frac 4 3 $. More generally, it's easy to show that$ F_N = \frac 1 {2N-1} (1+F_{N-1}) + \frac {2N-2} {2N-1} (F_{N-1})$This recurring relation can be rearranged to give the difference equation$ F_N = F_{N-1} + \frac 1 {2N-1} $. With the initial condition of$ F_1 = 1$, we can alternatively express it as a finite sum. Note how slow it grows with N (~ log(N)). ## Thursday, January 30, 2014 ### How do you model the dynamics of asset prices? A self-reflection (3 of 3) Last time we added jumps to the recipe to enrich the original random walk dynamics. Jump diffusion model is able to fit to market implied vol much better than the diffusion only counterpart. Before moving on, just two reminders: 1. The numerical pricing procedure becomes more involved, especially for grid/finite differencing methods (because we end up with an integro-differential equation, as opposed to a PDE) 2. Although we usually speak of "the" jump diffusion model, we are free in specifying the probability distribution of the jump amplitude How about good old Gaussian? So is jump diffusion the ultimate holy grail? And is pure diffusion model done with? For the first question, we will see later in the series that there are many other approaches that supersede jump diffusion, but for now let's take a step back and address the second question. It turns out that the use of mixtures is able to tackle at least some of the shortcomings of the original random walk model. A mixture, not surprisingly, is formed by combining more than one 'sub-distributions.' (usually, but not necessarily, of the same kind) Here we focus on Gaussian mixture. There are a number of possibilities regarding how the sub-distributions are combined, the use of Markov chain being a prominent one. Essentially, every time we need a new random variable to be generated, we first toss a coin to determine which sub-distribution out of the mixture to draw from. In the case of a mixture of two Gaussians, if one sub-distribution has a larger variance than the other, then the mixture would have fatter tail, although the individual sub-distributions do not. Candidate: Mixture • Markovian • Most likely not tractable • Excess kurtosis • Market incomplete The use of mixture models in option pricing is rare, possibly because not everyone likes the idea of the hidden Markov chain that comes with the same package. Justifying and interpreting it is easier in the context of econometric modeling that deals with longer time frames, in which case the state changes can be perceived as a change in regimes. For something like intra-day option pricing, the model may require that we draw from alternate sub-distributions every 5 minutes or so, and it's not trivial to make sense of what that means. (other than the fact that it fits the data well) Another French Mathematician If producing a fatter tail is all we care about, then what about substituting Gaussian process with something with excess kurtosis? The resulting model would likely not have analytic solution, but we can fall back on numerical methods as our last resort. For example, ask the computer to generate 10000 random numbers from a Student's t distribution and use those in a Monte Carlo simulation. The idea sounds neat, but one has to be cautious about it. We may not be aware of it, but when we were modeling asset prices with Gaussian or Poisson processes, both of the processes are closed under linear transformation. In other words, aggregating multiple Gaussian (or Poisson) random variables would result in yet another Gaussian (or Poisson) variable. That means when we use Gaussian process to model asset prices, even though it may not be a very faithful description of reality, at least we are being consistent and we know what we are getting ourselves into: namely, if the one-day distribution is an N(0.01, 0.05), then the one-month distribution would be an N(0.25,0.25), etc. Other distributions in general do not have this property. If in our hypothetical Monte Carlo simulation we draw from a t distribution with a degree of freedom of 10 for daily return, how would the monthly return be distributed? How about quarterly? These problems can be avoided by going for non-Gaussian processes while staying within the territory of Levy processes, which is a broad family of processes that possess the nice property of being closed under linear transformation. (obviously both Gaussian and Poisson processes are members of this family) ## Tuesday, January 28, 2014 ### Other Interview Questions Q: It's 12:00 now. What time would it be when the minute hand crosses the hour hand again? A: Between 1200 and 2400, this will happen for 11 times. So divide 12 hr by 11 and add to 1200, i.e. 13:05:27 Q: A cell can do 4 things: 1) Dies; 2) Nothing; 3) Duplicate; or 4) Triplicate. What is P(colony dies out) if we start with one single cell? A: Solve the polynomial$ p = \frac {1} {4} + \frac {1}{4} p + \frac {1}{4} p^2 + \frac {1}{4} p^3$Q: A 6x6 matrix has entries of either +1 or -1. How many unique combinations are there such that all column products and row products are +1? A: Consider the top left 5x5 sub-matrix. It can take any permutation (as long as each row/column contains no more than 3 same number) because we can rely on the rest of the 6x6 matrix (an "L" shape) to "fix" it. So the answer is$ 2^{25} $Q: There is an integer X. The product of the digits is 96. What is the largest/smallest possible value of X? A: smallest = 268, largest = 322222 Q: What is the sum of all digits of all the numbers between 1 and 1000000? A: Convince yourself that all digits appear for an equal number of times. The answer is 27000001 Q: A bus makes a stop, and 3/4 of all passengers get off, while 7 get on. This repeats for two more times. What is the smallest number of initial passengers? A: 52 Q: Paint the outside of a cube, then dice it into 27 equal smaller cubes. Pick one at random and roll it. What is P(no paint visible after the roll)? A:$ 6/27 \times 1/6 + 1/27 $Q: Two equally talented teams are playing a series of games in which the first team to win 4 games wins the series. What is P(7th game has to be played)? A:$ \frac {6!}{3!3!} \frac {1}{2^6} = \frac {5}{16}$Q: I roll a die, spin a roulette and draw a card from a deck of 52. What is the probability that all three give the same number? A:$ 5 \times \frac 1 6 \times \frac 1 {13} \times 1 {38} $Q: What is the last digit of 3 to the power 33? A: Some experimentation would indicate that$3^0=1$,$3^1=3$,$3^2=9$,$3^3=27$,$3^4=81$, ..., hence$3^33  mod  10 = 3$Q: N couples shake hand with others. It is known that 1) They don't shake hand with their own partners; and 2) The first (2N - 1) people surveyed all have different number of handshakes. What is the number of handshake the remaining person made? A: N - 1 (Can draw graphs to deduce; any analytic solution?) Q: What is P(3 random points on a circle are within an 180 degree arc)? A: There are two approaches: 1) Area on an abstract x-y plane, x and y are distances from the first point pinned down (call it z, which is arbitrary); or 2)$ N \times \frac {1}{2^{N-1}} $, with$ N=3$. Both give the answer of 3/4. Q: How do you simulate a 6-sided die using a fair coin? What's the expected number of toss to produce the number? A: Use a binary scheme. With 3 tosses, there are 8 possible outcomes, which is more than enough to simulate 6 sides. The trick is to know which 2 to exclude. {HHH,TTT} would be a bad choice because you always have to wait until the 3rd toss to know if you failed. So we should choose something like {HHH,HHT} or {THH,THT}. The expected number of toss is given by solving$E[N]=\frac 6 8 \times 3 + \frac 2 8 (2+E[N])$Q: Place 3 points randomly on the perimeter of a square. What is P(they don't form a triangle) and P(all lie on different edges)? A:$P(they_don't_form_a_triangle) = 4/{4^3} = 1/16$;$P(all_lie_on_different_edges) =\frac {4 \times _3P_3}{4^3} = 3/8\$

Q: 10 light bulbs are lined up in a row, and it is known that no two of the adjacent light bulbs can be both on. What is the possible number of arrangements?
A: Fibonacci sequence, 144

Q: You have a large number of red and blue balls (equally many) in a large container. What is P(odd number of red) if you draw a) 3 balls from the container? b) 10 balls from the container?
A: 0.5 regardless of how many you draw. For example if 3 are drawn, the binomial distribution is 1(odd) 3(even) 3(odd) 1(even); if 4 are drawn, the binomial distribution is 1(even) 4(odd) 6(even) 4(odd) 1(even).

Q: Transport as many apples as possible to a destination 1000 km away starting with 3000 apples. For each km of transportation, you lose 1 apple.
A:  The best tactic is to always start with full load so as to maximize number of "apple km" traveled per apple lost. So the 1st stop is optimally at 3000 - 3X = 2000 <=> X = 333; the 2nd stop is optimally at 2000 - 2X = 1000 <=> X = 500. After 833 km, we are left with 1000 apples. Hence we will have 833 apples at the destination.

Q: You are playing against two players A and B individually in a tournament, with A being the better player. If you have to win at least 2 consecutive games out of 3 in order to be the champion, would you prefer the order ABA or BAB?
A: Out of the 8 outcomes {LLL,LLW,LWL,LWW,WLL,WLW,WWL,WWW}, only {WWW,WWL,LWW} are desirable. If q = P(winning against A) and p = P(winning against B) so that p > q,
P(champion with the order ABA) = pqp + pq(1-p) + (1-p)qp = 2pq - ppq
P(champion with the order BAB) = qpq + qp(1-q) + (1-q)pq = 2pq - pqq
So  P(champion with the order BAB) > P(champion with the order ABA)

## Friday, January 17, 2014

### Short Notes on HMM (Hidden Markov Model)

1. Regarding calibration
Numerical optimization is computatinoal intensive and prone to local max, but it is able to estimate any general model specification. On the other hand, EM algorithms such as Baum-Welch are more reliable in seeking global max, but they are only capable of backing out the parameters of models with distributions that are fully described by moments (Gaussian, Poisson, ...)
2. Problem with HMM, and extensions
As Soltan discusses in his paper, ordinary HMM is by definition prone to instability due to the exponentiated matrix structure. The instability dominates in the long run, meaning that state occupancy duration estimation of ordinary HMM is usually unreliable. There are two ways to go: 1) HSMM (a.k.a. explicit duration HMM, inhomogeneous HMM, etc.), or 2) Treat duration as an extra degree of state space (see Soltan).
References: