The multinomial probit is obtained with the same modeling that we used while presenting the random utility model. The utility of an alternative is still the sum of two components: \(U_j = V_j + \epsilon_j\).
but the joint distribution of the error terms is now a multivariate normal with mean 0 and with a matrix of covariance denoted \(\Omega\)1.
Alternative \(l\) is chosen if:
\[ \left\{ \begin{array}{rcl} U_1-U_l&=&(V_1-V_l)+(\epsilon_1-\epsilon_l)<0\\ U_2-U_l&=&(V_2-V_l)+(\epsilon_2-\epsilon_l)<0\\ & \vdots & \\ U_J-U_l&=&(V_J-V_l)+(\epsilon_J-\epsilon_l)<0\\ \end{array} \right. \]
wich implies, denoting \(V^l_j=V_j-V_l\):
\[ \left\{ \begin{array}{rclrcl} \epsilon^l_1 &=& (\epsilon_1-\epsilon_l) &<& - V^l_1\\ \epsilon^l_2 &=& (\epsilon_2-\epsilon_l) &<& - V^l_2\\ &\vdots & & \vdots & \\ \epsilon^l_J &=& (\epsilon_J-\epsilon_l) &<& - V^l_J\\ \end{array} \right. \]
The initial vector of errors \(\epsilon\) are transformed using the following transformation:
\[\epsilon^l = M^l \epsilon\]
where the transformation matrix \(M^l\) is a \((J-1) \times J\) matrix obtained by inserting in an identity matrix a \(l^{\mbox{th}}\) column of \(-1\). For example, if \(J = 4\) and \(l = 3\):
\[ M^3 = \left( \begin{array}{cccc} 1 & 0 & -1 & 0 \\ 0 & 1 & -1 & 0 \\ 0 & 0 & -1 & 1 \\ \end{array} \right) \]
The covariance matrix of the error differences is obtained using the following matrix:
\[ \mbox{V}\left(\epsilon^l\right)=\mbox{V}\left(M^l\epsilon\right) = M^l\mbox{V}\left(\epsilon\right){M^l}^{\top} = M^l\Omega{M^l}^{\top} \]
The probability of choosing \(l\) is then:
\[\begin{equation} P_l =\mbox{P}(\epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \; \epsilon^l_J<-V_J^l) \end{equation}\]
with the hypothesis of distribution, this writes:
\[\begin{equation} P_l = \int_{-\infty}^{-V_1^l}\int_{-\infty}^{-V_2^l}...\int_{-\infty}^{-V_J^l}\phi(\epsilon^l) d\epsilon^l_1 d\epsilon^l_2... d^l_J \end{equation}\]
with:
\[\begin{equation} \phi\left(\epsilon^l\right)=\frac{1}{(2\pi)^{(J-1)/2}\mid\Omega^l\mid^{1/2}} e^{-\frac{1}{2}\epsilon^l{\Omega^l}^{-1}\epsilon^l} \end{equation}\]
Two problems arise with this model:
The meaning-full parameters are those of the covariance matrix of the error \(\Omega\). For example, with \(J = 3\):
\[ \Omega = \left( \begin{array}{ccc} \sigma_{11} & \sigma_{12} & \sigma_{13} \\ \sigma_{21} & \sigma_{22} & \sigma_{23} \\ \sigma_{31} & \sigma_{32} & \sigma_{33} \\ \end{array} \right) \]
\[ \Omega^1 = M^1 \Omega {M^1}^{\top}= \left( \begin{array}{cc} \sigma_{11}+\sigma_{22}-2\sigma_{12} & \sigma_{11} + \sigma_{23} - \sigma_{12} -\sigma_{13} \\ \sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13} & \sigma_{11} + \sigma_{33} - 2 \sigma_{13} \\ \end{array} \right) \]
The overall scale of utility being unidentified, one has to impose the value of one of the variance, for example the first one is fixed to 1. We then have:
\[ \Omega^1 = \left( \begin{array}{cc} 1 & \frac{\sigma_{11}+ \sigma_{23} - \sigma_{12} -\sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \frac{\sigma_{11}+\sigma_{23}- \sigma_{12} - \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} & \frac{\sigma_{11} + \sigma_{33} - 2 \sigma_{13}}{\sigma_{11}+\sigma_{22}-2\sigma_{12}} \\ \end{array} \right) \]
Therefore, out the 6 structural parameters of the covariance matrix, only 3 can be identified. Moreover, it’s almost impossible to interpret these parameters.
More generally, with \(J\) alternatives, the number of the parameters of the covariance matrix is \((J+1)\times J/2\) and the number of identified parameters is \(J\times(J-1)/2-1\).
Let \(L^l\) be the Choleski decomposition of the covariance matrix of the error differences:
\[ \Omega^l=L^l {L^l}^{\top} \]
This matrix is a lower triangular matrix of dimension \((J-1)\):
\[ L^l= \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) \]
Let \(\eta\) be a vector of standard normal deviates:
\[ \eta \sim N(0, I) \]
Therefore, we have:
\[ \mbox{V}\left(L^l\eta\right)=L^lV(\eta){L^l}^{\top}=L^lI{L^l}^{\top}=\Omega^l \]
Therefore, if we draw a vector of standard normal deviates \(\eta\) and apply to it this transformation, we get a realization of \(\epsilon^l\). This joint probability can be written as a product of conditional and marginal probabilities:
\[ \begin{array}{rcl} P_l &=& \mbox{P}(\epsilon^l_1<- V_1^l \;\&\; \epsilon^l_2<-V_2^l \;\&\; ... \;\&\; \epsilon^l_J<-V_J^l))\\ &=& \mbox{P}(\epsilon^l_1<- V_1^l))\\ &\times&\mbox{P}(\epsilon^l_2<-V_2^l \mid \epsilon^l_1<-V_1^l) \\ &\times&\mbox{P}(\epsilon^l_3<-V_3^l \mid \epsilon^l_1<-V_1^l \;\&\; \epsilon^l_2<-V_2^l) \\ & \vdots & \\ &\times&\mbox{P}(\epsilon^l_J<-V_J^l \mid \epsilon^l_1<-V_1^l \;\&\; ... \;\&\; \epsilon^l_{J-1}<-V_{J-1}^l)) \\ \end{array} \]
The vector of error differences deviates is:
\[ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{ccccc} l_{11} & 0 & 0 &... & 0 \\ l_{21} & l_{22} & 0 & ... & 0 \\ l_{31} & l_{32} & l_{33} & ... & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ l_{(J-1)1} & l_{(J-1)2} & l_{(J-1)3} & ... & l_{(J-1)(J-1)} \\ \end{array} \right) \times \left( \begin{array}{c} \eta_1 \\ \eta_2 \\ \eta_3 \\ \vdots \\ \eta_J \end{array} \right) \]
\[ \left( \begin{array}{c} \epsilon^l_1 \\ \epsilon^l_2 \\ \epsilon^l_3 \\ \vdots \\ \epsilon^l_J \end{array} \right) = \left( \begin{array}{l} l_{11}\eta_1 \\ l_{21}\eta_1+l_{22}\eta_2 \\ l_{31}\eta_1+l_{32}\eta_2 + l_{33}\eta_3\\ \vdots \\ l_{(J-1)1}\eta_1+l_{(J-1)2}\eta_2+...+l_{(J-1)(J-1)}\eta_{J-1} \end{array} \right) \]
Let’s now investigate the marginal and conditional probabilities:
This probabilities can easily be simulated by drawing numbers from a truncated normal distribution.
This so called GHK algorithm2 (for Geweke, Hajivassiliou and Keane who developed this algorithm) can be described as follow:
Several points should be noted concerning this algorithm:
We use again the Fishing data frame, with only a subset of three alternatives used. The multinomial probit model is estimated using mlogit with the probit argument equal to TRUE.
library(mlogit)
Fish <- dfidx(Fishing, varying = 2:9, choice = "mode",
idnames = c("chid", "alt"))Fish.mprobit <- mlogit(mode~price | income | catch, Fish, probit = TRUE,
alt.subset=c('beach', 'boat','pier'))summary(Fish.mprobit)
Call:
mlogit(formula = mode ~ price | income | catch, data = Fish,
alt.subset = c("beach", "boat", "pier"), probit = TRUE)
Frequencies of alternatives:choice
beach boat pier
0.18356 0.57260 0.24384
bfgs method
14 iterations, 0h:0m:10s
g'(-H)^-1g = 9.77E-07
gradient close to zero
Coefficients :
Estimate Std. Error z-value Pr(>|z|)
(Intercept):boat 7.2514e-01 3.5809e-01 2.0250 0.0428661 *
(Intercept):pier 6.2393e-01 2.7396e-01 2.2774 0.0227617 *
price -1.2154e-02 1.7697e-03 -6.8681 6.505e-12 ***
income:boat 2.4005e-06 3.6698e-05 0.0654 0.9478448
income:pier -6.5419e-05 4.0832e-05 -1.6022 0.1091198
catch:beach 1.5479e+00 4.3002e-01 3.5995 0.0003188 ***
catch:boat 4.0010e-01 4.1600e-01 0.9618 0.3361595
catch:pier 1.2747e+00 5.5863e-01 2.2819 0.0224968 *
boat.pier 5.4570e-01 4.6263e-01 1.1795 0.2381809
pier.pier 6.9544e-01 2.9294e-01 2.3740 0.0175973 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log-Likelihood: -478.43
McFadden R^2: 0.32751
Likelihood ratio test : chisq = 465.99 (p.value = < 2.22e-16)