Exercise 2: Nested logit model

Kenneth Train and Yves Croissant

2026-06-01

The data set HC from mlogit contains data in R format on the choice of heating and central cooling system for 250 single-family, newly built houses in California.

The alternatives are:

Heat pumps necessarily provide both heating and cooling such that heat pump without cooling is not an alternative.

The variables are:

Note that the full installation cost of alternative gcc is ich.gcc + icca, and similarly for the operating cost and for the other alternatives with cooling.

  1. Run a nested logit model on the data for two nests and one log-sum coefficient that applies to both nests. Note that the model is specified to have the cooling alternatives (gcc, ecc, erc, hpc) in one nest and the non-cooling alternatives (gc, ec, er) in another nest.
library(mlogit)
data("HC", package = "mlogit")
HC <- dfidx(HC, varying = c(2:8, 10:16), choice = "depvar")
cooling.modes <- idx(HC, 2) %in% c('gcc', 'ecc', 'erc', 'hpc')
room.modes <- idx(HC, 2) %in% c('erc', 'er')
# installation / operating costs for cooling are constants, 
# only relevant for mixed systems
HC$icca[! cooling.modes] <- 0
HC$occa[! cooling.modes] <- 0
# create income variables for two sets cooling and rooms
HC$inc.cooling <- HC$inc.room <- 0
HC$inc.cooling[cooling.modes] <- HC$income[cooling.modes]
HC$inc.room[room.modes] <- HC$income[room.modes]
# create an intercet for cooling modes
HC$int.cooling <- as.numeric(cooling.modes)
# estimate the model with only one nest elasticity
nl <- mlogit(depvar ~ ich + och +icca + occa + inc.room +
                 inc.cooling + int.cooling | 0, HC,
             nests = list(cooling = c('gcc','ecc','erc','hpc'), 
             other = c('gc', 'ec', 'er')), un.nest.el = TRUE)
gaze(nl)
             Estimate Std. Error z-value Pr(>|z|)
ich         -0.005549   0.001442  -3.848 0.000119
och         -0.008579   0.002553  -3.360 0.000779
icca        -0.002251   0.001444  -1.558 0.119121
occa        -0.010895   0.012198  -0.893 0.371788
inc.room    -0.378971   0.099631  -3.804 0.000143
inc.cooling  0.249575   0.059213   4.215  2.5e-05
int.cooling -6.000415   5.562423  -1.079 0.280703
iv           0.585922   0.179708   3.260 0.001112
  1. The estimated log-sum coefficient is \(0.59\). What does this estimate tell you about the degree of correlation in unobserved factors over alternatives within each nest?

The correlation is approximately \(1-0.59=0.41\). It’s a moderate correlation.

  1. Test the hypothesis that the log-sum coefficient is 1.0 (the value that it takes for a standard logit model.) Can the hypothesis that the true model is standard logit be rejected?

We can use a t-test of the hypothesis that the log-sum coefficient equal to 1. The t-statistic is :

unname( (coef(nl)['iv'] - 1) / sqrt(vcov(nl)['iv', 'iv']))
## [1] -2.304171

The critical value of t for 95% confidence is 1.96. So we can reject the hypothesis at 95% confidence. We can also use a likelihood ratio test because the multinomial logit is a special case of the nested model.

ml <- update(nl, nests = NULL)
lrtest(nl, ml) |> gaze()
## Chisq = 4.323, df: 1, pval = 0.038

Note that the hypothesis is rejected at 95% confidence, but not at 99% confidence.

  1. Re-estimate the model with the room alternatives in one nest and the central alternatives in another nest. (Note that a heat pump is a central system.)
nl2 <- update(nl,
              nests = list(central = c('ec', 'ecc', 'gc', 'gcc', 'hpc'), 
              room = c('er', 'erc')))
gaze(nl2)
              Estimate Std. Error z-value Pr(>|z|)
ich          -0.011382   0.005422  -2.099   0.0358
och          -0.018253   0.009323  -1.958   0.0502
icca         -0.003375   0.002693  -1.253   0.2102
occa         -0.020633   0.018973  -1.088   0.2768
inc.room     -0.757216   0.342919  -2.208   0.0272
inc.cooling   0.416894   0.207418   2.010   0.0444
int.cooling -13.824875   7.940308  -1.741   0.0817
iv            1.362007   0.653933   2.083   0.0373
  1. What does the estimate imply about the substitution patterns across alternatives? Do you think the estimate is plausible?

The log-sum coefficient is over 1. This implies that there is more substitution across nests than within nests. I don’t think this is very reasonable, but people can differ on their concepts of what’s reasonable.

  1. Is the log-sum coefficient significantly different from 1?

The t-statistic is:

unname((coef(nl2)['iv'] - 1) / sqrt(vcov(nl2)['iv', 'iv']))
## [1] 0.5535849
lrtest(nl2, ml) |> gaze()
## Chisq = 0.527, df: 1, pval = 0.468

We cannot reject the hypothesis at standard confidence levels.

  1. How does the value of the log-likelihood function compare for this model relative to the model in exercise 1, where the cooling alternatives are in one nest and the heating alternatives in the other nest.
logLik(nl)
## 'log Lik.' -178.1247 (df=8)
logLik(nl2)
## 'log Lik.' -180.0231 (df=8)

The \(\ln L\) is worse (more negative.) All in all, this seems like a less appropriate nesting structure.

  1. Rerun the model that has the cooling alternatives in one nest and the non-cooling alternatives in the other nest (like for exercise 1), with a separate log-sum coefficient for each nest.
nl3 <- update(nl, un.nest.el = FALSE)
  1. Which nest is estimated to have the higher correlation in unobserved factors? Can you think of a real-world reason for this nest to have a higher correlation?

The correlation in the cooling nest is around 1-0.60 = 0.4 and that for the non-cooling nest is around 1-0.45 = 0.55. So the correlation is higher in the non-cooling nest. Perhaps more variation in comfort when there is no cooling. This variation in comfort is the same for all the non-cooling alternatives.

  1. Are the two log-sum coefficients significantly different from each other? That is, can you reject the hypothesis that the model in exercise 1 is the true model?

We can use a likelihood ratio tests with models nl and nl3.

lrtest(nl, nl3) |> gaze()
## Chisq = 0.176, df: 1, pval = 0.675

The restricted model is the one from exercise 1 that has one log-sum coefficient. The unrestricted model is the one we just estimated. The test statistics is 0.6299. The critical value of chi-squared with 1 degree of freedom is 3.8 at the 95% confidence level. We therefore cannot reject the hypothesis that the two nests have the same log-sum coefficient.

  1. Rewrite the code to allow three nests. For simplicity, estimate only one log-sum coefficient which is applied to all three nests. Estimate a model with alternatives gcc, ecc and erc in a nest, hpc in a nest alone, and alternatives gc, ec and er in a nest. Does this model seem better or worse than the model in exercise 1, which puts alternative hpc in the same nest as alternatives gcc, ecc and erc?
nl4 <- update(nl, nests=list(n1 = c('gcc', 'ecc', 'erc'), n2 = c('hpc'),
                    n3 = c('gc', 'ec', 'er')))
logLik(nl4)
## 'log Lik.' -180.2633 (df=8)

The \(\ln L\) for this model is \(-180.26\), which is lower (more negative) than for the model with two nests, which got \(-178.12\).