Understand DIC

Bayesian
中文
Author

JW Tsai

Published

March 27, 2024

This mini-study aims to understand how the DIC (deviance information criterion) index works.

The common idea of an information criterion is \(D + 2pD\). The \(D\) (deviance) can also be presented as \(-2\) log-likelihood. Besides, a version of pD (effective number of parameters) of DIC (as same as JAGS program) is defined as the variance of log-likelihood.

Code
import numpy as np
import pymc as pm
import arviz as az
import pandas as pd
import matplotlib.pyplot as plt

# loading iris data set.
from sklearn.datasets import load_iris
iris = load_iris()

Goals.

這個研究就簡單拿 iris 資料集來測試。 我們知道 iris 有 4 個 features: 花萼長度 (Sepal.Length), 花萼寬度 (Sepal.Width), 花瓣長度 (Petal.Length), 花瓣寬度 (Petal.Width)。

我們今天就簡單用 Sepal.Length ~ Sepal.Width 這個模式來看看 DIC 怎麼算 此外,為了增加一點參數,我們再使用 3 個 target,建立階層線性模式。

因此,模式如下:

\[ \begin{align} \text{Likelihood:} \\ \text{Length} &\sim N(\mu _w, \sigma^2) \\ \mu _w &= \beta _0 + \beta _{1i} \text{Width} \\ \\ \text{Priors:} \\ \beta _0, \beta _{1i} &\sim N(0,5) \\ \sigma &\sim \text{Exp}(1) \end{align} \]

這邊的 \(\beta _{1i}\) 是每一個 level 對應的參數。所以應該會有 3 個。

jags

Let’s see how to run this model in jags.

Firstly, we call the iris data set (from R default {datasets})

data(iris)

Secondly, we define the data list and model string in the {R2jags} package. The {R2jags} allows users to write a jags model just like an R function.

  • The data list.
dat_list = list(
  sepal_length = iris$Sepal.Length,
  sepal_width = iris$Sepal.Width,
  species = iris$Species,
  n = 150
)
  • The model string.
mod_string <- \(){
  ## priors
  beta0 ~ dnorm(0,1/5^2)
  sigma ~ dexp(1)
  for (j in 1:3){
    beta1[j] ~ dnorm(0,1/5^2)
  }
  
  ## likelihood
  for (i in 1:n){
    mu_w[i] <- beta0 + beta1[species[i]] * sepal_width[i]
    sepal_length[i] ~ dnorm(mu_w[i], 1/sigma^2) 
  }
}

Finally, we run this model through the jags function.

fit <- jags(data = dat_list, 
     parameters.to.save = c('beta0','beta1','sigma'),
     model.file = (mod_string)
     )

Then, the output of this jags model is shown as follows:

> print(fit, digits = 3)
Inference for Bugs model at "/var/folders/1f/8r50hwmn6m5dwrngfgysq4p40000gn/T//Rtmpbvfmga/modelab6b34cc72fd.txt", fit using jags,
 3 chains, each with 2000 iterations (first 1000 discarded)
 n.sims = 3000 iterations saved
         mu.vect sd.vect    2.5%     25%     50%     75%   97.5%  Rhat n.eff
beta0      3.338   0.333   2.680   3.117   3.336   3.559   3.981 1.001  2400
beta1[1]   0.488   0.098   0.298   0.424   0.490   0.552   0.685 1.002  1700
beta1[2]   0.938   0.120   0.700   0.856   0.938   1.019   1.175 1.001  2400
beta1[3]   1.091   0.113   0.870   1.015   1.091   1.168   1.310 1.002  1900
sigma      0.444   0.026   0.397   0.426   0.444   0.461   0.496 1.002  1600
deviance 180.851   3.199 176.629 178.531 180.100 182.442 188.810 1.002  1400

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 5.1 and DIC = 186.0
DIC is an estimate of expected predictive error (lower deviance is better).

pymc

Now, we use the sync to replicate these results. We are interested in two things,

  • RQ1. to compare the parameters of beta0, beta1, and sigma.
  • RQ2. to compute the (expected) deviance, pD, and DIC.
Code
iris_data = pd.DataFrame(iris['data'])
iris_data.columns = iris['feature_names']
iris_data
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
... ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

150 rows × 4 columns

Code
#seed=1234

target_index, target = pd.Series(iris['target']).factorize()
#width_index, width = iris_data[1].factorize()


dict = {
    'target': iris['target_names'], 
    'target_index': target_index,
    #'width_index': width_index
}
dict
{'target': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),
 'target_index': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}

RQ1. Compare jags and pymc

Code
with pm.Model(coords=dict) as iris_model:   
    ## data
    sepal_length = pm.Data('sepal_length', iris_data['sepal length (cm)'])
    sepal_width = pm.Data('sepal_width', iris_data['sepal width (cm)'])
    

    ## priors
    beta0 = pm.Normal('β0', 0,5)
    beta1 = pm.Normal('β1', 0,5, shape=3)
    sigma = pm.Exponential('σ',1)

    ## likelihood
    mu_w = beta0 + beta1[target_index] * sepal_width
    Length = pm.Normal('length', mu_w, sigma, observed=sepal_length)

    ## sampling
    iris_post = pm.sample( draws=3000, chains=4, cores=4) 
    pm.compute_log_likelihood(iris_post)
    #ra_4pl_predict = pm.sample_posterior_predictive(ra_4pl_post)
/Users/garden/Library/Python/3.9/lib/python/site-packages/pymc/data.py:433: UserWarning: The `mutable` kwarg was not specified. Before v4.1.0 it defaulted to `pm.Data(mutable=True)`, which is equivalent to using `pm.MutableData()`. In v4.1.0 the default changed to `pm.Data(mutable=False)`, equivalent to `pm.ConstantData`. Use `pm.ConstantData`/`pm.MutableData` or pass `pm.Data(..., mutable=False/True)` to avoid this warning.
  warnings.warn(
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [β0, β1, σ]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 5 seconds.
100.00% [16000/16000 00:04<00:00 Sampling 4 chains, 0 divergences]
100.00% [12000/12000 00:00<00:00]
Code
az.summary(iris_post)
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
β0 3.344 0.329 2.739 3.968 0.007 0.005 2393.0 2772.0 1.0
β1[0] 0.487 0.097 0.296 0.657 0.002 0.001 2425.0 2847.0 1.0
β1[1] 0.935 0.119 0.713 1.159 0.002 0.002 2469.0 2887.0 1.0
β1[2] 1.089 0.111 0.877 1.294 0.002 0.002 2421.0 2802.0 1.0
σ 0.444 0.026 0.395 0.492 0.000 0.000 4241.0 3826.0 1.0

Concluding remarks. For the RQ1, the outputs from jags and pymc show no significant differences.

RQ2. Computing DIC.

Firstly, let’s see the data structure of log_likelihood from the pm.compute_log_likelihood() function. It’s a three-way dimensions tensor. The first dim is for (4) chains, the second for (3000) draws, and the third for length of data (150).

Code
iris_post.log_likelihood
<xarray.Dataset>
Dimensions:       (chain: 4, draw: 3000, length_dim_0: 150)
Coordinates:
  * chain         (chain) int64 0 1 2 3
  * draw          (draw) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999
  * length_dim_0  (length_dim_0) int64 0 1 2 3 4 5 6 ... 144 145 146 147 148 149
Data variables:
    length        (chain, draw, length_dim_0) float64 -0.1501 -0.2302 ... -1.553
Attributes:
    created_at:                 2024-03-27T11:15:05.525056
    arviz_version:              0.16.1
    inference_library:          pymc
    inference_library_version:  5.9.0

Secondly, let’s try to compute the expected deviance (D) from this tensor. Due to the output from jags, we know the correct answer will be close to 180.851.

Now, we need to compute the D (-2ll) for each point (there are a total of 150 points in this study

Tips. To sum up the dim we are interested in. In this case, we sum up the dim of length_dim_0 (axis=2). Then we can get 4*3000 draws for each points.

Code
y_ll = iris_post.log_likelihood['length'].sum(axis=2)
y_deviance = -2*y_ll
y_deviance
<xarray.DataArray 'length' (chain: 4, draw: 3000)>
array([[180.13014469, 180.20560338, 181.51113052, ..., 180.46864777,
        182.32817656, 184.06096534],
       [179.44237102, 181.94479594, 187.25324731, ..., 179.46417839,
        183.18168135, 180.72672825],
       [177.94773331, 177.25331248, 179.26326997, ..., 185.02514987,
        181.1540135 , 180.74335949],
       [180.20963194, 177.73703712, 177.10687626, ..., 178.72855249,
        178.97973022, 178.32675483]])
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 ... 2993 2994 2995 2996 2997 2998 2999

Then get the posterior mean of it. It is 180.85.

Code
y_deviance.mean()
<xarray.DataArray 'length' ()>
array(180.83823109)

Thirdly, we need to compute the pD. We konw the pD will be close to 5.1.

Code
y_deviance.var()/2
<xarray.DataArray 'length' ()>
array(5.37651436)

Finally, we can compute the DIC. It will be close to 186.0. There are two kind of mthods to compute it,

  • Using log-likelihood. -2*y_ll.mean() + 2*y_ll.var()
  • Using deviance. y_deviance.mean() + y_deviance.var()/2
Code
DIC = y_deviance.mean() + y_deviance.var()/2
DIC
<xarray.DataArray 'length' ()>
array(186.21474545)

Yes!! Bingo!!

The easy function.

Furthermore, we write a function to output the strings like the jags program.

It will look like,

DIC info (using the rule, pD = var(deviance)/2)
deviance = 180.85, pD = 5.1 and DIC = 186.0
DIC is an estimate of expected predictive error (lower deviance is better).
Code
def get_dic(posterior_tensor, var_names):
    y_ll = posterior_tensor.log_likelihood[var_names].sum(axis=2).to_numpy()
    y_deviance = -2*y_ll.mean()
    y_pd = 2*y_ll.var()
    y_dic = y_deviance + y_pd

    y_print =   'DIC info (using the rule, pD = var(deviance)/2) \n' +\
                'mean deviance = {:.3f}, pD = {:.3f} and DIC = {:.3f} \n'.format(y_deviance, y_pd, y_dic) +\
                'DIC is an estimate of expected predictive error (lower deviance is better).'
            
    return print(y_print)
Code
get_dic(iris_post, var_names='length')
DIC info (using the rule, pD = var(deviance)/2) 
mean deviance = 180.838, pD = 5.377 and DIC = 186.215 
DIC is an estimate of expected predictive error (lower deviance is better).

Citation

BibTeX citation:
@online{tsai2024,
  author = {Tsai, JW},
  title = {Understand {DIC}},
  date = {2024-03-27},
  langid = {en}
}
For attribution, please cite this work as:
Tsai, J. (2024, March 27). Understand DIC.