Code
import numpy as np
import pymc as pm
import arviz as az
import pandas as pd
import matplotlib.pyplot as plt
# loading iris data set.
from sklearn.datasets import load_iris
= load_iris() iris
JW Tsai
March 27, 2024
This mini-study aims to understand how the DIC (deviance information criterion) index works.
The common idea of an information criterion is \(D + 2pD\). The \(D\) (deviance) can also be presented as \(-2\) log-likelihood. Besides, a version of pD
(effective number of parameters) of DIC (as same as JAGS
program) is defined as the variance of log-likelihood.
這個研究就簡單拿 iris
資料集來測試。 我們知道 iris 有 4 個 features: 花萼長度 (Sepal.Length), 花萼寬度 (Sepal.Width), 花瓣長度 (Petal.Length), 花瓣寬度 (Petal.Width)。
我們今天就簡單用 Sepal.Length ~ Sepal.Width
這個模式來看看 DIC 怎麼算 此外,為了增加一點參數,我們再使用 3 個 target
,建立階層線性模式。
因此,模式如下:
\[ \begin{align} \text{Likelihood:} \\ \text{Length} &\sim N(\mu _w, \sigma^2) \\ \mu _w &= \beta _0 + \beta _{1i} \text{Width} \\ \\ \text{Priors:} \\ \beta _0, \beta _{1i} &\sim N(0,5) \\ \sigma &\sim \text{Exp}(1) \end{align} \]
這邊的 \(\beta _{1i}\) 是每一個 level 對應的參數。所以應該會有 3 個。
Let’s see how to run this model in jags
.
Firstly, we call the iris
data set (from R default {datasets}
)
data(iris)
Secondly, we define the data list and model string in the {R2jags}
package. The {R2jags}
allows users to write a jags model just like an R function.
dat_list = list(
sepal_length = iris$Sepal.Length,
sepal_width = iris$Sepal.Width,
species = iris$Species,
n = 150
)
mod_string <- \(){
## priors
beta0 ~ dnorm(0,1/5^2)
sigma ~ dexp(1)
for (j in 1:3){
beta1[j] ~ dnorm(0,1/5^2)
}
## likelihood
for (i in 1:n){
mu_w[i] <- beta0 + beta1[species[i]] * sepal_width[i]
sepal_length[i] ~ dnorm(mu_w[i], 1/sigma^2)
}
}
Finally, we run this model through the jags
function.
fit <- jags(data = dat_list,
parameters.to.save = c('beta0','beta1','sigma'),
model.file = (mod_string)
)
Then, the output of this jags
model is shown as follows:
> print(fit, digits = 3)
Inference for Bugs model at "/var/folders/1f/8r50hwmn6m5dwrngfgysq4p40000gn/T//Rtmpbvfmga/modelab6b34cc72fd.txt", fit using jags,
3 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 3000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
beta0 3.338 0.333 2.680 3.117 3.336 3.559 3.981 1.001 2400
beta1[1] 0.488 0.098 0.298 0.424 0.490 0.552 0.685 1.002 1700
beta1[2] 0.938 0.120 0.700 0.856 0.938 1.019 1.175 1.001 2400
beta1[3] 1.091 0.113 0.870 1.015 1.091 1.168 1.310 1.002 1900
sigma 0.444 0.026 0.397 0.426 0.444 0.461 0.496 1.002 1600
deviance 180.851 3.199 176.629 178.531 180.100 182.442 188.810 1.002 1400
For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)
pD = 5.1 and DIC = 186.0
DIC is an estimate of expected predictive error (lower deviance is better).
Now, we use the sync
to replicate these results. We are interested in two things,
beta0
, beta1
, and sigma
.deviance
, pD
, and DIC
.sepal length (cm) | sepal width (cm) | petal length (cm) | petal width (cm) | |
---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 |
1 | 4.9 | 3.0 | 1.4 | 0.2 |
2 | 4.7 | 3.2 | 1.3 | 0.2 |
3 | 4.6 | 3.1 | 1.5 | 0.2 |
4 | 5.0 | 3.6 | 1.4 | 0.2 |
... | ... | ... | ... | ... |
145 | 6.7 | 3.0 | 5.2 | 2.3 |
146 | 6.3 | 2.5 | 5.0 | 1.9 |
147 | 6.5 | 3.0 | 5.2 | 2.0 |
148 | 6.2 | 3.4 | 5.4 | 2.3 |
149 | 5.9 | 3.0 | 5.1 | 1.8 |
150 rows × 4 columns
{'target': array(['setosa', 'versicolor', 'virginica'], dtype='<U10'),
'target_index': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])}
jags
and pymc
with pm.Model(coords=dict) as iris_model:
## data
sepal_length = pm.Data('sepal_length', iris_data['sepal length (cm)'])
sepal_width = pm.Data('sepal_width', iris_data['sepal width (cm)'])
## priors
beta0 = pm.Normal('β0', 0,5)
beta1 = pm.Normal('β1', 0,5, shape=3)
sigma = pm.Exponential('σ',1)
## likelihood
mu_w = beta0 + beta1[target_index] * sepal_width
Length = pm.Normal('length', mu_w, sigma, observed=sepal_length)
## sampling
iris_post = pm.sample( draws=3000, chains=4, cores=4)
pm.compute_log_likelihood(iris_post)
#ra_4pl_predict = pm.sample_posterior_predictive(ra_4pl_post)
/Users/garden/Library/Python/3.9/lib/python/site-packages/pymc/data.py:433: UserWarning: The `mutable` kwarg was not specified. Before v4.1.0 it defaulted to `pm.Data(mutable=True)`, which is equivalent to using `pm.MutableData()`. In v4.1.0 the default changed to `pm.Data(mutable=False)`, equivalent to `pm.ConstantData`. Use `pm.ConstantData`/`pm.MutableData` or pass `pm.Data(..., mutable=False/True)` to avoid this warning.
warnings.warn(
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [β0, β1, σ]
Sampling 4 chains for 1_000 tune and 3_000 draw iterations (4_000 + 12_000 draws total) took 5 seconds.
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
β0 | 3.344 | 0.329 | 2.739 | 3.968 | 0.007 | 0.005 | 2393.0 | 2772.0 | 1.0 |
β1[0] | 0.487 | 0.097 | 0.296 | 0.657 | 0.002 | 0.001 | 2425.0 | 2847.0 | 1.0 |
β1[1] | 0.935 | 0.119 | 0.713 | 1.159 | 0.002 | 0.002 | 2469.0 | 2887.0 | 1.0 |
β1[2] | 1.089 | 0.111 | 0.877 | 1.294 | 0.002 | 0.002 | 2421.0 | 2802.0 | 1.0 |
σ | 0.444 | 0.026 | 0.395 | 0.492 | 0.000 | 0.000 | 4241.0 | 3826.0 | 1.0 |
Concluding remarks. For the RQ1, the outputs from jags
and pymc
show no significant differences.
Firstly, let’s see the data structure of log_likelihood
from the pm.compute_log_likelihood()
function. It’s a three-way dimensions tensor. The first dim is for (4) chains, the second for (3000) draws, and the third for length of data (150).
<xarray.Dataset> Dimensions: (chain: 4, draw: 3000, length_dim_0: 150) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 ... 2994 2995 2996 2997 2998 2999 * length_dim_0 (length_dim_0) int64 0 1 2 3 4 5 6 ... 144 145 146 147 148 149 Data variables: length (chain, draw, length_dim_0) float64 -0.1501 -0.2302 ... -1.553 Attributes: created_at: 2024-03-27T11:15:05.525056 arviz_version: 0.16.1 inference_library: pymc inference_library_version: 5.9.0
array([0, 1, 2, 3])
array([ 0, 1, 2, ..., 2997, 2998, 2999])
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149])
array([[[-0.15007587, -0.23018022, -0.17186102, ..., -0.15036302, -2.03497421, -1.25868256], [-0.14763492, -0.18836443, -0.24012757, ..., -0.15958522, -1.93430897, -1.20251279], [-0.13546664, -0.24928963, -0.11306264, ..., -0.12683694, -2.32605535, -1.40839025], ..., [-0.14045847, -0.13434047, -0.13613556, ..., -0.14624427, -2.03811634, -1.58747839], [-0.15619394, -0.1480345 , -0.15840489, ..., -0.16924811, -1.99564914, -1.57251438], [-0.06604883, -0.06365481, -0.3631115 , ..., -0.06431746, -1.54910513, -1.23877687]], [[-0.2091668 , -0.22398206, -0.26267452, ..., -0.21925268, -1.68602165, -1.25524037], [-0.0207 , -0.00833144, -0.18972243, ..., -0.03910661, -1.76095645, -1.53025309], [-0.07599954, -0.0308677 , -0.08252084, ..., -0.02220397, -1.67739141, -1.4524553 ], ... [-0.21175487, -0.3262845 , -0.14786837, ..., -0.20980681, -2.44356451, -1.53501204], [-0.10045583, -0.14453996, -0.20253765, ..., -0.12437101, -2.18738956, -1.34751415], [-0.06133903, -0.05676096, -0.27590568, ..., -0.06310933, -1.92064681, -1.27690886]], [[-0.12362904, -0.17491691, -0.13462518, ..., -0.09471016, -1.73004549, -1.14761551], [-0.04212361, -0.05487806, -0.19423224, ..., -0.10022129, -2.30313771, -1.63956094], [-0.04604174, -0.07543161, -0.13592791, ..., -0.06340857, -2.12692774, -1.46189584], ..., [-0.10221615, -0.10216825, -0.2916557 , ..., -0.12732807, -1.68477784, -1.35505866], [-0.08520427, -0.07523728, -0.19593031, ..., -0.12138973, -1.90040781, -1.57051397], [-0.07718419, -0.07102337, -0.22110662, ..., -0.11846029, -1.87751136, -1.5528609 ]]])
PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))
PandasIndex(Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 2990, 2991, 2992, 2993, 2994, 2995, 2996, 2997, 2998, 2999], dtype='int64', name='draw', length=3000))
PandasIndex(Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 140, 141, 142, 143, 144, 145, 146, 147, 148, 149], dtype='int64', name='length_dim_0', length=150))
Secondly, let’s try to compute the expected deviance (D
) from this tensor. Due to the output from jags
, we know the correct answer will be close to 180.851.
Now, we need to compute the D
(-2ll) for each point (there are a total of 150 points in this study
Tips. To sum up the dim we are interested in. In this case, we sum up the dim of length_dim_0
(axis=2
). Then we can get 4*3000 draws for each points.
<xarray.DataArray 'length' (chain: 4, draw: 3000)> array([[180.13014469, 180.20560338, 181.51113052, ..., 180.46864777, 182.32817656, 184.06096534], [179.44237102, 181.94479594, 187.25324731, ..., 179.46417839, 183.18168135, 180.72672825], [177.94773331, 177.25331248, 179.26326997, ..., 185.02514987, 181.1540135 , 180.74335949], [180.20963194, 177.73703712, 177.10687626, ..., 178.72855249, 178.97973022, 178.32675483]]) Coordinates: * chain (chain) int64 0 1 2 3 * draw (draw) int64 0 1 2 3 4 5 6 7 ... 2993 2994 2995 2996 2997 2998 2999
array([[180.13014469, 180.20560338, 181.51113052, ..., 180.46864777, 182.32817656, 184.06096534], [179.44237102, 181.94479594, 187.25324731, ..., 179.46417839, 183.18168135, 180.72672825], [177.94773331, 177.25331248, 179.26326997, ..., 185.02514987, 181.1540135 , 180.74335949], [180.20963194, 177.73703712, 177.10687626, ..., 178.72855249, 178.97973022, 178.32675483]])
array([0, 1, 2, 3])
array([ 0, 1, 2, ..., 2997, 2998, 2999])
PandasIndex(Index([0, 1, 2, 3], dtype='int64', name='chain'))
PandasIndex(Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 2990, 2991, 2992, 2993, 2994, 2995, 2996, 2997, 2998, 2999], dtype='int64', name='draw', length=3000))
Then get the posterior mean of it. It is 180.85.
<xarray.DataArray 'length' ()> array(180.83823109)
array(180.83823109)
Thirdly, we need to compute the pD
. We konw the pD
will be close to 5.1.
<xarray.DataArray 'length' ()> array(5.37651436)
array(5.37651436)
Finally, we can compute the DIC
. It will be close to 186.0. There are two kind of mthods to compute it,
log-likelihood
. -2*y_ll.mean() + 2*y_ll.var()
deviance
. y_deviance.mean() + y_deviance.var()/2
<xarray.DataArray 'length' ()> array(186.21474545)
array(186.21474545)
Yes!! Bingo!!
Furthermore, we write a function to output the strings like the jags
program.
It will look like,
DIC info (using the rule, pD = var(deviance)/2)
deviance = 180.85, pD = 5.1 and DIC = 186.0
DIC is an estimate of expected predictive error (lower deviance is better).
def get_dic(posterior_tensor, var_names):
y_ll = posterior_tensor.log_likelihood[var_names].sum(axis=2).to_numpy()
y_deviance = -2*y_ll.mean()
y_pd = 2*y_ll.var()
y_dic = y_deviance + y_pd
y_print = 'DIC info (using the rule, pD = var(deviance)/2) \n' +\
'mean deviance = {:.3f}, pD = {:.3f} and DIC = {:.3f} \n'.format(y_deviance, y_pd, y_dic) +\
'DIC is an estimate of expected predictive error (lower deviance is better).'
return print(y_print)
@online{tsai2024,
author = {Tsai, JW},
title = {Understand {DIC}},
date = {2024-03-27},
langid = {en}
}