Skip to contents

Introduction

At the heart of the pgdcm package lies a single, unified Bayesian network model defined in the file DiBelloBN.R. This one piece of NIMBLE code is highly flexible: depending on how you configure your graph (i.e., which node types and compute rules you assign), it can express a standard Diagnostic Classification Model (DCM), an Item Response Theory (IRT) model, a Multidimensional IRT (MIRT) model, or a Higher-Order DCM (HO-DCM).

This vignette is written for readers who want to understand exactly what the model is doing mathematically. We will walk through:

  1. The general equation that governs every node in the network.
  2. The three condensation rules (DINA, DINO, DINM) that determine how parent information is combined.
  3. How the model specializes into IRT, MIRT, DCM, and HO-DCM depending on the graph topology.
  4. The semantic meaning of every estimated parameter.

1. The Big Picture: Everything is a Bayesian Network

A pgdcm model is a Directed Acyclic Graph (DAG) - a network of nodes connected by directional arrows, with no cycles. There are two kinds of nodes:

  • Attribute nodes (latent/unobserved): These represent skills, abilities, or competencies that we cannot directly see. For example, “Addition” or “Reading Comprehension.” We want to estimate whether each student has mastered these.
  • Task nodes (observed): These represent actual test items (questions). We can see whether each student answered correctly (1) or incorrectly (0).

The arrows (edges) in the graph encode dependency: if there is an arrow from Attribute AA to Task TT, it means that mastery of AA influences the probability of answering TT correctly. If there is an arrow from Attribute A1A_1 to Attribute A2A_2, it means that A1A_1 is a prerequisite for A2A_2.

The question that every model must answer is: Given the pattern of correct/incorrect responses we observe, what is the most likely configuration of latent skills for each student?

Topological Ordering

Because the graph is acyclic, we can always arrange nodes in a topological order - an ordering where every parent appears before its children. The pgdcm package enforces this automatically via enforce_topo_sort(). This ordering is critical because it means we can process nodes from left to right: when we reach any node, all of its parents have already been defined.

The nodes are ordered as:

α1,α2,,αKAttribute nodes,X1,X2,,XJTask nodes \underbrace{\alpha_1, \alpha_2, \ldots, \alpha_K}_{\text{Attribute nodes}} \;,\; \underbrace{X_1, X_2, \ldots, X_J}_{\text{Task nodes}}

where KK is the number of attributes and JJ is the number of items.


2. The Universal Equation: Logistic Regression at Every Node

Every node in the network (whether an attribute or a task) follows the same fundamental equation. The probability that node vv “fires” (equals 1) for student ii is:

P(vi=1)=logit1(avslopeψv(parents of v)condensation rulebvintercept) P(v_i = 1) = \text{logit}^{-1}\!\Big(\underbrace{a_v}_{\text{slope}} \cdot \underbrace{\psi_v(\text{parents of } v)}_{\text{condensation rule}} - \underbrace{b_v}_{\text{intercept}}\Big)

Let us unpack each piece:

2.1 The Logistic (Inverse-Logit) Function

The logistic function logit1(x)=11+ex\text{logit}^{-1}(x) = \frac{1}{1 + e^{-x}} transforms any real number into a probability between 0 and 1. It is the standard “link” function in logistic regression:

Input xx logit1(x)\text{logit}^{-1}(x)
5-5 0.007\approx 0.007
2-2 0.12\approx 0.12
00 0.500.50
+2+2 0.88\approx 0.88
+5+5 0.993\approx 0.993

When the input is large and positive, the probability approaches 1. When it is large and negative, the probability approaches 0. At x=0x = 0, the probability is exactly 0.500.50.

2.2 The Slope (ava_v): Discrimination

The slope (also called the discrimination parameter) controls how sharply the probability transitions from low to high as the condensed input changes. Think of it as the “sensitivity” of the node:

  • A large slope (e.g., av=5a_v = 5) means the node is very good at discriminating between students who have the prerequisite skills and those who do not. The probability jumps rapidly from near 0 to near 1.
  • A small slope (e.g., av=0.5a_v = 0.5) means the node is a weak discriminator - even students with all the right skills only have a moderately higher probability of success.

In the pgdcm code, item slopes are stored in lambda[j, 1] and attribute transition slopes are stored in theta[k, 1]. They are constrained to be positive via a truncated normal prior TN(μ,σ,0,)\text{TN}(\mu, \sigma, 0, \infty), ensuring that more skill always means higher probability (never inverted).

2.3 The Intercept (bvb_v): Difficulty / Base-Rate

The intercept controls the baseline difficulty of the node - how hard it is to “pass” when the condensation rule (ψv\psi_v) outputs exactly zero.

Specifically:

  • When ψv=0\psi_v = 0, the probability resolves to logit1(bv)\text{logit}^{-1}(-b_v).
  • A large positive bvb_v means the node is difficult: the baseline probability of success is low (it is hard to guess, or hard for an average student).
  • A large negative bvb_v means the node is easy: the baseline probability of success is high.
  • At bv=0b_v = 0, the baseline probability is exactly 0.50.

In the code, item intercepts are lambda[j, 2] and attribute intercepts are theta[k, 2].

Why the Minus Sign?

Notice the equation uses avψvbva_v \cdot \psi_v - b_v and not avψv+bva_v \cdot \psi_v + b_v. The minus sign is a psychometric convention that makes bvb_v directly interpretable as difficulty: higher bvb_v = harder item. If we used a plus sign, a positive bvb_v would make the item easier, which is less intuitive. You will see this convention in most IRT textbooks.

2.4 The Condensation Rule (ψv\psi_v): Combining Parent Information

Each node has one or more parent nodes (the nodes with arrows pointing into it). The condensation rule ψv\psi_v takes all the parent values and collapses them into a single number that the logistic regression uses as input.

The condensation rule is computed by the calc_mixed_kernel() function. There are three options:

Rule Name Compute Value Cognitive Interpretation
DINA Deterministic Input, Noisy “And” "dina" All-or-nothing: Student must have every required skill
DINO Deterministic Input, Noisy “Or” "dino" Any-one-sufficient: Having at least one skill is enough
DINM Deterministic Input, Noisy “Mixed” "dinm" Proportional: Each additional skill incrementally helps

We detail each rule in the next section.


3. The Three Condensation Rules (Pure Discrete Case)

To build intuition, let us first consider the pure discrete case - where all parent attributes are binary (0 or 1). We will extend to continuous parents later.

Suppose item jj requires skills α1\alpha_1 and α2\alpha_2 (indicated by 1s in the corresponding Q-matrix row).

3.1 DINA: The Conjunctive (“And”) Gate

The DINA rule asks: Does the student possess ALL required skills?

ψjDINA=kparents(j)αikqjk={1if the student has mastered every required skill0otherwise \psi_j^{\text{DINA}} = \prod_{k \in \text{parents}(j)} \alpha_{ik}^{q_{jk}} = \begin{cases} 1 & \text{if the student has mastered every required skill} \\ 0 & \text{otherwise} \end{cases}

where qjkq_{jk} is the Q-matrix entry (1 if skill kk is required for item jj, 0 otherwise) and αik\alpha_{ik} is student ii’s mastery state (0 or 1) on skill kk.

Example: Item 5 requires skills AA and BB.

Student αA\alpha_A αB\alpha_B ψDINA\psi^{\text{DINA}} Interpretation
Alice 1 1 1 Has both skills → high chance of success
Bob 1 0 0 Missing skill B → treated exactly like having no skills
Carol 0 0 0 Missing both → low chance of success

The DINA rule is strict: mastering 3 out of 4 required skills is treated identically to mastering 0 out of 4. There is no partial credit for partial mastery.

In the code, this is computed as:

gate <- (sum_disc == req_disc)   # Are all required discrete skills present?
val_dina <- gate                 # 1 if yes, 0 if no

With the full logistic equation, the probabilities become:

  • All skills present (ψ=1\psi = 1): P=logit1(a1b)=logit1(ab)P = \text{logit}^{-1}(a \cdot 1 - b) = \text{logit}^{-1}(a - b)
  • Any skill missing (ψ=0\psi = 0): P=logit1(a0b)=logit1(b)P = \text{logit}^{-1}(a \cdot 0 - b) = \text{logit}^{-1}(-b)

The gap between these two probabilities is controlled by the slope aa. A large slope creates a wide gap, meaning the item can clearly distinguish masters from non-masters.

3.2 DINO: The Disjunctive (“Or”) Gate

The DINO rule asks: Does the student possess AT LEAST ONE required skill?

ψjDINO=1kparents(j)(1αik)qjk={1if the student has mastered at least one required skill0if the student has mastered none of the required skills \psi_j^{\text{DINO}} = 1 - \prod_{k \in \text{parents}(j)} (1 - \alpha_{ik})^{q_{jk}} = \begin{cases} 1 & \text{if the student has mastered at least one required skill} \\ 0 & \text{if the student has mastered none of the required skills} \end{cases}

Example: Same item requiring skills AA and BB:

Student αA\alpha_A αB\alpha_B ψDINO\psi^{\text{DINO}} Interpretation
Alice 1 1 1 Has both → high chance
Bob 1 0 1 Has one → still high chance
Carol 0 0 0 Has none → low chance

DINO is lenient: having just one of the required skills is as good as having all of them. This models scenarios where multiple skills each independently provide a path to the correct answer.

In the code:

dino_gate <- 0.0
if (sum_disc > 0.0) {
    dino_gate <- 1.0        # At least one skill is present
}
val_dino <- dino_gate

3.3 DINM: The Compensatory (Proportional) Rule

The DINM rule asks: What fraction of the required skills does the student possess?

ψjDINM=kqjkαikkqjk \psi_j^{\text{DINM}} = \frac{\sum_{k} q_{jk} \cdot \alpha_{ik}}{\sum_{k} q_{jk}}

This produces a value between 0 and 1, representing the proportion of required skills mastered.

Example: Same item requiring skills AA and BB:

Student αA\alpha_A αB\alpha_B ψDINM\psi^{\text{DINM}} Interpretation
Alice 1 1 2/2=1.02/2 = 1.0 Has all skills → highest probability
Bob 1 0 1/2=0.51/2 = 0.5 Has half → moderate probability
Carol 0 0 0/2=0.00/2 = 0.0 Has none → lowest probability

DINM provides partial credit: each additional skill smoothly increases the probability of success. This is the most “forgiving” model in terms of rewarding partial knowledge.

In the code:

val_dinm <- sum_total / max(1, sum_input)

4. Root Attributes: Base-Rates and Latent Abilities

Every DAG has root nodes - nodes with no incoming arrows (no parents). In a pgdcm model, these are the “foundational” attributes. How they are modeled depends on an important configuration flag: isContinuousHO.

4.1 Standard Discrete Roots (isContinuousHO = 0)

When root attributes are discrete, each one is treated as an independent Bernoulli variable with its own base-rate probability. The model estimates a single parameter βm\beta_m for each root attribute mm:

P(αim=1)=logit1(βm) P(\alpha_{im} = 1) = \text{logit}^{-1}(-\beta_m)

Since this is a root node (no parents, no condensation rule), the probability depends only on the intercept βm\beta_m. This parameter represents the population-level difficulty of mastering this skill:

  • βm=0\beta_m = 0: the skill is mastered by about 50% of the population.
  • βm>0\beta_m > 0 (positive): the skill is mastered by fewer than 50% - it is a difficult skill.
  • βm<0\beta_m < 0 (negative): the skill is mastered by more than 50% - it is an easy/common skill.

In the code, this parameter is beta_root[m], with a normal prior:

beta_root[m] ~ dnorm(mean = beta_prior_mean[m, 1], sd = beta_prior_std[m, 1])

4.2 Continuous Roots (isContinuousHO = 1)

When root attributes are continuous, they represent latent abilities rather than binary mastery states. Each student’s ability on dimension mm is drawn from a standard normal distribution:

αim𝒩(0,1) \alpha_{im} \sim \mathcal{N}(0, 1)

This is the standard identification constraint used in IRT: fixing the mean to 0 and the variance to 1 ensures the model is identifiable (otherwise, we could shift and scale the ability axis without changing any observable quantity). In this case, no beta_root parameter is estimated - the ability values themselves are directly estimated for each student.

Importantly, the rest of the model doesn’t change. When these continuous values flow into child nodes through the condensation rules, the logistic equation naturally adapts.


5. Case Studies: How the Graph Topology Creates Different Models

Now we show how simply changing the graph structure-without altering any model code-creates fundamentally different psychometric models.

5.1 Traditional DCM (Flat Q-Matrix)

Graph structure:

All attributes are roots (no arrows between attributes), and they point directly to the items. This is the classic Q-matrix structure.

Configuration: All attributes have compute = "dina" and isContinuousHO = 0.

What happens mathematically:

  1. Root attributes: Each skill αk\alpha_k is an independent Bernoulli variable: P(αik=1)=logit1(βk)for k=1,,KP(\alpha_{ik} = 1) = \text{logit}^{-1}(-\beta_k) \quad \text{for } k = 1, \ldots, K

  2. Items: There are no dependent attributes (no hierarchy). Each item uses the condensation rule directly: P(Xij=1𝛂i)=logit1(λj1ψj(𝛂i)λj2)P(X_{ij} = 1 \mid \boldsymbol{\alpha}_i) = \text{logit}^{-1}\!\left(\lambda_{j1} \cdot \psi_j(\boldsymbol{\alpha}_i) - \lambda_{j2}\right)

The specific form of ψj\psi_j depends on which compute rule the item is assigned. For a flat DINA Q-matrix with items requiring skills {A1,A2}\{A_1, A_2\}:

ψjDINA=αi,1αi,2 \psi_j^{\text{DINA}} = \alpha_{i,1} \cdot \alpha_{i,2}

Estimated parameters:

Parameter Symbol Count Meaning
beta_root[m] βm\beta_m KK Population mastery rate of each root skill
lambda[j, 1] λj1\lambda_{j1} JJ Item discrimination (how separable the masters vs. non-masters are)
lambda[j, 2] λj2\lambda_{j2} JJ Item difficulty (baseline challenge level)
attributenodes[i, k] αik\alpha_{ik} N×KN \times K Each student’s binary mastery state on each skill

Note that theta parameters are absent in this flat case. The theta parameters only exist for dependent (non-root) attributes - nodes that have parent attributes feeding into them. Since all attributes here are independent roots, there are no attribute-to-attribute transitions to parameterize.

5.2 Hierarchical DCM / Bayesian Net DCM (Discrete Attribute Hierarchy)

Graph structure:

Some attributes depend on other attributes. For example, “Multiplication” might require “Addition” as a prerequisite. Here, A1A_1 is a root (no parents), while A2A_2 and A3A_3 are dependent attributes that receive input from A1A_1.

Configuration: All attributes have compute = "dina" and isContinuousHO = 0, but the graph contains edges between attributes.

What happens mathematically:

  1. Root attribute: A1A_1 is modeled exactly as before: P(αi1=1)=logit1(β1)P(\alpha_{i1} = 1) = \text{logit}^{-1}(-\beta_1)

  2. Dependent attributes: Each non-root attribute AkA_k gets its own theta parameters. The condensation rule ψk\psi_k summarizes the parent skill values, and then a logistic regression determines mastery: P(αik=1parents)=logit1(θk,1ψkθk,2)P(\alpha_{ik} = 1 \mid \text{parents}) = \text{logit}^{-1}(\theta_{k,1} \cdot \psi_k - \theta_{k,2})

    Here:

    • θk,1\theta_{k,1} (theta[k, 1]) is the transition slope - how strongly mastery of the parent skills influences whether the student masters skill kk. A large value means that parent mastery is highly predictive of child mastery.
    • θk,2\theta_{k,2} (theta[k, 2]) is the transition intercept - the baseline difficulty of mastering skill kk, independent of parent skills. A large positive value means this skill is hard to acquire even when all parent skills are mastered.
  3. Items: Same as the flat case - items see the binary attribute values and apply the condensation rule + lambda parameters.

Estimated parameters:

Parameter Symbol Count Meaning
beta_root[m] βm\beta_m KrootK_{\text{root}} Population mastery rate of each root skill
theta[k, 1] θk,1\theta_{k,1} KKrootK - K_{\text{root}} How strongly parent skills predict mastery of dependent skill kk
theta[k, 2] θk,2\theta_{k,2} KKrootK - K_{\text{root}} Baseline difficulty of acquiring dependent skill kk
lambda[j, 1] λj1\lambda_{j1} JJ Item discrimination
lambda[j, 2] λj2\lambda_{j2} JJ Item difficulty
attributenodes[i, k] αik\alpha_{ik} N×KN \times K Each student’s binary mastery state on each skill

This is the case where theta parameters first appear - whenever attributes have parent attributes, the model needs parameters to describe those transitions.

5.3 Unidimensional IRT (Single Continuous Root)

Graph structure:

A single latent ability η\eta directly influences all items. There are no discrete skills.

Configuration: Single attribute with compute = "zscore" and thus isContinuousHO = 1.

What happens mathematically:

  1. Root attribute (ability): For each student ii: ηi𝒩(0,1)\eta_i \sim \mathcal{N}(0, 1)

  2. Items: Each item receives ηi\eta_i as its only input through the condensation rule. Since there is one continuous parent and no discrete parents, the DINA condensation rule simplifies to: ψj=ηi\psi_j = \eta_i (The “gate” is open because there are no discrete skills to check.)

    The item response probability becomes: P(Xij=1ηi)=logit1(λj1ηiλj2)P(X_{ij} = 1 \mid \eta_i) = \text{logit}^{-1}(\lambda_{j1} \cdot \eta_i - \lambda_{j2})

This is exactly the Two-Parameter Logistic (2PL) IRT model:

P(Xij=1ηi)=11+exp((ajηibj)) P(X_{ij} = 1 \mid \eta_i) = \frac{1}{1 + \exp\!\big(-\big(a_j \cdot \eta_i - b_j\big)\big)}

where aj=λj1a_j = \lambda_{j1} is the item discrimination and bj=λj2b_j = \lambda_{j2} is the item difficulty.

Estimated parameters:

Parameter Symbol Count Meaning
lambda[j, 1] aja_j JJ Item discrimination - how steeply the item’s probability curve rises with ability
lambda[j, 2] bjb_j JJ Item difficulty - the ability level at which a student has a 50% chance of success (when aj=1a_j = 1)
attributenodes[i, 1] ηi\eta_i NN Each student’s latent ability (a continuous real number, centered at 0)

Interpreting the Difficulty Parameter bjb_j in IRT

In the 2PL parameterization P=logit1(a(ηb))P = \text{logit}^{-1}(a(\eta - b)), the difficulty bjb_j is the point on the ability scale where a student has exactly 50% probability of success. The pgdcm parameterization uses aηba\eta - b rather than a(ηb)a(\eta - b), so the “50% point” occurs at η=b/a\eta = b/a rather than exactly η=b\eta = b. This is a common alternative parameterization found in many Bayesian IRT implementations.

No beta_root or theta parameters are estimated in IRT mode - there are no discrete roots (so no beta_root) and no dependent attributes (so no theta).

5.4 Multidimensional IRT (MIRT)

Graph structure:

Multiple continuous latent abilities, each influencing a subset of items.

Configuration: Multiple attributes with compute = "zscore", giving isContinuousHO = 1 and nrbetaroot > 1.

What happens mathematically:

  1. Root attributes (abilities): For each student ii and dimension mm: ηim𝒩(0,1)\eta_{im} \sim \mathcal{N}(0, 1)

  2. Items: An item may depend on one or more ability dimensions. The condensation rule sums the contributions from all relevant continuous parents. For example, if item jj depends on both η1\eta_1 and η2\eta_2: ψj=qj1ηi1+qj2ηi2\psi_j = q_{j1} \cdot \eta_{i1} + q_{j2} \cdot \eta_{i2}

    So: P(Xij=1𝛈i)=logit1(λj1(mqjmηim)λj2)P(X_{ij} = 1 \mid \boldsymbol{\eta}_i) = \text{logit}^{-1}\!\left(\lambda_{j1} \cdot \left(\sum_{m} q_{jm} \, \eta_{im}\right) - \lambda_{j2}\right)

This produces a compensatory MIRT model where multiple dimensions contribute additively. Item discrimination λj1\lambda_{j1} scales the total ability input, while individual dimension loadings are encoded in the Q-matrix structure.

Estimated parameters:

Parameter Symbol Count Meaning
lambda[j, 1] aja_j JJ Overall item discrimination (how strongly the item discriminates on the combined ability)
lambda[j, 2] bjb_j JJ Item difficulty
attributenodes[i, m] ηim\eta_{im} N×MN \times M Each student’s ability on each continuous dimension

5.5 Higher-Order DCM (HO-DCM)

Graph structure:

A single continuous general ability η\eta feeds into discrete binary skills, which in turn determine item responses.

Configuration: Root attribute has compute = "zscore" (continuous), child attributes have compute = "dina" (discrete). This gives isContinuousHO = 1.

What happens mathematically:

This model type demonstrates the full mixed continuous-discrete capability of DiBelloBN. The network has three layers:

Layer 1 - General Ability (Root): ηi𝒩(0,1)\eta_i \sim \mathcal{N}(0, 1)

Layer 2 - Discrete Skills (Dependent Attributes):

Each dependent attribute AkA_k depends on the general ability η\eta through a logistic regression. The condensation rule passes ηi\eta_i straight through:

ψk=ηi\psi_k = \eta_i

The probability that student ii has mastered skill kk:

P(αik=1ηi)=logit1(θk1ηiθk2)P(\alpha_{ik} = 1 \mid \eta_i) = \text{logit}^{-1}(\theta_{k1} \cdot \eta_i - \theta_{k2})

where:

  • θk1\theta_{k1} (stored as theta[k, 1]) is the transition slope for skill kk. It controls how sensitive skill mastery is to changes in general ability η\eta. A large value means that small differences in general ability produce large differences in the probability of mastering this skill. A small value means the skill is only weakly related to general ability.
  • θk2\theta_{k2} (stored as theta[k, 2]) is the transition intercept (threshold) for skill kk. It determines the baseline difficulty of acquiring the skill. A student with general ability ηi\eta_i has a 50% chance of mastering skill kk when θk1ηi=θk2\theta_{k1} \cdot \eta_i = \theta_{k2}, i.e., at the ability level ηi=θk2/θk1\eta_i = \theta_{k2} / \theta_{k1}. A larger θk2\theta_{k2} means the student needs higher general ability to have a reasonable chance of mastering the skill.

Layer 3 - Items (Observed):

Item responses depend on the discrete skill states αik\alpha_{ik}. Since the roots are continuous but items may only see the discrete child attributes, the condensation rule at this level operates on purely discrete inputs (the binary skill values):

P(Xij=1𝛂i)=logit1(λj1ψjDINA(𝛂i)λj2)P(X_{ij} = 1 \mid \boldsymbol{\alpha}_i) = \text{logit}^{-1}(\lambda_{j1} \cdot \psi_j^{\text{DINA}}(\boldsymbol{\alpha}_i) - \lambda_{j2})

The nrbetaroot trick in the code (isContinuousHO * nrbetaroot) ensures that at the item level, the model correctly treats the skill values as discrete, even though η\eta was originally continuous. Only the roots themselves “know” they are continuous; the items only see the binary αk\alpha_k values that resulted from thresholding η\eta through the skill-level logistic regressions.

Estimated parameters:

Parameter Symbol Count Meaning
attributenodes[i, 1] ηi\eta_i NN General latent ability for each student
theta[k, 1] θk1\theta_{k1} K1K-1 Transition slope: sensitivity of skill mastery to general ability
theta[k, 2] θk2\theta_{k2} K1K-1 Transition intercept: difficulty threshold for acquiring the skill
lambda[j, 1] λj1\lambda_{j1} JJ Item discrimination
lambda[j, 2] λj2\lambda_{j2} JJ Item difficulty
attributenodes[i, k] αik\alpha_{ik} N×(K1)N \times (K-1) Binary mastery states for each specific skill

Why Higher-Order Models?

In a flat DCM, skills are estimated independently - the model does not know or care whether a student who masters “Addition” is also likely to master “Subtraction.” A higher-order model introduces explicit statistical dependence: skills that load heavily on general ability will be correlated in the posterior. This produces more realistic skill profiles and can improve estimation accuracy when skills are genuinely related.


6. The Gated Condensation Rule: Handling Mixed Parents

What happens when a node has both continuous and discrete parents? This arises naturally in a Bayesian network where a continuous root (η\eta) and discrete attributes (A1,A2A_1, A_2) all point to the same child node.

The calc_mixed_kernel() function handles this with a gating mechanism:

  1. Separate the parents into continuous parents (indices \leqnrbetaroot) and discrete parents (indices >>nrbetaroot).
  2. Check the gate (specific to DINA or DINO).
  3. If the gate is open, pass the continuous signal through.
  4. If the gate is closed, apply a severe penalty (10-10), which drives the logistic probability to near zero.

Gated DINA (Mixed Non-Compensatory)

ψGated-DINA={mMrootqjmηimsum of continuous inputsif all required discrete skills are mastered10otherwise (gate closed) \psi^{\text{Gated-DINA}} = \begin{cases} \underbrace{\sum_{m \leq M_{\text{root}}} q_{jm} \cdot \eta_{im}}_{\text{sum of continuous inputs}} & \text{if all required discrete skills are mastered} \\[8pt] -10 & \text{otherwise (gate closed)} \end{cases}

Semantic meaning: The student must first pass the discrete skills “gatekeeper.” Only if they possess every required binary skill does their continuous ability actually count. Otherwise, no amount of general ability can help - the gate is shut.

Gated DINO (Mixed Disjunctive)

ψGated-DINO={mMrootqjmηimif at least one required discrete skill is mastered10otherwise (gate closed) \psi^{\text{Gated-DINO}} = \begin{cases} \sum_{m \leq M_{\text{root}}} q_{jm} \cdot \eta_{im} & \text{if at least one required discrete skill is mastered} \\[8pt] -10 & \text{otherwise (gate closed)} \end{cases}

Semantic meaning: The student needs any one of the required binary skills to unlock the gate. Once any skill is present, the continuous ability flows through.

DINM with Mixed Parents

The DINM rule does not use gating at all. It simply takes the weighted average across all parents (continuous and discrete alike):

ψDINM=kqjkparentkkqjk \psi^{\text{DINM}} = \frac{\sum_k q_{jk} \cdot \text{parent}_k}{\sum_k q_{jk}}

This naturally handles the mix: continuous values contribute proportionally alongside discrete (0/1) values.

Why 10-10 as the Closed-Gate Penalty?

The value 10-10 is not arbitrary - it is chosen so that logit1(a(10)b)\text{logit}^{-1}(a \cdot (-10) - b) is vanishingly small for any reasonable slope a>0a > 0. For example, with a=1a = 1 and b=0b = 0: logit1(10)0.0000454\text{logit}^{-1}(-10) \approx 0.0000454. This effectively forces the probability to zero without requiring special-case logic in the NIMBLE model code.


7. The Complete Prior Structure

Every unknown parameter in the model is given a prior distribution - a mathematical statement of our beliefs before seeing any data. The priors in DiBelloBN are:

Root Attribute Intercepts (Discrete Mode)

βm𝒩(μβm,σβm)for m=1,,Kroot \beta_m \sim \mathcal{N}(\mu_{\beta_m}, \sigma_{\beta_m}) \quad \text{for } m = 1, \ldots, K_{\text{root}}

Default: μ=0\mu = 0, σ=2\sigma = 2. This is a relatively uninformative prior centered at 50% mastery rate, allowing the data to speak.

Attribute Transition Parameters (Dependent Attributes)

θk,1TN(μθk,1,σθk,1,0,)(slope, positive only) \theta_{k,1} \sim \text{TN}(\mu_{\theta_{k,1}}, \sigma_{\theta_{k,1}}, 0, \infty) \quad \text{(slope, positive only)} θk,2𝒩(μθk,2,σθk,2)(intercept) \theta_{k,2} \sim \mathcal{N}(\mu_{\theta_{k,2}}, \sigma_{\theta_{k,2}}) \quad \text{(intercept)}

The truncated normal TN(,,0,)\text{TN}(\cdot, \cdot, 0, \infty) ensures slopes are positive - more prerequisite mastery should always increase (never decrease) the probability of mastering the dependent skill.

Item Parameters

λj,1TN(μλj,1,σλj,1,0,)(discrimination, positive only) \lambda_{j,1} \sim \text{TN}(\mu_{\lambda_{j,1}}, \sigma_{\lambda_{j,1}}, 0, \infty) \quad \text{(discrimination, positive only)} λj,2𝒩(μλj,2,σλj,2)(difficulty) \lambda_{j,2} \sim \mathcal{N}(\mu_{\lambda_{j,2}}, \sigma_{\lambda_{j,2}}) \quad \text{(difficulty)}

Same logic: discrimination must be positive so that skill mastery always benefits the student.

Customizing Priors

You can control all priors through the priors argument in build_model_config():

# Option 1: Set uniform priors across all parameters
config <- build_model_config(g, X, priors = list(
    beta = c(0, 2),      # mean, sd for all beta_root
    theta = c(0, 2),     # mean, sd for all theta (slope and intercept)
    lambda = c(0, 2)     # mean, sd for all lambda (slope and intercept)
))

# Option 2: Set per-parameter priors using matrices
config <- build_model_config(g, X, priors = list(
    lambda_mean = matrix(c(1, 0,    # Item 1: slope prior mean = 1, intercept prior mean = 0
                           1, 0,    # Item 2
                           1, 0),   # Item 3
                         nrow = 3, byrow = TRUE),
    lambda_std = matrix(0.0001, nrow = 3, ncol = 2)  # Very tight → locks parameters (scoring mode)
))

8. Summary: How One Model Becomes Many

The following table summarizes how the same DiBelloBN code adapts to different psychometric models purely through graph configuration:

Model Root Attributes Root Compute Dependent Attributes Item Compute Key Parameters
Traditional DCM KK discrete "dina" None "dina" / "dino" / "dinm" βm\beta_m, λj,1\lambda_{j,1}, λj,2\lambda_{j,2}
Bayesian Net DCM 1\geq 1 discrete "dina" 1\geq 1 discrete "dina" / "dino" / "dinm" βm\beta_m, θk,1\theta_{k,1}, θk,2\theta_{k,2}, λj,1\lambda_{j,1}, λj,2\lambda_{j,2}
IRT (2PL) 1 continuous "zscore" None "dina" λj,1\lambda_{j,1}, λj,2\lambda_{j,2}, ηi\eta_i
MIRT MM continuous "zscore" None "dina" λj,1\lambda_{j,1}, λj,2\lambda_{j,2}, ηim\eta_{im}
HO-DCM 1 continuous "zscore" KK discrete "dina" / "dino" / "dinm" θk,1\theta_{k,1}, θk,2\theta_{k,2}, λj,1\lambda_{j,1}, λj,2\lambda_{j,2}, ηi\eta_i, αik\alpha_{ik}

The entire switching logic is governed by two things:

  1. Graph topology - which nodes are roots, which are dependent, and which are tasks.
  2. The compute attribute on each node - "dina", "dino", "dinm", or "zscore"/"continuous".

The build_model_config() function inspects these properties and generates the appropriate NIMBLE constants, initial values, and monitors. The model code itself never changes.


9. Appendix: Quick Reference - Parameter Mapping

When you run map_pgdcm_parameters() after estimation, pgdcm translates the internal NIMBLE parameter names into human-readable labels. Here is the mapping:

NIMBLE Parameter Readable Name Type
beta_root[m] {Attribute_m} - Prior/Intercept Root Structure
theta[k, 1] {Attribute_k} - Dependency on Parent Attribute Structure (Slope)
theta[k, 2] {Attribute_k} - Intercept Attribute Structure (Threshold)
lambda[j, 1] {Task_j} - Slope Item Parameter (Discrimination)
lambda[j, 2] {Task_j} - Intercept Item Parameter (Difficulty)
attributenodes[i, k] {Student_i} - {Attribute_k} Student Mastery