Dominique Louise

Harry Potter and the institutional conditions for process-oriented learning

Dominique Louise — Tue, 02 Jun 2026 19:54:34 GMT

In my previous writings, I’ve covered quite substantially what it takes at the data level to report reliably on learning processes alongside learning outcomes. But what does process-oriented learning mean in practice, when it has to survive contact with accreditation bodies, workload constraints, and student expectations shaped by years of output-oriented education? This conversation-starter article took me a while to narrow down, but I’m beyond excited to take you along in the narrative on AI and the process-change in Higher Ed.

And since I’m actually attending the HP and The Cursed Child musical coming weekend, I’d already gotten in the mood.

Subscribe now

The rise of AI in higher education may shift assessment from evaluating products to evaluating learning processes, which in turn requires institutions to develop data infrastructures capable of capturing, integrating, and analyzing process data as evidence of learning. In order to embed process-oriented learning in the core of higher education systems, three things would need to change simultaneously: assessment design, the data infrastructure that makes process evidence defensible, and inducting students into a different model of what learning is for that what they’re currently used to.

First, take today’s assessment culture. The NVAO Assessment Framework’s Standard 4 specifies that institutions demonstrate achievement of intended learning outcomes through results of tests, final projects, and performance of graduates in practice, but the evidence forms listed are deliberately broad: thesis, portfolio, professional product, interim exam, paper, artistic achievement, or a combination. The European Standards and Guidelines, which underpin NVAO’s framework, go even further: assessment should “reflect the learning process and concretise the intended learning outcomes,” a formulation that readily invites alignment between process documentation and outcome demonstration.

The harder constraint is that switching from essays to portfolios rather relocates than resolves the challenge of evaluating learning quality. Students are trained to optimize for summative assessment, and it’s likely that those who are already good at strategic surface learning will be good at performing reflection as well. For example, a portfolio filled with reflections written to match a rubric rather than document genuine iteration isn’t process evidence, but output-oriented behavior in different context. This is where the data infrastructure argument of a previous article on process analysis connects to assessment design. More on that can be found here:

Dominique Louise

What process-oriented analysis actually requires

This is the second article in a four-part series on educational data infrastructure and AI-supported learning. It is intended as a technical reference for readers with a data or institutional research background, and is designed to be read alongside rather than strictly before the third and fourth article. Readers primarily interested in the governance …

2 months ago · Dominique Louise

Continuous digital trace data, captured through learning management systems during the learning process itself, is significantly harder to retroactively manufacture than a written reflection. Nicol and Macfarlane-Dick (2006) described the feedback architecture that makes process assessment meaningful as one requiring visibility into learning as it unfolds, not a summary of it after the fact. Incorporating that visibility in the base level data infrastructure comes before the assessment design problem.

Then there’s workload constraints: continuous feedback is expensive. Portfolio review requires more time than marking an exam, and compressing feedback cycles into regular intervals across a semester redistributes workload in ways that can’t be absorbed without planning. Research on hybrid human-AI feedback systems - Molenaar and colleagues have been developing this strand systematically since 2022, connecting it directly to self-regulated learning at scale - shows that AI-supported scaffolding during the learning process can take on a meaningful share of formative feedback while keeping human judgment at the consequential moments. It doesn’t eliminate the workload argument; but changes the economics considerably.

Lastly, student behavior. The students arriving in Dutch universities have been in an assessment-heavy system since primary school, which has only been intensifying. Research tracking Dutch secondary students since 2001 shows the share experiencing significant pressure from schoolwork has nearly tripled over twenty years, reaching roughly 47% by 2025 (HBSC, 2021). Johannes Visser documented in his 2023 account of Dutch grading culture that secondary students sit an average of 102 assessed moments per year. The national secondary student association LAKS captured the resulting learning logic in three words: zweten, weten, vergeten (allow me to creatively translate this to learn it, earn it, burn it, meaning that students forget any learned content immedeately after passing the test).

Students shaped by that experience arrive in higher education not disengaged but trained. They allocate attention to what’s assessed and treat unassessed process activities as an opportunity cost: a perfectly logical response to the system they came from. Process-oriented assessment asks them to revise that model but doesn’t automatically give them reason to trust the new one. Boud’s (2000) framing of sustainable assessment was precise on this: assessment that genuinely develops lifelong learning capacity requires students to build evaluative judgment about their own work, not just produce work that meets externally defined criteria. That capability develops through repeated cycles of self-assessment and feedback. It doesn’t appear because the module design changed.

The implication is that student onboarding into the assessment logic is a curriculum design task. Explicitly teaching self-regulatory strategies, scaffolding evaluative judgment in lower-stakes contexts before higher-stakes ones, being transparent about why process evidence has independent value; these need to be designed into the curriculum alongside the assessment rubrics. Van der Graaf and colleagues (2022) showed that the relationship between self-regulated learning processes and learning outcomes is neither straightforward nor uniform across students, which means the scaffolding can’t be treated as a one-time instructional moment. It requires the kind of longitudinal monitoring that, again, depends on adequate data infrastructure being in place.

None of this is beyond reach, and the Dutch system has more room for it than institutional culture tends to assume. The OER gives program managers extensive latitude in specifying how intended learning outcomes are assessed, and the NVAO framework, read carefully, is compatible with portfolio-based and process-reflective assessment architectures. The research base is solid, the technology for supporting feedback at scale exists and is improving, and student resistance to process-oriented assessment is documented as transitional in programs where the design is coherent and the institutional commitment is sustained.

What’s required is the organizational capacity to treat assessment redesign, data infrastructure, and student induction into evaluative practice as one integrated problem that program managers are positioned to address, but not from any single position in the institution alone. The question I’d like to dive deeper into: which Dutch institutions are currently building that capacity, and who’s paying attention to what they’re learning?

See you next time!

References

Boud, D. (2000). Sustainable assessment: rethinking assessment for the learning society. Studies in Continuing Education, 22(2), 151–167.

ESG (2015). Standards and Guidelines for Quality Assurance in the European Higher Education Area. Brussels: ENQA.

LAKS (2020). Toetscultuur op middelbare scholen. Amsterdam: Landelijk Aktie Komitee Scholieren.

Molenaar, I. (2022). Towards hybrid human-AI learning technologies. European Journal of Education, 57(4), 621–634.

Molenaar, I., de Mooij, S., Azevedo, R., et al. (2023). Measuring self-regulated learning and the role of AI: five years of research using multimodal multichannel data. Computers in Human Behavior, 139, 107540.

Nicol, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: a model and seven principles of good feedback practice. Studies in Higher Education, 31(2), 199–218.

NVAO (2018). Assessment Framework for the Higher Education Accreditation System of the Netherlands. The Hague: NVAO.

Stevens, G. W. J. M. et al. (2021). Gezondheid en welzijn van jongeren in Nederland: HBSC 2021. Utrecht: Universiteit Utrecht.

Visser, J. (2023). Is het voor een cijfer? Amsterdam: De Correspondent.

Van der Graaf, J., Lim, L., Fan, Y., Kilgour, J., Moore, J., Gašević, D., Bannert, M., & Molenaar, I. (2022). The dynamics between self-regulated learning and learning outcomes: an exploratory approach and implications. Metacognition and Learning, 17, 693–720.

When the analysis layer outpaces the data infrastructure

Dominique Louise — Thu, 14 May 2026 14:28:07 GMT

This article is the fourth and final part of a series on educational data infrastructure and AI-supported learning. The previous articles established which improvements to data infrastructure are required for universities to answer institutional research questions about process-oriented and AI-supported learning. We now build on the technical foundation by taking a governance perspective on how to scale this institutionally.

With years of experience as institutional researcher, I’m noticing a change in usage of institutional education data. The arrival of AI-assisted analysis tools (large language models that can write database queries from plain-language descriptions, generate scripts from data tables, or interpret datasets without formal analytical training) is lowering the barrier to institutional data analysis in ways that are both genuinely useful and structurally consequential. Take a policy advisor who previously depended on a central data team to answer a question about student retention patterns, who can now (with sufficient access and the right tools) attempt that analysis independently. Or a programme director curious about course-level dropout rates who no longer needs to file a request and wait three weeks. This democratization of analysis is in many ways welcome. But the thing is, it’s happening faster than most data governance frameworks were designed to accommodate.

In a decentralized analysis landscape, the role of data definitions becomes more important. In a centralized model, a data team mediates between raw institutional data and the answers that decision-makers receive. That mediation creates - often invisibly for end users - definitional consistency: ensuring that “international student” means the same thing in a retention analysis as it does in an intake report, that “first year” is calculated from the same reference date across departments, that credits from exchange programmes are included or excluded on a consistent basis. When analysis moves outside that mediation layer, those definitions become assumptions made implicitly by whoever is running the analysis, and could be shaped by whatever the underlying data system happens to contain and however the querying tool happens to interpret it.

AI-assisted analysis rather scales this problem than solving it. A language model generating a query against a definitionally inconsistent dataset will produce a result that looks authoritative (i.e. formatted, labeled, often accompanied by a chart) but that may be comparing populations defined differently across faculties, drawing on fields populated inconsistently across systems, or missing sub-populations entirely because of how a filter was applied. The practice that has come to be called vibe coding in software development circles (i.e. generating functional analyses or applications through conversational AI prompting, often by users without deep technical training) documents exactly this pattern at the application level: outputs that run without error and return plausible-looking numbers, but that carry assumptions the author was unaware of making. The parallel in institutional data analysis is direct. A 2026 EDUCAUSE case study documenting one American university’s experience with AI deployment on institutional data captured this dynamic with the quote: “AI does not solve governance, it exposes it.” Institutions that deployed AI tools expecting them to make sense of fragmented, inconsistently defined data found instead that the fragmentation became visible at scale, in the hands of users who had no frame of reference for evaluating what they were seeing.

The three previous articles in this series have built toward a conclusion about what this means. The event log is the raw material for process-oriented educational analysis. The dimensional model is the semantic governance layer that makes that raw material interpretable consistently. Process mining is the analytical technique that operates on both to reveal actual student trajectories rather than assumed ones. Causal inference is the additional layer required before pattern observations can be treated as explanations. And real-time signals are the downstream possibility that becomes achievable once the foundational layers are stable. AI tools in the analysis layer make the distance between those layers and the end user shorter, but they do not make the layers optional. A well-governed dimensional model with clearly defined conformed dimensions (e.g. shared institutional definitions of what a student is, what an enrolment is, what an average grade consists of) is what makes AI-assisted analysis by non-experts reliable across the institution rather than locally plausible.

This creates a specific challenge for central data and institutional research teams. The traditional value proposition of those teams (i.e. access to data and the expertise to interpret it) is under pressure as access becomes easier and AI lowers the interpretive barrier for non-specialists. Teams that respond by positioning themselves as approval gates will find that pressure compounding, as the tools available to work around central teams continue to improve. The more durable response is a shift in value proposition: from being the team that produces analyses to being the team that makes decentralized analysis reliable. In practice, that means investing in data model documentation that is legible to non-technical users, in governed environments where AI-assisted querying operates within defined semantic boundaries, and in sustained engagement with the faculty and policy users.

Questions about data infrastructure, such as “What event data is captured? How is it governed, what shared definitions apply across facylties and how are analytical layers sequenced?” are policy questions. So let’s get to what this means for institutional leadership and policy leads. As I’ve stated in the first article (“Before AI can transform universities, fix the data”), the governance model that gave Dutch universities their autonomy also gave them the freedom to build incompatible systems. That was a coherent arrangement when the primary demands on those systems were standardized output measures for external reporting. The questions emerging from AI-supported and process-oriented learning are fundamentally different. They concern trajectories, sequences, and causal relationships, and they will increasingly be asked by staff working outside central data teams. Whether this shift will create enough shared urgency to reconsider the governance arrangements that produced today’s fragmentation remains uncertain. What is becoming increasingly difficult to ignore, however, is the growing cost of leaving that fragmentation unresolved.

This cost mechanism has two compounding components. The first is analytical debt: every analysis produced against a fragmented or definitionally inconsistent infrastructure becomes an institutional record (a report, a policy decision, a programme evaluation) built on undocumented assumptions. As analyses accumulate, they become reference points for subsequent ones, which inherit original definitions without knowing it. The analytical layer builds on itself and inconsistencies compound, which becomes a structural problem when AI tools increase the volume of analysis. The second component is the infrastructure gap relative to the pace of pedagogical change. AI-supported and adaptive learning tools are entering higher education, generating new categories of data and new pressure to evaluate what they are doing. Every semester that passes without an event log infrastructure in place is a semester of timestamped interactions, assessment attempts, and engagement patterns that cannot be retrospectively reconstructed, while they form the historical baseline against which future process models are calibrated and against which early warning signals become interpretable. These two components interact in a specific way: the arrival of AI analysis tools is creating demand for the kind of process-level institutional insight that depends on historical event data, at precisely the moment when that data is not being captured in usable form. Delaying the infrastructure decision allows the analytical debt to accumulate.

What the data can and cannot tell you about learning

Dominique Louise — Wed, 06 May 2026 18:44:10 GMT

This is the third article in a four-part series on educational data infrastructure and AI-supported learning. It builds on the infrastructure argument of the second article but is written to be accessible to readers without a deep data background. The focus is on what becomes analytically possible once the foundation is in place and where the honest limits lie.

Imagine that the data infrastructure described in the previous article exists: event logs are capturing granular, timestamped student interactions across systems and are governed by a dimensional model with shared definitions that hold consistently across faculties and analyses. We then arrive at the exciting part: determining which kind of analytical questions we can and cannot answer. Acknowledging limitations here is important to build analytical capability that readily supports data-informed educational policy, rather than enabling a situation in which either we produce confident-looking outputs but flatten important complexity and nuance, or build dashboards that don’t actually help making decisions.

The analytical layer that sits most naturally above a well-structured educational event log is process mining. Process mining - founded in substantial part by Wil van der Aalst at Eindhoven University of Technology, which makes its Dutch roots worth noting - is a set of techniques for discovering, visualizing and analyzing processes directly from timestamped event data. Rather than asking the data to confirm a process model that analysts defined in advance, process mining constructs a model of the process as it actually occurred in the population, based on the sequences of events observed in the log. In an educational context, this means that instead of measuring how many students progressed through the stages a curriculum was designed around, process mining reveals the paths students actually took: which courses were taken out of the expected sequence, where the population clusters into meaningfully different trajectory types, and which points in the programme generate the strongest deviations from the norm.

The distinction between the designed process and the observed process is important. It is the difference between evaluating a curriculum on its own terms and evaluating it on the terms of student behaviour. In my personal experience, most current institutional research projects focus on evaluating the designed process. We measure completion against the designed path and flag deviation as delay. Process mining does the latter. Without assuming the designed path is the right reference point, it asks what the population actually did and identifies which patterns in the observed behaviour are associated with typical outcomes. Programme directors, study advisors and educational policy officers need to make that shift in orientation when they’re ask process questions. No amount of improvement to the current output-oriented reporting layer can produce this, because the question type requires different data and a different analytical approach.

Let’s take it one step further. A dimensional model can help you discover that students who resit a specific first-year course have a (s)lower graduation rate than those who do not. That is a useful diagnostic observation. What it cannot tell you however, is whether that course is causing the downstream difficulty. Maybe, students who struggle with it are moved into that path by prior characteristics such as secondary school background, programme preparation or other socioeconomic factors. These underlying patterns could have produced the same outcomes regardless of whether those students had followed that course, making it impossible to determine whether a curriculum intervention targeting that course would change graduation rates at all. Process mining can add resolution here, by showing that students who resit the course and subsequently graduate follow a measurably different trajectory in the two semesters that follow the resit than those who do not, which gives programme directors a more detailed picture of where intervention might be productive. But it still cannot establish causality.

Determining causal relations requires ruling out alternative explanations for an observed association. In educational data - which is observational rather than experimental, meaning students are not randomly assigned to courses or trajectories - the standard threat to causal inference is confounding. This means that an outside factor (confounder) influences both the independent variable (e.g. student behavior) and dependent variable (e.g. graduation outcome), creating an apparent relationship between them that would disappear if the confounder were controlled for. In the Journal of Learning Analytics, three types of confounding pitfalls that observational educational data is particularly susceptible to are identified: confounding bias, overcontrol bias, and collider bias (read more here). Establishing credible causal claims from this data requires explicit reasoning about the assumed causal structure and analytical methods designed to isolate effects in observational settings. This is a distinct analytical layer from process mining, and it requires both the event log infrastructure and the dimensional governance layer to function reliably before it can produce results worth acting on.

To match the analytical questions to what the data actually can help answer, it’s important to be precize about what each layer enables. Descriptive and diagnostic questions (e.g. what happened, where did it happen, in which populations, in what sequence) are answerable from a well-constructed event log with a dimensional governance layer. Process discovery questions (e.g. what paths actually exist, which deviate from the norm, where the bottlenecks are) require process mining above that foundation. Causal questions (e.g. did this intervention work, does this course cause this outcome) require additional methods and explicit causal reasoning above that again. Presenting descriptive outputs as if they indicate answers to causal questions is one of the most common analytical errors in institutional research, and it is one that better infrastructure makes easier to avoid, precisely because the different layers become distinguishable.

The final layer worth naming, because it is the one most frequently invoked in the context of AI-supported learning, is near-real-time signal generation: the idea that the analytical system could detect, while a semester is still ongoing, that a student’s engagement pattern is beginning to resemble the pattern historically associated with disengagement or dropout, and surface that signal to a study advisor in time to act on it. This is technically achievable, but it requires the event infrastructure to operate at a different time scale than the batch processing that dimensional models typically use. Near-real-time signals require a streaming architecture: a system that routes events as they are generated, evaluates them continuously against models derived from historical data, and produces outputs without waiting for a nightly or weekly data load. The dimensional model and the historical process models derived from it are inputs to this system, not substitutes for it. Building it is a subsequent step, dependent on the foundational layers being stable and well-governed first. And although technically feasible, the empirical track record of early warning systems is mixed. Studies examining real-world deployments consistently find that effectiveness depends less on the predictive model than on the intervention infrastructure surrounding it: whether advisors act on signals, how consistently they do so, and whether the support offered once a student is flagged is adequate to address the underlying difficulty.

[Drafting Dominique wanted to dive deeper in early warning systems right away, but editing Dominique decided to dedicate a separate article to this topic later.]

What I have tried to establish across these two somewhat technical articles is that the infrastructure question is sequenced. An event log without dimensional governance won’t enable cross-faculty (or cross-institutional) comparison and longitudinal consistency. A dimensional model without event log foundation cannot answer process questions. Process mining without causal reasoning produces patterns that can be erroneously interpreted as explanations. And real-time signals without stable historical models to compare against produce noise rather than insight. The question for Dutch higher education institutions is how to sequence and govern the investment in these layers, and not at all unimportant: how we can do so in a way that instills and preserves data ethics, student consent and algorithmic fairness. That is what the fourth article in this series addresses. See you next week!

AI and the process turn in education

Dominique Louise — Sat, 02 May 2026 19:47:05 GMT

There’s a claim gaining traction in higher education circles: that AI tools are fundamentally shifting the orientation of learning away from final products and toward the process of learning itself. It’s an attractive idea, but before institutions restructure assessment and evaluation policies around it, it deserves careful scrutiny from both directions.

The case for the claim

The most straightforward argument starts with scarcity. When any student can generate a competent essay in minutes, the essay loses its function as a reliable signal of learning. Educators are responding - either by design or necessity - by placing greater weight on process artifacts: annotated drafts, reflection logs, iterative peer review, oral defenses. The output hasn’t disappeared, but its primacy is eroding.

There’s a second, more pedagogically interesting dimension. AI tools make the learning process legible in ways previous tools didn’t. When a student iterates with an AI (e.g. refining a prompt, interrogating a generated argument, asking for counterexamples), that conversational trace is itself a record of reasoning. Educators can now ask not just what did you produce, but how did you arrive at it. That’s a genuine shift in what counts as evidence of learning. (see more at https://www.unesco.org/en/articles/whats-worth-measuring-future-assessment-ai-age and https://learning.northeastern.edu/ai-assessment/)

This also aligns with longstanding ideals in learning science. Constructivist and Vygotskian frameworks have argued for decades that learning happens through supported dialogue and iteration, not through the production of polished artifacts. The problem was scalability: one instructor can’t provide individualized Socratic dialogue to two hundred students. AI doesn’t fully solve this, but a patient, always-available thinking partner for every student is no longer a fantasy. (see also https://www.sciencedirect.com/science/article/pii/S2949882126000149)

Subscribe now

The case against (or at least, for caution)

The critical weakness in the argument is that students can outsource the process just as easily as the output. A submitted conversation log doesn’t prove meaningful engagement. A reflective journal can be generated. The process, like the essay before it, becomes gameable the moment it becomes assessable. Learning only happens when the process is genuinely internalized, and there’s no structural mechanism that guarantees this. (read more at https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1499495/full)

More fundamentally, AI may be reinforcing output-orientation in a new register rather than displacing it. If assessment frameworks remain unchanged, then AI simply becomes a faster, cheaper tool for meeting the same old demands. The transformation of assessment culture (e.g. what institutions reward, how they define learning, what counts as evidence of competence) is a separate and much slower institutional project.

On the teacher side, the argument rests on an assumption that hasn’t been empirically verified: that AI frees up instructor time, and that instructors reinvest that time in richer formative feedback. In practice, AI tools frequently add work rather than substituting for it: new systems to learn, outputs to verify, academic integrity questions to navigate. The time dividend is not yet a documented pattern.

There’s also an equity dimension that deserves attention. Process-oriented pedagogy tends to work best for students who already possess metacognitive skills (i.e. who know how to reflect, iterate, and self-regulate). Evidence suggests that high-metacognition students use AI to deepen their engagement, while lower-metacognition students use it to circumvent it. The assumption that AI democratizes process-oriented learning is, at best, unproven (read more via https://www.mdpi.com/2227-7102/15/5/637).

Finally, and perhaps most practically: the institutional structures of higher education remain largely output-oriented. Process-oriented pedagogy embedded inside output-oriented institutional mechanisms creates friction rather than transformation. A shift to process-orientation in higher education would require changes in how programs are accredited, how degrees signal competence to employers, and how learning outcomes are defined and monitored. None of these are moving at the speed of AI adoption. The result is a structural mismatch: process-oriented pedagogy embedded inside output-oriented institutional machinery.

A more defensible version of the claim

This means the causal mechanism is more conditional than the popular framing suggests. A more precise version of our original claim might be:

AI creates the conditions and the incentive for a shift from output- to process-oriented education. Whether that shift materializes depends on whether institutions redesign their assessment cultures.

I’m writing a series of articles (of which the first two are up on my page) on what it actually takes for higher education institutions to report reliably on learning processes alongside learning outcomes. What data infrastructure, assessment redesign, and policy frameworks does that require? And what does “process-oriented” mean in practice, when it has to survive contact with accreditation bodies, workload constraints, and student expectations shaped by years of output-oriented schooling?

Stay tuned for coming Wednesday’s post on process analysis - what it can reveal, what it cannot, and what causal questions it leaves open.

What process-oriented analysis actually requires

Dominique Louise — Tue, 28 Apr 2026 08:31:27 GMT

Before proposing a solution to the infrastructure problem described in the first article of this series, it is worth making explicit what the problem actually is at the data level. The argument I made - that current university data systems are “not built for process questions” - is intuitive, but imprecise. And to get to a solution for the structural cause rather than an intervention that addresses only the symptoms, let’s take a closer look on what exactly a process-oriented question requires from data that an output-oriented question does not.

“How many students enrolled in the first year of Computer Science in 2023?” is an output question. It requires a count over a defined population at a defined point in time. “Which course combinations in the first year are associated with delayed graduation, and at what point in the academic calendar do students following those paths begin to diverge from those who graduate on time?” is a process question. It requires tracking individual students across time, through a sequence of states, with enough granularity to detect when and how their trajectories change. These are not variations of the same query. They require fundamentally different data structures to answer reliably.

What current systems are built to produce

Most university administrative data systems organize information in one of two ways. Transactional systems record discrete events as they occur: a student registers for a programme, course grades are entered, a programme switch is processed. Periodic snapshot systems aggregate those events into a summary state at fixed intervals, typically at the end of an academic year: credits obtained, weighted average grade, enrollment status. Both patterns are optimized for output accounting. They answer the funding questions that dominated institutional reporting in the Netherlands since the 1980s (i.e. how many, at what rate) efficiently and accurately.

What neither pattern preserves is the sequence. Once a transactional event has been recorded and aggregated into a periodic snapshot, the order in which events occurred, the time elapsed between them, and the paths individual students took through the system are not retained in a form that is analytically accessible. The data is there, in principle, dispersed across systems. Reconstructing a student’s trajectory from it requires extracting records from multiple sources, resolving different identifiers, aligning different time references, and making decisions about which events count as milestones and which do not. That reconstruction is essentially a custom data engineering project, which is why answering a process question in most Dutch universities currently takes weeks, for it to still arrive with caveats.

The raw material: event logs

The foundation for any process-oriented analysis is an event log. An event log, at its simplest, is an ordered record of timestamped events associated with a case. Where in educational context, the case is typically a student or a student-programme combination. Each row represents one event: a course registration, an assessment attempt, a grade entry, a login to the learning management system, a contact moment with a study advisor. The three required elements are a case identifier, an activity label, and a timestamp. Everything else is context.

The critical property of an event log is that it is append-only. New events are added; old events are not overwritten. This preserves the sequence of what happened, which is precisely what periodic snapshot systems discard when they aggregate. Event log data is often generated as a byproduct of existing systems: every interaction with Brightspace or Canvas, every grade entry in Osiris, every study advisor contact on faculty level each produce timestamped records. But that data is not systematically captured, stored in a form that links cases across systems, or made available for analysis. The raw material for process-oriented analysis exists, but the infrastructure to retain and connect it does not, in most cases, yet.

There is an additional complication. Even where event log data is captured, it tends to be captured separately by each system that produces it. LMS events live in the LMS. Assessment events live in the student information system. Advisor contact lives in a CRM or, frequently, in email. Connecting these into a coherent per-student event sequence requires resolving the same definitional and identifier problems the first article described at the sector level, now applied within a single institution across its own systems. The challenge is not only storage, but integration.

Why event logs alone are not sufficient

A naive reading of the above might suggest that the solution is simply to capture and connect the event logs. To build the pipeline that extracts, links, and stores timestamped student events in a single place and then run analyses against that. This is necessary but not sufficient, because of a reason that becomes clear when you consider what happens when multiple analysts, from different faculties, at different moments in time, run different analyses against the same event data without shared definitions governing how that data is interpreted.

To use a term from dimensional data warehousing (i.e. the discipline for organizing analytical data systems developed by Ralph Kimball), a conformed dimension is a shared, governed definition of a core entity that carries the same meaning across analyses. What counts as an enrolled student? Does a programme switch reset the clock? Is a failed assessment attempt that was subsequently passed an event in the graduation trajectory or not? These are governance questions that have to be resolved before any analysis runs, and the answers have to be consistent across every analysis that uses the same entities, or the results cannot be compared.

Process mining has a substantial body of academic literature and a growing set of applied tools, so the reason it has not scaled in Dutch higher education isn’t really technical. The thing is: there are few conformed dimensions anchoring the event data to shared institutional definitions. Every analysis runs against event logs in which “sufficient course result” and “exam attempts” mean subtly different things depending on which system produced the events and which faculty defined the terms. The patterns that process mining discovers under these conditions are real patterns in the data, but they are not reliably patterns in the educational reality the data is supposed to represent.

This is the infrastructure precondition that dimensional modeling addresses. A dimensional model, sitting as a governed consumption layer above the event log, ensures that the raw event data is interpreted consistently before analysis begins. It does not replace process mining, but it creates the conditions under which process mining produces results that can be trusted, compared across faculties, and acted upon at the institutional level. The event log is the raw material. The dimensional model is the semantic layer that governs how that material is read. The process analysis is what happens above both.

That process analysis - what it can reveal, what it cannot, and what causal questions it leaves open - is the subject of the third article in this series, arriving next week on Wednesday. But first, I’ll take a brief side step coming Sunday and take you along in a discussion around output- versus process-oriented learning. Stay tuned!

Before AI can transform universities, fix the data

Dominique Louise — Tue, 21 Apr 2026 14:48:17 GMT

“How many students can we expect on program level based on preliminary enrollments?”, “Would changing admission requirements impact student’s success in their first year at uni?”, “How do students move through Bachelor programmes and which courses are being retaken more often or cause delays?” All relevant questions for universities to ask, even more so to be able to answer them.

Reporting on cross-faculty educational topics such as student enrollment, success and retention is more difficult than it seems. Differences in definitions, fragmented data ownership and rigid data infrastructures cause delays, even on purely descriptive analysis levels. Imagine requiring all Dutch universities to simultaneously report on trends and patterns in say, international enrolment, that becomes a near-impossible task. For them to provide data to fulfill a nationwide request takes multiple weeks for it to still come through with caveats.

From working directly inside of this problem, I’d recognize this is a surface symptom of a more structural cause. It derives from a governance system where the ministry of Education, Culture and Science (in Dutch OCW) standardized the reporting layer in data systems like DUO, CROHO and Studielink while leaving everything upstream (e.g. how learning is organized, measured, and tracked internally) to institutional autonomy. When all 14 universities fill in an Excel template provided by Universities of the Netherlands (the national representative body - UNL), they pull from 14 different underlying systems with 14 possible interpretations of the topic at hand. Any definitional conflicts are a logical outcome of a system that deliberately avoided standardizing the layer where those definitions are made. In summary: the governance model that gave institutions their autonomy also gave them the freedom to build incompatible systems.

In a landscape subject to political shifts and rapid technological developments, higher education institutions are navigating how to measure and evaluate their contribution to society. It is naturally in line with their social responsibility to - in spite of technical obstacles - be on top of their own student and learning data and thoroughly know the patterns. Most current data structures are built to answer one kind of question: how many students enrolled, how many credits obtained, how many degrees conferred. From a funding perspective, those were (and still are) the questions that mattered. But as educational developments push institutions further along the data analytics maturity curve towards prescriptive analytics and process-oriented and student-based learning, I think it’s quite timely to signal a mismatch between what we now ask of data versus what it’s infrastructure was originally designed for. As current systems were built for output oriented learning, it is structurally unprepared for the process questions that AI-supported learning will require.

If Dutch universities are serious about facilitating process-oriented and adaptive learning, the data infrastructure question is no longer purely theoretical. Since I haven’t seen it being asked yet, let alone it being addressed in policy frameworks: I’m wondering what kind of improvements process-oriented AI-supported learning would require.