Data Standards · 8 min read

Building ADaM datasets with the admiral package

By the TechWorksLab clinical programming team

Analysis datasets sit at the center of a clinical submission. Every table, every figure, and most of the questions a reviewer asks come back to the ADaM data. When that data is built well, the rest of the work goes smoothly. When it is built in a tangle of one off scripts, every change ripples outward and the team spends its time tracing problems instead of analyzing results.

The admiral package offers a different way to build ADaM. Instead of writing a long program that transforms a dataset from start to finish, you assemble the dataset from a series of small, named derivations. Each derivation does one thing, such as deriving a treatment emergent flag or a study day, and each one is documented and tested on its own.

Why a derivation based approach helps

The value becomes clear the first time the source data changes, which on a live study is often. With a monolithic program, a change in one variable can force you to read the whole script to understand what else it touches. With a set of derivations, the change is local. You find the derivation responsible, you adjust it, and the surrounding logic stays as it was.

Traceability improves for the same reason. A reviewer who asks how a particular analysis value was produced can be pointed to the specific derivation that produced it, along with the inputs it used. There is no need to walk through hundreds of lines of code to reconstruct the path.

A dataset built from named, tested steps is far easier to defend than one built from a single program that only its author fully understands.

What a typical build looks like

A common pattern starts from the relevant SDTM domains and the ADaM specification. The team works through the specification one variable at a time, mapping each to a derivation. Standard derivations, such as those for dates, study days, and common flags, come from the package. Study specific logic is written as custom derivations that follow the same structure.

Three habits tend to separate a clean build from a messy one:

Keep each derivation focused on a single variable or a tightly related group.
Drive the build from the specification, so the code and the documentation never drift apart.
Test derivations against known inputs, so a later change that breaks one is caught at once.

Working within a validated process

Using an open package in a regulated setting raises a fair question: how do you trust it? The answer is qualification. Before a package is used to produce submission data, it is validated against documented expectations, and the evidence is kept. Once that work is done, the package can be relied on the same way an internal tool would be.

This is where the modular approach pays off again. Because each derivation is small and tested, the qualification effort is focused and repeatable. New versions of the package can be assessed against the same tests rather than reviewed from scratch.

The result

Teams that adopt this approach report the same thing: the first study takes real effort to set up, and every study after that is faster. The derivations become a library. The specifications become templates. A late data change that once meant a long evening becomes a small, contained edit followed by a rerun.

That is the point. The work of ADaM does not disappear, but it moves from firefighting toward something steady and predictable, which is exactly what a submission timeline needs.

If your team is weighing a move to a derivation based ADaM process, we are happy to talk through what it would take in your environment. You can reach us here.