Skip to content

Managing Metadata

Metadata in Calypr is formatted using the Fast Healthcare Interoperability Resources (FHIR) schema. If you choose to bring your own FHIR newline delimited json data, you will need to create a directory called “META” in your git-drs repository in the same directory that you initialized your git-drs repository, and place your metadata files in that directory.
The META/ folder contains newline-delimited JSON (.ndjson) files representing FHIR resources describing the project, its data, and related entities. Large files are tracked using Git LFS, with a required correlation between each data file and a DocumentReference resource. This project follows a standardized structure to manage large research data files and associated FHIR metadata in a version-controlled, DRS and FHIR compatible format.
Each file must contain only one type of FHIR resource type, for example META/ResearchStudy.ndjson only contains research study resource typed FHIR objects. The name of the file doesn’t have to match the resource type name, unless you bring your own document references, then you must use DocumentReference.ndjson. For all other FHIR file types this is simply a good organizational practice for organizing your FHIR metadata.

META/ResearchStudy.ndjson

  • The File directory structure root research study is based on the 1st Research Study in the document. This research study is the research study that the autogenerated document references are connected to. Any additional research studies that are provided will be ignored when populating the miller table file tree.
  • Contains at least one FHIR ResearchStudy resource describing the project.
  • Defines project identifiers, title, description, and key attributes.

META/DocumentReference.ndjson

  • Contains one FHIR DocumentReference resource per Git LFS-managed file.
  • Each DocumentReference.content.attachment.url field:
  • Must exactly match the relative path of the corresponding file in the repository.
  • Example:
{
  "resourceType": "DocumentReference",
  "id": "docref-file1",
  "status": "current",
  "content": [
    {
      "attachment": {
        "url": "data/file1.bam",
        "title": "BAM file for Sample X"
      }
    }
  ]
}

Place your custom FHIR .ndjson files in the META/ directory:

# Copy your prepared FHIR metadata
cp ~/my-data/patients.ndjson META/
cp ~/my-data/observations.ndjson META/
cp ~/my-data/specimens.ndjson META/
cp ~/my-data/document-references.ndjson META/

Other FHIR data

[TODO More intro text here]

  • Patient.ndjson: Participant records.
  • Specimen.ndjson: Biological specimens.
  • ServiceRequest.ndjson: Requested procedures.
  • Observation.ndjson: Measurements or results.
  • Other valid FHIR resource types as required.

Ensure your FHIR DocumentReference resources reference the DRS URIs:

Example DocumentReference linking to S3 file:

{
  "resourceType": "DocumentReference",
  "id": "doc-001",
  "status": "current",
  "content": [{
    "attachment": {
      "url": "drs://calypr-public.ohsu.edu/your-drs-id",
      "title": "sample1.bam",
      "contentType": "application/octet-stream"
    }
  }],
  "subject": {
    "reference": "Patient/patient-001"
  }
}

Validating Metadata

To ensure that the FHIR files you have added to the project are correct and pass schema checking, you can use the Forge tool.

forge validate

Successful output:

✓ Validating META/patients.ndjson... OK
✓ Validating META/observations.ndjson... OK
✓ Validating META/specimens.ndjson... OK
✓ Validating META/document-references.ndjson... OK
All metadata files are valid.

Fix any validation errors and re-run until all files pass.

Forge Data Quality Assurance Command Line Commands

If you have provided your own FHIR resources there are two commands that might be useful to you for ensuring that your FHIR metadata will appear on the CALYPR data platform as expected. These commands are validate and check-edge

Validate:

forge validate META
# or
forge validate META/DocumentReference.ndjson
Validation checks if the provided directory or file will be accepted by the CALYPR data platform. It catches improper JSON formatting and FHIR schema errors.

Check-edge:

forge check-edge META
# or
forge validate META/DocumentReference.ndjson
Check-edge ensures that references within your files (e.g., a Patient ID in an Observation) connect to known vertices and aren't "orphaned".

Validation Process

1. Schema Validation

  • Each .ndjson file in META/ (like ResearchStudy.ndjson, DocumentReference.ndjson, etc.) is read line by line.
  • Every line is parsed as JSON and checked against the corresponding FHIR schema for that resourceType.
  • Syntax errors, missing required fields, or invalid FHIR values trigger clear error messages with line numbers.

2. Mandatory Files Presence

  • Confirms that:
  • ResearchStudy.ndjson exists and has at least one valid record.
  • DocumentReference.ndjson exists and contains at least one record.
  • If either is missing or empty, validation fails.

3. One-to-One Mapping of Files to DocumentReference

  • Scans the working directory for Git LFS-managed files in expected locations (e.g., data/).
  • For each file, locates a corresponding DocumentReference resource whose content.attachment.url matches the file’s relative path.
  • Validates:
  • All LFS files have a matching DocumentReference.
  • All DocumentReferences point to existing files.

4. Project-level Referential Checks

  • Validates that DocumentReference resources reference the same ResearchStudy via relatesTo or other linking mechanisms.
  • If FHIR resources like Patient, Specimen, ServiceRequest, Observation are present, ensures:
  • Their id fields are unique.
  • DocumentReference correctly refers to those resources (e.g., via subject or related fields).

5. Cross-Entity Consistency

  • If multiple optional FHIR .ndjson files exist:
  • Confirms IDs referenced in one file exist in others.
  • Detects dangling references (e.g., a DocumentReference.patient ID that's not in Patient.ndjson).

✅ Example Error Output

ERROR META/DocumentReference.ndjson line 4: url "data/some_missing.bam" does not resolve to an existing file
ERROR META/Specimen.ndjson line 2: id "specimen-123" referenced in Observation.ndjson but not defined


🎯 Purpose & Benefits

  • Ensures all files and metadata are in sync before submission.
  • Prevents submission failures due to missing pointers or invalid FHIR payloads.
  • Enables CI integration, catching issues early in the development workflow.

Validation Requirements

Automated tools or CI processes must:

  • Verify presence of META/ResearchStudy.ndjson with at least one record.
  • Verify presence of META/DocumentReference.ndjson with one record per LFS-managed file.
  • Confirm every DocumentReference.url matches an existing file path.
  • Check proper .ndjson formatting.