Multi-modal¶
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
MuData objects build on top of AnnData objects to store multimodal data.
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --modules bionty
Show code cell output
→ initialized lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
bt.settings.organism = "human"
ln.track()
Show code cell output
→ connected lamindb: testuser1/test-multimodal
→ created Transform('zkzCom3nZzt40000', key='multimodal.ipynb'), started new Run('iQZgvwTgUAec8foG') at 2025-12-17 19:54:00 UTC
→ notebook imports: bionty==1.10.0 lamindb==1.17.0
• recommendation: to identify the notebook across renames, pass the uid: ln.track("zkzCom3nZzt4")
Creating MuData Artifacts¶
lamindb provides a from_mudata() method to create Artifact from MuData objects.
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Show code cell output
MuData object with n_obs × n_vars = 200 × 300
obs: 'perturbation', 'replicate'
var: 'name'
4 modalities
rna: 200 x 173
obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
var: 'name'
adt: 200 x 4
obs: 'nCount_ADT', 'nFeature_ADT'
var: 'name'
hto: 200 x 12
obs: 'nCount_HTO', 'nFeature_HTO', 'technique'
var: 'name'
gdo: 200 x 111
obs: 'nCount_GDO'
var: 'name'mdata_artifact = ln.Artifact.from_mudata(mdata, key="papalexi.h5mu")
mdata_artifact
Show code cell output
→ writing the in-memory object into cache
Artifact(uid='TRoiY2PIJ6duxUCq0000', version=None, is_latest=True, key='papalexi.h5mu', description=None, suffix='.h5mu', kind='dataset', otype='MuData', size=550136, hash='as4mRWTdRo1z6ppZhxQlzw', n_files=None, n_observations=200, branch_id=1, space_id=1, storage_id=2, run_id=1, schema_id=None, created_by_id=2, created_at=<django.db.models.expressions.DatabaseDefault object at 0x7f316cb40a10>, is_locked=False)
# MuData Artifacts have the corresponding otype
mdata_artifact.otype
Show code cell output
'MuData'
# MuData Artifacts can easily be loaded back into memory
papalexi_in_memory = mdata_artifact.load()
papalexi_in_memory
Show code cell output
MuData object with n_obs × n_vars = 200 × 300
obs: 'perturbation', 'replicate'
var: 'name'
4 modalities
rna: 200 x 173
obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
var: 'name'
adt: 200 x 4
obs: 'nCount_ADT', 'nFeature_ADT'
var: 'name'
hto: 200 x 12
obs: 'nCount_HTO', 'nFeature_HTO', 'technique'
var: 'name'
gdo: 200 x 111
obs: 'nCount_GDO'
var: 'name'Schema¶
# define labels
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="Perturbed", type=perturbation).save()
ln.ULabel(name="NT", type=perturbation).save()
replicate = ln.ULabel(name="Replicate", is_type=True).save()
ln.ULabel(name="rep1", type=replicate).save()
ln.ULabel(name="rep2", type=replicate).save()
ln.ULabel(name="rep3", type=replicate).save()
# define obs schema
obs_schema = ln.Schema(
name="mudata_papalexi21_subset_obs_schema",
features=[
ln.Feature(name="perturbation", dtype="cat[ULabel[Perturbation]]").save(),
ln.Feature(name="replicate", dtype="cat[ULabel[Replicate]]").save(),
],
).save()
obs_schema_rna = ln.Schema(
name="mudata_papalexi21_subset_rna_obs_schema",
features=[
ln.Feature(name="nCount_RNA", dtype=int).save(),
ln.Feature(name="nFeature_RNA", dtype=int).save(),
ln.Feature(name="percent.mito", dtype=float).save(),
],
coerce_dtype=True,
).save()
obs_schema_hto = ln.Schema(
name="mudata_papalexi21_subset_hto_obs_schema",
features=[
ln.Feature(name="nCount_HTO", dtype=float).save(),
ln.Feature(name="nFeature_HTO", dtype=int).save(),
ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
],
coerce_dtype=True,
).save()
var_schema_rna = ln.Schema(
name="mudata_papalexi21_subset_rna_var_schema",
itype=bt.Gene.symbol,
dtype=float,
).save()
# define composite schema
mudata_schema = ln.Schema(
name="mudata_papalexi21_subset_mudata_schema",
otype="MuData",
slots={
"obs": obs_schema,
"rna:obs": obs_schema_rna,
"hto:obs": obs_schema_hto,
"rna:var": var_schema_rna,
},
).save()
Show code cell output
! rather than passing a string 'cat[ULabel[Perturbation]]' to dtype, pass a Python object
! rather than passing a string 'cat[ULabel[Replicate]]' to dtype, pass a Python object
! you are trying to create a record with name='nFeature_HTO' but a record with similar name exists: 'nFeature_RNA'. Did you mean to load it?
mudata_schema.describe()
Schema: mudata_papalexi21_subset_mudata_schema ├── uid: Os9iz5VpznhLzoIB run: iQZgvwT (multimodal.ipynb) │ itype: None otype: MuData │ hash: JkaY-Gd-EuKJmNGt_DQUrQ ordered_set: False │ maximal_set: False minimal_set: True │ branch: main space: all │ created_at: 2025-12-17 19:54:01 UTC created_by: testuser1 ├── obs: mudata_papalexi21_subset_obs_schema │ ├── uid: 9ImfeVgmfmqxAjYh run: iQZgvwT (multimodal.ipynb) │ │ itype: Feature otype: None │ │ hash: bErvv2KqEZ0eew2TxVvQ_A ordered_set: False │ │ maximal_set: False minimal_set: True │ │ branch: main space: all │ │ created_at: 2025-12-17 19:54:01 UTC created_by: testuser1 │ └── Features (2) │ └── name dtype optional nullable coerce_dtype default_value │ perturbation ULabel[Perturbation] ✗ ✓ ✗ unset │ replicate ULabel[Replicate] ✗ ✓ ✗ unset ├── rna:obs: mudata_papalexi21_subset_rna_obs_schema │ ├── uid: bujSIaSwSjCtRYe8 run: iQZgvwT (multimodal.ipynb) │ │ itype: Feature otype: None │ │ hash: rUXe4m9BYtbwA_MOlCLf3Q ordered_set: False │ │ maximal_set: False minimal_set: True │ │ branch: main space: all │ │ created_at: 2025-12-17 19:54:01 UTC created_by: testuser1 │ └── Features (3) │ └── name dtype optional nullable coerce_dtype default_value │ nCount_RNA int ✗ ✓ ✓ unset │ nFeature_RNA int ✗ ✓ ✓ unset │ percent.mito float ✗ ✓ ✓ unset ├── hto:obs: mudata_papalexi21_subset_hto_obs_schema │ ├── uid: 9k6PUiZI5AZVSDXR run: iQZgvwT (multimodal.ipynb) │ │ itype: Feature otype: None │ │ hash: 1S-zLhmavcJxuXE3vck64w ordered_set: False │ │ maximal_set: False minimal_set: True │ │ branch: main space: all │ │ created_at: 2025-12-17 19:54:01 UTC created_by: testuser1 │ └── Features (3) │ └── name dtype optional nullable coerce_dtype default_value │ nCount_HTO float ✗ ✓ ✓ unset │ nFeature_HTO int ✗ ✓ ✓ unset │ technique bionty.ExperimentalFactor ✗ ✓ ✓ unset └── rna:var: mudata_papalexi21_subset_rna_var_schema ├── uid: JU14ZlsGln7PyhCE run: iQZgvwT (multimodal.ipynb) │ itype: bionty.Gene.symbol otype: None │ hash: rooz5mfOcfQvgjRu-gGnvA ordered_set: False │ maximal_set: False minimal_set: True │ branch: main space: all │ created_at: 2025-12-17 19:54:01 UTC created_by: testuser1 └── bionty.Gene.symbol └── dtype: float
Validate MuData annotations¶
curator = ln.curators.MuDataCurator(mdata, mudata_schema)
! auto-transposed `var` for backward compat, please indicate transposition in the schema definition by calling out `.T`: slots={'var.T': itype=bt.Gene.ensembl_gene_id}
try:
curator.validate()
except ln.errors.ValidationError:
pass
! 37 terms not validated in feature 'columns' in slot 'obs': 'adt:gene_target', 'hto:guide_ID', 'adt:G2M.Score', 'hto:technique', 'hto:G2M.Score', 'hto:orig.ident', 'hto:MULTI_ID', 'adt:replicate', 'hto:replicate', 'hto:S.Score', 'adt:percent.mito', 'adt:NT', 'gdo:HTO_classification', 'adt:guide_ID', 'gdo:MULTI_ID', 'gdo:percent.mito', 'gdo:replicate', 'gdo:gene_target', 'adt:orig.ident', 'hto:Phase', ...
→ fix typos, remove non-existent values, or save terms via: curator.slots['obs'].cat.add_new_from('columns')
! 96 terms not validated in feature 'columns' in slot 'rna:var': 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
→ curate synonyms via: .standardize("columns")
for remaining terms:
→ fix organism 'Organism(uid='1dpCL6Td', name='human', ontology_id='NCBITaxon:9606', scientific_name='Homo sapiens', synonyms=None, description=None, branch_id=1, space_id=1, created_by_id=2, run_id=None, source_id=34, created_at=2025-12-17 19:53:59 UTC, is_locked=False)', fix typos, remove non-existent values, or save terms via: curator.slots['rna:var'].cat.add_new_from('columns')
curator.slots["rna:var"].cat.standardize("columns")
curator.slots["rna:var"].cat.add_new_from("columns")
curator.validate()
Show code cell output
! 37 terms not validated in feature 'columns' in slot 'obs': 'adt:gene_target', 'hto:guide_ID', 'adt:G2M.Score', 'hto:technique', 'hto:G2M.Score', 'hto:orig.ident', 'hto:MULTI_ID', 'adt:replicate', 'hto:replicate', 'hto:S.Score', 'adt:percent.mito', 'adt:NT', 'gdo:HTO_classification', 'adt:guide_ID', 'gdo:MULTI_ID', 'gdo:percent.mito', 'gdo:replicate', 'gdo:gene_target', 'adt:orig.ident', 'hto:Phase', ...
→ fix typos, remove non-existent values, or save terms via: curator.slots['obs'].cat.add_new_from('columns')
! 12 terms not validated in feature 'columns' in slot 'rna:var': 'CTC-467M3.1', 'HIST1H4K', 'CASC1', 'LARGE', 'NBPF16', 'C1orf65', 'IBA57-AS1', 'KIAA1239', 'TMEM75', 'AP003419.16', 'FAM65C', 'C14orf177'
12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
→ curate synonyms via: .standardize("columns")
Register curated Artifact¶
artifact = curator.save_artifact(key="mudata_papalexi21_subset.h5mu")
Show code cell output
→ writing the in-memory object into cache
→ returning schema with same hash: Schema(uid='9ImfeVgmfmqxAjYh', name='mudata_papalexi21_subset_obs_schema', description=None, n=2, is_type=False, itype='Feature', otype=None, dtype=None, hash='bErvv2KqEZ0eew2TxVvQ_A', minimal_set=True, ordered_set=False, maximal_set=False, slot=None, branch_id=1, space_id=1, created_by_id=2, run_id=1, type_id=None, validated_by_id=None, composite_id=None, created_at=2025-12-17 19:54:01 UTC, is_locked=False)
→ returning schema with same hash: Schema(uid='bujSIaSwSjCtRYe8', name='mudata_papalexi21_subset_rna_obs_schema', description=None, n=3, is_type=False, itype='Feature', otype=None, dtype=None, hash='rUXe4m9BYtbwA_MOlCLf3Q', minimal_set=True, ordered_set=False, maximal_set=False, slot=None, branch_id=1, space_id=1, created_by_id=2, run_id=1, type_id=None, validated_by_id=None, composite_id=None, created_at=2025-12-17 19:54:01 UTC, is_locked=False)
→ returning schema with same hash: Schema(uid='9k6PUiZI5AZVSDXR', name='mudata_papalexi21_subset_hto_obs_schema', description=None, n=3, is_type=False, itype='Feature', otype=None, dtype=None, hash='1S-zLhmavcJxuXE3vck64w', minimal_set=True, ordered_set=False, maximal_set=False, slot=None, branch_id=1, space_id=1, created_by_id=2, run_id=1, type_id=None, validated_by_id=None, composite_id=None, created_at=2025-12-17 19:54:01 UTC, is_locked=False)
artifact.describe()
Show code cell output
Artifact: mudata_papalexi21_subset.h5mu (0000) ├── uid: p476uTrRaQpOj7HL0000 run: iQZgvwT (multimodal.ipynb) │ kind: dataset otype: MuData │ hash: as4mRWTdRo1z6ppZhxQlzw size: 537.2 KB │ branch: main space: all │ created_at: 2025-12-17 19:54:04 UTC created_by: testuser1 │ n_observations: 200 ├── storage/path: │ /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/p476uTrRaQpOj7HL0000.h5mu ├── Dataset features │ ├── obs (2) │ │ perturbation ULabel[Perturbation] NT, Perturbed │ │ replicate ULabel[Replicate] rep1, rep2, rep3 │ ├── rna:obs (3) │ │ nCount_RNA int │ │ nFeature_RNA int │ │ percent.mito float │ ├── hto:obs (3) │ │ technique bionty.ExperimentalFactor cell hashing │ │ nCount_HTO float │ │ nFeature_HTO int │ └── rna:var (184 bionty.Gene.symb… │ SH2D6 num │ MEF2C-AS2 num │ ARHGAP26-AS1 num │ GABRA1 num │ H4C12 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ HLA-DQB1-AS1 num │ SPACA1 num │ VNN1 num │ CTAGE15 num │ CTAGE15 num │ PFKFB1 num │ TRPC5 num │ RBPMS-AS1 num │ CA8 num └── Labels └── .ulabels ULabel Perturbed, NT, rep1, rep2, rep3 .experimental_factors bionty.ExperimentalFactor cell hashing
ln.finish()
Show code cell output
! cells [(15, 17)] were not run consecutively
→ finished Run('iQZgvwTgUAec8foG') after 4s at 2025-12-17 19:54:05 UTC
# clean up test instance
bt.settings.organism = None
!rm -r test-multimodal
!lamin delete --force test-multimodal