Changelog 2025¶
2025-03-10 R 1.0.0¶
✨ laminr now has feature-parity with lamindb. PR @lazappi
- Run - install_lamindb()to update the default Python environment to lamindb- 1.2.
- Replace - db <- connect()with- ln <- import_module("lamindb")and see the “Detailed changes” dropdown.
The ln object is largely similar to the db object in laminr < v1 and matches lamindb’s Python API (. → $).
Detailed changes
| What | Before | After | 
|---|---|---|
| Connect to the default LaminDB instance | 
 | 
 | 
| Start tracking | 
 | 
 | 
| Get an artifact from another instance | 
 | 
 | 
| Create an artifact from a path | 
 | 
 | 
| Finish tracking | 
 | 
 | 
See the updated “Get started” vignette for more information.
User-facing changes:
- Add an - import_module()function to import Python modules with additional functionality, e.g.,- import_module("lamindb")for lamindb
- Add functions for accessing more - laminCLI commands
- Add a new “Introduction” vignette that replicates the code from the Python lamindb introduction guide 
Internal changes:
- Add an internal - wrap_python()function to wrap Python objects while replacing Python methods with R methods as needed, leaving most work to {reticulate}
- Update the internal - check_requires()function to handle Python packages
- Add custom - cache()/- load()methods to the- Artifactclass
- Add custom - track()/- finish()methods to the lamindb module
2025-03-09 db 1.2.0¶
✨ Enable to auto-link entities to projects. Guide PR @falexwolf
ln.track(project="My project")
🚸 Better support for spatialdata with Artifact.from_spatialdata() and artifact.load(). PR1 PR2 @Zethson
🚸 Introduce .slots in Schema, Curator, and artifact.features to access schemas and curators by dataset slot. PR @sunnyosun
schema.slots["obs"] -> Schema for .obs slot of AnnData
curator.slots["obs"] -> Curator for .obs slot of AnnData
artifact.features["obs"] -> Feature sets for .obs slot of AnnData
🏗️ Re-structured the internal API away from monkey-patching Django models. PR @falexwolf
⚠️ Use of internal API
If you used the internal API, you might experience a breaking change. The most drastic change is that all internal registry-related functionality is now re-exported under lamindb.models.
🚸 When re-creating an Artifact, link subsequent runs instead of updating .run and linking previous runs. PR @falexwolf
On the hub.
More details here. @chaichontat
| Before | After | 
|---|---|
| An artifact is only shown as an output for the latest run that created the artifact. Previous runs don’t show it. | All runs that (re-)create an artifact show it as an output. | 
More changes:
- ✨ Allow to use - Artifact.open()and- Artifact.load()for- .gzfiles PR @Koncopd
- 🐛 Fix passing a path to - ln.track()when no path found by- nbprojectPR @Koncopd
- 🐛 Do not overwrite - ._state_dbof records when the current instance is passed to- .usingPR @Koncopd
- 🚸 Do not show track warning for read-only connections PR @Koncopd 
- 🚸 Raise - NotImplementedErrorin- Artifact.load()if there is no loader PR @Koncopd
2025-02-27 db 1.1.1¶
- 🚸 Make the - obsand- var- DataFrameCuratorobjects accessible via- AnnDataCurator.slotsPR @sunnyosun
- 🚸 Better error message upon re-creation of schema with same name and different hash PR @falexwolf 
- 🚸 Raise consistency error if a source path suffix doesn’t match the artifact - keysuffix PR @falexwolf
- 🚸 Automatically add missing columns upon - DataFrameCurator.standardize()if- nullableis- TruePR @falexwolf
- 🚸 Allow to specify - fsspecupload options in- Artifact.savePR @Koncopd
- 🚸 Populate - Artifact.n_observationsin- Artifact.from_df()PR @Koncopd
- 🐛 Fix notebook re-run with same hash PR @falexwolf 
2025-02-18 db 1.1.0¶
⚠️ The FeatureSet registry got renamed to Schema.
All your code is backward compatible. The Schema registry encompasses feature sets as a special case.
✨ Conveniently track functions including inputs, outputs, and parameters with a decorator: ln.tracked(). PR1 PR2 @falexwolf
@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,  # all arguments tracked as parameters of the function run
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    df = artifact.load()  # auto-tracked as input
    new_df = df.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_df(new_df, key=output_artifact_key).save()  # auto-tracked as output
✨ Make sub-types of ULabel, Feature, Schema, Project, Param, and Reference. PR @falexwolf
On the hub.
More details here. @awgaan @chaichontat
| Before | After | 
|---|---|
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="DMSO", type=perturbation).save()
ln.ULabel(name="IFNG", type=perturbation).save()
✨ Use an overhauled dataset curation flow. @falexwolf @Zethson @sunnyosun
- support persisting validation constraints as a - pandera-compatible schema
- support validating any feature type, no longer just categoricals 
- make the relationship between features, dataset schema, and curator evident 
Detailed changes for the overhauled curation flow.
⚠️ The API gained the lamindb.curators module as the new way to access Curator classes for different data structures.
- This release introduces the schema-based - DataFrameCuratorand- AnnDataCurator
- The old-style curation flow for categoricals based on - lamindb.Curator.from_objecttype()continues to work
| Before | After | 
|---|---|
Key PRs.
- ✨ Overhaul curation guides + enable default values and filters on valid categories for features PR @falexwolf 
- ✨ Schema-based curators: - AnnDataCuratorPR @falexwolf
- ✨ Schema-based curators: - DataFrameCuratorPR @falexwolf
Enabling PRs.
- ✨ Allow passing - artifactto- CuratorPR @sunnyosun
- 🎨 A - ManyToManybetween- Schema.componentsand- .compositesPR @falexwolf
- ♻️ Mark - Schemafields as non-editable PR @falexwolf
- ✨ Add auxiliary field - nullableto- FeaturePR @falexwolf
- ♻️ Prettify - AnnDataCuratorimplementation PR @falexwolf
- 🚸 Better error for malformed categorical dtype PR @falexwolf 
- 🎨 A - ManyToManybetween- Schema.componentsand- .compositesPR @falexwolf
- 🚚 Restore - .feature_setsas a- ManyToManyFieldPR @falexwolf
- 🚚 Rename - CatCuratorto- CatManagerPR @falexwolf
- 🎨 Let - Curator.validate()throw an error PR @falexwolf
- ♻️ Re-purpose - BaseCuratoras- Curator, introduce- CatCuratorand consolidate shared logic under- CatCuratorPR @falexwolf
- ♻️ Refactor - organismhandling in curators PR @falexwolf
- 🔥 Eliminate all logic related to - using_keyin curators PR @falexwolf
- 🚚 Bulk-rename old-style curators to - CatCuratorPR @falexwolf
- 🎨 Self-contained definition of - CellxGeneschema / validation constraints PR @falexwolf
- 🚚 Move - PertCuratorfrom- wetlabhere and add- CellxGene- Curatortest PR @falexwolf
- 🚚 Move CellXGene - Curatorfrom- cellxgene-laminhere PR @falexwolf
schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    features=[
        ln.Feature(name="CD8A", dtype=int).save(),  # integer counts for CD8A marker
        ln.Feature(name="perturbation", dtype=ln.ULabel).save(),  # a categorical feature that validates against the ULabel registry
        ln.Feature(name="sample_note", dtype=str).save(),   # a note for the sample
    ],
).save()
df = pd.DataFrame({
    "CD8A": [1, 4, 0],
    "perturbation": ["DMSO", ],
    "sample_note": ["value_1", "value_2", "value_3"],
    "temperature": [22.2, 25.7, 27.3],
})
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")  # validates compliance with schema, annotates with metadata
assert artifact.schema == schema  # the validating schema
✨ Easily filter on a validating schema. @falexwolf @Zethson @sunnyosun
On the hub.
With the Schema filter button, find all datasets that satisfy a given schema (→ explore).
schema = ln.Schema.get(name="small_dataset1_obs_level_metadata")  # get a schema
ln.Artifact.filter(schema=schema).df()  # filter all datasets that were validated by the schema
✨ Collection.open() returns a pyarrow dataset. PR @Koncopd
df = pd.DataFrame({"feat1": [0, 0, 1, 1], "feat2": [6, 7, 8, 9]})
df[:2].to_parquet("df1.parquet", engine="pyarrow")
df[2:].to_parquet("df2.parquet", engine="pyarrow")
artifact1 = ln.Artifact(shard1, key="df1.parquet").save()
artifact2 = ln.Artifact(shard2, key="df2.parquet").save()
collection = ln.Collection([artifact1, artifact2], key="parquet_col")
dataset = collection.open() # backed by files in the cloud storage
dataset.to_table().to_pandas().head()
✨ Support s3-compatible endpoint urls, say your on-prem MinIO deployment. PR @Koncopd
Speed up instance creation through squashed migrations.
- ⚡ Squash migrations PR1 PR2 @falexwolf 
Tiledbsoma.
- ✨ Support - endpoint_urlin operations with tiledbsoma PR1 PR2 @Koncopd
- ✨ Add - Artifact.from_tiledbsomato populate- n_observationsPR @Koncopd
MappedCollection.
- 🐛 Allow filtering on - np.nanin- obs_filterof- MappedCollectionPR @Koncopd
- 🐛 Fix labels for - NaNin categorical columns for- MappedCollectionPR @Koncopd
SpatialDataCurator.
- 🐛 Fix - var_indexstandardization of- SpatialDataCuratorPR1 PR2 @Zethson
- 🐛 Fix sample level metadata optional in - SpatialDataCatManagerPR @Zethson
Core functionality.
- ✨ Allow to check the need for syncing without actually syncing PR @Koncopd 
- ✨ Check for corrupted cache in - Artifact.load()&- Artifact.open()PR PR @Koncopd
- 🐛 Account for VSCode appending languageid to markdown cell in notebook tracking PR @falexwolf 
- 🐛 Normalize module names for robust checking in - _check_instance_setup()PR @Koncopd
- 🐛 Fix idempotency of - Featurecreation when- descriptionis passed and improve filter and get error behavior PR @Zethson
- 🚸 Make new version upon passing existing - keyto- CollectionPR @falexwolf
- 🚸 Throw better error upon checking - instance.moduleswhen loading a lamindb schema module PR @Koncopd
- 🚸 Validate existing records in the DB irrespective of whether an ontology - sourceis passed or not PR @sunnyosun
- 🚸 Full guarantee of avoiding duplicating - Transform,- Artifact&- Collectionin concurrent runs PR @falexwolf
- 🚸 Better user feedback during keyword validation in - Recordconstructor PR @Zethson
- 🚸 Improve local storage not found warning message PR @Zethson 
- 🚸 Better error message when attempting to save a file while not being connected to an instance PR @Zethson 
- 🚸 Error for non-keyword parameters for - Artifact.from_xmethods PR @Zethson
Housekeeping.
2025-01-23 db 1.0.5¶
2025-01-21 db 1.0.4¶
🚚 Revert Collection.description back to unlimited length TextField. PR @falexwolf
2025-01-21 db 1.0.3¶
🚸 In track(), improve logging in RStudio sessions. PR @falexwolf
2025-01-20 R 0.4.0¶
- 🚚 Migrate to lamindb v1 PR @falexwolf 
- 🚸 Improve the user experience for setting up Python & reticulate PR @lazappi 
2025-01-20 db 1.0.2¶
🚚 Improvments for lamindb v1 migrations. PR @falexwolf
- add a - .descriptionfield to- Schema
- enable labeling - Runwith- ULabel
- add a - .predecessorsand- .successorsfield to- Projectakin to what’s present on- Transform
- make - .uidfields not editable
2025-01-18 db 1.0.1¶
🐛 Block non-admin users from confirming the dialogue for integrating lnschema-core. PR @falexwolf
2025-01-17 db 1.0.0¶
This release makes the API consistent, integrates lnschema_core & ourprojects into the lamindb package, and introduces a breadth of database migrations to enable future features without disruption. You’ll now need at least Python 3.10.
Your code will continue to run as is, but you will receive warnings about a few renamed API components.
| What | Before | After | 
|---|---|---|
| Dataset vs. model | 
 | 
 | 
| Python object for  | 
 | 
 | 
| Number of files | 
 | 
 | 
| 
 | 
 | 
 | 
| 
 | 
 | 
 | 
| Consecutiveness field | 
 | 
 | 
| Run initiator | 
 | 
 | 
| 
 | 
 | 
 | 
Migration guide:
- Upon - lamin connect account/instanceyou will be prompted to confirm migrating away from- lnschema_core
- After that, you will be prompted to call - lamin migrate deployto apply database migrations
New features:
- ✨ Allow filtering by multiple - obscolumns in- MappedCollectionPR @Koncopd
- ✨ In git sync, also search git blob hash in non-default branches PR @Zethson 
- ✨ Add relationship with - Projectto everything except- Run,- Storage&- Userso that you can easily filter for the entities relevant to your project PR @falexwolf
- ✨ Capture logs of scripts during - ln.track()PR1 PR2 @falexwolf @Koncopd
- ✨ Support - "|"-seperated multi-values in- CuratorPR @sunnyosun
- 🚸 Accept - Nonein- connect()and improve migration dialogue PR @falexwolf
UX improvements:
- 🚸 Simplify the - ln.track()experience PR @falexwolf- you can omit the - uidargument
- you can organize transforms in folders 
- versioning is fully automated (requirement for 1.) 
- you can save scripts and notebooks without running them (corollary of 1.) 
- you avoid the interactive prompt in a notebook and the throwing of an error in a script (corollary of 1.) 
- you are no longer required to add a title in a notebook 
 
- 🚸 Raise error when modifying - Artifact.keyin problematic ways PR1 PR2 @sunnyosun @Koncopd
- 🚸 Better error message on running - ln.track()within Python terminal PR @Koncopd
- 🚸 Hide traceback for - InstanceNotEmptyusing Click Exception PR @Zethson
- 🚸 Only auto-search - ._name_fieldin sub-classes of- CanCuratePR @falexwolf
- 🚸 Simplify installation & API overview PR @falexwolf 
- 🚸 Make - lamin_run_uidcategorical in tiledbsoma stores PR @Koncopd
- 🚸 Raise - ValueErrorwhen trying to search a- Nonevalue PR @Zethson
Bug fixes:
- 🐛 Skip deleting storage when deleting outdated versions of folder-like artifacts PR @Koncopd 
- 🐛 Let - SOMACurator()validate and annotate all- .obscolumns PR @falexwolf
- 🐛 Fix renaming of feature sets PR @sunnyosun 
- 🐛 Do not raise an exception when default AWS credentials fail PR @Koncopd 
- 🐛 Only map synonyms when field is name PR @sunnyosun 
- 🐛 Fix - sourcein- .from_valuesPR @sunnyosun
- 🐛 Fix creating instances with storage in the current local working directory PR @Koncopd 
- 🐛 Fix NA values in - Curator.add_new_from()PR @sunnyosun
Refactors, renames & maintenance:
- 🏗️ Integrate - lnschema-coreinto- lamindbPR1 PR2 @falexwolf @Koncopd
- 🏗️ Integrate - ourprojectsinto lamindb PR @falexwolf
- ♻️ Manage - created_at,- updated_aton the database-level, make- created_bynot editable PR @falexwolf
- 🚚 Rename transform type “glue” to “linker” PR @falexwolf 
- 🚚 Deprecate the - --schemaargument of- lamin initin favor of- --modulesPR @falexwolf
DevOps:
Detailed list of database migrations
Those not yet announced above will be announced with the functionality they enable.
- ♻️ Add - contenttypesDjango plugin PR @falexwolf
- 🚚 Prepare introduction of persistable - Curatorobjects by renaming- FeatureSetto- Schemaon the database-level PR @falexwolf
- 🚚 Add a - .typeforeign key to- ULabel,- Feature,- FeatureSet,- Reference,- ParamPR @falexwolf
- 🚚 Introduce - RunData,- TidyTable, and- TidyTableDatain the database PR @falexwolf
All remaining database schema changes were made in this PR @falexwolf. Data migrations happen automatically.
- remove - _source_code_artifactfrom Transform, it’s been deprecated since 0.75- data migration: for all transforms that have - _source_code_artifactpopulated, populate- source_code
 
- rename - Transform.nameto- Transform.descriptionbecause it’s analogous to- Artifact.description- backward compat: - in the - Transformconstructor use- nameto populate- keyin all cases in which only- nameis passed
- return the same transform based on - keyin case- source_code is Nonevia- ._name_field = "key"
 
- data migrations: - there already was a legacy - descriptionfield that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field
- for all transforms that have - key=Noneand- name!=None, use- nameto pre-populate- key
 
 
- rename - Collection.nameto- Collection.keyfor consistency with- Artifact&- Transformand the high likelihood of you wanting to organize them hierarchically
- a - _branch_codeinteger on every record to model pull requests- include - visibilitywithin that code
- repurpose - visibility=0as- _branch_code=0as “archive”
- put an index on it 
- code a “draft” as _branch_code = 2, and “draft prs” as negative branch codes 
 
- rename values - "number"to- "num"in dtype
- an - ._auxjson field on- Record
- a SmallInteger - run._status_codethat allows to write- finished_atin clean up operations so that there is a run time also for aborted runs
- rename - Run.is_consecutiveto- Run._is_consecutive
- a - _template_idFK to store the information of the generating template (whether a record is a template is coded via _branch_code)
- rename - _accessorto- otypeto publicly declare the data format as- suffix, accessor
- rename - Artifact.typeto- Artifact.kind
- a FK to artifact - run._logfilewhich holds logs
- a - hashfield on- ParamValueand- FeatureValueto enforce uniqueness without running the danger of failure for large dictionaries
- add a boolean field - ._expect_manyto- Feature/- Paramthat defaults to- True/- Falseand indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run- for feature - if it’s - True(default), the values come from an observation-level aggregation and a dtype of- datetimeon the observation-level mean- set[datetime]on the artifact-level
- if it’s - Falseit’s an artifact-level value and- datetimemeans- datetime; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated (“one just happens to measure a single cell line in that artifact”)
 
- for param - if it’s - False(default), the values mean artifact/run-level values and- datetimemeans- datetime
- if it’s - True, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant
 
 
- remove the - .transformforeign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead that maintains the same UX
- store provenance metadata for - TransformULabel,- RunParamValue,- ArtifactParamValue
- enable linking projects & references to transforms & collections 
- rename - Run.parentto- Run.initiated_by_run
- introduce a boolean flag on artifact that’s called - _overwrite_versions, which indicates whether versions are overwritten or stored separately; it defaults to- Falsefor file-like artifacts and to- Truefor folder-like artifacts
- Rename - n_objectsto- n_filesfor more clarity
- Add a - Spaceregistry to lamindb with an FK on every- BasicRecord
- add a name column to - Runso that a specific run can be used as a named specific analysis
- remove - _previous_runsfield on everything except- Artifact&- Collection