lamindb.Collection¶
- class lamindb.Collection(artifacts: list[Artifact], key: str, description: str | None = None, meta: Any | None = None, reference: str | None = None, reference_type: str | None = None, run: Run | None = None, revises: Collection | None = None)¶
- Bases: - Record,- IsVersioned,- TracksRun,- TracksUpdates- Collections of artifacts. - Collections provide a simple way of versioning collections of artifacts. - Parameters:
- artifacts – - list[Artifact]A list of artifacts.
- key – - strA file-path like key, analogous to the- keyparameter of- Artifactand- Transform.
- description – - str | None = NoneA description.
- revises – - Collection | None = NoneAn old version of the collection.
- run – - Run | None = NoneThe run that creates the collection.
- meta – - Artifact | None = NoneAn artifact that defines metadata for the collection.
- reference – - str | None = NoneA simple reference, e.g. an external ID or a URL.
- reference_type – - str | None = NoneA way to indicate to indicate the type of the simple reference- "url".
 
 - See also - Examples - Create a collection from a list of - Artifactobjects:- >>> collection = ln.Collection([artifact1, artifact2], key="my_project/my_collection") - Create a collection that groups a data & a metadata artifact (e.g., here RxRx: cell imaging): - >>> collection = ln.Collection(data_artifact, key="my_project/my_collection", meta=metadata_artifact) - Attributes¶- property data_artifact: Artifact | None¶
- Access to a single data artifact. - If the collection has a single data & metadata artifact, this allows access via: - collection.data_artifact # first & only element of collection.artifacts collection.meta_artifact # metadata 
 - property name: str¶
- Name of the collection. - Splits - keyon- /and returns the last element.
 - property ordered_artifacts: QuerySet¶
- Ordered - QuerySetof- .artifacts.- Accessing the many-to-many field - collection.artifactsdirectly gives you non-deterministic order.- Using the property - .ordered_artifactsallows to iterate through a set that’s ordered in the order of creation.
 - property stem_uid: str¶
- Universal id characterizing the version family. - The full uid of a record is obtained via concatenating the stem uid and version information: - stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid 
 - Simple fields¶- uid: str¶
- Universal id, valid across DB instances. 
 - key: str¶
- Name or path-like key. 
 - description: str | None¶
- A description or title. 
 - hash: str | None¶
- Hash of collection content. 
 - reference: str | None¶
- A reference like URL or external ID. 
 - reference_type: str | None¶
- Type of reference, e.g., cellxgene Census collection_id. 
 - 
meta_artifact: Artifact| None¶
- An artifact that stores metadata that indexes a collection. - It has a 1:1 correspondence with an artifact. If needed, you can access the collection from the artifact via a private field: - artifact._meta_of_collection.
 - version: str | None¶
- Version (default - None).- Defines version of a family of records characterized by the same - stem_uid.- Consider using semantic versioning with Python versioning. 
 - is_latest: bool¶
- Boolean flag that indicates whether a record is the latest in its version family. 
 - created_at: datetime¶
- Time of creation of record. 
 - updated_at: datetime¶
- Time of last update to record. 
 - Relational fields¶- Class methods¶- classmethod df(include=None, features=False, limit=100)¶
- Convert to - pd.DataFrame.- By default, shows all direct fields, except - updated_at.- Use arguments - includeor- featureto include other data.- Parameters:
- include ( - str|- list[- str] |- None, default:- None) – Related fields to include as columns. Takes strings of form- "ulabels__name",- "cell_types__name", etc. or a list of such strings.
- features ( - bool|- list[- str], default:- False) – If- True, map all features of the- Featureregistry onto the resulting- DataFrame. Only available for- Artifact.
- limit ( - int, default:- 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
 
- Return type:
- DataFrame
 - Examples - Include the name of the creator in the - DataFrame:- >>> ln.ULabel.df(include="created_by__name"]) - Include display of features for - Artifact:- >>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations - Only include select features: - >>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"]) 
 - classmethod filter(*queries, **expressions)¶
- Query records. - Parameters:
- queries – One or multiple - Qobjects.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- Returns:
- A - QuerySet.
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df() 
 - classmethod get(idlike=None, **expressions)¶
- Get a single record. - Parameters:
- idlike ( - int|- str|- None, default:- None) – Either a uid stub, uid or an integer id.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- Returns:
- A record. 
- Raises:
- lamindb.errors.DoesNotExist – In case no matching record is found. 
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ulabel = ln.ULabel.get("FvtpPJLJ") >>> ulabel = ln.ULabel.get(name="my-label") 
 - classmethod lookup(field=None, return_field=None)¶
- Return an auto-complete object for a field. - Parameters:
- field ( - str|- DeferredAttribute|- None, default:- None) – The field to look up the values for. Defaults to first string field.
- return_field ( - str|- DeferredAttribute|- None, default:- None) – The field to return. If- None, returns the whole record.
 
- Return type:
- NamedTuple
- Returns:
- A - NamedTupleof lookup information of the field values with a dictionary converter.
 - See also - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol") 
 - classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
- Search. - Parameters:
- string ( - str) – The input string to match against the field ontology values.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field or fields to search. Search all string fields by default.
- limit ( - int|- None, default:- 20) – Maximum amount of top results to return.
- case_sensitive ( - bool, default:- False) – Whether the match is case sensitive.
 
- Return type:
- Returns:
- A sorted - DataFrameof search results with a score in column- score. If- return_querysetis- True.- QuerySet.
 - Examples - >>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2") 
 - classmethod using(instance)¶
- Use a non-default LaminDB instance. - Parameters:
- instance ( - str|- None) – An instance identifier of form “account_handle/instance_name”.
- Return type:
 - Examples - >>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0 
 - Methods¶- append(artifact, run=None)¶
- Append an artifact to the collection. - This does not modify the original collection in-place, but returns a new version of the original collection with the appended artifact. - Parameters:
- Return type:
 - Examples: - collection_v1 = ln.Collection(artifact, key="My collection").save() collection_v2 = collection.append(another_artifact) # returns a new version of the collection collection_v2.save() # save the new version 
 - cache(is_run_input=None)¶
- Download cloud artifacts in collection to local cache. - Follows synching logic: only caches outdated artifacts. - Returns paths to locally cached on-disk artifacts. - Parameters:
- is_run_input ( - bool|- None, default:- None) – Whether to track this collection as run input.
- Return type:
- list[- UPath]
 
 - delete(permanent=None)¶
- Delete collection. - Parameters:
- permanent ( - bool|- None, default:- None) – Whether to permanently delete the collection record (skips trash).
- Return type:
- None
 - Examples - For any - Collectionobject- collection, call:- >>> collection.delete() 
 - describe()¶
- Describe relations of record. - Return type:
- None
 - Examples - >>> artifact.describe() 
 - load(join='outer', is_run_input=None, **kwargs)¶
- Stage and load to memory. - Returns in-memory representation if possible such as a concatenated - DataFrameor- AnnDataobject.- Return type:
- Any
 
 - mapped(layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None, stream=False, is_run_input=None)¶
- Return a map-style dataset. - Returns a pytorch map-style dataset by virtually concatenating - AnnDataarrays.- If your - AnnDatacollection is in the cloud, move them into a local cache first via- cache().- __getitem__of the- MappedCollectionobject takes a single integer index and returns a dictionary with the observation data sample for this index from the- AnnDataobjects in the collection. The dictionary has keys for- layers_keys(- .Xis in- "X"),- obs_keys,- obsm_keys(under- f"obsm_{key}") and also- "_store_idx"for the index of the- AnnDataobject containing this observation sample.- Note - For a guide, see Train a machine learning model on a collection. - This method currently only works for collections of - AnnDataartifacts.- Parameters:
- layers_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .layersslot.- layers_keys=Noneor- "X"in the list retrieves- .X.
- obs_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsslots.
- obsm_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsmslots.
- obs_filter ( - dict[- str,- str|- list[- str]] |- None, default:- None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.
- join ( - Literal[- 'inner',- 'outer'] |- None, default:- 'inner') –- "inner"or- "outer"virtual joins. If- Noneis passed, does not join.
- encode_labels ( - bool|- list[- str], default:- True) – Encode labels into integers. Can be a list with elements from- obs_keys.
- unknown_label ( - str|- dict[- str,- str] |- None, default:- None) – Encode this label to -1. Can be a dictionary with keys from- obs_keysif- encode_labels=Trueor from- encode_labelsif it is a list.
- cache_categories ( - bool, default:- True) – Enable caching categories of- obs_keysfor faster access.
- parallel ( - bool, default:- False) – Enable sampling with multiple processes.
- dtype ( - str|- None, default:- None) – Convert numpy arrays from- .X,- .layersand- .obsm
- stream ( - bool, default:- False) – Whether to stream data from the array backend.
- is_run_input ( - bool|- None, default:- None) – Whether to track this collection as run input.
 
- Return type:
 - Examples - >>> import lamindb as ln >>> from torch.utils.data import DataLoader >>> ds = ln.Collection.get(description="my collection") >>> mapped = collection.mapped(obs_keys=["cell_type", "batch"]) >>> dl = DataLoader(mapped, batch_size=128, shuffle=True) 
 - open(is_run_input=None)¶
- Return a cloud-backed pyarrow Dataset. - Works for - pyarrowcompatible formats.- Return type:
- Dataset
 - Notes - For more info, see tutorial: Slice arrays. 
 - restore()¶
- Restore collection record from trash. - Return type:
- None
 - Examples - For any - Collectionobject- collection, call:- >>> collection.restore() 
 - save(using=None)¶
- Save the collection and underlying artifacts to database & storage. - Parameters:
- using ( - str|- None, default:- None) – The database to which you want to save.
- Return type:
 - Examples - >>> collection = ln.Collection("./myfile.csv", name="myfile") 
 - view_lineage(with_children=True)¶
- Graph of data flow. - Return type:
- None
 - Notes - For more info, see use cases: Data lineage. - Examples - >>> collection.view_lineage() >>> artifact.view_lineage()