lamindb.core.MappedCollection¶
- class lamindb.core.MappedCollection(path_list, layers_keys=None, obs_keys=None, obsm_keys=None, obs_filter=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None)¶
- Bases: - object- Map-style collection for use in data loaders. - This class virtually concatenates - AnnDataarrays as a pytorch map-style dataset.- If your - AnnDatacollection is in the cloud, move them into a local cache first for faster access.- __getitem__of the- MappedCollectionobject takes a single integer index and returns a dictionary with the observation data sample for this index from the- AnnDataobjects in- path_list. The dictionary has keys for- layers_keys(- .Xis in- "X"),- obs_keys,- obsm_keys(under- f"obsm_{key}") and also- "_store_idx"for the index of the- AnnDataobject containing this observation sample.- Note - For a guide, see Train a machine learning model on a collection. - For more convenient use within - MappedCollection, see- mapped().- This currently only works for collections of - AnnDataobjects.- The implementation was influenced by the SCimilarity data loader. - Parameters:
- path_list ( - list[lamindb.core.types.UPathStr]) – A list of paths to- AnnDataobjects stored in- .h5ador- .zarrformats.
- layers_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .layersslot.- layers_keys=Noneor- "X"in the list retrieves- .X.
- obsm_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsmslots.
- obs_keys ( - str|- list[- str] |- None, default:- None) – Keys from the- .obsslots.
- obs_filter ( - dict[- str,- str|- list[- str]] |- None, default:- None) – Select only observations with these values for the given obs columns. Should be a dictionary with obs column names as keys and filtering values (a string or a list of strings) as values.
- join ( - Literal[- 'inner',- 'outer'] |- None, default:- 'inner') –- "inner"or- "outer"virtual joins. If- Noneis passed, does not join.
- encode_labels ( - bool|- list[- str], default:- True) – Encode labels into integers. Can be a list with elements from- obs_keys.
- unknown_label ( - str|- dict[- str,- str] |- None, default:- None) – Encode this label to -1. Can be a dictionary with keys from- obs_keysif- encode_labels=Trueor from- encode_labelsif it is a list.
- cache_categories ( - bool, default:- True) – Enable caching categories of- obs_keysfor faster access.
- parallel ( - bool, default:- False) – Enable sampling with multiple processes.
- dtype ( - str|- None, default:- None) – Convert numpy arrays from- .X,- .layersand- .obsm
 
 - Attributes¶- property closed: bool¶
- Check if connections to array streaming backend are closed. - Does not matter if - parallel=True.
 - property original_shapes: list[tuple[int, int]]¶
- Shapes of the underlying AnnData objects (with - obs_filterapplied).
 - property shape: tuple[int, int]¶
- Shape of the (virtually aligned) dataset. 
 - Methods¶- check_vars_non_aligned(vars)¶
- Returns indices of objects with non-aligned variables. - Parameters:
- vars ( - Index|- list) – Check alignment against these variables.
- Return type:
- list[- int]
 
 - check_vars_sorted(ascending=True)¶
- Returns - Trueif all variables are sorted in all objects.- Return type:
- bool
 
 - close()¶
- Close connections to array streaming backend. - No effect if - parallel=True.
 - get_label_weights(obs_keys, scaler=None, return_categories=False)¶
- Get all weights for the given label keys. - This counts the number of labels for each label and returns weights for each obs label accoding to the formula - 1 / num of this label in the data. If- scaleris provided, then- scaler / (scaler + num of this label in the data).- Parameters:
- obs_keys ( - str|- list[- str]) – A key in the- .obsslots or a list of keys. If a list is provided, the labels from the obs keys will be concatenated with- "__"delimeter
- scaler ( - float|- None, default:- None) – Use this number to scale the provided weights.
- return_categories ( - bool, default:- False) – If- False, returns weights for each observation, can be directly passed to a sampler. If- True, returns a dictionary with unique categories for labels (concatenated if- obs_keysis a list) and their weights.
 
 
 - get_merged_categories(label_key)¶
- Get merged categories for - label_keyfrom all- .obs.
 - get_merged_labels(label_key)¶
- Get merged labels for - label_keyfrom all- .obs.