lamindb.Schema¶

Bases: Record, CanCurate, TracksRun

Schemas.

The simplest schema is a feature set such as the set of columns of a DataFrame.

A composite schema has multiple components, e.g., for an AnnData, one schema for obs and another one for var.

Parameters:

features – Iterable[Record] | None = None An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().
components – dict[str, Schema] | None = None A dictionary mapping component names to their corresponding Schema objects for composite schemas.
name – str | None = None A name.
description – str | None = None A description.
dtype – str | None = None The simple type. Defaults to None for sets of Feature records. Otherwise defaults to "num" (e.g., for sets of Gene).
itype – str | None = None The feature identifier type (e.g. Feature, Gene, …).
type – Schema | None = None A type.
is_type – bool = False Distinguish types from instances of the type.
otype – str | None = None An object type to define the structure of a composite schema.
minimal_set – bool = True Whether the schema contains a minimal set of linked features.
ordered_set – bool = False Whether features are required to be ordered.
maximal_set – bool = False If True, no additional features are allowed.
slot – str | None = None The slot name when this schema is used as a component in a composite schema.
coerce_dtype – bool = False When True, attempts to coerce values to the specified dtype during validation, see coerce_dtype.

Note

A feature set can be identified by the hash of its feature uids. It’s stored in the .hash field.

A slot provides a string key to access feature sets. For instance, for the schema of an AnnData object, it would be 'obs' for adata.obs.

See also

from_values(): Create from values.
from_df(): Create from dataframe columns.

Examples

Create a schema (feature set) from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> schema = ln.Schema.from_df(df)

Create a schema (feature set) from features:

>>> features = [ln.Feature(name=feat, dtype="float").save() for feat in ["feat1", "feat2"]]
>>> schema = ln.Schema(features)

Create a schema (feature set) from identifier values:

>>> import bionty as bt
>>> schema = ln.Schema.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, organism="mouse").save()

Attributes¶

property coerce_dtype: bool¶

Whether dtypes should be coerced during validation.

For example, a objects-dtyped pandas column can be coerced to categorical and would pass validation if this is true.

property members: QuerySet¶: A queryset for the individual records of the set.

property slots: dict[str, Schema]¶

Slots.

Examples:

# define composite schema
anndata_schema = ln.Schema(
    name="small_dataset1_anndata_schema",
    otype="AnnData",
    components={"obs": obs_schema, "var": var_schema},
).save()

# access slots
anndata_schema.slots
# {'obs': <Schema: obs_schema>, 'var': <Schema: var_schema>}

Simple fields¶

uid: str¶: A universal id (hash of the set of feature values).

name: str | None¶: A name.

description: str | None¶: A description.

n¶: Number of features in the set.

dtype: str | None¶

Data type, e.g., “num”, “float”, “int”. Is None for Feature.

For Feature, types are expected to be heterogeneous and defined on a per-feature level.

itype: str | None¶

A registry that stores feature identifiers used in this schema, e.g., 'Feature' or 'bionty.Gene'.

Depending on the registry, .members stores, e.g., Feature or bionty.Gene records.

Changed in version 1.0.0: Was called registry before.

is_type: bool¶: Distinguish types from instances of the type.

otype: str | None¶: Default Python object type, e.g., DataFrame, AnnData.

hash: str | None¶

A hash of the set of feature identifiers.

For a composite schema, the hash of hashes.

minimal_set: bool¶

Whether the schema contains a minimal set of linked features (default True).

If False, no features are linked to this schema.

If True, features are linked and considered as a minimally required set in validation.

ordered_set: bool¶: Whether features are required to be ordered (default False).

maximal_set: bool¶

If False, additional features are allowed (default False).

If True, the the minimal set is a maximal set and no additional features are allowed.

slot: str | None¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

created_at: datetime¶: Time of creation of record.

Relational fields¶

space: Space¶: The space in which the record lives.

created_by: User¶: Creator of record.

run: Run | None¶: Run that created record.

type: Schema | None¶

Type of schema.

Allows to group schemas by type, e.g., all meassurements evaluating gene expression vs. protein expression vs. multi modal.

You can define types via ln.Schema(name="ProteinPanel", is_type=True).

Here are a few more examples for type names: 'ExpressionPanel', 'ProteinPanel', 'Multimodal', 'Metadata', 'Embedding'.

components: Schema¶: Components of this schema.

params: Param¶: The params contained in the schema.

features: Feature¶: The features contained in the schema.

records: Schema¶: Records of this type.

composites: Schema¶

The composite schemas that contains this schema as a component.

For example, an AnnData composes multiple schemas: var[DataFrameT], obs[DataFrame], obsm[Array], uns[dict], etc.

validated_artifacts: Artifact¶: The artifacts that were validated against this schema with a Curator.

artifacts: Artifact¶: The artifacts that measure a feature set that matches this schema.

projects: Project¶: Linked projects.

Class methods¶

classmethod df(include=None, features=False, limit=100)¶

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:

include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.
features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.
limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, source=None)¶

Create feature set for validated features.

Return type:: Schema | None

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, source=None, raise_validation_error=True)¶

Create feature set for validated features.

Parameters:

values (list[str] | Series | array) – A list of values, like feature names or ids.
field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference registry to map values.
type (str | None, default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to "float" otherwise.
name (str | None, default: None) – A name.
organism (Record | str | None, default: None) – An organism to resolve gene mapping.
source (Record | None, default: None) – A public ontology to resolve feature identifier mapping.
raise_validation_error (bool, default: True) – Whether to raise a validation error if some values are not valid.

Raises:

ValidationError – If some values are not valid.

Return type:

Schema

Examples

>>> features = [ln.Feature(name=feat, dtype="str").save() for feat in ["feat11", "feat21"]]
>>> schema = ln.Schema.from_values(features)

>>> genes = ["ENSG00000139618", "ENSG00000198786"]
>>> schema = ln.Schema.from_values(features, bt.Gene.ensembl_gene_id, "float")

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Return type:

Record

Returns:

A record.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ulabel = ln.ULabel.get("FvtpPJLJ")
>>> ulabel = ln.ULabel.get(name="my-label")

classmethod inspect(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶

Inspect if values are mappable to a field.

Being mappable means that an exact match exists.

Parameters:

values (list[str] | Series | array) – Values that will be checked against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute (bool, default: False) – Whether to mute logging.
organism (str | Record | None, default: None) – An Organism name or record.
source (Record | None, default: None) – A bionty.Source record that specifies the version to inspect against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against bionty/public sources.

Return type:

InspectResult

See also

validate()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol)
>>> result.validated
['A1CF', 'A1BG']
>>> result.non_validated
['FANCD1', 'FANCD20']

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")

classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None, strict_source=False)¶

Maps input synonyms to standardized names.

Parameters:

values (Iterable) – Identifiers that will be standardized.
field (str | DeferredAttribute | None, default: None) – The field representing the standardized names.
return_field (str | DeferredAttribute | None, default: None) – The field to return. Defaults to field.
return_mapper (bool, default: False) – If True, returns {input_value: standardized_name}.
case_sensitive (bool, default: False) – Whether the mapping is case sensitive.
mute (bool, default: False) – Whether to mute logging.
public_aware (bool, default: True) – Whether to standardize from Bionty reference. Defaults to True for Bionty registries.
keep (Literal['first', 'last', False], default: 'first') –
When a synonym maps to multiple names, determines which duplicates to mark as pd.DataFrame.duplicated:
- "first": returns the first mapped standardized name
- "last": returns the last mapped standardized name
- False: returns all mapped standardized name.
When keep is False, the returned list of standardized names will contain nested lists in case of duplicates.

When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
synonyms_field (str, default: 'synonyms') – A field containing the concatenated synonyms.
organism (str | Record | None, default: None) – An Organism name or record.
source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against bionty/public sources.

Return type:

list[str] | dict[str, str]

Returns:

If return_mapper is False – a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.

See also

add_synonym(): Add synonyms.
remove_synonym(): Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> standardized_names = bt.Gene.standardize(gene_synonyms)
>>> standardized_names
['A1CF', 'A1BG', 'BRCA2', 'FANCD20']

classmethod using(instance)¶

Use a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0

classmethod validate(values, field=None, *, mute=False, organism=None, source=None, strict_source=False)¶

Validate values against existing values of a string field.

Note this is strict_source validation, only asserts exact matches.

Parameters:

values (list[str] | Series | array) – Values that will be validated against the field.
field (str | DeferredAttribute | None, default: None) – The field of values. Examples are 'ontology_id' to map against the source ID or 'name' to map against the ontologies field names.
mute (bool, default: False) – Whether to mute logging.
organism (str | Record | None, default: None) – An Organism name or record.
source (Record | None, default: None) – A bionty.Source record that specifies the version to validate against.
strict_source (bool, default: False) – Determines the validation behavior against records in the registry. - If False, validation will include all records in the registry, ignoring the specified source. - If True, validation will only include records in the registry that are linked to the specified source. Note: this parameter won’t affect validation against bionty/public sources.

Return type:

ndarray

Returns:

A vector of booleans indicating if an element is validated.

See also

inspect()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol"))
>>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"]
>>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol)
array([ True,  True, False, False])

Methods¶

add_synonym(synonym, force=False, save=None)¶

Add synonyms to a record.

Parameters:

synonym (str | list[str] | Series | array) – The synonyms to add to the record.
force (bool, default: False) – Whether to add synonyms even if they are already synonyms of other records.
save (bool | None, default: None) – Whether to save the record to the database.

See also

remove_synonym(): Remove synonyms.

Examples

>>> import bionty as bt
>>> bt.CellType.from_source(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.add_synonym("T cells")
>>> record.synonyms
'T cells|T-cell|T-lymphocyte|T lymphocyte'

delete()¶

Delete.

Return type:: None

describe(return_str=False)¶

Describe schema.

Return type:: None | str

remove_synonym(synonym)¶

Remove synonyms from a record.

Parameters:: synonym (str | list[str] | Series | array) – The synonym values to remove.

See also

add_synonym(): Add synonyms

Examples

>>> import bionty as bt
>>> bt.CellType.from_source(name="T cell").save()
>>> lookup = bt.CellType.lookup()
>>> record = lookup.t_cell
>>> record.synonyms
'T-cell|T lymphocyte|T-lymphocyte'
>>> record.remove_synonym("T-cell")
'T lymphocyte|T-lymphocyte'

save(*args, **kwargs)¶

Save.

Return type:: Schema

set_abbr(value)¶

Set value for abbr field and add to synonyms.

Parameters:: value (str) – A value for an abbreviation.

See also

add_synonym()

Examples

>>> import bionty as bt
>>> bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save()
>>> scrna = bt.ExperimentalFactor.get(name="single-cell RNA sequencing")
>>> scrna.abbr
None
>>> scrna.synonyms
'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing'
>>> scrna.set_abbr("scRNA")
>>> scrna.abbr
'scRNA'
>>> scrna.synonyms
'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq'
>>> scrna.save()