lamindb.Transform¶
- class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)¶
- Bases: - Record,- IsVersioned- Data transformations. - A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run ( - Run). A run has inputs and outputs.- A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository. - Transforms are versioned so that a given transform version maps on a given source code version. - Can I sync transforms to git?- If you switch on - sync_git_repoa script-like transform is synched to its hashed state in a git repository upon calling- ln.track().- >>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" >>> ln.track() - The definition of transforms and runs is consistent the OpenLineage specification where a - Transformrecord would be called a “job” and a- Runrecord a “run”.- Parameters:
- name – - strA name or title.
- key – - str | None = NoneA short name or path-like semantic key.
- type – - TransformType | None = "pipeline"See- TransformType.
- revises – - Transform | None = NoneAn old version of the transform.
 
 - Notes - Examples - Create a transform for a pipeline: - >>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save() - Create a transform from a notebook: - >>> ln.track() - View predecessors of a transform: - >>> transform.view_lineage() - Attributes¶- property name: str¶
- Name of the transform. - Splits - keyon- /and returns the last element.
 - property stem_uid: str¶
- Universal id characterizing the version family. - The full uid of a record is obtained via concatenating the stem uid and version information: - stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid 
 - Simple fields¶- uid: str¶
- Universal id. 
 - key: str | None¶
- A name or “/”-separated path-like string. - All transforms with the same key are part of the same version family. 
 - description: str | None¶
- A description. 
 - type: TransformType¶
- TransformType(default- "pipeline").
 - source_code: str | None¶
- Source code of the transform. - Changed in version 0.75: The - source_codefield is no longer an artifact, but a text field.
 - hash: str | None¶
- Hash of the source code. 
 - reference: str | None¶
- Reference for the transform, e.g., a URL. 
 - reference_type: str | None¶
- Reference type of the transform, e.g., ‘url’. 
 - created_at: datetime¶
- Time of creation of record. 
 - updated_at: datetime¶
- Time of last update to record. 
 - version: str | None¶
- Version (default - None).- Defines version of a family of records characterized by the same - stem_uid.- Consider using semantic versioning with Python versioning. 
 - is_latest: bool¶
- Boolean flag that indicates whether a record is the latest in its version family. 
 - Relational fields¶- 
predecessors: Transform¶
- Preceding transforms. - These are auto-populated whenever an artifact or collection serves as a run input, e.g., - artifact.runand- artifact.transformget populated & saved.- The table provides a more convenient method to query for the predecessors that bypasses querying the - Run.- It also allows to manually add predecessors whose outputs are not tracked in a run. 
 - 
successors: Transform¶
- Subsequent transforms. - See - predecessors.
 - Class methods¶- classmethod df(include=None, features=False, limit=100)¶
- Convert to - pd.DataFrame.- By default, shows all direct fields, except - updated_at.- Use arguments - includeor- featureto include other data.- Parameters:
- include ( - str|- list[- str] |- None, default:- None) – Related fields to include as columns. Takes strings of form- "ulabels__name",- "cell_types__name", etc. or a list of such strings.
- features ( - bool|- list[- str], default:- False) – If- True, map all features of the- Featureregistry onto the resulting- DataFrame. Only available for- Artifact.
- limit ( - int, default:- 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
 
- Return type:
- DataFrame
 - Examples - Include the name of the creator in the - DataFrame:- >>> ln.ULabel.df(include="created_by__name"]) - Include display of features for - Artifact:- >>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations - Only include select features: - >>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"]) 
 - classmethod filter(*queries, **expressions)¶
- Query records. - Parameters:
- queries – One or multiple - Qobjects.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- Returns:
- A - QuerySet.
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df() 
 - classmethod get(idlike=None, **expressions)¶
- Get a single record. - Parameters:
- idlike ( - int|- str|- None, default:- None) – Either a uid stub, uid or an integer id.
- expressions – Fields and values passed as Django query expressions. 
 
- Return type:
- Returns:
- A record. 
- Raises:
- lamindb.errors.DoesNotExist – In case no matching record is found. 
 - See also - Guide: Query & search registries 
- Django documentation: Queries 
 - Examples - >>> ulabel = ln.ULabel.get("FvtpPJLJ") >>> ulabel = ln.ULabel.get(name="my-label") 
 - classmethod lookup(field=None, return_field=None)¶
- Return an auto-complete object for a field. - Parameters:
- field ( - str|- DeferredAttribute|- None, default:- None) – The field to look up the values for. Defaults to first string field.
- return_field ( - str|- DeferredAttribute|- None, default:- None) – The field to return. If- None, returns the whole record.
 
- Return type:
- NamedTuple
- Returns:
- A - NamedTupleof lookup information of the field values with a dictionary converter.
 - See also - Examples - >>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol") 
 - classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
- Search. - Parameters:
- string ( - str) – The input string to match against the field ontology values.
- field ( - str|- DeferredAttribute|- None, default:- None) – The field or fields to search. Search all string fields by default.
- limit ( - int|- None, default:- 20) – Maximum amount of top results to return.
- case_sensitive ( - bool, default:- False) – Whether the match is case sensitive.
 
- Return type:
- Returns:
- A sorted - DataFrameof search results with a score in column- score. If- return_querysetis- True.- QuerySet.
 - Examples - >>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2") 
 - classmethod using(instance)¶
- Use a non-default LaminDB instance. - Parameters:
- instance ( - str|- None) – An instance identifier of form “account_handle/instance_name”.
- Return type:
 - Examples - >>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0 
 - Methods¶- delete()¶
- Delete. - Return type:
- None
 
 - view_lineage(with_successors=False, distance=5)¶
- View lineage of transforms.