FEMap#

class cinnabar.femap.FEMap[source]#

Free Energy map of both simulations and bench measurements

Contains a set (non-duplicate entries) of different measurements.

Examples

To construct a FEMap by hand:

>>> # Load/create experimental results
>>> from openff.units import unit
>>> kJpm = unit.kilojoule_per_mole
>>> g = ReferenceState()
>>> experimental_result1 = Measurement(labelA=g, labelB="CAT-13a", DG=-8.83 * kJpm, uncertainty=0.10 * kJpm,
...                                    computational=False)
>>> experimental_result2 = Measurement(labelA=g, labelB="CAT-17g", DG=-9.73 * kJpm, uncertainty=0.10 * kJpm,
...                                    computational=False)
>>> # Load/create calculated results
>>> calculated_result = Measurement(labelA="CAT-13a", labelB="CAT-17g", DG=0.36 * kJpm,
...                                 uncertainty=0.11 * kJpm, computational=True)
>>> # Incrementally created FEMap
>>> fe = FEMap()
>>> fe.add_measurement(experimental_result1)
>>> fe.add_measurement(experimental_result2)
>>> fe.add_measurement(calculated_result)

To read from a legacy csv file specifically formatted for this, you can use:

>>> fe = FEMap.from_csv('../data/example.csv')

Methods

add_absolute_calculation

Add a single ABFE calculation

add_experimental_measurement

Add a single experimental measurement

add_measurement

Add new observation to FEMap, modifies the FEMap in-place

add_relative_calculation

Add a single RBFE calculation

check_weakly_connected

Checks if all computational results in the graph are reachable from other results.

draw_graph

Draw the graph using matplotlib.

from_csv

Construct from legacy csv format

from_networkx

Create FEMap from network representation

generate_absolute_values

Populate the FEMap with absolute computational values.

get_absolute_dataframe

Get a dataframe of all absolute results from all sources.

get_all_to_all_relative_dataframe

Get a dataframe of the all-to-all pairwise relative results using the absolute DG values.

get_cycle_closure_dataframe

Calculate cycle closure errors for all cycles in the network.

get_cycle_closure_edge_statistics_dataframe

For each simulated edge, report how many cycles it appears in and the mean and max cycle closure error of those cycles per source.

get_estimator_metadata

Retrieve stored metadata from a previous generate_absolute_values() call.

get_relative_dataframe

Get a dataframe of all relative results for all sources including experimental and computational.

to_legacy_graph

Produce single graph version of this FEMap

to_networkx

A copy of the FEMap as a networkx Graph

Attributes

degree

Average degree of computational nodes

ligands

All ligands in the graph

n_edges

Number of computational edges

n_ligands

Total number of unique ligands

n_measurements

Total number of both experimental and computational measurements

add_absolute_calculation(label, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#

Add a single ABFE calculation

Parameters:
  • label (str | Hashable) – The ligand being measured.

  • value (openff.units.Quantity) – The measured value, as kcal/mol, or kJ/mol.

  • uncertainty (openff.units.Quantity) – The uncertainty in the measurement

  • source (str, default “”) – An identifier for the source of the data, by default this is an empty string.

  • temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.

add_experimental_measurement(label: str | Hashable, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#

Add a single experimental measurement

Parameters:
  • label (str | Hashable) – The ligand being measured

  • value (openff.units.Quantity) – The measured value, as either Ki, IC50, kcal/mol, or kJ/mol. The type of input is determined by the units of the input.

  • uncertainty (openff.units.Quantity) – The uncertainty in the measurement

  • source (str, default “”) – An identifier for the source of the data, by default this is an empty string.

  • temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.

add_measurement(measurement: Measurement)[source]#

Add new observation to FEMap, modifies the FEMap in-place

Any other attributes on the measurement are used as annotations

Parameters:

measurement (Measurement) – The measurement to add.

:raises ValueError : if bad type given:

add_relative_calculation(labelA: str | Hashable, labelB: str | Hashable, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#

Add a single RBFE calculation

Parameters:
  • labelA, labelB (str | Hashable) – The ligands being measured. The measurement is taken from ligandA to ligandB, i.e. ligandA is the “old” or lambda=0.0 state, and ligandB is the “new” or lambda=1.0 state.

  • value (openff.units.Quantity) – The measured DDG value, as kcal/mol, or kJ/mol.

  • uncertainty (openff.units.Quantity) – The uncertainty in the measurement.

  • source (str, default “”) – An identifier for the source of the data, by default this is an empty string.

  • temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.

check_weakly_connected() bool[source]#

Checks if all computational results in the graph are reachable from other results.

Returns:

True if the graph is weakly connected, False otherwise.

Return type:

bool

Raises:

ValueError – If the graph contains no computational edges.

property degree: float#

Average degree of computational nodes

draw_graph(title: str = '', filename: str | None = None, highlight_edges: dict[str, list[tuple[str, str]]] | None = None)[source]#

Draw the graph using matplotlib.

Parameters:
  • title (str, default “”) – Title for the graph, by default an empty string.

  • filename (str | None, default None) – If provided, the graph will be saved to this file. If None, the graph will be displayed.

  • highlight_edges (dict[str, list[tuple[str, str]]], default None) – Mapping of color -> list of edges to draw in that color. Edges not included are drawn in grey.

classmethod from_csv(filename: Path, units: Quantity | None = None)[source]#

Construct from legacy csv format

Parameters:
  • filename (pathlib.Path) – The path to the csv file.

  • units (openff.units.Quantity, default None) – The units to use for values in the file, defaults to kcal/mol.

classmethod from_networkx(graph: MultiDiGraph)[source]#

Create FEMap from network representation

Parameters:

graph (nx.MultiDiGraph) – The networkx representation of the FEMap.

Note

Currently absolutely no validation of the input is done.

generate_absolute_values(estimator: Estimator | None = None)[source]#

Populate the FEMap with absolute computational values.

Runs the estimator on this femap for each unique computational source, adds the returned Measurement objects, and stores the EstimatorResult metadata per source for later retrieval via get_estimator_metadata.

Parameters:

estimator (Estimator, default None) – The estimator to use. Defaults to the MLEEstimator.

Raises:

ValueError – If measurements have mixed units or the computational graph for any source is not weakly connected.

See also

get_estimator_metadata

retrieve stored metadata after estimation.

Notes

  • This method modifies the FEMap in-place, adding new measurements and metadata.

  • The estimator is run separately for each unique computational source, predictions will have a new source tag of

    the form {estimator_name}({original_source}), e.g. MLE(openff-2.0.0).

get_absolute_dataframe(observable_type: Literal['dg', 'pic50']='dg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#

Get a dataframe of all absolute results from all sources.

Parameters:
  • observable_type ({“dg”, “pic50”}, default “dg”) – The observable type to report values in. Defaults to dg (kcal/mol). Use pic50 to report pIC50 values.

  • temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.

Note

The dataframe will have the following columns:

  • label

  • DG (kcal/mol) / pIC50 — depending on observable_type

  • uncertainty (kcal/mol) / uncertainty (unitless)

  • source

  • computational

The dataframe will be sorted by source, computational, and label to ensure consistent ordering of results between sources.

get_all_to_all_relative_dataframe(symmetrical: bool = True, observable_type: Literal['ddg', 'dpic50']='ddg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#

Get a dataframe of the all-to-all pairwise relative results using the absolute DG values.

Parameters:
  • symmetrical (bool, default True) – If True, include both directions of each pairwise comparison. If False, include only one direction.

  • observable_type ({“ddg”, “dpic50”}, default “ddg”) – The observable type to report values in. Defaults to ddg (kcal/mol). Use dpic50 to report DpIC50 values.

  • temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.

Returns:

df – A dataframe containing all pairwise relative results.

Return type:

pd.DataFrame

Note

The dataframe will have the following columns:

  • labelA

  • labelB

  • DDG (kcal/mol) / DpIC50 — depending on observable_type

  • uncertainty (kcal/mol) / uncertainty (unitless)

  • source

  • computational

The dataframe will be sorted by source, computational, labelA, and labelB to ensure that pairing order is consistent. If symmetrical is True, the dataframe will include both (labelA, labelB) and (labelB, labelA) for each pair of labels, with opposite signs for DDG and the same uncertainty. If an estimator is used to generate the absolute binding affinities from relative results this function attempts to use the covariance_matrix in the uncertainty if available, if not the covariance is set to zero.

get_cycle_closure_dataframe(max_cycle_length: int = 5) DataFrame[source]#

Calculate cycle closure errors for all cycles in the network.

Parameters:

max_cycle_length (int, default 5) – Only consider cycles up to this length. Default 5.

Returns:

  • The pandas DataFrame will have the following columns

  • - source

  • - cycle

  • - cc (kcal/mol)

  • - cc_per_edge (kcal/mol)

  • - cc_unc_normalized

  • Sorted by source and cycle closure error descending.

Notes

Three cycle closure metrics are calculated:

  • cc (kcal/mol): the raw absolute sum of DDGs around the cycle. Units: kcal/mol.

  • cc_per_edge (kcal/mol): the cycle closure divided by the square root of the cycle length, to allow comparison across different cycle lengths; see Baumann et al. (DOI 10.1021/acs.jctc.3c00282). Units: kcal/mol.

  • cc_unc_normalized: the cycle closure error divided by its propagated uncertainty, calculated as abs(sum_ddgs) / sqrt(sum_var).

The function currently does not consider self loop edges, e.g. A–>B and B–>A edges.

get_cycle_closure_edge_statistics_dataframe(max_cycle_length: int = 5) DataFrame[source]#

For each simulated edge, report how many cycles it appears in and the mean and max cycle closure error of those cycles per source.

The cycle closure values are based on cc_per_edge (kcal/mol), defined as the absolute cycle closure divided by the square root of the cycle length.

Parameters:

max_cycle_length (int, default 5) – Only consider cycles up to this length. Defaults to 5.

Returns:

  • The pandas DataFrame will have the following columns

  • - source

  • - ligandA

  • - ligandB

  • - n_cycles

  • - mean_cc_per_edge (kcal/mol)

  • - max_cc_per_edge (kcal/mol)

  • Sorted by source and mean cycle closure error descending.

get_estimator_metadata(source: str) EstimatorResult[source]#

Retrieve stored metadata from a previous generate_absolute_values() call.

Parameters:

source (str) – The composed source identifier for the estimator results to retrieve, e.g. MLE(openff-2.0.0).

Returns:

The concrete type depends on the estimator used, e.g. MLEEstimatorResult for MLEEstimator.

Return type:

EstimatorResult

Raises:

KeyError – If no metadata is stored for the provided source.

get_relative_dataframe(observable_type: Literal['ddg', 'dpic50']='ddg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#

Get a dataframe of all relative results for all sources including experimental and computational.

Parameters:
  • observable_type ({“ddg”, “dpic50”}, default “ddg”) – The observable type to report values in. Defaults to ddg (kcal/mol). Use dpic50 to report DpIC50 values.

  • temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.

Note

The pandas DataFrame will have the following columns:

  • labelA

  • labelB

  • DDG (kcal/mol) / DpIC50 — depending on observable_type

  • uncertainty (kcal/mol) / uncertainty (unitless)

  • source

  • computational

Only simulated relative results are included for the computational results. The dataframe is sorted by source, computational, labelA, and labelB to ensure consistent ordering of results between sources.

property ligands: list#

All ligands in the graph

property n_edges: int#

Number of computational edges

property n_ligands: int#

Total number of unique ligands

property n_measurements: int#

Total number of both experimental and computational measurements

to_legacy_graph() DiGraph[source]#

Produce single graph version of this FEMap

This graph will feature: - experimental DDG values calculated as the difference between experimental DG values - calculated DG values calculated via mle

This matches the legacy format of this object, notably: - drops multi edge capability - removes units from values

Deprecated since version ``to_legacy_graph``: is deprecated and will be removed in a future release. Use get_relative_dataframe and get_absolute_dataframe to access the underlying data, or generate_absolute_values to run MLE explicitly. The plot functions plot_DDGs, plot_DGs, and plot_all_DDGs now accept a FEMap directly and no longer require a legacy graph.

to_networkx() MultiDiGraph[source]#

A copy of the FEMap as a networkx Graph

The FEMap is represented as a multi-edged directional graph

Edges have the following attributes:

  • DG: the free energy difference of going from the first edge label to the second edge label

  • uncertainty: uncertainty of the DG value

  • temperature: the temperature at which DG was measured

  • computational: boolean label of the original source of the data

  • source: a string describing the source of data.

Note

All edges appear twice, once with the attribute source=’reverse’, and the DG value flipped. This allows “pathfinding” like approaches, where the DG values will be correctly summed.