FEMap#
- class cinnabar.femap.FEMap[source]#
Free Energy map of both simulations and bench measurements
Contains a set (non-duplicate entries) of different measurements.
Examples
To construct a FEMap by hand:
>>> # Load/create experimental results >>> from openff.units import unit >>> kJpm = unit.kilojoule_per_mole >>> g = ReferenceState() >>> experimental_result1 = Measurement(labelA=g, labelB="CAT-13a", DG=-8.83 * kJpm, uncertainty=0.10 * kJpm, ... computational=False) >>> experimental_result2 = Measurement(labelA=g, labelB="CAT-17g", DG=-9.73 * kJpm, uncertainty=0.10 * kJpm, ... computational=False) >>> # Load/create calculated results >>> calculated_result = Measurement(labelA="CAT-13a", labelB="CAT-17g", DG=0.36 * kJpm, ... uncertainty=0.11 * kJpm, computational=True) >>> # Incrementally created FEMap >>> fe = FEMap() >>> fe.add_measurement(experimental_result1) >>> fe.add_measurement(experimental_result2) >>> fe.add_measurement(calculated_result)
To read from a legacy csv file specifically formatted for this, you can use:
>>> fe = FEMap.from_csv('../data/example.csv')
Methods
Add a single ABFE calculation
Add a single experimental measurement
Add new observation to FEMap, modifies the FEMap in-place
Add a single RBFE calculation
Checks if all computational results in the graph are reachable from other results.
Draw the graph using matplotlib.
Construct from legacy csv format
Create FEMap from network representation
Populate the FEMap with absolute computational values.
Get a dataframe of all absolute results from all sources.
Get a dataframe of the all-to-all pairwise relative results using the absolute DG values.
Calculate cycle closure errors for all cycles in the network.
For each simulated edge, report how many cycles it appears in and the mean and max cycle closure error of those cycles per source.
Retrieve stored metadata from a previous
generate_absolute_values()call.Get a dataframe of all relative results for all sources including experimental and computational.
Produce single graph version of this FEMap
A copy of the FEMap as a networkx Graph
Attributes
Average degree of computational nodes
All ligands in the graph
Number of computational edges
Total number of unique ligands
Total number of both experimental and computational measurements
- add_absolute_calculation(label, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#
Add a single ABFE calculation
- Parameters:
label (str | Hashable) – The ligand being measured.
value (openff.units.Quantity) – The measured value, as kcal/mol, or kJ/mol.
uncertainty (openff.units.Quantity) – The uncertainty in the measurement
source (str, default “”) – An identifier for the source of the data, by default this is an empty string.
temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.
- add_experimental_measurement(label: str | Hashable, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#
Add a single experimental measurement
- Parameters:
label (str | Hashable) – The ligand being measured
value (openff.units.Quantity) – The measured value, as either Ki, IC50, kcal/mol, or kJ/mol. The type of input is determined by the units of the input.
uncertainty (openff.units.Quantity) – The uncertainty in the measurement
source (str, default “”) – An identifier for the source of the data, by default this is an empty string.
temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.
- add_measurement(measurement: Measurement)[source]#
Add new observation to FEMap, modifies the FEMap in-place
Any other attributes on the measurement are used as annotations
- Parameters:
measurement (Measurement) – The measurement to add.
:raises ValueError : if bad type given:
- add_relative_calculation(labelA: str | Hashable, labelB: str | Hashable, value: Quantity, uncertainty: Quantity, *, source: str = '', temperature=<Quantity(298.15, 'kelvin')>)[source]#
Add a single RBFE calculation
- Parameters:
labelA, labelB (str | Hashable) – The ligands being measured. The measurement is taken from ligandA to ligandB, i.e. ligandA is the “old” or lambda=0.0 state, and ligandB is the “new” or lambda=1.0 state.
value (openff.units.Quantity) – The measured DDG value, as kcal/mol, or kJ/mol.
uncertainty (openff.units.Quantity) – The uncertainty in the measurement.
source (str, default “”) – An identifier for the source of the data, by default this is an empty string.
temperature (openff.units.Quantity, default 298.15 * unit.kelvin) – The temperature the measurement was taken at.
- check_weakly_connected() bool[source]#
Checks if all computational results in the graph are reachable from other results.
- Returns:
True if the graph is weakly connected, False otherwise.
- Return type:
bool
- Raises:
ValueError – If the graph contains no computational edges.
- property degree: float#
Average degree of computational nodes
- draw_graph(title: str = '', filename: str | None = None, highlight_edges: dict[str, list[tuple[str, str]]] | None = None)[source]#
Draw the graph using matplotlib.
- Parameters:
title (str, default “”) – Title for the graph, by default an empty string.
filename (str | None, default None) – If provided, the graph will be saved to this file. If None, the graph will be displayed.
highlight_edges (dict[str, list[tuple[str, str]]], default None) – Mapping of color -> list of edges to draw in that color. Edges not included are drawn in grey.
- classmethod from_csv(filename: Path, units: Quantity | None = None)[source]#
Construct from legacy csv format
- Parameters:
filename (pathlib.Path) – The path to the csv file.
units (openff.units.Quantity, default None) – The units to use for values in the file, defaults to kcal/mol.
- classmethod from_networkx(graph: MultiDiGraph)[source]#
Create FEMap from network representation
- Parameters:
graph (nx.MultiDiGraph) – The networkx representation of the FEMap.
Note
Currently absolutely no validation of the input is done.
- generate_absolute_values(estimator: Estimator | None = None)[source]#
Populate the FEMap with absolute computational values.
Runs the estimator on this femap for each unique computational source, adds the returned
Measurementobjects, and stores theEstimatorResultmetadata per source for later retrieval viaget_estimator_metadata.- Parameters:
estimator (Estimator, default None) – The estimator to use. Defaults to the MLEEstimator.
- Raises:
ValueError – If measurements have mixed units or the computational graph for any source is not weakly connected.
See also
get_estimator_metadataretrieve stored metadata after estimation.
Notes
This method modifies the FEMap in-place, adding new measurements and metadata.
- The estimator is run separately for each unique computational source, predictions will have a new source tag of
the form
{estimator_name}({original_source}), e.g.MLE(openff-2.0.0).
- get_absolute_dataframe(observable_type: Literal['dg', 'pic50']='dg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#
Get a dataframe of all absolute results from all sources.
- Parameters:
observable_type ({“dg”, “pic50”}, default “dg”) – The observable type to report values in. Defaults to
dg(kcal/mol). Usepic50to report pIC50 values.temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.
Note
The dataframe will have the following columns:
labelDG (kcal/mol)/pIC50— depending onobservable_typeuncertainty (kcal/mol)/uncertainty (unitless)sourcecomputational
The dataframe will be sorted by source, computational, and label to ensure consistent ordering of results between sources.
- get_all_to_all_relative_dataframe(symmetrical: bool = True, observable_type: Literal['ddg', 'dpic50']='ddg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#
Get a dataframe of the all-to-all pairwise relative results using the absolute DG values.
- Parameters:
symmetrical (bool, default True) – If True, include both directions of each pairwise comparison. If False, include only one direction.
observable_type ({“ddg”, “dpic50”}, default “ddg”) – The observable type to report values in. Defaults to
ddg(kcal/mol). Usedpic50to report DpIC50 values.temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.
- Returns:
df – A dataframe containing all pairwise relative results.
- Return type:
pd.DataFrame
Note
The dataframe will have the following columns:
labelAlabelBDDG (kcal/mol)/DpIC50— depending onobservable_typeuncertainty (kcal/mol)/uncertainty (unitless)sourcecomputational
The dataframe will be sorted by source, computational, labelA, and labelB to ensure that pairing order is consistent. If
symmetricalis True, the dataframe will include both (labelA, labelB) and (labelB, labelA) for each pair of labels, with opposite signs for DDG and the same uncertainty. If an estimator is used to generate the absolute binding affinities from relative results this function attempts to use thecovariance_matrixin the uncertainty if available, if not the covariance is set to zero.
- get_cycle_closure_dataframe(max_cycle_length: int = 5) DataFrame[source]#
Calculate cycle closure errors for all cycles in the network.
- Parameters:
max_cycle_length (int, default 5) – Only consider cycles up to this length. Default 5.
- Returns:
The pandas DataFrame will have the following columns
- source
- cycle
- cc (kcal/mol)
- cc_per_edge (kcal/mol)
- cc_unc_normalized
Sorted by source and cycle closure error descending.
Notes
Three cycle closure metrics are calculated:
cc (kcal/mol): the raw absolute sum of DDGs around the cycle. Units: kcal/mol.cc_per_edge (kcal/mol): the cycle closure divided by the square root of the cycle length, to allow comparison across different cycle lengths; see Baumann et al. (DOI 10.1021/acs.jctc.3c00282). Units: kcal/mol.cc_unc_normalized: the cycle closure error divided by its propagated uncertainty, calculated asabs(sum_ddgs) / sqrt(sum_var).
The function currently does not consider self loop edges, e.g. A–>B and B–>A edges.
- get_cycle_closure_edge_statistics_dataframe(max_cycle_length: int = 5) DataFrame[source]#
For each simulated edge, report how many cycles it appears in and the mean and max cycle closure error of those cycles per source.
The cycle closure values are based on
cc_per_edge (kcal/mol), defined as the absolute cycle closure divided by the square root of the cycle length.- Parameters:
max_cycle_length (int, default 5) – Only consider cycles up to this length. Defaults to 5.
- Returns:
The pandas DataFrame will have the following columns
- source
- ligandA
- ligandB
- n_cycles
- mean_cc_per_edge (kcal/mol)
- max_cc_per_edge (kcal/mol)
Sorted by source and mean cycle closure error descending.
- get_estimator_metadata(source: str) EstimatorResult[source]#
Retrieve stored metadata from a previous
generate_absolute_values()call.- Parameters:
source (str) – The composed source identifier for the estimator results to retrieve, e.g.
MLE(openff-2.0.0).- Returns:
The concrete type depends on the estimator used, e.g.
MLEEstimatorResultforMLEEstimator.- Return type:
- Raises:
KeyError – If no metadata is stored for the provided source.
- get_relative_dataframe(observable_type: Literal['ddg', 'dpic50']='ddg', temperature: Quantity = <Quantity(298.15, 'kelvin')>) DataFrame[source]#
Get a dataframe of all relative results for all sources including experimental and computational.
- Parameters:
observable_type ({“ddg”, “dpic50”}, default “ddg”) – The observable type to report values in. Defaults to
ddg(kcal/mol). Usedpic50to report DpIC50 values.temperature (Quantity, default 298.15 * unit.kelvin) – Temperature used for the unit conversion.
Note
The pandas DataFrame will have the following columns:
labelAlabelBDDG (kcal/mol)/DpIC50— depending onobservable_typeuncertainty (kcal/mol)/uncertainty (unitless)sourcecomputational
Only simulated relative results are included for the computational results. The dataframe is sorted by source, computational, labelA, and labelB to ensure consistent ordering of results between sources.
- property ligands: list#
All ligands in the graph
- property n_edges: int#
Number of computational edges
- property n_ligands: int#
Total number of unique ligands
- property n_measurements: int#
Total number of both experimental and computational measurements
- to_legacy_graph() DiGraph[source]#
Produce single graph version of this FEMap
This graph will feature: - experimental DDG values calculated as the difference between experimental DG values - calculated DG values calculated via mle
This matches the legacy format of this object, notably: - drops multi edge capability - removes units from values
Deprecated since version ``to_legacy_graph``: is deprecated and will be removed in a future release. Use
get_relative_dataframeandget_absolute_dataframeto access the underlying data, orgenerate_absolute_valuesto run MLE explicitly. The plot functionsplot_DDGs,plot_DGs, andplot_all_DDGsnow accept aFEMapdirectly and no longer require a legacy graph.
- to_networkx() MultiDiGraph[source]#
A copy of the FEMap as a networkx Graph
The FEMap is represented as a multi-edged directional graph
Edges have the following attributes:
DG: the free energy difference of going from the first edge label to the second edge label
uncertainty: uncertainty of the DG value
temperature: the temperature at which DG was measured
computational: boolean label of the original source of the data
source: a string describing the source of data.
Note
All edges appear twice, once with the attribute source=’reverse’, and the DG value flipped. This allows “pathfinding” like approaches, where the DG values will be correctly summed.