{ "cells": [ { "cell_type": "markdown", "id": "f055f678932e4cc", "metadata": {}, "source": [ "# Cycle Closure Error Analysis\n", "\n", "Cycle closure analysis is a useful internal consistency check for relative binding free energy networks. For a closed loop of transformations, the signed sum of the calculated ΔΔG values should be zero (assuming perfect sampling). A large non-zero value indicates that at least one edge in the cycle is inconsistent with the others and may have convergence or sampling issues.\n", "\n", "However, a low cycle closure error can also arise by chance from cancellation of errors, so it does not prove that every edge is well sampled. Cycle closure analysis should therefore be included as part of a larger post-simulation analysis pipeline.\n", "\n", "
\n", "Note: Not all alchemical network generation methods incorporate cycles; see konnektor for examples.\n", "
\n", "\n", "\n", "In this tutorial we will:\n", "\n", "1. Build a ``FEMap`` with two computational sources using the bundled RBFE example data.\n", "2. Calculate cycle closure errors with ``FEMap.get_cycle_closure_dataframe``.\n", "3. Summarise which edges appear in high-closure-error cycles with ``FEMap.get_cycle_closure_edge_statistics_dataframe``.\n", "4. Plot the cycle closure error distribution with ``cinnabar.plotting.plot_cycle_closure``.\n", "\n", "
\n", "Important: Cycle closure analysis only considers cycles involving three or more ligands. It does not include \"self edges\" formed by comparing a forward edge with its backward edge, for example A→B against B→A. Forward/backward consistency should be checked separately before adding data to the map.\n", "
" ] }, { "cell_type": "markdown", "id": "138cf0640cc1a016", "metadata": {}, "source": [ "## Setting up the FEMap\n", "\n", "For this example, we will use RBFE data generated with [OpenFE](https://docs.openfree.energy/en/latest/), which is bundled with ``cinnabar``'s data directory. The dataset contains one set of simulated relative free energy edges (**Method A**). To demonstrate comparing two computational protocols, we also create a second method (**Method B**) by adding ±0.5 kcal/mol Gaussian noise to the Method A values. You can easily swap this out with your own RBFE outputs.\n", "\n", "Cycle closure analysis only needs calculated relative free energy edges, so we do not need to add experimental measurements for this tutorial." ] }, { "cell_type": "code", "execution_count": 1, "id": "6a5f36f0500fa354", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FEMap contains 36 ligands and 116 calculated edges\n", "Computational sources: ['Method A', 'Method B']\n" ] } ], "source": [ "%matplotlib inline\n", "%config InlineBackend.figure_formats = ['svg']\n", "\n", "from collections import defaultdict\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from openff.units import unit\n", "\n", "from cinnabar import FEMap, plotting\n", "\n", "# Load the computational results bundled with cinnabar\n", "rbfe_results = pd.read_csv(\"../cinnabar/data/computational_data.csv\")\n", "\n", "# Fix the random seed so Method B is reproducible\n", "rng = np.random.default_rng(seed=42)\n", "\n", "femap = FEMap()\n", "\n", "# Method A: the original calculated values\n", "for _, result in rbfe_results.iterrows():\n", " femap.add_relative_calculation(\n", " labelA=result[\"Ligand1\"],\n", " labelB=result[\"Ligand2\"],\n", " value=result[\"calc_DDG\"] * unit.kilocalorie_per_mole,\n", " uncertainty=result[\"calc_dDDG(MBAR)\"] * unit.kilocalorie_per_mole,\n", " source=\"Method A\",\n", " )\n", "\n", "# Method B: same edges, perturbed values to mimic another computational protocol\n", "for _, result in rbfe_results.iterrows():\n", " femap.add_relative_calculation(\n", " labelA=result[\"Ligand1\"],\n", " labelB=result[\"Ligand2\"],\n", " value=(result[\"calc_DDG\"] + rng.normal(0, 0.5)) * unit.kilocalorie_per_mole,\n", " uncertainty=result[\"calc_dDDG(MBAR)\"] * unit.kilocalorie_per_mole,\n", " source=\"Method B\",\n", " )\n", "\n", "print(f\"FEMap contains {femap.n_ligands} ligands and {femap.n_edges} calculated edges\")\n", "rel_df = femap.get_relative_dataframe()\n", "print(\"Computational sources:\", rel_df.loc[rel_df[\"computational\"], \"source\"].unique().tolist())" ] }, { "cell_type": "markdown", "id": "e49535da7813b508", "metadata": {}, "source": [ "## Cycle-level closure errors\n", "\n", "``get_cycle_closure_dataframe`` reports one row per detected cycle and source. By default, cycles up to length 5 are considered; this can be changed with ``max_cycle_length``.\n", "\n", "The reported columns are:\n", "\n", "| Label | Description |\n", "|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "|``source``| The computational source label, such as ``Method A`` or ``Method B``. |\n", "| ``cycle``| The ligand labels in the identified closed loop. |\n", "| ``cc (kcal/mol)``| The raw unsigned cycle closure error, calculated as the absolute value of the signed sum of ΔΔG values around the cycle. |\n", "| ``cc_per_edge (kcal/mol)``| ``cc (kcal/mol)`` normalised by ``sqrt(number_of_edges_in_cycle)``. This makes cycles with different lengths easier to compare; see equation 5 of [Baumann et al.](https://pubs.acs.org/doi/10.1021/acs.jctc.3c00282). |\n", "| ``cc_unc_normalized``| ``cc (kcal/mol)`` normalised by the propagated cycle uncertainty, ``sqrt(sum(edge_uncertainty**2))``. Values much larger than 1 indicate closure errors that are large relative to the reported edge uncertainties. This is similar to ``cc_per_edge`` but does not assume the edges all have comparable uncertainties. |\n", "\n", "When a cycle traverses an edge in the opposite direction to the stored calculation, ``cinnabar`` automatically changes the sign of that ΔΔG contribution." ] }, { "cell_type": "code", "execution_count": 2, "id": "a22d80f5175345", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Detected 92 cycles across all sources\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcecyclecc (kcal/mol)cc_per_edge (kcal/mol)cc_unc_normalized
0Method A(CAT-17g, CAT-13i, CAT-13d, CAT-13b)2.401.2000009.982684
1Method A(CAT-17g, CAT-13i, CAT-13d, CAT-17d)1.600.8000007.715885
2Method A(CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m)1.260.5634894.650745
3Method A(CAT-4b, CAT-4o, CAT-13j, CAT-4m, CAT-13k)1.230.5500735.263914
4Method A(CAT-4b, CAT-4o, CAT-4j, CAT-4m, CAT-13k)1.080.4829915.489949
5Method A(CAT-13a, CAT-17g, CAT-13c, CAT-17i)1.050.5250004.879764
6Method A(CAT-4b, CAT-4o, CAT-4d, CAT-13k)1.020.5100004.740342
7Method A(CAT-4b, CAT-4o, CAT-4k, CAT-4m, CAT-13k)0.880.3935484.243737
8Method A(CAT-4c, CAT-4o, CAT-13j, CAT-4m)0.850.4250004.123106
9Method A(CAT-17g, CAT-17d, CAT-13d, CAT-13b)0.800.4000004.093156
\n", "
" ], "text/plain": [ " source cycle cc (kcal/mol) \\\n", "0 Method A (CAT-17g, CAT-13i, CAT-13d, CAT-13b) 2.40 \n", "1 Method A (CAT-17g, CAT-13i, CAT-13d, CAT-17d) 1.60 \n", "2 Method A (CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m) 1.26 \n", "3 Method A (CAT-4b, CAT-4o, CAT-13j, CAT-4m, CAT-13k) 1.23 \n", "4 Method A (CAT-4b, CAT-4o, CAT-4j, CAT-4m, CAT-13k) 1.08 \n", "5 Method A (CAT-13a, CAT-17g, CAT-13c, CAT-17i) 1.05 \n", "6 Method A (CAT-4b, CAT-4o, CAT-4d, CAT-13k) 1.02 \n", "7 Method A (CAT-4b, CAT-4o, CAT-4k, CAT-4m, CAT-13k) 0.88 \n", "8 Method A (CAT-4c, CAT-4o, CAT-13j, CAT-4m) 0.85 \n", "9 Method A (CAT-17g, CAT-17d, CAT-13d, CAT-13b) 0.80 \n", "\n", " cc_per_edge (kcal/mol) cc_unc_normalized \n", "0 1.200000 9.982684 \n", "1 0.800000 7.715885 \n", "2 0.563489 4.650745 \n", "3 0.550073 5.263914 \n", "4 0.482991 5.489949 \n", "5 0.525000 4.879764 \n", "6 0.510000 4.740342 \n", "7 0.393548 4.243737 \n", "8 0.425000 4.123106 \n", "9 0.400000 4.093156 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cycle_df = femap.get_cycle_closure_dataframe(max_cycle_length=5)\n", "print(f\"Detected {len(cycle_df)} cycles across all sources\")\n", "cycle_df.head(10)" ] }, { "cell_type": "markdown", "id": "5f67c5ccc69114c1", "metadata": {}, "source": [ "The cycles with the largest ``cc_per_edge (kcal/mol)`` are good starting points for follow-up. We can sort by this metric to get the largest errors." ] }, { "cell_type": "code", "execution_count": 3, "id": "8f4ec95e37079a13", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourcecyclecc (kcal/mol)cc_per_edge (kcal/mol)cc_unc_normalized
0Method A(CAT-17g, CAT-13i, CAT-13d, CAT-13b)2.4000001.2000009.982684
46Method B(CAT-17g, CAT-13c, CAT-17i, CAT-13e)2.0522911.0261469.328598
48Method B(CAT-13m, CAT-13a, CAT-13n, CAT-4i)1.7609790.8804897.183182
49Method B(CAT-17g, CAT-13i, CAT-13d, CAT-13b)1.7179440.8589727.145704
47Method B(CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m)1.8604000.8319966.866861
1Method A(CAT-17g, CAT-13i, CAT-13d, CAT-17d)1.6000000.8000007.715885
50Method B(CAT-17g, CAT-13i, CAT-13d, CAT-17d)1.3065970.6532996.300973
2Method A(CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m)1.2600000.5634894.650745
54Method B(CAT-4c, CAT-4o, CAT-4k, CAT-4m)1.1179210.5589616.359631
55Method B(CAT-17g, CAT-13c, CAT-17i, CAT-13g)1.1160940.5580474.425601
\n", "
" ], "text/plain": [ " source cycle cc (kcal/mol) \\\n", "0 Method A (CAT-17g, CAT-13i, CAT-13d, CAT-13b) 2.400000 \n", "46 Method B (CAT-17g, CAT-13c, CAT-17i, CAT-13e) 2.052291 \n", "48 Method B (CAT-13m, CAT-13a, CAT-13n, CAT-4i) 1.760979 \n", "49 Method B (CAT-17g, CAT-13i, CAT-13d, CAT-13b) 1.717944 \n", "47 Method B (CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m) 1.860400 \n", "1 Method A (CAT-17g, CAT-13i, CAT-13d, CAT-17d) 1.600000 \n", "50 Method B (CAT-17g, CAT-13i, CAT-13d, CAT-17d) 1.306597 \n", "2 Method A (CAT-13m, CAT-13a, CAT-13n, CAT-13k, CAT-4m) 1.260000 \n", "54 Method B (CAT-4c, CAT-4o, CAT-4k, CAT-4m) 1.117921 \n", "55 Method B (CAT-17g, CAT-13c, CAT-17i, CAT-13g) 1.116094 \n", "\n", " cc_per_edge (kcal/mol) cc_unc_normalized \n", "0 1.200000 9.982684 \n", "46 1.026146 9.328598 \n", "48 0.880489 7.183182 \n", "49 0.858972 7.145704 \n", "47 0.831996 6.866861 \n", "1 0.800000 7.715885 \n", "50 0.653299 6.300973 \n", "2 0.563489 4.650745 \n", "54 0.558961 6.359631 \n", "55 0.558047 4.425601 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cycle_df.sort_values(\"cc_per_edge (kcal/mol)\", ascending=False).head(10)" ] }, { "cell_type": "markdown", "id": "64384039a01a1c16", "metadata": {}, "source": [ "## Edge-level cycle closure statistics\n", "\n", "``get_cycle_closure_edge_statistics_dataframe`` maps the cycle-level results back onto the edges. It does not claim that a particular edge caused the cycle closure error; instead, it identifies edges that occur in cycles with high closure errors and may deserve closer inspection.\n", "\n", "The reported columns are:\n", "\n", "| Label | Description |\n", "|-------|-------------|\n", "| ``source``| The computational source label.|\n", "| ``ligandA``/``ligandB``| The ligand labels involved in the edge.|\n", "| ``n_cycles``| The number of detected cycles containing that edge.|\n", "| ``mean_cc_per_edge (kcal/mol)``| The mean ``cc_per_edge (kcal/mol)`` over cycles containing the edge.|\n", "| ``max_cc_per_edge (kcal/mol)``| The largest ``cc_per_edge (kcal/mol)`` over cycles containing the edge.|\n", "\n", "Because this is based on cycles, edges that do not participate in any detected cycle are absent from the dataframe." ] }, { "cell_type": "code", "execution_count": 4, "id": "4fa3e16716ebf48a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceligandAligandBn_cyclesmean_cc_per_edge (kcal/mol)max_cc_per_edge (kcal/mol)
0Method ACAT-13iCAT-17g21.0000001.200000
1Method ACAT-13dCAT-13i21.0000001.200000
2Method ACAT-13bCAT-13d20.8000001.200000
3Method ACAT-13bCAT-17g20.8000001.200000
4Method ACAT-13dCAT-17d20.6000000.800000
5Method ACAT-17dCAT-17g20.6000000.800000
6Method ACAT-13aCAT-13m20.4367450.563489
7Method ACAT-13aCAT-13n20.4367450.563489
8Method ACAT-13kCAT-13n20.4248530.563489
9Method ACAT-13mCAT-4m20.4248530.563489
\n", "
" ], "text/plain": [ " source ligandA ligandB n_cycles mean_cc_per_edge (kcal/mol) \\\n", "0 Method A CAT-13i CAT-17g 2 1.000000 \n", "1 Method A CAT-13d CAT-13i 2 1.000000 \n", "2 Method A CAT-13b CAT-13d 2 0.800000 \n", "3 Method A CAT-13b CAT-17g 2 0.800000 \n", "4 Method A CAT-13d CAT-17d 2 0.600000 \n", "5 Method A CAT-17d CAT-17g 2 0.600000 \n", "6 Method A CAT-13a CAT-13m 2 0.436745 \n", "7 Method A CAT-13a CAT-13n 2 0.436745 \n", "8 Method A CAT-13k CAT-13n 2 0.424853 \n", "9 Method A CAT-13m CAT-4m 2 0.424853 \n", "\n", " max_cc_per_edge (kcal/mol) \n", "0 1.200000 \n", "1 1.200000 \n", "2 1.200000 \n", "3 1.200000 \n", "4 0.800000 \n", "5 0.800000 \n", "6 0.563489 \n", "7 0.563489 \n", "8 0.563489 \n", "9 0.563489 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "edge_stats_df = femap.get_cycle_closure_edge_statistics_dataframe(max_cycle_length=5)\n", "edge_stats_df.head(10)" ] }, { "cell_type": "markdown", "id": "94ec3074a0efa3fc", "metadata": {}, "source": [ "For a single source, sorting by ``max_cc_per_edge (kcal/mol)`` highlights edges that appear in at least one poorly closing cycle. Sorting by ``n_cycles`` can identify edges whose statistics are based on more cycle evidence." ] }, { "cell_type": "code", "execution_count": 5, "id": "8b9ebdc7a09391c2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sourceligandAligandBn_cyclesmean_cc_per_edge (kcal/mol)max_cc_per_edge (kcal/mol)
0Method ACAT-13iCAT-17g21.0000001.200000
1Method ACAT-13dCAT-13i21.0000001.200000
2Method ACAT-13bCAT-13d20.8000001.200000
3Method ACAT-13bCAT-17g20.8000001.200000
4Method ACAT-13dCAT-17d20.6000000.800000
5Method ACAT-17dCAT-17g20.6000000.800000
26Method ACAT-13kCAT-4m170.2577840.563489
6Method ACAT-13aCAT-13m20.4367450.563489
7Method ACAT-13aCAT-13n20.4367450.563489
8Method ACAT-13kCAT-13n20.4248530.563489
\n", "
" ], "text/plain": [ " source ligandA ligandB n_cycles mean_cc_per_edge (kcal/mol) \\\n", "0 Method A CAT-13i CAT-17g 2 1.000000 \n", "1 Method A CAT-13d CAT-13i 2 1.000000 \n", "2 Method A CAT-13b CAT-13d 2 0.800000 \n", "3 Method A CAT-13b CAT-17g 2 0.800000 \n", "4 Method A CAT-13d CAT-17d 2 0.600000 \n", "5 Method A CAT-17d CAT-17g 2 0.600000 \n", "26 Method A CAT-13k CAT-4m 17 0.257784 \n", "6 Method A CAT-13a CAT-13m 2 0.436745 \n", "7 Method A CAT-13a CAT-13n 2 0.436745 \n", "8 Method A CAT-13k CAT-13n 2 0.424853 \n", "\n", " max_cc_per_edge (kcal/mol) \n", "0 1.200000 \n", "1 1.200000 \n", "2 1.200000 \n", "3 1.200000 \n", "4 0.800000 \n", "5 0.800000 \n", "26 0.563489 \n", "6 0.563489 \n", "7 0.563489 \n", "8 0.563489 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "edge_stats_df[edge_stats_df[\"source\"] == \"Method A\"].sort_values(\n", " [\"max_cc_per_edge (kcal/mol)\", \"n_cycles\"], ascending=[False, False]\n", ").head(10)" ] }, { "cell_type": "markdown", "id": "794e0301", "metadata": {}, "source": [ "### Prioritising Edges With Uncertainty-Normalized Cycle Scores\n", "\n", "We can also prioritise edges for follow-up using the uncertainty-normalized metric ``cc_unc_normalized``. This analysis does not prove that an edge caused the observed cycle closure error; it only identifies edges that often occur in high-error cycles. For each edge, we calculate the fraction of containing cycles whose uncertainty-normalized closure error is above a threshold. Here we use a cutoff of 2, meaning the closure error is more than twice the propagated cycle uncertainty\n" ] }, { "cell_type": "code", "execution_count": 6, "id": "8e913d5a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ligandAligandBfraction_cc_unc_gt_2n_cyclesmax_cc_unc_normalized
37CAT-13hCAT-17i1.033.072659
40CAT-13dCAT-13h1.033.072659
0CAT-13iCAT-17g1.029.982684
1CAT-13dCAT-13i1.029.982684
2CAT-13bCAT-13d1.029.982684
3CAT-13bCAT-17g1.029.982684
4CAT-13dCAT-17d1.027.715885
5CAT-17dCAT-17g1.027.715885
6CAT-13aCAT-13m1.024.650745
7CAT-13aCAT-13n1.024.650745
\n", "
" ], "text/plain": [ " ligandA ligandB fraction_cc_unc_gt_2 n_cycles max_cc_unc_normalized\n", "37 CAT-13h CAT-17i 1.0 3 3.072659\n", "40 CAT-13d CAT-13h 1.0 3 3.072659\n", "0 CAT-13i CAT-17g 1.0 2 9.982684\n", "1 CAT-13d CAT-13i 1.0 2 9.982684\n", "2 CAT-13b CAT-13d 1.0 2 9.982684\n", "3 CAT-13b CAT-17g 1.0 2 9.982684\n", "4 CAT-13d CAT-17d 1.0 2 7.715885\n", "5 CAT-17d CAT-17g 1.0 2 7.715885\n", "6 CAT-13a CAT-13m 1.0 2 4.650745\n", "7 CAT-13a CAT-13n 1.0 2 4.650745" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# just do the analysis for Method A\n", "method_a_cycles = cycle_df[cycle_df[\"source\"] == \"Method A\"]\n", "edge_cycles = defaultdict(list)\n", "\n", "for _, row in method_a_cycles.iterrows():\n", " cycle = list(row[\"cycle\"])\n", " for i, lig in enumerate(cycle):\n", " lig_a = lig\n", " # pick the next ligand in the cycle, wrapping around to the start if necessary\n", " lig_b = cycle[i + 1] if i < len(cycle) - 1 else cycle[0]\n", " edge = tuple(sorted((lig_a, lig_b)))\n", " edge_cycles[edge].append(row[\"cc_unc_normalized\"])\n", "\n", "\n", "# get the fraction of cycles that have cc_unc_normalized > 2\n", "edge_fractions = []\n", "for edge, cc_uncs in edge_cycles.items():\n", " cc_uns = np.array(cc_uncs)\n", " fraction = np.mean(cc_uns > 2)\n", " edge_fractions.append(\n", " {\n", " \"ligandA\": edge[0],\n", " \"ligandB\": edge[1],\n", " \"fraction_cc_unc_gt_2\": fraction,\n", " \"n_cycles\": len(cc_uns),\n", " \"max_cc_unc_normalized\": cc_uns.max(),\n", " }\n", " )\n", "edge_fractions_df = pd.DataFrame(edge_fractions)\n", "edge_fractions_df.sort_values(\n", " [\"fraction_cc_unc_gt_2\", \"n_cycles\", \"max_cc_unc_normalized\"], ascending=[False, False, False]\n", ").head(10)" ] }, { "cell_type": "markdown", "id": "42b17a908cb288a0", "metadata": {}, "source": [ "## Plotting the cycle closure distribution\n", "\n", "``plot_cycle_closure`` plots a histogram of ``cc_per_edge (kcal/mol)``. If the ``FEMap`` contains multiple computational sources, all sources are plotted by default. You can pass ``sources=[...]`` to plot a subset." ] }, { "cell_type": "code", "execution_count": 7, "id": "2f3559b5ab09a5e1", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2026-06-17T09:56:04.089595\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.10.6, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plotting.plot_cycle_closure(\n", " # the femap containing the edges per source to plot\n", " femap,\n", " # the name of the file to save the plot to, or None to show it inline\n", " filename=None,\n", " # the maximum cycle length to consider when computing cycle closure statistics\n", " max_cycle_length=5,\n", " # the width of the bins in the histogram of cycle closure errors\n", " bin_width=0.25,\n", ")" ] }, { "cell_type": "markdown", "id": "195b5b430284c011", "metadata": {}, "source": [ "You can also focus on a single source. This is useful when a map contains many computational protocols or replicate datasets." ] }, { "cell_type": "code", "execution_count": 8, "id": "9ff9795da57ed015", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", " \n", " \n", " \n", " \n", " 2026-06-17T09:56:04.251632\n", " image/svg+xml\n", " \n", " \n", " Matplotlib v3.10.6, https://matplotlib.org/\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig = plotting.plot_cycle_closure(\n", " femap,\n", " filename=None,\n", " sources=[\"Method B\"],\n", " max_cycle_length=5,\n", " bin_width=0.25,\n", ")" ] }, { "cell_type": "markdown", "id": "ee64b198a905712f", "metadata": {}, "source": [ "## Recap\n", "\n", "- Use ``FEMap.get_cycle_closure_dataframe`` to inspect each detected cycle and its raw, length-scaled, and uncertainty-normalized closure error.\n", "- Use ``FEMap.get_cycle_closure_edge_statistics_dataframe`` to find edges that appear in cycles with high closure errors.\n", "- Use ``plotting.plot_cycle_closure`` to compare the distribution of ``cc_per_edge (kcal/mol)`` across one or more computational sources.\n", "- ``max_cycle_length`` controls the largest cycles included in both dataframe helpers and the plot.\n", "- Forward/backward self edges such as A→B and B→A are not included in this cycle closure analysis and should be checked separately." ] } ], "metadata": { "kernelspec": { "display_name": "openfe_dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 5 }