{ "cells": [ { "cell_type": "markdown", "id": "f119d17fc04de233", "metadata": {}, "source": [ "# Calculating Error and Correlation Metrics Manually\n", "\n", "``cinnabar``'s scatter plots automatically annotate figures with statistics such as **RMSE** or **MUE** based on the recommend best practices for the observable being plotted as described in the [companion paper](https://livecomsjournal.org/index.php/livecoms/article/view/v4i1e1497).\n", "\n", "Sometimes, however, you may need to access those numbers to run a statistical comparison, or feed results into a downstream analysis pipeline and ``cinnabar`` provides a convenient way to do that as well using the API.\n", "\n", "In this tutorial we will demonstrate how to recreate the analysis metrics shown by default in the example scatter plots manually, this involves the following steps:\n", "\n", "1. Extract the predicted and experimental values from a ``FEMap`` using the dataframe helper functions.\n", "2. Compute the appropriate metrics for the given observable using analysis functions available in ``cinnabar.stats``.\n", "3. Obtain bootstrap confidence intervals around each metric using ``cinnabar.stats.bootstrap_statistic``.\n", "\n", "## Available metrics\n", "\n", "The ``cinnabar.stats`` module exposes are wide range of deviation and correlation metrics, the table below maps the short-hand string identifiers (used by ``cinnabar`` internally) for each metric to their corresponding function names and descriptions:\n", "\n", "| Metric | Function name | Description |\n", "|-----------|--------------------------------|------------------------------------------------------------------------------------------------------------------------|\n", "| ``\"RMSE\"`` | ``calculate_rmse`` | Root mean squared error. |\n", "| ``\"MUE\"`` | ``calculate_mue`` | Mean unsigned error. |\n", "| ``\"RAE\"`` | ``calculate_rae`` | Relative absolute error, a measure of the average absolute error relative to the average magnitude of the true values. |\n", "| ``\"NRMSE\"`` | ``calculate_nrmse`` | Normalized root mean squared error, using the true mean to normalize the RMSE. |\n", "| ``\"R2\"`` | ``calculate_r2`` | Coefficient of determination. |\n", "| ``\"rho\"`` | ``calculate_pearson_r`` | Pearson correlation coefficient. |\n", "| ``\"KTAU\"`` | ``calculate_kendalls_tau`` | Kendall Tau correlation coefficient. |\n", "| ``\"PI\"`` | ``calculate_predictive_index`` | Predictive index, weighted Kendall Tau using the difference of the true values to weight ordinal comparisons. |\n", "\n", "\n", "see the {mod}`cinnabar.stats` API documentation for more details.\n", "\n", "## Loading Example RBFE Results\n", "\n", "For this example, we will use RBFE data generated with [OpenFE](https://docs.openfree.energy/en/latest/), which is bundled with ``cinnabar``'s data directory. The dataset contains one set of simulated relative free energy edges (**Method A**) and a corresponding set of experimental absolute free energies. To demonstrate comparing two computational protocols, we also create a second method (**Method B**) by adding ±0.5 kcal/mol Gaussian noise to the Method A values. You can easily swap this out with your own RBFE outputs.\n", "\n", "For more information on creating a ``FEMap`` and adding data to it, see the {doc}`cinnabar API tutorial `." ] }, { "cell_type": "code", "execution_count": 1, "id": "67d2c737119baf13", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T08:42:17.741085Z", "start_time": "2026-06-02T08:42:16.785805Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FEMap contains 36 ligands\n", "Computational sources: ['Method A', 'Method B']\n" ] } ], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "from openff.units import unit\n", "\n", "from cinnabar import FEMap, stats\n", "\n", "# Load the computational results bundled with cinnabar\n", "rbfe_results = pd.read_csv(\"../cinnabar/data/computational_data.csv\")\n", "\n", "# Fix the random seed so bootstrap results are reproducible\n", "rng = np.random.default_rng(seed=42)\n", "\n", "femap = FEMap()\n", "\n", "# Method A: the original calculated values\n", "for _, result in rbfe_results.iterrows():\n", " # add each calculated relative free energy to the FEMap\n", " femap.add_relative_calculation(\n", " labelA=result[\"Ligand1\"], # string identifier for ligandA\n", " labelB=result[\"Ligand2\"], # string identifier for ligandB\n", " value=result[\"calc_DDG\"] * unit.kilocalorie_per_mole, # the calculated relative free energy with units\n", " uncertainty=result[\"calc_dDDG(MBAR)\"]\n", " * unit.kilocalorie_per_mole, # the uncertainty in the calculated relative free energy with units\n", " source=\"Method A\", # optional string describing the source of the calculation\n", " )\n", "\n", "# Method B: same edges, slightly different values (simulating a second protocol)\n", "for _, result in rbfe_results.iterrows():\n", " # add each calculated relative free energy to the FEMap\n", " femap.add_relative_calculation(\n", " labelA=result[\"Ligand1\"],\n", " labelB=result[\"Ligand2\"],\n", " # add ±0.5 kcal/mol noise\n", " value=(result[\"calc_DDG\"] + rng.normal(0, 0.5)) * unit.kilocalorie_per_mole,\n", " uncertainty=result[\"calc_dDDG(MBAR)\"] * unit.kilocalorie_per_mole,\n", " source=\"Method B\",\n", " )\n", "\n", "# Experimental absolute values (shared by both methods)\n", "experimental_results = pd.read_csv(\"../cinnabar/data/experimental_data.csv\")\n", "for _, exp_row in experimental_results.iterrows():\n", " femap.add_experimental_measurement(\n", " label=exp_row[\"Ligand\"],\n", " value=exp_row[\"expt_DG\"] * unit.kilocalorie_per_mole,\n", " uncertainty=exp_row[\"expt_dDG\"] * unit.kilocalorie_per_mole,\n", " source=\"Experimental\",\n", " )\n", "\n", "# Generate MLE absolute values for both sources in one call.\n", "# This produces sources \"MLE(Method A)\" and \"MLE(Method B)\" in the absolute dataframe.\n", "femap.generate_absolute_values()\n", "print(f\"FEMap contains {femap.n_ligands} ligands\")\n", "rel_df = femap.get_relative_dataframe()\n", "print(\"Computational sources:\", rel_df.loc[rel_df[\"computational\"], \"source\"].unique().tolist())" ] }, { "cell_type": "markdown", "id": "b06918c6f69ab46f", "metadata": {}, "source": [ "## Extracting data from the FEMap\n", "\n", "### Relative ΔΔG data\n", "\n", "``get_relative_dataframe()`` returns a ``DataFrame`` of every directly simulated edge together with the corresponding experimental ΔΔG derived from the absolute experimental values. The dataframe is also grouped and sorted by the ``source`` column to make computing metrics for a particular method straightforward.\n", "\n", "The columns are:\n", "\n", "- ``labelA``, ``labelB``: labels for the two ligands involved in the transformation\n", "- ``DDG (kcal/mol)``: the simulated free energy difference (ΔΔG = ΔG_B − ΔG_A)\n", "- ``uncertainty (kcal/mol)``: the reported uncertainty for the simulation\n", "- ``source``: which method / dataset the row comes from\n", "- ``computational``: ``True`` for calculated results, ``False`` for experimental" ] }, { "cell_type": "code", "execution_count": 2, "id": "57c9e3e60fb9b1fa", "metadata": { "ExecuteTime": { "end_time": "2026-06-03T10:05:58.522288Z", "start_time": "2026-06-03T10:05:46.883344Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelAlabelBDDG (kcal/mol)uncertainty (kcal/mol)sourcecomputational
0CAT-13aCAT-13m-0.950.13Method ATrue
1CAT-13aCAT-17g-0.020.10Method ATrue
2CAT-13aCAT-17i-0.760.11Method ATrue
3CAT-13bCAT-17g0.360.11Method ATrue
4CAT-13cCAT-17i0.260.11Method ATrue
5CAT-13dCAT-13b2.120.12Method ATrue
6CAT-13dCAT-13f0.130.13Method ATrue
7CAT-13dCAT-13h1.460.10Method ATrue
8CAT-13dCAT-13i-0.590.13Method ATrue
9CAT-13dCAT-17a-0.780.07Method ATrue
\n", "
" ], "text/plain": [ " labelA labelB DDG (kcal/mol) uncertainty (kcal/mol) source \\\n", "0 CAT-13a CAT-13m -0.95 0.13 Method A \n", "1 CAT-13a CAT-17g -0.02 0.10 Method A \n", "2 CAT-13a CAT-17i -0.76 0.11 Method A \n", "3 CAT-13b CAT-17g 0.36 0.11 Method A \n", "4 CAT-13c CAT-17i 0.26 0.11 Method A \n", "5 CAT-13d CAT-13b 2.12 0.12 Method A \n", "6 CAT-13d CAT-13f 0.13 0.13 Method A \n", "7 CAT-13d CAT-13h 1.46 0.10 Method A \n", "8 CAT-13d CAT-13i -0.59 0.13 Method A \n", "9 CAT-13d CAT-17a -0.78 0.07 Method A \n", "\n", " computational \n", "0 True \n", "1 True \n", "2 True \n", "3 True \n", "4 True \n", "5 True \n", "6 True \n", "7 True \n", "8 True \n", "9 True " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rel_df = femap.get_relative_dataframe()\n", "# show some method A results\n", "rel_df.head(10)" ] }, { "cell_type": "code", "execution_count": 3, "id": "787e4eb5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelAlabelBDDG (kcal/mol)uncertainty (kcal/mol)sourcecomputational
164CAT-4mCAT-4c1.300.141421experimentalFalse
165CAT-4mCAT-4j0.130.141421experimentalFalse
166CAT-4mCAT-4k1.300.141421experimentalFalse
167CAT-4mCAT-4l-0.190.141421experimentalFalse
168CAT-4mCAT-4n0.060.141421experimentalFalse
169CAT-4mCAT-4p-0.930.141421experimentalFalse
170CAT-4nCAT-13k-0.610.141421experimentalFalse
171CAT-4oCAT-4b-0.250.141421experimentalFalse
172CAT-4oCAT-4d0.270.141421experimentalFalse
173CAT-4pCAT-13k0.380.141421experimentalFalse
\n", "
" ], "text/plain": [ " labelA labelB DDG (kcal/mol) uncertainty (kcal/mol) source \\\n", "164 CAT-4m CAT-4c 1.30 0.141421 experimental \n", "165 CAT-4m CAT-4j 0.13 0.141421 experimental \n", "166 CAT-4m CAT-4k 1.30 0.141421 experimental \n", "167 CAT-4m CAT-4l -0.19 0.141421 experimental \n", "168 CAT-4m CAT-4n 0.06 0.141421 experimental \n", "169 CAT-4m CAT-4p -0.93 0.141421 experimental \n", "170 CAT-4n CAT-13k -0.61 0.141421 experimental \n", "171 CAT-4o CAT-4b -0.25 0.141421 experimental \n", "172 CAT-4o CAT-4d 0.27 0.141421 experimental \n", "173 CAT-4p CAT-13k 0.38 0.141421 experimental \n", "\n", " computational \n", "164 False \n", "165 False \n", "166 False \n", "167 False \n", "168 False \n", "169 False \n", "170 False \n", "171 False \n", "172 False \n", "173 False " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show some experimental results\n", "rel_df.tail(10)" ] }, { "cell_type": "code", "execution_count": 4, "id": "d6e8dbbf", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sources present: ['Method A', 'Method B', 'experimental']\n" ] } ], "source": [ "# show all the sources in the dataframe\n", "print(f\"Sources present: {rel_df['source'].unique().tolist()}\")" ] }, { "cell_type": "markdown", "id": "8e830d2869292b74", "metadata": {}, "source": [ "To compute error metrics for a particular method we need to pair each computational edge with the corresponding experimental value. Both are already in the same DataFrame with matching ``(labelA, labelB)`` keys we simply merge on those columns, while making sure to filter out any rows with missing values (e.g. if an experimental value is missing for a particular edge, that row will be dropped from the merged dataframe)." ] }, { "cell_type": "code", "execution_count": 5, "id": "2a4f2962af83edb6", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T09:01:09.921099Z", "start_time": "2026-06-02T09:01:09.879855Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Paired edges — Method A: 58, Method B: 58\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelAlabelBDDG_calcdDDG_calcDDG_exptdDDG_expt
0CAT-13aCAT-13m-0.950.130.080.141421
1CAT-13aCAT-17g-0.020.10-0.900.141421
2CAT-13aCAT-17i-0.760.11-0.630.141421
3CAT-13bCAT-17g0.360.11-0.620.141421
4CAT-13cCAT-17i0.260.11-0.150.141421
\n", "
" ], "text/plain": [ " labelA labelB DDG_calc dDDG_calc DDG_expt dDDG_expt\n", "0 CAT-13a CAT-13m -0.95 0.13 0.08 0.141421\n", "1 CAT-13a CAT-17g -0.02 0.10 -0.90 0.141421\n", "2 CAT-13a CAT-17i -0.76 0.11 -0.63 0.141421\n", "3 CAT-13b CAT-17g 0.36 0.11 -0.62 0.141421\n", "4 CAT-13c CAT-17i 0.26 0.11 -0.15 0.141421" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# make a helper function to align a source in the dataframe with the experimental values\n", "def get_paired_ddg(df: pd.DataFrame, source: str) -> pd.DataFrame:\n", " \"\"\"Return a DataFrame with matched computational and experimental ΔΔG columns.\"\"\"\n", " # Computational rows for the requested source\n", " calc = df[df[\"source\"] == source][[\"labelA\", \"labelB\", \"DDG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n", " calc = calc.rename(columns={\"DDG (kcal/mol)\": \"DDG_calc\", \"uncertainty (kcal/mol)\": \"dDDG_calc\"})\n", "\n", " # Experimental rows (source label is always \"experimental\" inside the relative dataframe)\n", " expt = df[df[\"source\"] == \"experimental\"][[\"labelA\", \"labelB\", \"DDG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n", " expt = expt.rename(columns={\"DDG (kcal/mol)\": \"DDG_expt\", \"uncertainty (kcal/mol)\": \"dDDG_expt\"})\n", "\n", " # merge the aligned dataframes\n", " paired = calc.merge(expt, on=[\"labelA\", \"labelB\"], how=\"left\")\n", " # drop any rows with a missing experimental value\n", " paired = paired.dropna(subset=[\"DDG_expt\"])\n", " return paired\n", "\n", "\n", "paired_A = get_paired_ddg(rel_df, \"Method A\")\n", "paired_B = get_paired_ddg(rel_df, \"Method B\")\n", "\n", "print(f\"Paired edges — Method A: {len(paired_A)}, Method B: {len(paired_B)}\\n\")\n", "paired_A.head()" ] }, { "cell_type": "markdown", "id": "b571b3b561808c8c", "metadata": {}, "source": [ "### Absolute ΔG data\n", "\n", "``get_absolute_dataframe()`` returns per-ligand absolute free energies as a ``DataFrame``. After calling ``generate_absolute_values()`` the MLE-derived computational estimates appear alongside the experimental absolute values, directly simulated absolute values are also included if present. The dataframe is grouped and sorted by the ``source`` column to make computing metrics for a particular method straightforward.\n", "\n", "The columns are:\n", "\n", "- ``label``: ligand identifier\n", "- ``DG (kcal/mol)``: the absolute binding free energy, either experimental, MLE-derived or directly simulated\n", "- ``uncertainty (kcal/mol)``: the reported uncertainty\n", "- ``source``: which method / dataset the row comes from\n", "- ``computational``: ``True`` for calculated results, ``False`` for experimental" ] }, { "cell_type": "code", "execution_count": 6, "id": "8f243328d08f79e9", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T09:33:18.806890Z", "start_time": "2026-06-02T09:33:18.761211Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelDG (kcal/mol)uncertainty (kcal/mol)sourcecomputational
0CAT-13a-8.830.1ExperimentalFalse
1CAT-13b-9.110.1ExperimentalFalse
2CAT-13c-9.310.1ExperimentalFalse
3CAT-13d-10.460.1ExperimentalFalse
4CAT-13e-9.950.1ExperimentalFalse
5CAT-13f-9.080.1ExperimentalFalse
6CAT-13g-9.080.1ExperimentalFalse
7CAT-13h-9.620.1ExperimentalFalse
8CAT-13i-9.260.1ExperimentalFalse
9CAT-13j-8.720.1ExperimentalFalse
\n", "
" ], "text/plain": [ " label DG (kcal/mol) uncertainty (kcal/mol) source computational\n", "0 CAT-13a -8.83 0.1 Experimental False\n", "1 CAT-13b -9.11 0.1 Experimental False\n", "2 CAT-13c -9.31 0.1 Experimental False\n", "3 CAT-13d -10.46 0.1 Experimental False\n", "4 CAT-13e -9.95 0.1 Experimental False\n", "5 CAT-13f -9.08 0.1 Experimental False\n", "6 CAT-13g -9.08 0.1 Experimental False\n", "7 CAT-13h -9.62 0.1 Experimental False\n", "8 CAT-13i -9.26 0.1 Experimental False\n", "9 CAT-13j -8.72 0.1 Experimental False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "abs_df = femap.get_absolute_dataframe()\n", "# print the experimental values\n", "abs_df.head(10)" ] }, { "cell_type": "code", "execution_count": 7, "id": "6b7e0fad", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelDG (kcal/mol)uncertainty (kcal/mol)sourcecomputational
98CAT-4c1.9070470.110923MLE(Method B)True
99CAT-4d-1.2081850.117046MLE(Method B)True
100CAT-4i2.8296560.109747MLE(Method B)True
101CAT-4j1.3810730.100368MLE(Method B)True
102CAT-4k1.6505620.105023MLE(Method B)True
103CAT-4l2.1576890.117528MLE(Method B)True
104CAT-4m0.9032430.089598MLE(Method B)True
105CAT-4n0.2095480.103610MLE(Method B)True
106CAT-4o0.9210200.096058MLE(Method B)True
107CAT-4p-0.2594420.098955MLE(Method B)True
\n", "
" ], "text/plain": [ " label DG (kcal/mol) uncertainty (kcal/mol) source \\\n", "98 CAT-4c 1.907047 0.110923 MLE(Method B) \n", "99 CAT-4d -1.208185 0.117046 MLE(Method B) \n", "100 CAT-4i 2.829656 0.109747 MLE(Method B) \n", "101 CAT-4j 1.381073 0.100368 MLE(Method B) \n", "102 CAT-4k 1.650562 0.105023 MLE(Method B) \n", "103 CAT-4l 2.157689 0.117528 MLE(Method B) \n", "104 CAT-4m 0.903243 0.089598 MLE(Method B) \n", "105 CAT-4n 0.209548 0.103610 MLE(Method B) \n", "106 CAT-4o 0.921020 0.096058 MLE(Method B) \n", "107 CAT-4p -0.259442 0.098955 MLE(Method B) \n", "\n", " computational \n", "98 True \n", "99 True \n", "100 True \n", "101 True \n", "102 True \n", "103 True \n", "104 True \n", "105 True \n", "106 True \n", "107 True " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# print the MLE values calculated using Method B\n", "abs_df.tail(10)" ] }, { "cell_type": "code", "execution_count": 8, "id": "eb5f7524", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sources present: ['Experimental', 'MLE(Method A)', 'MLE(Method B)']\n" ] } ], "source": [ "# get all sources in the dataframe\n", "print(f\"Sources present: {abs_df['source'].unique().tolist()}\")" ] }, { "cell_type": "code", "execution_count": 9, "id": "40db1ff9f0bf359", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T09:57:48.050833Z", "start_time": "2026-06-02T09:57:47.791327Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Paired ligands — MLE(Method A): 36, MLE(Method B): 36\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelDG_calcdDG_calcDG_exptdDG_expt
0CAT-13a0.5032130.066349-8.830.1
1CAT-13b0.3788640.102983-9.110.1
2CAT-13c0.0281100.099052-9.310.1
3CAT-13d-1.2113770.075449-10.460.1
4CAT-13e-1.1368900.099052-9.950.1
\n", "
" ], "text/plain": [ " label DG_calc dDG_calc DG_expt dDG_expt\n", "0 CAT-13a 0.503213 0.066349 -8.83 0.1\n", "1 CAT-13b 0.378864 0.102983 -9.11 0.1\n", "2 CAT-13c 0.028110 0.099052 -9.31 0.1\n", "3 CAT-13d -1.211377 0.075449 -10.46 0.1\n", "4 CAT-13e -1.136890 0.099052 -9.95 0.1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# make a helper function to align the chosen source with the experimental data\n", "def get_paired_dg(df: pd.DataFrame, source: str) -> pd.DataFrame:\n", " \"\"\"Return a DataFrame with matched MLE computational and experimental ΔG columns.\"\"\"\n", " calc = df[df[\"source\"] == source][[\"label\", \"DG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n", " calc = calc.rename(columns={\"DG (kcal/mol)\": \"DG_calc\", \"uncertainty (kcal/mol)\": \"dDG_calc\"})\n", "\n", " expt = df[df[\"source\"] == \"Experimental\"][[\"label\", \"DG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n", " expt = expt.rename(columns={\"DG (kcal/mol)\": \"DG_expt\", \"uncertainty (kcal/mol)\": \"dDG_expt\"})\n", "\n", " # merge the aligned dataframes\n", " paired = calc.merge(expt, on=\"label\", how=\"left\")\n", " # drop any rows with a missing experimental value\n", " paired = paired.dropna(subset=[\"DG_expt\"])\n", " return paired\n", "\n", "\n", "# MLE source names are automatically composed as \"MLE()\"\n", "abs_paired_A = get_paired_dg(abs_df, \"MLE(Method A)\")\n", "abs_paired_B = get_paired_dg(abs_df, \"MLE(Method B)\")\n", "\n", "print(f\"Paired ligands — MLE(Method A): {len(abs_paired_A)}, MLE(Method B): {len(abs_paired_B)}\\n\")\n", "abs_paired_A.head()" ] }, { "cell_type": "markdown", "id": "7a8a140f270e8b69", "metadata": {}, "source": [ "## Computing metrics with ``cinnabar.stats``\n", "\n", "Each metric function implemented in ``cinnabar.stats`` has a similar API, they each take two ``numpy`` arrays: ``y_true`` and ``y_pred`` — and return a single scalar. The convention throughout ``cinnabar`` is:\n", "\n", "- ``y_true``: experimental values\n", "- ``y_pred``: calculated / predicted values\n", "\n", "The arguments can be swaped however or used to compare two different calculated results as you wish.\n", "\n", "### Relative ΔΔG metrics\n" ] }, { "cell_type": "code", "execution_count": 10, "id": "7ebc5dbf3314cb06", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T10:04:52.533068Z", "start_time": "2026-06-02T10:04:52.322Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Relative ΔΔG metrics:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodMetricValue
0Method ARMSE1.053002
1Method AMUE0.867586
2Method BRMSE1.134862
3Method BMUE0.898565
\n", "
" ], "text/plain": [ " Method Metric Value\n", "0 Method A RMSE 1.053002\n", "1 Method A MUE 0.867586\n", "2 Method B RMSE 1.134862\n", "3 Method B MUE 0.898565" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_true_A = paired_A[\"DDG_expt\"].to_numpy()\n", "y_pred_A = paired_A[\"DDG_calc\"].to_numpy()\n", "y_pred_B = paired_B[\"DDG_calc\"].to_numpy()\n", "\n", "# calculate the RMSE and MUE for Method A/B relative to experiment and report in a pandas DataFrame\n", "data = []\n", "\n", "for source, y_pred in [(\"Method A\", y_pred_A), (\"Method B\", y_pred_B)]:\n", " for metric, func in [(\"RMSE\", stats.calculate_rmse), (\"MUE\", stats.calculate_mue)]:\n", " value = func(y_true_A, y_pred)\n", " data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n", "\n", "\n", "metrics_df = pd.DataFrame(data)\n", "print(\"Relative ΔΔG metrics:\")\n", "metrics_df" ] }, { "cell_type": "markdown", "id": "73331d51e1455d5a", "metadata": {}, "source": [ "You can cross-check these values with the default annotations shown in the scatter plots in the {doc}`cinnabar API tutorial ` and see that they match to 2 decimal places.\n", "\n", "### Absolute ΔG metrics\n", "\n", "The same functions work equally well on the MLE-derived absolute free energies, we can also calculate correlation metrics for these values matching the defaults on the absolute ΔG scatter plots in the API tutorial:" ] }, { "cell_type": "code", "execution_count": 11, "id": "201e83026700cc7a", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T10:21:29.844172Z", "start_time": "2026-06-02T10:21:29.823399Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Absolute ΔG metrics:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodMetricValue
0MLE(Method A)RMSE9.364494
1MLE(Method A)MUE9.326389
2MLE(Method A)R20.614966
3MLE(Method A)rho0.784198
4MLE(Method B)RMSE9.376977
5MLE(Method B)MUE9.326389
6MLE(Method B)R20.561993
7MLE(Method B)rho0.749662
\n", "
" ], "text/plain": [ " Method Metric Value\n", "0 MLE(Method A) RMSE 9.364494\n", "1 MLE(Method A) MUE 9.326389\n", "2 MLE(Method A) R2 0.614966\n", "3 MLE(Method A) rho 0.784198\n", "4 MLE(Method B) RMSE 9.376977\n", "5 MLE(Method B) MUE 9.326389\n", "6 MLE(Method B) R2 0.561993\n", "7 MLE(Method B) rho 0.749662" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_true_A_abs = abs_paired_A[\"DG_expt\"].to_numpy()\n", "y_pred_A_abs = abs_paired_A[\"DG_calc\"].to_numpy()\n", "y_pred_B_abs = abs_paired_B[\"DG_calc\"].to_numpy()\n", "\n", "# calculate the default metrics for Method A/B relative to experiment and report in a pandas DataFrame\n", "data = []\n", "for source, y_pred in [(\"MLE(Method A)\", y_pred_A_abs), (\"MLE(Method B)\", y_pred_B_abs)]:\n", " for metric, func in [\n", " (\"RMSE\", stats.calculate_rmse),\n", " (\"MUE\", stats.calculate_mue),\n", " (\"R2\", stats.calculate_r2),\n", " (\"rho\", stats.calculate_pearson_r),\n", " ]:\n", " value = func(y_true_A_abs, y_pred)\n", " data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n", "metrics_df = pd.DataFrame(data)\n", "print(\"Absolute ΔG metrics:\")\n", "metrics_df" ] }, { "cell_type": "markdown", "id": "4b93829b86ee3ceb", "metadata": {}, "source": [ "
\n", "NOTE The accuracy metrics (RMSE and MUE) are significantly worse than those reported on the absolute ΔG scatter plots in the API tutorial. This is due the MLE values being centered around 0 rather than the experimental mean, this offset is automatically removed by default in the scatter plot. The correlation metrics (R² and Pearson r) are unaffected by this shift and match the API tutorial values.\n", "
" ] }, { "cell_type": "code", "execution_count": 12, "id": "9fd89340d3a7ec0c", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T10:32:46.193444Z", "start_time": "2026-06-02T10:32:45.865907Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Relative ΔΔG metrics:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodMetricValue
0MLE(Method A)RMSE0.843931
1MLE(Method A)MUE0.664549
2MLE(Method A)R20.614966
3MLE(Method A)rho0.784198
4MLE(Method B)RMSE0.972712
5MLE(Method B)MUE0.770764
6MLE(Method B)R20.561993
7MLE(Method B)rho0.749662
\n", "
" ], "text/plain": [ " Method Metric Value\n", "0 MLE(Method A) RMSE 0.843931\n", "1 MLE(Method A) MUE 0.664549\n", "2 MLE(Method A) R2 0.614966\n", "3 MLE(Method A) rho 0.784198\n", "4 MLE(Method B) RMSE 0.972712\n", "5 MLE(Method B) MUE 0.770764\n", "6 MLE(Method B) R2 0.561993\n", "7 MLE(Method B) rho 0.749662" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate the RMSE and MUE error again after applying a shift to align the center of the predicted and experimental values\n", "y_true_A_abs_centered = y_true_A_abs - np.mean(y_true_A_abs)\n", "y_pred_A_abs_centered = y_pred_A_abs - np.mean(y_pred_A_abs)\n", "y_pred_B_abs_centered = y_pred_B_abs - np.mean(y_pred_B_abs)\n", "\n", "# calculate the default metrics for Method A/B relative to experiment and report in a pandas DataFrame\n", "data = []\n", "for source, y_pred in [(\"MLE(Method A)\", y_pred_A_abs_centered), (\"MLE(Method B)\", y_pred_B_abs_centered)]:\n", " for metric, func in [\n", " (\"RMSE\", stats.calculate_rmse),\n", " (\"MUE\", stats.calculate_mue),\n", " (\"R2\", stats.calculate_r2),\n", " (\"rho\", stats.calculate_pearson_r),\n", " ]:\n", " value = func(y_true_A_abs_centered, y_pred)\n", " data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n", "metrics_df = pd.DataFrame(data)\n", "print(\"Relative ΔΔG metrics:\")\n", "metrics_df" ] }, { "cell_type": "markdown", "id": "4a93e375860fd4ba", "metadata": {}, "source": [ "## Bootstrap confidence intervals\n", "\n", "A sample estimate of a metric is not useful in isolation without an estimate of its uncertainity. ``bootstrap_statistic`` wraps any metric with bootstrap resampling to create a distribution of values from which confidence intervals can be derived (see [here](https://en.wikipedia.org/wiki/Confidence_interval) for more information on confidence intervals and their derivation). It returns a dictionary with the following keys:\n", "\n", "- ``mle``: the statistic on the original (unsampled) data, or sample statistic\n", "- ``mean``: the mean value for the metric over all bootstrap samples\n", "- ``stderr``: standard error of the statistic over all bootstrap samples\n", "- ``low`` / ``high``: the confidence interval bounds (default 95 %)\n", "\n", "The ``statistic`` argument accepts any of the available metrics as short-hand strings: ``\"RMSE\"``, ``\"MUE\"``, ``\"R2\"``, ``\"rho\"``, ``\"KTAU\"``, ``\"RAE\"``, ``\"NRMSE\"``, or ``\"PI\"``.\n", "\n", "You can also pass the experimental and predicted **uncertainty** arrays and ask the bootstrap to incorporate them by setting ``include_true_uncertainty=True`` and/or ``include_pred_uncertainty=True``. This adds Gaussian noise drawn from those uncertainties during each replicate of bootstrap sampling." ] }, { "cell_type": "code", "execution_count": 13, "id": "491d450cfe9d40b4", "metadata": { "ExecuteTime": { "end_time": "2026-06-02T10:45:55.262032Z", "start_time": "2026-06-02T10:45:54.786528Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RMSE bootstrap results for Method A (ΔΔG):\n", " mle : 1.053 kcal/mol\n", " stderr : 0.086 kcal/mol\n", " mean : 1.053 kcal/mol\n", " low : 0.879 kcal/mol\n", " high : 1.220 kcal/mol\n" ] } ], "source": [ "# calculate the RMSE of the DDGs again with a bootstrap CI estimate\n", "rmse_boot_A = stats.bootstrap_statistic(\n", " y_true=y_true_A,\n", " y_pred=y_pred_A,\n", " dy_true=paired_A[\"dDDG_expt\"].to_numpy(),\n", " dy_pred=paired_A[\"dDDG_calc\"].to_numpy(),\n", " statistic=\"RMSE\",\n", " ci=0.95,\n", " nbootstrap=1000,\n", " include_pred_uncertainty=False,\n", ")\n", "\n", "print(\"RMSE bootstrap results for Method A (ΔΔG):\")\n", "for key, value in rmse_boot_A.items():\n", " print(f\" {key:8s}: {value:.3f} kcal/mol\")" ] }, { "cell_type": "markdown", "id": "cf97a28243d3ae24", "metadata": {}, "source": [ "This can then be repeated for every metric of interest using a simple loop as above.\n", "\n", "## Common tasks with ``FEMap`` dataframes\n", "\n", "The DataFrames returned by ``get_relative_dataframe()`` and ``get_absolute_dataframe()`` are standard ``pandas`` DataFrames, so any standard DataFrame operation works on them. A few useful patterns which can help you explore the data and prepare it for downstream analysis:\n" ] }, { "cell_type": "code", "execution_count": 14, "id": "4745ba84e832c640", "metadata": {}, "outputs": [], "source": [ "# List all unique sources\n", "rel_df[\"source\"].unique()\n", "\n", "# Filter to a single method\n", "method_a_rows = rel_df[rel_df[\"source\"] == \"Method A\"]\n", "\n", "# Filter to experimental rows only\n", "expt_rows = rel_df[~rel_df[\"computational\"]]" ] }, { "cell_type": "markdown", "id": "d9d9138ce2de3441", "metadata": {}, "source": [ "## Recap\n", "\n", "- Use ``femap.get_relative_dataframe()`` to obtain the directly simulated ΔΔG edges together with the corresponding experimental ΔΔG values; merge on ``(labelA, labelB)`` to pair them for metric computation.\n", "- Use ``femap.get_absolute_dataframe()`` for per-ligand ΔG values; merge on ``label`` to pair them. Requires ``femap.generate_absolute_values()`` to have been called first for the MLE estimates or directly simulated absolute values to be present.\n", "- All point-estimate metrics are in ``cinnabar.stats`` and accept plain ``numpy`` arrays.\n", "- ``bootstrap_statistic`` wraps any of these with bootstrap resampling and returns the MLE estimate alongside a mean, standard error, and confidence interval." ] } ], "metadata": { "kernelspec": { "display_name": "cinnabar", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }