{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f119d17fc04de233",
   "metadata": {},
   "source": [
    "# Calculating Error and Correlation Metrics Manually\n",
    "\n",
    "``cinnabar``'s scatter plots automatically annotate figures with statistics such as **RMSE** or **MUE** based on the recommend best practices for the observable being plotted as described in the [companion paper](https://livecomsjournal.org/index.php/livecoms/article/view/v4i1e1497).\n",
    "\n",
    "Sometimes, however, you may need to access those numbers to run a statistical comparison, or feed results into a downstream analysis pipeline and ``cinnabar`` provides a convenient way to do that as well using the API.\n",
    "\n",
    "In this tutorial we will demonstrate how to recreate the analysis metrics shown by default in the example scatter plots manually, this involves the following steps:\n",
    "\n",
    "1. Extract the predicted and experimental values from a ``FEMap`` using the dataframe helper functions.\n",
    "2. Compute the appropriate metrics for the given observable using analysis functions available in ``cinnabar.stats``.\n",
    "3. Obtain bootstrap confidence intervals around each metric using ``cinnabar.stats.bootstrap_statistic``.\n",
    "\n",
    "## Available metrics\n",
    "\n",
    "The ``cinnabar.stats`` module exposes are wide range of deviation and correlation metrics, the table below maps the short-hand string identifiers (used by ``cinnabar`` internally) for each metric to their corresponding function names and descriptions:\n",
    "\n",
    "| Metric    | Function name                  | Description                                                                                                            |\n",
    "|-----------|--------------------------------|------------------------------------------------------------------------------------------------------------------------|\n",
    "| ``\"RMSE\"``  | ``calculate_rmse``             | Root mean squared error.                                                                                               |\n",
    "| ``\"MUE\"``   | ``calculate_mue``              | Mean unsigned error.                                                                                                   |\n",
    "| ``\"RAE\"``   | ``calculate_rae``              | Relative absolute error, a measure of the average absolute error relative to the average magnitude of the true values. |\n",
    "| ``\"NRMSE\"`` | ``calculate_nrmse``            | Normalized root mean squared error, using the true mean to normalize the RMSE.                                         |\n",
    "| ``\"R2\"``    | ``calculate_r2``               | Coefficient of determination.                                                                                          |\n",
    "| ``\"rho\"``   | ``calculate_pearson_r``        | Pearson correlation coefficient.                                                                                       |\n",
    "| ``\"KTAU\"``  | ``calculate_kendalls_tau``      | Kendall Tau correlation coefficient.                                                                                   |\n",
    "| ``\"PI\"``    | ``calculate_predictive_index`` | Predictive index, weighted Kendall Tau using the difference of the true values to weight ordinal comparisons.       |\n",
    "\n",
    "\n",
    "see the {mod}`cinnabar.stats` API documentation for more details.\n",
    "\n",
    "## Loading Example RBFE Results\n",
    "\n",
    "For this example, we will use RBFE data generated with [OpenFE](https://docs.openfree.energy/en/latest/), which is bundled with ``cinnabar``'s data directory. The dataset contains one set of simulated relative free energy edges (**Method A**) and a corresponding set of experimental absolute free energies. To demonstrate comparing two computational protocols, we also create a second method (**Method B**) by adding ±0.5 kcal/mol Gaussian noise to the Method A values. You can easily swap this out with your own RBFE outputs.\n",
    "\n",
    "For more information on creating a ``FEMap`` and adding data to it, see the {doc}`cinnabar API tutorial <cinnabar-api>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "67d2c737119baf13",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T08:42:17.741085Z",
     "start_time": "2026-06-02T08:42:16.785805Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "FEMap contains 36 ligands\n",
      "Computational sources: ['Method A', 'Method B']\n"
     ]
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "from openff.units import unit\n",
    "\n",
    "from cinnabar import FEMap, stats\n",
    "\n",
    "# Load the computational results bundled with cinnabar\n",
    "rbfe_results = pd.read_csv(\"../cinnabar/data/computational_data.csv\")\n",
    "\n",
    "# Fix the random seed so bootstrap results are reproducible\n",
    "rng = np.random.default_rng(seed=42)\n",
    "\n",
    "femap = FEMap()\n",
    "\n",
    "# Method A: the original calculated values\n",
    "for _, result in rbfe_results.iterrows():\n",
    "    # add each calculated relative free energy to the FEMap\n",
    "    femap.add_relative_calculation(\n",
    "        labelA=result[\"Ligand1\"],  # string identifier for ligandA\n",
    "        labelB=result[\"Ligand2\"],  # string identifier for ligandB\n",
    "        value=result[\"calc_DDG\"] * unit.kilocalorie_per_mole,  # the calculated relative free energy with units\n",
    "        uncertainty=result[\"calc_dDDG(MBAR)\"]\n",
    "        * unit.kilocalorie_per_mole,  # the uncertainty in the calculated relative free energy with units\n",
    "        source=\"Method A\",  # optional string describing the source of the calculation\n",
    "    )\n",
    "\n",
    "# Method B: same edges, slightly different values (simulating a second protocol)\n",
    "for _, result in rbfe_results.iterrows():\n",
    "    # add each calculated relative free energy to the FEMap\n",
    "    femap.add_relative_calculation(\n",
    "        labelA=result[\"Ligand1\"],\n",
    "        labelB=result[\"Ligand2\"],\n",
    "        # add ±0.5 kcal/mol noise\n",
    "        value=(result[\"calc_DDG\"] + rng.normal(0, 0.5)) * unit.kilocalorie_per_mole,\n",
    "        uncertainty=result[\"calc_dDDG(MBAR)\"] * unit.kilocalorie_per_mole,\n",
    "        source=\"Method B\",\n",
    "    )\n",
    "\n",
    "# Experimental absolute values (shared by both methods)\n",
    "experimental_results = pd.read_csv(\"../cinnabar/data/experimental_data.csv\")\n",
    "for _, exp_row in experimental_results.iterrows():\n",
    "    femap.add_experimental_measurement(\n",
    "        label=exp_row[\"Ligand\"],\n",
    "        value=exp_row[\"expt_DG\"] * unit.kilocalorie_per_mole,\n",
    "        uncertainty=exp_row[\"expt_dDG\"] * unit.kilocalorie_per_mole,\n",
    "        source=\"Experimental\",\n",
    "    )\n",
    "\n",
    "# Generate MLE absolute values for both sources in one call.\n",
    "# This produces sources \"MLE(Method A)\" and \"MLE(Method B)\" in the absolute dataframe.\n",
    "femap.generate_absolute_values()\n",
    "print(f\"FEMap contains {femap.n_ligands} ligands\")\n",
    "rel_df = femap.get_relative_dataframe()\n",
    "print(\"Computational sources:\", rel_df.loc[rel_df[\"computational\"], \"source\"].unique().tolist())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b06918c6f69ab46f",
   "metadata": {},
   "source": [
    "## Extracting data from the FEMap\n",
    "\n",
    "### Relative ΔΔG data\n",
    "\n",
    "``get_relative_dataframe()`` returns a ``DataFrame`` of every directly simulated edge together with the corresponding experimental ΔΔG derived from the absolute experimental values. The dataframe is also grouped and sorted by the ``source`` column to make computing metrics for a particular method straightforward.\n",
    "\n",
    "The columns are:\n",
    "\n",
    "- ``labelA``, ``labelB``: labels for the two ligands involved in the transformation\n",
    "- ``DDG (kcal/mol)``: the simulated free energy difference (ΔΔG = ΔG_B − ΔG_A)\n",
    "- ``uncertainty (kcal/mol)``: the reported uncertainty for the simulation\n",
    "- ``source``: which method / dataset the row comes from\n",
    "- ``computational``: ``True`` for calculated results, ``False`` for experimental"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "57c9e3e60fb9b1fa",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-03T10:05:58.522288Z",
     "start_time": "2026-06-03T10:05:46.883344Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>labelA</th>\n",
       "      <th>labelB</th>\n",
       "      <th>DDG (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "      <th>source</th>\n",
       "      <th>computational</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-13m</td>\n",
       "      <td>-0.95</td>\n",
       "      <td>0.13</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-17g</td>\n",
       "      <td>-0.02</td>\n",
       "      <td>0.10</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-17i</td>\n",
       "      <td>-0.76</td>\n",
       "      <td>0.11</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CAT-13b</td>\n",
       "      <td>CAT-17g</td>\n",
       "      <td>0.36</td>\n",
       "      <td>0.11</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CAT-13c</td>\n",
       "      <td>CAT-17i</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.11</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>CAT-13b</td>\n",
       "      <td>2.12</td>\n",
       "      <td>0.12</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>CAT-13f</td>\n",
       "      <td>0.13</td>\n",
       "      <td>0.13</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>CAT-13h</td>\n",
       "      <td>1.46</td>\n",
       "      <td>0.10</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>CAT-13i</td>\n",
       "      <td>-0.59</td>\n",
       "      <td>0.13</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>CAT-17a</td>\n",
       "      <td>-0.78</td>\n",
       "      <td>0.07</td>\n",
       "      <td>Method A</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    labelA   labelB  DDG (kcal/mol)  uncertainty (kcal/mol)    source  \\\n",
       "0  CAT-13a  CAT-13m           -0.95                    0.13  Method A   \n",
       "1  CAT-13a  CAT-17g           -0.02                    0.10  Method A   \n",
       "2  CAT-13a  CAT-17i           -0.76                    0.11  Method A   \n",
       "3  CAT-13b  CAT-17g            0.36                    0.11  Method A   \n",
       "4  CAT-13c  CAT-17i            0.26                    0.11  Method A   \n",
       "5  CAT-13d  CAT-13b            2.12                    0.12  Method A   \n",
       "6  CAT-13d  CAT-13f            0.13                    0.13  Method A   \n",
       "7  CAT-13d  CAT-13h            1.46                    0.10  Method A   \n",
       "8  CAT-13d  CAT-13i           -0.59                    0.13  Method A   \n",
       "9  CAT-13d  CAT-17a           -0.78                    0.07  Method A   \n",
       "\n",
       "   computational  \n",
       "0           True  \n",
       "1           True  \n",
       "2           True  \n",
       "3           True  \n",
       "4           True  \n",
       "5           True  \n",
       "6           True  \n",
       "7           True  \n",
       "8           True  \n",
       "9           True  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "rel_df = femap.get_relative_dataframe()\n",
    "# show some method A results\n",
    "rel_df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "787e4eb5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>labelA</th>\n",
       "      <th>labelB</th>\n",
       "      <th>DDG (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "      <th>source</th>\n",
       "      <th>computational</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>164</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4c</td>\n",
       "      <td>1.30</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>165</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4j</td>\n",
       "      <td>0.13</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>166</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4k</td>\n",
       "      <td>1.30</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4l</td>\n",
       "      <td>-0.19</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4n</td>\n",
       "      <td>0.06</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>CAT-4p</td>\n",
       "      <td>-0.93</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>170</th>\n",
       "      <td>CAT-4n</td>\n",
       "      <td>CAT-13k</td>\n",
       "      <td>-0.61</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>171</th>\n",
       "      <td>CAT-4o</td>\n",
       "      <td>CAT-4b</td>\n",
       "      <td>-0.25</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>CAT-4o</td>\n",
       "      <td>CAT-4d</td>\n",
       "      <td>0.27</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>CAT-4p</td>\n",
       "      <td>CAT-13k</td>\n",
       "      <td>0.38</td>\n",
       "      <td>0.141421</td>\n",
       "      <td>experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     labelA   labelB  DDG (kcal/mol)  uncertainty (kcal/mol)        source  \\\n",
       "164  CAT-4m   CAT-4c            1.30                0.141421  experimental   \n",
       "165  CAT-4m   CAT-4j            0.13                0.141421  experimental   \n",
       "166  CAT-4m   CAT-4k            1.30                0.141421  experimental   \n",
       "167  CAT-4m   CAT-4l           -0.19                0.141421  experimental   \n",
       "168  CAT-4m   CAT-4n            0.06                0.141421  experimental   \n",
       "169  CAT-4m   CAT-4p           -0.93                0.141421  experimental   \n",
       "170  CAT-4n  CAT-13k           -0.61                0.141421  experimental   \n",
       "171  CAT-4o   CAT-4b           -0.25                0.141421  experimental   \n",
       "172  CAT-4o   CAT-4d            0.27                0.141421  experimental   \n",
       "173  CAT-4p  CAT-13k            0.38                0.141421  experimental   \n",
       "\n",
       "     computational  \n",
       "164          False  \n",
       "165          False  \n",
       "166          False  \n",
       "167          False  \n",
       "168          False  \n",
       "169          False  \n",
       "170          False  \n",
       "171          False  \n",
       "172          False  \n",
       "173          False  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# show some experimental results\n",
    "rel_df.tail(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d6e8dbbf",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sources present: ['Method A', 'Method B', 'experimental']\n"
     ]
    }
   ],
   "source": [
    "# show all the sources in the dataframe\n",
    "print(f\"Sources present: {rel_df['source'].unique().tolist()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e830d2869292b74",
   "metadata": {},
   "source": [
    "To compute error metrics for a particular method we need to pair each computational edge with the corresponding experimental value.  Both are already in the same DataFrame with matching ``(labelA, labelB)`` keys  we simply merge on those columns, while making sure to filter out any rows with missing values (e.g. if an experimental value is missing for a particular edge, that row will be dropped from the merged dataframe)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2a4f2962af83edb6",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T09:01:09.921099Z",
     "start_time": "2026-06-02T09:01:09.879855Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paired edges — Method A: 58, Method B: 58\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>labelA</th>\n",
       "      <th>labelB</th>\n",
       "      <th>DDG_calc</th>\n",
       "      <th>dDDG_calc</th>\n",
       "      <th>DDG_expt</th>\n",
       "      <th>dDDG_expt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-13m</td>\n",
       "      <td>-0.95</td>\n",
       "      <td>0.13</td>\n",
       "      <td>0.08</td>\n",
       "      <td>0.141421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-17g</td>\n",
       "      <td>-0.02</td>\n",
       "      <td>0.10</td>\n",
       "      <td>-0.90</td>\n",
       "      <td>0.141421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>CAT-17i</td>\n",
       "      <td>-0.76</td>\n",
       "      <td>0.11</td>\n",
       "      <td>-0.63</td>\n",
       "      <td>0.141421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CAT-13b</td>\n",
       "      <td>CAT-17g</td>\n",
       "      <td>0.36</td>\n",
       "      <td>0.11</td>\n",
       "      <td>-0.62</td>\n",
       "      <td>0.141421</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CAT-13c</td>\n",
       "      <td>CAT-17i</td>\n",
       "      <td>0.26</td>\n",
       "      <td>0.11</td>\n",
       "      <td>-0.15</td>\n",
       "      <td>0.141421</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    labelA   labelB  DDG_calc  dDDG_calc  DDG_expt  dDDG_expt\n",
       "0  CAT-13a  CAT-13m     -0.95       0.13      0.08   0.141421\n",
       "1  CAT-13a  CAT-17g     -0.02       0.10     -0.90   0.141421\n",
       "2  CAT-13a  CAT-17i     -0.76       0.11     -0.63   0.141421\n",
       "3  CAT-13b  CAT-17g      0.36       0.11     -0.62   0.141421\n",
       "4  CAT-13c  CAT-17i      0.26       0.11     -0.15   0.141421"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# make a helper function to align a source in the dataframe with the experimental values\n",
    "def get_paired_ddg(df: pd.DataFrame, source: str) -> pd.DataFrame:\n",
    "    \"\"\"Return a DataFrame with matched computational and experimental ΔΔG columns.\"\"\"\n",
    "    # Computational rows for the requested source\n",
    "    calc = df[df[\"source\"] == source][[\"labelA\", \"labelB\", \"DDG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n",
    "    calc = calc.rename(columns={\"DDG (kcal/mol)\": \"DDG_calc\", \"uncertainty (kcal/mol)\": \"dDDG_calc\"})\n",
    "\n",
    "    # Experimental rows (source label is always \"experimental\" inside the relative dataframe)\n",
    "    expt = df[df[\"source\"] == \"experimental\"][[\"labelA\", \"labelB\", \"DDG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n",
    "    expt = expt.rename(columns={\"DDG (kcal/mol)\": \"DDG_expt\", \"uncertainty (kcal/mol)\": \"dDDG_expt\"})\n",
    "\n",
    "    # merge the aligned dataframes\n",
    "    paired = calc.merge(expt, on=[\"labelA\", \"labelB\"], how=\"left\")\n",
    "    # drop any rows with a missing experimental value\n",
    "    paired = paired.dropna(subset=[\"DDG_expt\"])\n",
    "    return paired\n",
    "\n",
    "\n",
    "paired_A = get_paired_ddg(rel_df, \"Method A\")\n",
    "paired_B = get_paired_ddg(rel_df, \"Method B\")\n",
    "\n",
    "print(f\"Paired edges — Method A: {len(paired_A)}, Method B: {len(paired_B)}\\n\")\n",
    "paired_A.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b571b3b561808c8c",
   "metadata": {},
   "source": [
    "### Absolute ΔG data\n",
    "\n",
    "``get_absolute_dataframe()`` returns per-ligand absolute free energies as a ``DataFrame``.  After calling ``generate_absolute_values()`` the MLE-derived computational estimates appear alongside the experimental absolute values, directly simulated absolute values are also included if present. The dataframe is grouped and sorted by the ``source`` column to make computing metrics for a particular method straightforward.\n",
    "\n",
    "The columns are:\n",
    "\n",
    "- ``label``: ligand identifier\n",
    "- ``DG (kcal/mol)``: the absolute binding free energy, either experimental, MLE-derived or directly simulated\n",
    "- ``uncertainty (kcal/mol)``: the reported uncertainty\n",
    "- ``source``: which method / dataset the row comes from\n",
    "- ``computational``: ``True`` for calculated results, ``False`` for experimental"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "8f243328d08f79e9",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T09:33:18.806890Z",
     "start_time": "2026-06-02T09:33:18.761211Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>DG (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "      <th>source</th>\n",
       "      <th>computational</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>-8.83</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CAT-13b</td>\n",
       "      <td>-9.11</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CAT-13c</td>\n",
       "      <td>-9.31</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>-10.46</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CAT-13e</td>\n",
       "      <td>-9.95</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>CAT-13f</td>\n",
       "      <td>-9.08</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>CAT-13g</td>\n",
       "      <td>-9.08</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>CAT-13h</td>\n",
       "      <td>-9.62</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>CAT-13i</td>\n",
       "      <td>-9.26</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>CAT-13j</td>\n",
       "      <td>-8.72</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Experimental</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     label  DG (kcal/mol)  uncertainty (kcal/mol)        source  computational\n",
       "0  CAT-13a          -8.83                     0.1  Experimental          False\n",
       "1  CAT-13b          -9.11                     0.1  Experimental          False\n",
       "2  CAT-13c          -9.31                     0.1  Experimental          False\n",
       "3  CAT-13d         -10.46                     0.1  Experimental          False\n",
       "4  CAT-13e          -9.95                     0.1  Experimental          False\n",
       "5  CAT-13f          -9.08                     0.1  Experimental          False\n",
       "6  CAT-13g          -9.08                     0.1  Experimental          False\n",
       "7  CAT-13h          -9.62                     0.1  Experimental          False\n",
       "8  CAT-13i          -9.26                     0.1  Experimental          False\n",
       "9  CAT-13j          -8.72                     0.1  Experimental          False"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "abs_df = femap.get_absolute_dataframe()\n",
    "# print the experimental values\n",
    "abs_df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "6b7e0fad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>DG (kcal/mol)</th>\n",
       "      <th>uncertainty (kcal/mol)</th>\n",
       "      <th>source</th>\n",
       "      <th>computational</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>CAT-4c</td>\n",
       "      <td>1.907047</td>\n",
       "      <td>0.110923</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>CAT-4d</td>\n",
       "      <td>-1.208185</td>\n",
       "      <td>0.117046</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>CAT-4i</td>\n",
       "      <td>2.829656</td>\n",
       "      <td>0.109747</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>CAT-4j</td>\n",
       "      <td>1.381073</td>\n",
       "      <td>0.100368</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>CAT-4k</td>\n",
       "      <td>1.650562</td>\n",
       "      <td>0.105023</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>CAT-4l</td>\n",
       "      <td>2.157689</td>\n",
       "      <td>0.117528</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>CAT-4m</td>\n",
       "      <td>0.903243</td>\n",
       "      <td>0.089598</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>105</th>\n",
       "      <td>CAT-4n</td>\n",
       "      <td>0.209548</td>\n",
       "      <td>0.103610</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>106</th>\n",
       "      <td>CAT-4o</td>\n",
       "      <td>0.921020</td>\n",
       "      <td>0.096058</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>107</th>\n",
       "      <td>CAT-4p</td>\n",
       "      <td>-0.259442</td>\n",
       "      <td>0.098955</td>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      label  DG (kcal/mol)  uncertainty (kcal/mol)         source  \\\n",
       "98   CAT-4c       1.907047                0.110923  MLE(Method B)   \n",
       "99   CAT-4d      -1.208185                0.117046  MLE(Method B)   \n",
       "100  CAT-4i       2.829656                0.109747  MLE(Method B)   \n",
       "101  CAT-4j       1.381073                0.100368  MLE(Method B)   \n",
       "102  CAT-4k       1.650562                0.105023  MLE(Method B)   \n",
       "103  CAT-4l       2.157689                0.117528  MLE(Method B)   \n",
       "104  CAT-4m       0.903243                0.089598  MLE(Method B)   \n",
       "105  CAT-4n       0.209548                0.103610  MLE(Method B)   \n",
       "106  CAT-4o       0.921020                0.096058  MLE(Method B)   \n",
       "107  CAT-4p      -0.259442                0.098955  MLE(Method B)   \n",
       "\n",
       "     computational  \n",
       "98            True  \n",
       "99            True  \n",
       "100           True  \n",
       "101           True  \n",
       "102           True  \n",
       "103           True  \n",
       "104           True  \n",
       "105           True  \n",
       "106           True  \n",
       "107           True  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# print the MLE values calculated using Method B\n",
    "abs_df.tail(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "eb5f7524",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sources present: ['Experimental', 'MLE(Method A)', 'MLE(Method B)']\n"
     ]
    }
   ],
   "source": [
    "# get all sources in the dataframe\n",
    "print(f\"Sources present: {abs_df['source'].unique().tolist()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "40db1ff9f0bf359",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T09:57:48.050833Z",
     "start_time": "2026-06-02T09:57:47.791327Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paired ligands — MLE(Method A): 36, MLE(Method B): 36\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>DG_calc</th>\n",
       "      <th>dDG_calc</th>\n",
       "      <th>DG_expt</th>\n",
       "      <th>dDG_expt</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>CAT-13a</td>\n",
       "      <td>0.503213</td>\n",
       "      <td>0.066349</td>\n",
       "      <td>-8.83</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>CAT-13b</td>\n",
       "      <td>0.378864</td>\n",
       "      <td>0.102983</td>\n",
       "      <td>-9.11</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>CAT-13c</td>\n",
       "      <td>0.028110</td>\n",
       "      <td>0.099052</td>\n",
       "      <td>-9.31</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>CAT-13d</td>\n",
       "      <td>-1.211377</td>\n",
       "      <td>0.075449</td>\n",
       "      <td>-10.46</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>CAT-13e</td>\n",
       "      <td>-1.136890</td>\n",
       "      <td>0.099052</td>\n",
       "      <td>-9.95</td>\n",
       "      <td>0.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     label   DG_calc  dDG_calc  DG_expt  dDG_expt\n",
       "0  CAT-13a  0.503213  0.066349    -8.83       0.1\n",
       "1  CAT-13b  0.378864  0.102983    -9.11       0.1\n",
       "2  CAT-13c  0.028110  0.099052    -9.31       0.1\n",
       "3  CAT-13d -1.211377  0.075449   -10.46       0.1\n",
       "4  CAT-13e -1.136890  0.099052    -9.95       0.1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# make a helper function to align the chosen source with the experimental data\n",
    "def get_paired_dg(df: pd.DataFrame, source: str) -> pd.DataFrame:\n",
    "    \"\"\"Return a DataFrame with matched MLE computational and experimental ΔG columns.\"\"\"\n",
    "    calc = df[df[\"source\"] == source][[\"label\", \"DG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n",
    "    calc = calc.rename(columns={\"DG (kcal/mol)\": \"DG_calc\", \"uncertainty (kcal/mol)\": \"dDG_calc\"})\n",
    "\n",
    "    expt = df[df[\"source\"] == \"Experimental\"][[\"label\", \"DG (kcal/mol)\", \"uncertainty (kcal/mol)\"]]\n",
    "    expt = expt.rename(columns={\"DG (kcal/mol)\": \"DG_expt\", \"uncertainty (kcal/mol)\": \"dDG_expt\"})\n",
    "\n",
    "    # merge the aligned dataframes\n",
    "    paired = calc.merge(expt, on=\"label\", how=\"left\")\n",
    "    # drop any rows with a missing experimental value\n",
    "    paired = paired.dropna(subset=[\"DG_expt\"])\n",
    "    return paired\n",
    "\n",
    "\n",
    "# MLE source names are automatically composed as \"MLE(<original_source>)\"\n",
    "abs_paired_A = get_paired_dg(abs_df, \"MLE(Method A)\")\n",
    "abs_paired_B = get_paired_dg(abs_df, \"MLE(Method B)\")\n",
    "\n",
    "print(f\"Paired ligands — MLE(Method A): {len(abs_paired_A)}, MLE(Method B): {len(abs_paired_B)}\\n\")\n",
    "abs_paired_A.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a8a140f270e8b69",
   "metadata": {},
   "source": [
    "## Computing metrics with ``cinnabar.stats``\n",
    "\n",
    "Each metric function implemented in ``cinnabar.stats`` has a similar API, they each take two ``numpy`` arrays: ``y_true`` and ``y_pred`` — and return a single scalar.  The convention throughout ``cinnabar`` is:\n",
    "\n",
    "- ``y_true``: experimental values\n",
    "- ``y_pred``: calculated / predicted values\n",
    "\n",
    "The arguments can be swaped however or used to compare two different calculated results as you wish.\n",
    "\n",
    "### Relative ΔΔG metrics\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "7ebc5dbf3314cb06",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T10:04:52.533068Z",
     "start_time": "2026-06-02T10:04:52.322Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Relative ΔΔG metrics:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Method</th>\n",
       "      <th>Metric</th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Method A</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>1.053002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Method A</td>\n",
       "      <td>MUE</td>\n",
       "      <td>0.867586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Method B</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>1.134862</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Method B</td>\n",
       "      <td>MUE</td>\n",
       "      <td>0.898565</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Method Metric     Value\n",
       "0  Method A   RMSE  1.053002\n",
       "1  Method A    MUE  0.867586\n",
       "2  Method B   RMSE  1.134862\n",
       "3  Method B    MUE  0.898565"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_true_A = paired_A[\"DDG_expt\"].to_numpy()\n",
    "y_pred_A = paired_A[\"DDG_calc\"].to_numpy()\n",
    "y_pred_B = paired_B[\"DDG_calc\"].to_numpy()\n",
    "\n",
    "# calculate the RMSE and MUE for Method A/B relative to experiment and report in a pandas DataFrame\n",
    "data = []\n",
    "\n",
    "for source, y_pred in [(\"Method A\", y_pred_A), (\"Method B\", y_pred_B)]:\n",
    "    for metric, func in [(\"RMSE\", stats.calculate_rmse), (\"MUE\", stats.calculate_mue)]:\n",
    "        value = func(y_true_A, y_pred)\n",
    "        data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n",
    "\n",
    "\n",
    "metrics_df = pd.DataFrame(data)\n",
    "print(\"Relative ΔΔG metrics:\")\n",
    "metrics_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73331d51e1455d5a",
   "metadata": {},
   "source": [
    "You can cross-check these values with the default annotations shown in the scatter plots in the {doc}`cinnabar API tutorial <cinnabar-api>` and see that they match to 2 decimal places.\n",
    "\n",
    "### Absolute ΔG metrics\n",
    "\n",
    "The same functions work equally well on the MLE-derived absolute free energies, we can also calculate correlation metrics for these values matching the defaults on the absolute ΔG scatter plots in the API tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "201e83026700cc7a",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T10:21:29.844172Z",
     "start_time": "2026-06-02T10:21:29.823399Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Absolute ΔG metrics:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Method</th>\n",
       "      <th>Metric</th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>9.364494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>MUE</td>\n",
       "      <td>9.326389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>R2</td>\n",
       "      <td>0.614966</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>rho</td>\n",
       "      <td>0.784198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>9.376977</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>MUE</td>\n",
       "      <td>9.326389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>R2</td>\n",
       "      <td>0.561993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>rho</td>\n",
       "      <td>0.749662</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Method Metric     Value\n",
       "0  MLE(Method A)   RMSE  9.364494\n",
       "1  MLE(Method A)    MUE  9.326389\n",
       "2  MLE(Method A)     R2  0.614966\n",
       "3  MLE(Method A)    rho  0.784198\n",
       "4  MLE(Method B)   RMSE  9.376977\n",
       "5  MLE(Method B)    MUE  9.326389\n",
       "6  MLE(Method B)     R2  0.561993\n",
       "7  MLE(Method B)    rho  0.749662"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_true_A_abs = abs_paired_A[\"DG_expt\"].to_numpy()\n",
    "y_pred_A_abs = abs_paired_A[\"DG_calc\"].to_numpy()\n",
    "y_pred_B_abs = abs_paired_B[\"DG_calc\"].to_numpy()\n",
    "\n",
    "# calculate the default metrics for Method A/B relative to experiment and report in a pandas DataFrame\n",
    "data = []\n",
    "for source, y_pred in [(\"MLE(Method A)\", y_pred_A_abs), (\"MLE(Method B)\", y_pred_B_abs)]:\n",
    "    for metric, func in [\n",
    "        (\"RMSE\", stats.calculate_rmse),\n",
    "        (\"MUE\", stats.calculate_mue),\n",
    "        (\"R2\", stats.calculate_r2),\n",
    "        (\"rho\", stats.calculate_pearson_r),\n",
    "    ]:\n",
    "        value = func(y_true_A_abs, y_pred)\n",
    "        data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n",
    "metrics_df = pd.DataFrame(data)\n",
    "print(\"Absolute ΔG metrics:\")\n",
    "metrics_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b93829b86ee3ceb",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-info\">\n",
    "<b>NOTE</b> The accuracy metrics (RMSE and MUE) are significantly worse than those reported on the absolute ΔG scatter plots in the API tutorial.  This is due the MLE values being centered around 0 rather than the experimental mean, this offset is automatically removed by default in the scatter plot. The correlation metrics (R² and Pearson r) are unaffected by this shift and match the API tutorial values.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "9fd89340d3a7ec0c",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T10:32:46.193444Z",
     "start_time": "2026-06-02T10:32:45.865907Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Relative ΔΔG metrics:\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Method</th>\n",
       "      <th>Metric</th>\n",
       "      <th>Value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>0.843931</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>MUE</td>\n",
       "      <td>0.664549</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>R2</td>\n",
       "      <td>0.614966</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>MLE(Method A)</td>\n",
       "      <td>rho</td>\n",
       "      <td>0.784198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>RMSE</td>\n",
       "      <td>0.972712</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>MUE</td>\n",
       "      <td>0.770764</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>R2</td>\n",
       "      <td>0.561993</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>MLE(Method B)</td>\n",
       "      <td>rho</td>\n",
       "      <td>0.749662</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          Method Metric     Value\n",
       "0  MLE(Method A)   RMSE  0.843931\n",
       "1  MLE(Method A)    MUE  0.664549\n",
       "2  MLE(Method A)     R2  0.614966\n",
       "3  MLE(Method A)    rho  0.784198\n",
       "4  MLE(Method B)   RMSE  0.972712\n",
       "5  MLE(Method B)    MUE  0.770764\n",
       "6  MLE(Method B)     R2  0.561993\n",
       "7  MLE(Method B)    rho  0.749662"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# calculate the RMSE and MUE error again after applying a shift to align the center of the predicted and experimental values\n",
    "y_true_A_abs_centered = y_true_A_abs - np.mean(y_true_A_abs)\n",
    "y_pred_A_abs_centered = y_pred_A_abs - np.mean(y_pred_A_abs)\n",
    "y_pred_B_abs_centered = y_pred_B_abs - np.mean(y_pred_B_abs)\n",
    "\n",
    "# calculate the default metrics for Method A/B relative to experiment and report in a pandas DataFrame\n",
    "data = []\n",
    "for source, y_pred in [(\"MLE(Method A)\", y_pred_A_abs_centered), (\"MLE(Method B)\", y_pred_B_abs_centered)]:\n",
    "    for metric, func in [\n",
    "        (\"RMSE\", stats.calculate_rmse),\n",
    "        (\"MUE\", stats.calculate_mue),\n",
    "        (\"R2\", stats.calculate_r2),\n",
    "        (\"rho\", stats.calculate_pearson_r),\n",
    "    ]:\n",
    "        value = func(y_true_A_abs_centered, y_pred)\n",
    "        data.append({\"Method\": source, \"Metric\": metric, \"Value\": value})\n",
    "metrics_df = pd.DataFrame(data)\n",
    "print(\"Relative ΔΔG metrics:\")\n",
    "metrics_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a93e375860fd4ba",
   "metadata": {},
   "source": [
    "## Bootstrap confidence intervals\n",
    "\n",
    "A sample estimate of a metric is not useful in isolation without an estimate of its uncertainity.  ``bootstrap_statistic`` wraps any metric with bootstrap resampling to create a distribution of values from which confidence intervals can be derived (see [here](https://en.wikipedia.org/wiki/Confidence_interval) for more information on confidence intervals and their derivation).  It returns a dictionary with the following keys:\n",
    "\n",
    "- ``mle``: the statistic on the original (unsampled) data, or sample statistic\n",
    "- ``mean``: the mean value for the metric over all bootstrap samples\n",
    "- ``stderr``: standard error of the statistic over all bootstrap samples\n",
    "- ``low`` / ``high``: the confidence interval bounds (default 95 %)\n",
    "\n",
    "The ``statistic`` argument accepts any of the available metrics as short-hand strings: ``\"RMSE\"``, ``\"MUE\"``, ``\"R2\"``, ``\"rho\"``, ``\"KTAU\"``, ``\"RAE\"``, ``\"NRMSE\"``, or ``\"PI\"``.\n",
    "\n",
    "You can also pass the experimental and predicted **uncertainty** arrays and ask the bootstrap to incorporate them by setting ``include_true_uncertainty=True`` and/or ``include_pred_uncertainty=True``. This adds Gaussian noise drawn from those uncertainties during each replicate of bootstrap sampling."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "491d450cfe9d40b4",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2026-06-02T10:45:55.262032Z",
     "start_time": "2026-06-02T10:45:54.786528Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RMSE bootstrap results for Method A (ΔΔG):\n",
      "  mle     : 1.053 kcal/mol\n",
      "  stderr  : 0.086 kcal/mol\n",
      "  mean    : 1.053 kcal/mol\n",
      "  low     : 0.879 kcal/mol\n",
      "  high    : 1.220 kcal/mol\n"
     ]
    }
   ],
   "source": [
    "# calculate the RMSE of the DDGs again with a bootstrap CI estimate\n",
    "rmse_boot_A = stats.bootstrap_statistic(\n",
    "    y_true=y_true_A,\n",
    "    y_pred=y_pred_A,\n",
    "    dy_true=paired_A[\"dDDG_expt\"].to_numpy(),\n",
    "    dy_pred=paired_A[\"dDDG_calc\"].to_numpy(),\n",
    "    statistic=\"RMSE\",\n",
    "    ci=0.95,\n",
    "    nbootstrap=1000,\n",
    "    include_pred_uncertainty=False,\n",
    ")\n",
    "\n",
    "print(\"RMSE bootstrap results for Method A (ΔΔG):\")\n",
    "for key, value in rmse_boot_A.items():\n",
    "    print(f\"  {key:8s}: {value:.3f} kcal/mol\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf97a28243d3ae24",
   "metadata": {},
   "source": [
    "This can then be repeated for every metric of interest using a simple loop as above.\n",
    "\n",
    "## Common tasks with ``FEMap`` dataframes\n",
    "\n",
    "The DataFrames returned by ``get_relative_dataframe()`` and ``get_absolute_dataframe()`` are standard ``pandas`` DataFrames, so any standard DataFrame operation works on them.  A few useful patterns which can help you explore the data and prepare it for downstream analysis:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "4745ba84e832c640",
   "metadata": {},
   "outputs": [],
   "source": [
    "# List all unique sources\n",
    "rel_df[\"source\"].unique()\n",
    "\n",
    "# Filter to a single method\n",
    "method_a_rows = rel_df[rel_df[\"source\"] == \"Method A\"]\n",
    "\n",
    "# Filter to experimental rows only\n",
    "expt_rows = rel_df[~rel_df[\"computational\"]]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d9d9138ce2de3441",
   "metadata": {},
   "source": [
    "## Recap\n",
    "\n",
    "- Use ``femap.get_relative_dataframe()`` to obtain the directly simulated ΔΔG edges together with the corresponding experimental ΔΔG values; merge on ``(labelA, labelB)`` to pair them for metric computation.\n",
    "- Use ``femap.get_absolute_dataframe()`` for per-ligand ΔG values; merge on ``label`` to pair them. Requires ``femap.generate_absolute_values()`` to have been called first for the MLE estimates or directly simulated absolute values to be present.\n",
    "- All point-estimate metrics are in ``cinnabar.stats`` and accept plain ``numpy`` arrays.\n",
    "- ``bootstrap_statistic`` wraps any of these with bootstrap resampling and returns the MLE estimate alongside a mean, standard error, and confidence interval."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cinnabar",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}