Application to real data

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import scipy
import requests
import io
import arviz as az
import tempfile

from pdrm.model_handling import loadTorchSedModel
from pdrm.utils import lif

from utils import compute_hlid
from plot_functions import numerate_plots, plot_adm, plot_D_I

from paleokalmag import PaleoKalmag

# load ArchKalmag14k.r model
r = requests.get(
    "https://nextcloud.gfz-potsdam.de/s/BqN2XD7c9jK3AsF/download", stream=True
)
archkalmag = PaleoKalmag(io.BytesIO(r.raw.read()), name="ArchKalmag14k.r", legacy=True)

# Alternatively, you can download the model and save it locally
# archkalmag = PaleoKalmag(
#     "path/to/ArchKalmag14k.r.npz", name="ArchKalmag14k.r"
# )

pre = "https://nextcloud.gfz-potsdam.de/s/"
post = "/download"
model_links = {
    "BA3": f"{pre}YyYb6d7JEA4KBHF{post}",
    "BYE": f"{pre}cPDpsRePP9zryBX{post}",
    "ByaP2": f"{pre}KQccCnj85CDEXYn{post}",
    "ByaP3": f"{pre}aXBDTgax5ZGXpCg{post}",
    "EIL": f"{pre}W8Qtd8oBm7de3gp{post}",
    "FUR": f"{pre}BN95xNp4zWkCm7n{post}",
    "P2": f"{pre}7xSxCfNAg2PKncp{post}",
    "P3": f"{pre}jZy36Bn6yiWsDsn{post}",
    "GYL": f"{pre}XZFiHcSzmpqWHAH{post}",
    "GP1": f"{pre}MX6tgdQr2wPyw5L{post}",
    "GP2": f"{pre}tcmWbGf5sfRdXXr{post}",
    "GP4": f"{pre}2yKae5GjY7PLMcY{post}",
    "MOR": f"{pre}ZNmedjbGrP3WKg7{post}",
    "U1305": f"{pre}rxMEAanTRGeRjPo{post}",
    "BIR_unflat": f"{pre}naGec8zgSmNbE2P{post}",
}
sample_links = {
    "BA3": f"{pre}BGFFnRsQZRrHkPg{post}",
    "BYE": f"{pre}BLRiwLYxcMfTtLH{post}",
    "ByaP2": f"{pre}RAZzNNAaApKpFNR{post}",
    "ByaP3": f"{pre}4K9BpdDwrzXf2A3{post}",
    "EIL": f"{pre}csoqjPRbo6HpbPo{post}",
    "FUR": f"{pre}XpLMYpx9ik56Gxz{post}",
    "P2": f"{pre}T4zBW3pYyoSGAf2{post}",
    "P3": f"{pre}wnQWN57ZHMF82XL{post}",
    "GYL": f"{pre}WS7snomx2g9wB3G{post}",
    "GP1": f"{pre}sdc6YPrzLEaCL2L{post}",
    "GP2": f"{pre}oacoZR55PecDtHD{post}",
    "GP4": f"{pre}DCjR7QQqjHxkMzF{post}",
    "MOR": f"{pre}eiwCCCj9ii2QNSn{post}",
    "U1305": f"{pre}bBCKE4gP4Nako9X{post}",
    "BIR_unflat": f"{pre}xorEdci4ji7NjMj{post}",
}

adm_links = {
    "BA3": f"{pre}WQaSYe9eNffzyZJ{post}",
    "BYE": f"{pre}qH5Zc685D8g9BYb{post}",
    "ByaP2": f"{pre}LAgSbXkkDKjcgrC{post}",
    "ByaP3": f"{pre}tEipBed36aYASP3{post}",
    "EIL": f"{pre}JitiXQXPrjbjRRe{post}",
    "FUR": f"{pre}mSMrTBnmWZ8wCsz{post}",
    "P2": f"{pre}jYd7ka4ncWeiF6e{post}",
    "P3": f"{pre}p8yWE8SZYgWQg7i{post}",
    "GYL": f"{pre}5pD7kZE5gGK68kF{post}",
    "GP1": f"{pre}JEdgZXQXS3dMdcT{post}",
    "GP2": f"{pre}8XHGZFCpL3xRW2g{post}",
    "GP4": f"{pre}DSJXNmH9xqPY6di{post}",
    "MOR": f"{pre}oKSjPZN7BgxdZF6{post}",
    "U1305": f"{pre}7Seyd4w44sfkZcC{post}",
    "BIR_unflat": f"{pre}sAnaqtoJrSd9S2X{post}",
}
/tmp/ipykernel_1934/2783026678.py:2: DeprecationWarning:
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

  import pandas as pd
cpu

In this notebook we will apply the proposed method to several real-world sediment core samples. Each core sample went to a standardized preprocessing procedure, including outlier exclusion and transforming relative to absolute declinations. Details can be found in the preprocessing folder. The folder is organized as follows. Each core, sub-core or sub-section has its own subfolder containing all necessary information, i.e. tables including palaeomagnetic data and radiocarbon dates, as well as a python file that provides the code to read and format the data. Raw data from Geomagia can be found in the complete_specimen_stratigraphic_pmag_rmag_data_180615.csv file. The code to read and format the Geomagia data can be found in preprocess_geomagia.py. The actual data preprocessing, including outlier exclusion and transformation of relative to absolute declinations is done in the preprocessing.ipynb notebook. All files used to generate the age-depth models can be found in the age_depth_modeling subfolder. Additionally, the preprocessing folder contains the ArchKalmag14k.r model in form of a .npz file.

To estimate the lock-in function parameters run the following line in your terminal:

foo@bar:~$ python estimate_parameters.py {sed_data_name} {max_lid} {n_runs}

where max_lid is the upper bound for the parameter estimation and n_runs stands for the number of estimations you want to perform. For example to run 50 estimations with an upper bound of 100 cm for the sediment core BA3.csv execute

foo@bar:~$ python estimate_parameters.py BA3 100 50

We recommend using a GPU for faster computation. To adjust the optimizer parameters or the utilized arch data change these values directly in estimate_parameters.py.

To run inversions for a list of sediment records with fixed lock-in function parameters use the run_inversion.py file.

The results of the parameter estimation as well as the models derived using the best estimated lock-in function can be found in the results folder. Results are visualized in the next cell. The models are stored in nextcloud and can be downloaded using the links in the first cell.

These figures provide the following information. The fifty estimated lock-in functions are visualized in (A). The lock-in function with the highest (best) log-ml value is highlighted in orange and was used for the inversion. The remaining estimated lock-in functions are color-coded, ranging from red (indicating a low log-ml value) to blue (indicating a high log-ml value). The distribution of the log-ml values is illustrated in (D). To assess the accuracy of our predictions, we present the predicted sediment observations (mean and one hundred samples) for declination and inclination in the first and second panels of (B). These predictions are derived from the posterior, generated using the lock-in function highlighted in orange. Furthermore, the directional palaeomagnetic records are shown along with their respective measurement errors. In cases where a core sample consist of multiple sub-cores or sub-segments we distinguish them by different colors. Additionally, the posterior mean and uncertainties of ArchKalmag14k.r is visualized in gray. To examine the characteristics of the lock-in functions, we employ density plots in (E) to (G), which demonstrate the distributions of three parameters: half lock-in depth, lock-in function height, and lock-in function width. The density estimation is conducted using Gaussian kernel density estimation. The parameter associated with the lock-in function that yields the best log-ml value is highlighted in orange. Finally, (C) shows the mean (red) and fifty samples (gray) of the age-depth model. Radiocarbon ages are depicted as violin plots. The color of the violins indicates which type of calibration curve was used for the individual ages, blue corresponding to the marine and orange to the respective land curve. In most cases the decision which type of curve to use for which radiocarbon sample was guided by the original publication, with bulk sediments being calibrated by the marine curve and plant or wood remanents by the land curves.

[2]:
# ------------------ Read core data and estimated parameters ------------------
cores, results = {}, {}
core_ids = [
    "BA3",
    "BYE",
    # "ByaP2",
    # "ByaP3",
    # "EIL",
    # "FUR",
    # "P2",
    # "P3",
    # "GYL",
    # "GP1",
    # "GP2",
    # "GP4",
    # "MOR",
    # "U1305",
    # "BIR_unflat",
]
for core_id in core_ids:
    # read core data
    cores[core_id] = pd.read_csv(f"../dat/preprocessed_data/{core_id}.csv")
    # read estimated paramters
    result = pd.read_csv(f"../results/estimated_params/{core_id}_params.csv")
    result["log_ml"] = -result["log_ml"]
    if core_id == "BIR_unflat":
        result = result[result.max_lid == 100]
    else:
        result = result[result.max_lid == max(result.max_lid)]
    results[core_id] = result.sort_values("log_ml").reset_index(drop=True)

# generate plots
xlabels = ["log-ml", r"$\rho_{0.5}$", r"$\rho_h$", r"$\rho_w$"]
cmap = LinearSegmentedColormap.from_list("rb", ["C3", "purple", "C0"], N=256)
for core_id, core, res in zip(cores.keys(), cores.values(), results.values()):
    print(core_id)
    norm = plt.Normalize(vmin=min(res["log_ml"]), vmax=max(res["log_ml"]))
    fig, axs = plt.subplot_mosaic(
        [
            2 * ["L"] + 4 * ["D"] + 2 * ["ADM"],
            2 * ["L"] + 4 * ["I"] + 2 * ["ADM"],
            8 * ["."],
            2 * ["P0"] + 2 * ["P1"] + 2 * ["P2"] + 2 * ["P3"],
        ],
        figsize=(15, 6),
        gridspec_kw=dict(hspace=0, wspace=0.8, height_ratios=[3, 3, 1.5, 3]),
        dpi=300,
    )
    # ------------------ Lock-in functions ------------------
    x = np.linspace(-3, max(res.max_lid) + 10 if len(res) > 0 else 1, 5000)
    res["hlid"] = res.apply(lambda row: compute_hlid(list(row[0:4])), axis=1)
    res["height"] = res.apply(
        lambda row: np.max(lif(list(row[0:4]))(x)), axis=1
    )
    res["lif_width"] = res.apply(lambda row: np.sum(row[0:4]) - row[0], axis=1)
    for idx, row in res.iterrows():
        axs["L"].plot(
            x,
            lif(list(row[0:4]))(x),
            c="C1"
            if row.log_ml == max(res.log_ml)
            else cmap(norm(row["log_ml"])),
            lw=3 if row.log_ml == max(res.log_ml) else 1,
            alpha=1 if row.log_ml == max(res.log_ml) else 0.8,
            zorder=2 if row.log_ml == max(res.log_ml) else 1,
            label="Best lock-in function"
            if row.log_ml == max(res.log_ml)
            else None,
        )
    handles, labels = axs["L"].get_legend_handles_labels()
    by_label = dict(zip(labels, handles))
    axs["L"].legend(by_label.values(), by_label.keys(), prop={"size": 8})
    axs["L"].set_ylim(0, min(0.15, axs["L"].get_ylim()[1]))
    axs["L"].set_xlabel("Relative depth [cm]")
    numerate_plots(axs["L"], "A")
    # ------------------ Declination and Inclination ------------------
    # model_names = glob.glob(f"../results/models/{core_id}*.pkl")
    # sample_names = glob.glob(f"../results/models/{core_id}*.npz")
    # if len(model_names) > 0:
        # model = loadTorchSedModel(model_names[0])
        # samples = np.load(sample_names[0], allow_pickle=True)["samples"]
    # else:
    #     model, samples = None, None
    r = requests.get(model_links[core_id])
    model = loadTorchSedModel(io.BytesIO(r.content))
    r = requests.get(sample_links[core_id], stream=True)
    samples = np.load(io.BytesIO(r.raw.read()), allow_pickle=True)["samples"]
    plot_D_I(
        axs["D"],
        axs["I"],
        core,
        model=model,
        samples=samples,
        time_or_depth="depth",
        posterior=False,
        archkalmag14k=archkalmag,
        pred_sed_obs=True,
        limit_y_to_data=True,
    )
    numerate_plots(axs["D"], "B")
    # ------------------ Age-Depth-Model ------------------
    r = requests.get(adm_links[core_id])
    with tempfile.NamedTemporaryFile(mode='w+b', delete=False) as temp_file:
        temp_file.write(r.content)
    idata = az.from_netcdf(temp_file.name)
    plot_adm(axs["ADM"], core_id, idata)
    numerate_plots(axs["ADM"], "C")
    # ------------------ Parameter distributions ------------------
    for ax, param_name, xlabel in zip(
        [axs[f"P{i}"] for i in range(4)],
        ["log_ml"] + list(res.columns[-3:]),
        xlabels,
    ):
        param = res[param_name]
        ax.axvline(param[np.argmax(res.log_ml)], c="C1", zorder=2)
        mean_p_val = max(param) - min(param)
        x = np.linspace(
            min(param) - 0.2 * mean_p_val, max(param) + 0.2 * mean_p_val, 5000
        )
        density = scipy.stats.gaussian_kde(param)
        ax.plot(x, density(x), c="black", ls="--")
        n, bins, patches = ax.hist(
            param,
            bins=10,
            density=True,
            color="C0",
            alpha=0.8,
            weights=norm(res["log_ml"]) if param_name != "log_ml" else None,
        )
        if param_name == "log_ml":
            log_ml_centers = [
                np.mean(
                    res[
                        (res[param_name] >= bins[b])
                        & (res[param_name] < bins[b + 1])
                    ]["log_ml"]
                )
                for b in range(len(bins) - 1)
            ]
            for c, p in zip(log_ml_centers, patches):
                plt.setp(p, "facecolor", cmap(norm(c)))
        ax.set_xlabel(xlabel)
    [numerate_plots(axs[f"P{i}"], ["D", "E", "F", "G"][i]) for i in range(4)]
    fig.suptitle(f"Results for {core_id}", fontsize=16)
    if axs["D"].get_yticklabels()[1].get_position()[1] < -99:
        for label in axs["D"].get_yticklabels():
            label.set_rotation(90)
            label.set_va("center")
    fig.align_ylabels()
    fig.align_xlabels()
    # plt.savefig(
    #     f"../../../paper_2/6464794eaec25a9379c71989/{core_id}.png",
    #     bbox_inches="tight",
    # )
plt.show()
BA3
/tmp/ipykernel_1934/1905790600.py:55: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/opt/conda/envs/virtualenv/lib/python3.10/site-packages/arviz/data/inference_data.py:153: UserWarning: adm_pars group is not defined in the InferenceData scheme
BYE
/tmp/ipykernel_1934/1905790600.py:55: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/opt/conda/envs/virtualenv/lib/python3.10/site-packages/arviz/data/inference_data.py:153: UserWarning: adm_pars group is not defined in the InferenceData scheme
../../_images/2_data_based_investigation_src_results_3_6.png
../../_images/2_data_based_investigation_src_results_3_7.png