Impact of Training Dataset Size on the Accuracy of L-SVR Single-Time-Point Renal Dosimetry for [¹⁷⁷Lu]Lu-PSMA-617 Therapy

Authors

  • Abdurrahman Aziz Wicaksono
  • Jaja Muhammad Jabar
  • Syahril Siregar
  • Deni Hardiansyah

DOI:

https://doi.org/10.24843/BF.2026.v27.i07

Keywords:

[¹⁷⁷Lu]Lu-PSMA-617, Single-Time-Point dosimetry, Machine Learning , Support Vector Regression, Synthetic Data

Abstract

Radiopharmaceutical therapy (RPT) using [¹⁷⁷Lu]Lu-PSMA-617 requires accurate dosimetry to evaluate organs-at-risk (OAR), specifically the kidneys. Single-time-point (STP) dosimetry simplifies clinical workflows by reducing SPECT/CT acquisition. Machine learning (ML) offers a potential solution, yet clinical implementation is hindered by the scarcity of sufficient training datasets for ML-based studies. This study investigated the relationship between training dataset size and time-integrated activity (TIA) estimation accuracy. A Linear Support Vector Regression (L-SVR) model was trained on synthetic virtual patients (VPs, 5,000 total) simulated from a published PBMS NLMEM renal biokinetics at five imaging times (t=1.8 h, 18.7 h, 42.6 h, 66.2 h, and 160.3 h). Time-activity-curve (TAC) and reference TIA (rTIA) were calculated for each VP. Random sampling was performed in increasing dataset sizes. Sample sizes were sub-sampled to training (80%) and testing (20%) datasets. L-SVR was trained on STP data at 42.6 h post-injection (best-time-point of PBMS NLMEM study) from the training dataset and tested by generating estimated TIA (eTIA) with input from the testing dataset. Performance was evaluated by calculating root-mean-square-error (RMSE) and mean-absolute-percentage-error (MAPE) of the eTIA to rTIA. Results showed that the accuracy of eTIA from ML STP dosimetry depends on training size: small samples (n=10) yielded poor performance (RMSE>85.98%, MAPE>89.1%). Accuracy improved significantly at n=500 (RMSE=14.07%) and plateaued beyond n=1,000 (peak RMSE=13.07%). Results indicate that the L-SVR model of the study requires sample sizes of n>200, with optimal gains up to n=2,000. This study suggests synthetic data as a methodological bridge between limited clinical datasets and data-intensive ML approaches.

Downloads

Published

2025-12-19