Project Proposals for 2026 entry

Applicants to our CDT for 2026 start can choose from the following selection of projects.

Applications for 2026 entry are now closed

Please note the deadline for applications for September 2026 entry has now passed.

This project will use the Public Health Scotland Unscheduled Care Data Mart (UCD)—a linked patient-level dataset covering all of Scotland since 2011—to improve the efficiency and equity of unscheduled care. We will map patient pathways across NHS 24, ambulance, emergency, acute, and mental health services using descriptive statistics, pathway visualisation, and machine learning. Predictive models will identify factors affecting outcomes, while clustering will reveal common pathways and bottlenecks. Embedded within PHS, the project will deliver actionable policy recommendations to enhance data collection, optimise patient flows, and guide equitable redesign of urgent and unscheduled care services.

Supervisory team

Syed Ahmar Shah and Saturnino Luz Filho

Project Background

Healthcare systems worldwide face growing pressure to deliver timely, efficient care while managing rising demand and constrained resources. The COVID-19 pandemic exposed critical vulnerabilities, disrupting routine services and overwhelming urgent care. In the UK, the first lockdown caused substantial drops in hospital admissions and major backlogs in elective care, with enduring impacts on waiting times and health outcomes [1–2]. In Scotland, the pandemic created sustained excess demand, prolonged delays, and excess mortality, underscoring the need for a resilient, responsive system capable of recovering from shocks and maintaining routine care [3–4].
In response, the Scottish Government launched the Urgent and Unscheduled Care Collaborative as part of its NHS Recovery Plan to modernise services, strengthen coordination between primary and secondary care, and improve patient flow through initiatives such as Hospital at Home. However, unscheduled care remains fragmented, with patient information dispersed across multiple services, impeding timely decision-making, evaluation, and resource planning. These gaps disproportionately affect socioeconomically disadvantaged and minority ethnic groups, contributing to health inequalities.
Public Health Scotland’s Unscheduled Care Data Mart (UCD) now offers a unique opportunity to link patient-level data across care settings, enabling system-wide analyses to identify bottlenecks and support data-driven redesign of unscheduled care pathways [5].

Project Aims

This project aims to generate data-driven insights to improve the efficiency, equity, and resilience of unscheduled care services in Scotland. Specifically, it will:
1. Map end-to-end patient pathways across NHS 24, ambulance, emergency, acute, and mental health services using the Unscheduled Care Data Mart (UCD);
2. Identify systemic bottlenecks, data gaps, and their impact on patient outcomes;
3. Develop predictive and clustering models to uncover drivers of delays and adverse outcomes; and
4. Produce evidence-based policy recommendations to support service redesign, optimise patient flows, and reduce health inequalities in unscheduled care.

Data and Methodology

This project will leverage the Public Health Scotland (PHS) Unscheduled Care Data Mart (UCD), a comprehensive, linked dataset covering the entire Scottish population since 2011. The UCD integrates patient-level data across NHS 24, the Scottish Ambulance Service, Primary Care Out of Hours, Emergency Departments, Acute Hospital Admissions, Mental Health Admissions, and Death Records, capturing approximately 2.8 million unscheduled care pathways annually. Linkage is enabled via the Community Health Index (CHI) number, allowing complete tracking of individual patient journeys from first contact to discharge or death.

Descriptive and Exploratory Analyses
Initial analyses will involve descriptive statistics and data visualisation to characterise service use and patient flows across unscheduled care. Network and Sankey-style pathway visualisations will map patient transitions between services, highlighting frequent routes, points of delay, and high-demand groups. These analyses will help identify candidate variables and outcomes for subsequent modelling.

Predictive Modelling
We will develop supervised machine learning models (e.g. logistic regression, random forests, gradient boosting) to predict key outcomes such as hospital admission, waiting time, and length of stay. Models will be evaluated using standard performance metrics (AUROC, accuracy, calibration) and validated via k-fold cross-validation. Regularisation will be applied to prevent overfitting, and feature importance techniques will support model interpretability.

Unsupervised Clustering
Clustering methods (e.g. k-medoids, hierarchical clustering) will be used to identify common patient pathways and systemic bottlenecks. Clusters will be profiled on demographics, comorbidities, and outcomes, and findings will be reviewed with PHS and clinical stakeholders to ensure real-world relevance.

Implementation Approach
The student will be embedded within PHS two days per week, enabling close engagement with data engineers, analysts, and policymakers. This will facilitate timely access to data, iterative feedback on analysis, and co-production of actionable outputs to inform service redesign.

Translational Potential and Expected Impact

Although centred on Scotland, this project addresses challenges shared by healthcare systems worldwide—managing surges in unscheduled care demand, reducing bottlenecks, and improving system resilience. By leveraging large-scale linked datasets to map patient journeys, the project will generate transferable methods and insights relevant to other national health systems. The analytical framework—combining pathway visualisation, predictive modelling, and clustering—can be adapted to different contexts, informing service redesign internationally. Findings will be disseminated through peer-reviewed publications, policy briefs, and international networks, contributing to global efforts to optimise urgent care delivery and strengthen health system preparedness.

Training and Development Outcomes for the Student

This project will provide comprehensive interdisciplinary training spanning data science, health informatics, and applied health services research. The student will gain advanced skills in data engineering, statistical analysis, machine learning, and pathway visualisation using large-scale linked health datasets. Embedding within Public Health Scotland (PHS) two days per week will offer hands-on experience with real-world data pipelines, governance procedures, and policy translation. They will also develop transferable skills in stakeholder engagement, responsible AI, scientific writing, and presenting findings to technical and non-technical audiences. This training will prepare the student for leadership roles in data-driven healthcare innovation.

References

[1]: Shah SA, Robertson C, Sheikh A. Effects of the COVID-19 pandemic on NHS England waiting times for elective hospital care: a modelling study. The Lancet. 2024 Jan 20;403(10423):241-3.
[2]: Shah SA, Brophy S, Kennedy J, Fisher L, Walker A, Mackenna B, Curtis H, Inglesby P, Davy S, Bacon S, Goldacre B. Impact of first UK COVID-19 lockdown on hospital admissions: Interrupted time series study of 32 million people. EClinicalMedicine. 2022 Jul 1;49.
[3]: Shah, S.A., Jeffrey, K., Robertson, C. and Sheikh, A., 2025. Impact of COVID-19 pandemic on elective care backlog trends, recovery efforts, and capacity needs to address backlogs in Scotland (2013–2023): a descriptive analysis and modelling study. The Lancet Regional Health–Europe, 50.
[4]: Shah SA, Mulholland RH, Wilkinson S, Katikireddi SV, Pan J, Shi T, Kerr S, Agrawal U, Rudan I, Simpson CR, Stock SJ. Impact on emergency and elective hospital-based care in Scotland over the first 12 months of the pandemic: interrupted time-series analysis of national lockdowns. Journal of the Royal Society of Medicine. 2022 Nov;115(11):429-38.
[5]: Public Health Scotland. Unscheduled Care Datamart (UCD) [Internet]. Available from: https://publichealthscotland.scot/services/national-data-catalogue/national-datasets/search-the-datasets/unscheduled-care-datamart-ucd/. Accessed 3 October 2024.


This PhD project will integrate lipidomic, metabolomic, and proteomic data from skin organoid models to uncover molecular drivers of eczema and identify candidate therapeutic targets. Under the supervision of Prof. Sara Brown (IGC, University of Edinburgh) and Prof. Mark Parsons (EPCC), the student will develop computational pipelines combining network biology, machine learning, and drug-target mapping. By bridging omics data with drug discovery resources, the project aims to define mechanistic pathways underlying skin barrier dysfunction and inflammation. Input from Eczema Outreach Support (EOS) will guide translation toward patient benefit and public communication, advancing responsible, data-driven dermatology.

Supervisory team

Sara Brown and Mark Parsons

Project Partner

TBC

Project Background

Eczema (atopic dermatitis) is a chronic, relapsing inflammatory skin disease affecting millions worldwide. Despite advances in immunomodulatory therapies, the molecular mechanisms driving its onset and persistence remain incompletely understood, particularly regarding lipid and metabolite dysregulation in the skin barrier. Prof. Sara Brown’s group has generated a rich multi-omics dataset, including lipidomics, metabolomics, and proteomics, derived from patient-relevant skin cells and organoid models that mimic human epidermal physiology. These data offer an exceptional opportunity to decode the molecular pathways driving eczema and to identify actionable therapeutic targets.

This interdisciplinary project will leverage computational and systems-biology approaches to integrate these omic layers, define disease-associated molecular signatures, and link them to existing drug–target resources for discovery and repurposing. Collaboration with the Edinburgh Parallel Computing Centre (EPCC) ensures access to secure, high-performance computing environments. Sara Brown is a medical adviser and long-term collaborator of the patient support group Eczema Outreach Support (EOS). They will provide translational and patient-centred perspectives, supporting prioritisation of computational findings for real-world benefit and effective public engagement.

Project Aims

1. Integrate lipidomic, metabolomic, and proteomic profiles from skin organoids to model molecular networks underlying skin differentiation and barrier formation. eczema.
2. Identify key dysregulated pathways and candidate biomarkers associated with barrier dysfunction and inflammation.
3. Develop computational pipelines linking molecular signatures to drug–target interaction databases to propose therapeutic candidates.
4. Establish reproducible, privacy-preserving workflows for multi-omics analysis within secure computing environments (EPCC).

Data and Methodology

The student will analyse existing multi-omics datasets generated by the Brown group, encompassing lipidomics, metabolomics, and proteomics from skin organoid models under varying experimental and disease-relevant conditions.

1. Data Processing and Integration:
Pre-processing will involve normalization, quality control, and batch correction across modalities. Integration strategies will include similarity network fusion, canonical correlation analysis, and deep representation learning to capture cross-layer molecular relationships.

2. Network and Machine Learning Approaches:
Graph-based clustering, network propagation, and representation learning (e.g., graph neural networks, multi-view autoencoders) will be explored to detect modules of co-regulated features. Biological interpretation will rely on pathway enrichment and ontology analyses.

3. Drug Repurposing and Therapeutic Targeting:
Using proteomic signatures, the student will perform connectivity mapping (CMap) and perturbation analysis to identify compounds that reverse disease-associated expression profiles. Network pharmacology approaches will map dysregulated proteins to known drug–target interaction graphs (DrugBank, STITCH, ChEMBL). Structural bioinformatics and docking tools may be explored for selected targets to evaluate compound–target affinity. This integrative pipeline will prioritize drug candidates for experimental validation.

4. Computing Environment:
Analyses will be conducted using EPCC’s secure high-performance computing resources to ensure scalability, reproducibility, and compliance with data governance frameworks. The student will have access to EPCC’s wide range of supercomputing, data science and AI systems.

Deliverables:
A reproducible computational pipeline, interpretable multi-omic networks, and a ranked list of candidate therapeutic targets linked to potential repurposing compounds.

Translational Potential and Expected Impact

This project unites expertise in dermatology (Brown Lab), computational science (EPCC), and translational engagement (EOS), fostering collaboration across academia, clinical research, and the third sector. By producing a scalable computational framework for integrating complex multi-omics data and linking findings to drug discovery pipelines, the project will have broad relevance for inflammatory and metabolic diseases. Outcomes will include novel mechanistic insights into skin development and eczema, prioritized therapeutic targets, and publicly accessible computational tools. EOS’s involvement ensures patient-centred prioritization and effective dissemination to lay audiences, maximising societal and international impact.

Training and Development Outcomes for the Student

The student will gain cross-disciplinary expertise spanning computational biology, systems medicine, and drug discovery informatics. They will develop advanced skills in data integration, network modelling, and high-performance computing through EPCC, as well as bioinformatics and translational research methods under Prof. Brown’s supervision. Interaction with EOS will offer experience in public engagement and third-sector collaboration. The project provides professional development in scientific communication, responsible research, and reproducible software engineering, equipping the candidate for future roles in academia, healthcare data science, or the pharmaceutical sector.

References

Elias MS, Wright SC, Nicholson WV et al. Functional and proteomic analysis of a full thickness filaggrin-deficient skin organoid model [version 2; peer review: 3 approved]. Wellcome Open Res 2019, 4:134 (https://doi.org/10.12688/wellcomeopenres.15405.2)

Brown, Sara J. Keratinocytes Listen, Respond, and Actively Contribute to Crosstalk in the Epidermal Community and Beyond. Journal of Investigative Dermatology, 2024 Volume 144, Issue 12, 2628 - 2630

Budu-Aggrey, A., Kilanowski, A., Sobczyk, M.K. et al. European and multi-ancestry genome-wide association meta-analysis of atopic dermatitis highlights importance of systemic immune regulation. Nat Commun 2023; 14, 6172.

Standl et al. et al. Gene-environment Interaction Affects Risk of Atopic Eczema: Population and In Vitro Studies. Allergy 2025 https://doi.org/10.1111/all.16605 


A major challenge and opportunity in genomic medicine is integrating data across scales to identify and link disease-causal variants, molecular mechanism and cell types/states to clinical outcomes. This is necessary for efficient drug target candidate identification, as well as investigation of heterogeneous clinical outcomes with respect to disease progression trajectories or treatment response. We have developed stat/ML methodologies, Stator and TarGene, for high resolution disease cell type/state identification from single-cell RNA-seq data and disease-causal DNA variant prioritisation from large-scale biobanks, respectively. Here, we aim to develop novel stat/ML methodologies to integrate molecular states quantification with genotype-phenotype inference for application in disease state stratification in immunological disease.

Supervisory team

Ava Khamseh and Sara Brown

Project Partner

Janssen Pharmaceutica NV

Janssen’s primary interest is this project is in genotype-phenotype causal inference for the purpose of identifying patient populations in which a treatment may exhibit differential efficacy across distinct subgroups in the presence of multiplicity problems in characterizing such subgroups. Janssen’s Innovative Medicine department has extensive expertise in biostatistics, AI/ML, molecular biomedicine applications, and Real-World Evidence generation, which are of great value to this project. Janssen will delegate a representative to the advisory board of this project.

Project Background

Modern molecular biology, genomics and population medicine take advantage of thousands of variables at contrasting scales. Biology is only rarely conveyed by marginal variation involving a single molecule or phenotype at a time, or pair-wise correlation between two molecules or two phenotypes. We have recently developed two fully general state-of-the-art stat/ML methodologies, backed up by mathematical theory: (1) Stator, to identify cell types and states at high resolution from scRNA-seq data of disease vs healthy controls by taking advantage of high-order expression dependencies, (2) TarGene, for double-robust quantification of the of DNA variants and their interaction on disease outcomes for large-scale genotype-phenotype biobanks, with minimum bias and maximum power. TarGene can and has been used to integrated population genetics with functional genomics epistatic contributions to human traits via transcription factor mechanisms, thus prioritising candidates variants and genes to disease via molecular mechanisms. Given Stator works on the RNA scale, and TarGene on the genotype-to-phenotype scale, we now wish to integrate these data modalities together to link DNA variant to gene expression, mechanisms and disease phenotypes, which are expected to be heterogeneous for complex trait. This is then expected to lead to differences in disease trajectory, severity and treatment response.

Project Aims

The first aim of the project is to develop novel stat/ML methodologies for linking disease (severity/response) genes derived from genotype-phenotype population studies to cell states and corresponding RNA expression programmes derived from scRNA-seq data. The second aim of the project is to investigate how the identified strata of cell states/genes relate to differences in disease trajectory and/or severity and/or response to treatment. The key element of this project is to prioritise causation with respect to disease-relevance of cell (sub)types and states and genotype-phenotype inference. This is important to identify genomic contributions to subpopulations of disease spectrum, in order to apply targeted therapies.

Data and Methodology

Stator utilises structure learning and model-free non-parametric estimators of higher-order interactions, implemented as a nextflow software, pipeline and shiny app. TarGene utilises Targeted Learning (TL), involving diverse machine learning libraries and double-robust estimation strategies, such as Targeted Maximum Likelihood Estimation. TL also applies to quantification of treatment effects on disease outcomes under different treatment interventions (for TarGene, DNA variants are the analogous of “treatment interventions” in Real-World Evidence studies). Broadly, the approach is analogous to LDscore regression which integrates GWAS summary statistics and gene expression data to investigate how genes prioritised from population studies of disease can be stratified by combinatorial gene expression in different cell (sub)types or states. The main differences are 3-fold: (1) Stator offers a higher resolution of cell (sub)types and states, with a focus on cell states, (2) TarGene can be utilised to discover new candidate variants/genes, both with and without functional genomics integration, depending on the type of input data, (3) the focus here is to identify strata of disease, and link these back to molecular differences amongst the individuals.

The methodology proposed is completely general and applicable to a diversity of disease areas. In this project, we develop and apply the proposed approach in the context of immunology, taking atopic dermatitis (AD) as an exemplar. We will utilise publicly available scRNA-seq data of AD and healthy controls, as well as large-scale biobanks such as the UK Biobank, All of Us and Our Future Health.

Translational Potential and Expected Impact

Drug discovery is generally an inefficient and costly process due to limited understanding of tissue heterogeneity, specifically related to identification of disease-relevant cell populations, their biological states, and the molecular mechanisms involved. Beyond initial discovery, treatments are often only successful in subpopulations of patients. There is therefore a need to prioritise causal variants, genes and cell types/states leading to disease trajectories and treatment response for optimal development of drug targets for various patient subpopulations who would otherwise respond differently to various treatments. The focus here is on quantification of heterogeneous genomic contribution to disease outcome and/or treatment response.

Training and Development Outcomes for the Student

On the methodological front of this cross-disciplinary project, the student will develop technical skills in development and application of rigorous statistical inference (semi-parametric efficiency theory) and machine learning techniques, throughout the PhD and by attending MSc levels courses in these areas and beyond. In application of biomedical data at various scales, on the biomedical front, the student will develop a deep understanding of molecular biology via scRNA-seq, genotype-phenotype inference in large-scale biobanks and Real-World Evidence generation. The student will further develop essential cross-disciplinary and translational communication with access to a supervisory team with diverse expertise ranging across AI/ML, biostatistics and molecular biomedicine.

References

1. Review article: “A brief history of human disease genetics”, Nature, 2020, https://doi.org/10.1038/s41586-019-1879-7 
2. Review article: “Refining the impact of genetic evidence on clinical success”, Nature, 2024, https://doi.org/10.1038/s41586-024-07316-0 
3. Review article: “Applications of single-cell RNA sequencing in drug discovery and development”, Nature reviews Drug Discovery, https://doi.org/10.1038/s41573-023-00688-4 
4. Stator: “High order expression dependencies finely resolve cryptic states and subtypes in single cell data”, EMBO Molecular Systems Biology, 2025, https://doi.org/10.1038/s44320-024-00074-1 
5. TarGene: “Semiparametric efficient estimation of small genetic effects in large-scale population cohorts”, Oxford Biostatistics, 2025, https://doi.org/10.1093/biostatistics/kxaf030 
6. TarGene application: “Epistatic contributions to human traits via transcription factor mechanisms”. medRxiv, 2025, https://doi.org/10.1101/2025.09.28.25336826 
7. “Atopic Eczema: How Genetic Studies Can Contribute to the Understanding of this Complex Trait”, Journal of Investigative Dermatology, 2022, https://doi.org/10.1016/j.jid.2021.12.020 
8. “Multi-omic triangulation identifies molecular candidates of atopic dermatitis severity”, merRxiv, 2025, https://doi.org/10.1101/2025.08.04.25332125 


This project explores the use of radiology reports, combined with medical imaging on clinical data. By extracting more nuanced information from free text, this will enable richer phenotyping for research purposes, help identify referral reasons (improving generalisability) and improve image quality assessment. Additionally, through integrating Vision-Language Models, we will improve the prediction of brain health conditions. We will use data collected during general healthcare (facilitating future integration into clinical workflows), and process it within Trusted Research Environments (TREs) to ensure patient privacy.

Supervisory team

Michael Camilleri, Beatrice Alex and Grant Mair

Project Partner

Public Health Scotland

Project Background

The use of health data in research is often constrained to structured entries (e.g. ICD codes [1]), while most of the qualitative and nuanced understanding of the patient health is recorded in free-text, such as GP notes or radiology reports [2].

At the same time, the recent successes in Natural Language Processing (NLP) [3] provide a relatively untapped opportunity to extract value from such unstructured data. Automated processing of clinical notes can help ascertain existing conditions [4] or, as proposed herein, identify biases in the data [2] which can feed into improving the robustness of AI tools applied to health data. Additionally, integrating language with visual models promises to improve performance of downstream tasks such as disease classification and prediction [5].

This is accelerated by the rising availability of Trusted Research Environments (TREs) [6], with the aim of opening up clinical data for research purposes, ensuring that any methods developed can be more easily integrated into clinical workflows. Chief among these is the Brain Health Data-Pilot (BHDP) [7], within the Scottish National Safe Haven (NSH) with more than 1.2 million brain scans and linked Electronich Health Records (EHRs) from across Scotland.

Project Aims

The primary goals of this project will be to process free-text radiology reports accompanying medical images (MRI/CT) to: (a) extract key conditions, artefacts and image quality features, (b) identify the reason for the scan (why the subject was referred to have a scan), and (c) as a stretch goal, integrate with an Imaging module as a Vision-Language-Model (VLM) [5] to improve prediction of brain health conditions (e.g. Dementia).

Data and Methodology

This project uses clinical datasets, which provide orders of magnitude more data and heterogeneity than publicly available sources [7], while exhibiting novel research opportunities due to their 'raw' nature. Access to the TRE (ensuring patient privacy) will be facilitated through having eDRIS as our external partner for the Scottish NSH (BHDP). Furthermore, there is scope for using consented data (e.g. Generation Scotland [10] or UK Biobank) as an alternative source of data to complement the above.

Methods

The project has 3 work packages:

1. Enrich Research Value of Radiology reports: The Language Technology Group [8] developed a rule-based system, EDIE-R [4] to identify 24 brain-scan phenotypes. This will provide a starting point to develop newer neural models (e.g. Transformers [9] or Large-Language Models [3]) to extract relevant concepts. Using neural models will also allow us to extend to other relevant phenotypes, and also to image quality metrics (e.g. movement artefacts).

2. Understanding Scanning Bias: The next step is to infer the reason for the scan. This will involve eliciting signal from the clinical history portion of the report. Furthermore, this may be missing in some scans, and hence will necessitate learning a mapping from the radiologist report to the referral context in a semi-supervised setting, allowing reasoning about selection bias in scanned individuals.

3. Improving Prediction of Brain Health: This can be extended to disease progression models, incorporating condition codes [11] or MRI/CT scans themselves (using a VLM [5]) to improve prediction of brain health conditions e.g. Dementia. 

Translational Potential and Expected Impact

The use of clinical data and input from domain experts (and the project partner) will ensure that the aforementioned systems can more easily be deployed in clinical workflows. Concretely, this work will:
1. Develop systems to accelerate health research by increase the value of free text reports, and which can, in clinical settings, summarise patient trajectory for new consultations.
2. Provide a path to analysing biases in referrals to scanning, improving fairness and trusthworthiness of predictive models for diseases.
3. Develop and advance TRE functionality in collaboration with eDRIS.

Training and Development Outcomes for the Student

* Developing skills in applying/implementing deep learning for NLP and medical imaging
* Data Science for curation of raw data within constrained environments (TREs)
* Experience in using and developing the emerging field of TREs, including ethics and governance procedures.
* Experience in working with real-world health data and collaborating with clinical domain experts
* Experience in Patient and Public Involvement to shape the direction of research.

References

- [1] International Statistical Classification of Diseases and Related Health Problems. https://www.who.int/standards/classifications/classification-of-diseases 
- [2] Tang, A.S., Woldemariam, S.R., Miramontes, S. et al. "Harnessing EHR data for health research". Nat Med 30, 1847–1855 (2024). https://doi.org/10.1038/s41591-024-03074-8 
- [3] Artsi Y., Klang E. et al. "Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications". Intelligence-Based Medicine, 12 (2025), ISSN 2666-5212, https://doi.org/10.1016/j.ibmed.2025.100287 
- [4] Alex, B., Grover, C., Tobin, R. et al. Text mining brain imaging reports. J Biomed Semant 10 (Suppl 1), 23 (2019). https://doi.org/10.1186/s13326-019-0211-7 
- [5] Li X., Li L. et al. "Vision-Language Models in medical image analysis: From simple fusion to general large models". Information Fusion, 118 (2025), ISSN 1566-2535,
https://doi.org/10.1016/j.inffus.2025.102995 .
- [6] Trusted Research Environments. https://www.hdruk.ac.uk/access-to-health-data/trusted-research-environments/ 
- [7] Camilleri M., Gouzou D. et al. "A large dataset of brain imaging linked to health systems data: a whole system national cohort" (in preparation).
- [8] Language Technology Group (website) https://www.ltg.ed.ac.uk/ 
- [9] Tay Y., Dehghani M., et al. "Efficient Transformers: A Survey". ACM Comput. Surv. 55, 6, Article 109 (June 2023), https://doi.org/10.1145/3530811 
- [10] Generation Scotland https://genscot.ed.ac.uk/ 
- [11] Shmatko, A., Jung, A.W., Gaurav, K. et al. Learning the natural history of human disease with generative transformers. Nature (2025). https://doi.org/10.1038/s41586-025-09529-3 


A futuristic infographic illustrating deep learning, with arrows pointing to benefits such as deeper insights, understanding bias, and improved disease prediction

Patients undergoing hemodialysis (HD) exhibit significantly higher mortality rates compared to those who had kidney transplants. This disparity is largely attributed to the accumulation of uremic toxins that standard HD treatments fail to completely remove. Despite this acknowledged issue, systematic identification of specific uremic toxins impacting mortality in patients receiving maintenance HD has not been effectively addressed. This project integrates AI, metabolomics, and biomedical materials science to accelerate the identification of key metabolites and biological pathways involved in the mortality of dialysis patients and to discover biocompatible filtering materials that could enhance HD efficacy in toxin removal. By leveraging data from existing literature and collaborations, this synergistic approach seeks to elucidate the mechanisms behind elevated mortality in HD patients and develop solutions to mitigate these risks, with the ultimate goal of reducing patient mortality.

Supervisory team

Grazia De Angelis, Karl Burgess and Bryan Conway

Project Partner

Kidney Research UK

Project Background

Approximately 2 million individuals globally suffer from kidney failure, necessitating treatment options such as transplantation and dialysis. Transplantation is limited by donor availability, forcing many to rely on HD. Whereas transplant recipients exhibit approximately 80% survival rates five years post-procedure, those undergoing HD have less than a 50% chance of surviving the same period due to what's known as “residual uremic syndrome.” This condition results from the incomplete removal of certain uremic toxins during HD, significantly contributing to the higher mortality observed in these patients [1]. Current HD technologies rely on membranes which are limited by size, thus unable to effectively eliminate larger uremic toxins from the patient's bloodstream. This approach lacks precision and effectiveness as it is designed on small molecules like urea and fails to address other, more harmful toxins.

Project Aims

Our research aims to enhance HD treatment effectiveness and reduce mortality rates through a multidisciplinary strategy. Initially, we must identify metabolites linked to adverse effects, leveraging metabolomics combined with AI to uncover key molecules influencing kidney failure patient outcomes. Prior studies show inconsistent results, highlighting the complexity of metabolite impacts on patient mortality and emphasizing the need for deeper investigation. We plan to use an integrated metabolomics and AI approach to better understand these mechanisms, paving the way for future comprehensive studies and the development of materials tailored to remove toxic metabolites. AI will play a crucial role in rapidly advancing these objectives, tackling the vast scope of toxins and potential materials.

Project Activities

  • As a PhD student on this project, your primary role will involve:

    Utilizing data from landmark studies carried out over the past decade, enhancing your understanding of clinical outcomes in hemodialysis.
  • Engaging in molecular simulations to assess databases containing thousands of porous materials, focusing particularly on Covalent Organic Frameworks, to identify those capable of efficiently removing harmful toxins from the bloodstream.
  • Applying sophisticated machine learning techniques to screen these materials on a large scale, a methodology currently being developed by our Engineering group.
  • Synthesizing and/or selecting optimal materials based on the unique properties required for effective toxin removal, thereby directly contributing to the design of more efficient and patient-centered hemodialysis treatments.
  • Collaboration with Kidney Research UK and access to their NURTuRE biobank provides a rich, real-world context for your research, offering the opportunity to validate your findings against an extensive range of patient data. 

Translational Potential and Expected Impact

This project not only aims to lead to significant academic contributions but also holds the potential to translate into real-world clinical applications that could drastically reduce patient mortality. We expect this project to lie the basis for interdisciplinary research between the involved groups and provide evidence for larger studies.

Training and Development Outcomes for the Student

Through this project, the student will gain invaluable skills in both the practical and theoretical aspects of biomedical research. They will develop proficiency in metabolomics and artificial intelligence techniques, learning to interpret complex biological data and to apply machine learning algorithms for real-world applications. Additionally, the student will enhance their capabilities in molecular simulations and materials science, crucial for addressing clinical challenges. Through collaboration with external partners, such as Kidney Research UK, and interdisciplinary teamwork, they will also improve their communication and project management skills. This comprehensive training will prepare them for a successful career in bioinformatics and materials engineering.

References

[1] The Kidney Project, University of California San Francisco, https://pharm.ucsf.edu/kidney 
[2] S. Al Awadhi et al, A Metabolomics Approach to Identify Metabolites Associated With Mortality in Patients Receiving Maintenance Hemodialysis, Kidney Int Rep 2024 9, 2718–26.
[3] S. Kalim et al., A Plasma Long‐Chain Acylcarnitine Predicts Cardiovascular Mortality in Incident Dialysis Patients, J American Heart Association 2, 2013.
[4] Hu, J.-R., et al Serum Metabolites and Cardiac Death in Patients on Hemodialysis, Clin J Am Society of Nephrology 14(5): 747-749, 2019.
[5] https://nurturebiobank.org/ , visited on 4th October 2025.
[6] T. Fabiani et al., In silico screening of nanoporous materials for urea removal in hemodialysis applications, Phys. Chem. Chem. Phys., 2023, 25, 24069.
[7] REDIAL, redefining hemodialysis with data-driven materials innovation, project https://www.suspromgroup.eng.ed.ac.uk/redial 
[8] Zarghamidehagani and De Angelis, Machine learning-driven computational screening of covalent organic frameworks for gas separation applications, Separation and Purification Technology, 2025, 377, 134358.
[9] Zarghamidehagani et al., Chemical engineering contribution to hemodialysis innovation: achieving the wearable artificial kidneys with nanomaterial based dialysate regeneration, Physical Sciences Reviews, 2025, 10(3), pp. 279–299


A scatter plot showing binding capacity (mg g⁻¹) on the vertical axis versus equilibrium concentration Cₑq (mg L⁻¹) on the horizontal axis.
A flowchart-style infographic illustrates a workflow for studying COFs (covalent organic frameworks). The diagram is divided into four sections: Examples of COFs library, COFs descriptor, Molecular simulation, and Machine Learning

Early detection of pancreas cancers and pre-malignant lesions offers the best chance of cure for pancreatic cancer. Currently, patients at risk are managed through frequent imaging and clinical assessment—processes that are manual, time-consuming, and prone to error. This project will develop an AI system integrating imaging models and clinical data to detect early malignant transformations in the pancreas. 

Supervisory team

Eleonora D’Arnese, Amir Vaxman and Damian Mole

Project Partner

NHS Lothian

Project Background

Abnormalities in the pancreas detected on CT carry a risk of malignant transformation and require long-term surveillance. The incidence of such referrals is rising rapidly due to the increased number of scans done for other reasons, placing increasing demand on skilled specialists who must manually compare scans over time. This process is labour-intensive, costly, and prone to error: false negatives can delay treatment or allow cancers to go undetected, while false positives may lead to unnecessary surgery. Moreover, patients with low- or negligible-risk abnormalities are subjected to prolonged and expensive monitoring, impacting their well-being. Early detection of malignant transformation of abnormal areas could substantially improve survival.

Project Aims

The primary goal of this project is to develop an AI-based image analysis and decision-making augmentation solution for the surveillance and early detection of cancers or pre-cancers in pancreas. This project will create a new tool that, starting from routinely acquired images and clinical data, will monitor, analyse, and inform decision-making.

Training and Development Outcomes for the Student

The student will train in: AI for scientific computation, medical image processing, geometry processing, and clinical imaging diagnostics. The research will begin by sandboxing training examples (that could be synthetic), to develop the algorithms, progressing to exposure to the clinical dataset, to further develop the algorithm. Concrete development outcomes are:


1) Acquisition of fundamental AI, scientific computation, and diagnostic skills.


2) Create a mature proof-of-concept for pancreatic abnormality analysis.


3) Develop an algorithm using real-world clinical data to meet standardized diagnostic metrics.


With the remarkable progress in Artificial Intelligence (AI), particularly in the field of Transformers, machine learning-driven clinical prediction models (CPM) are gaining prominence in the literature [1]. However, most of these models are yet to be applied in practice for real-world clinical decision-making. To translate these tools’ real-world applications, they need to be accessible, adaptable, and actionable. In this project, we will develop usable models and assess their translation potential to decision-making in robot-assisted surgery (RAS). Recent advances in RAS have revolutionized healthcare, and allowed the collection of real-time pre-, post- and during-surgery data that can assist critical decision-making around when these surgeries should be offered and what potential complications might arise from these surgeries. A usable predictive model will facilitate this and lead to safer decision-making, reducing the burden on individuals and the healthcare system.

Supervisory team

Sohan Seth and Ewen Harrison

Project Partner

Intuitive

Project Background

Recent years have witnessed significant progress in machine learning driven clinical prediction models [1]. These models are shown to be robust, accurate and well calibrated on various publicly available benchmark datasets, e.g., MIMIC-IV. Using these models in practice, however, is not straightforward, and additionally requires them to be accessible, adaptable, and actionable, such that they are equipped to deal with multimodal data under competing risks, predicting various outcomes of interest simultaneously in real-time while presenting their decisions in a human-interpretable manner for guiding practical decisions under various resource and safety constraints. This is challenging and particularly difficult in high-stakes environments such as lifesaving surgeries. Therefore, these models are yet to be applied to clinical practices for decision-making widely. Recent technological advancements have witnessed the advent on robot-assisted surgeries making them safer and proving real time measurements paving the way for data-driven decision-making. But critical decisions remain to made around the section of surgery in the context of whether the benefit from surgery outweighs to complications for postoperative care. Having a better sense of factual and counterfactual situations over multiple outcomes and constraints provide an holistic view of treatment that helps with more informed decision-making at an individual level, and resource allocation at a healthcare level.

Project Aims

We aim to develop predictive models that are accessible, i.e., the model’s decision is understandable to the end-users and traceable to features responsible for the decision, adaptable, i.e., the model can be transferable to different populations relatively easily, and it can be adapted to a changing environment, and actionable, i.e., the model can integrate various data sources as potentially multiple resolutions, and can provide real-time outcome from longitudinal data. We aim to assess in model in uncovering the mechanisms that drive complications, resilience, and recovery, or to test whether different surgical approaches truly minimise physiological stress across diverse patient groups.

Translational Potential and Expected Impact

The project develops machine learning driven clinical prediction model to make these models usable. The project aims to assess the translation of a recently developed method into a real-world application. The current technology is at a Technology Readiness Level 3, and we expect it to explore its performance on real data beyond publicly available benchmarks to potentially move it towards Technology Readiness Level 4. However, we expect the project to evaluate performance beyond accuracy and calibration, and establish the method on various usability metrics based on transparency, traceability, accessibility, privacy, adaptability, etc. We expect the project to push the boundaries of translation-ready clinical predictive models and set standards in data and methods practices in healthcare informatics. The successful completion of the project will enable clinicians to make real-world decisions around life-saving surgeries and post-operative care.

Training and Development Outcomes for the Student

We expect the project to train the prospective student in cutting-edge AI tools and health informatics. The project requires developing machine learning models and deploying these models in clinical decision-making. The project also involves an understanding of the clinical variables, pre-processing and interpretation. The student will be based in the Data Science Unit at the School of Informatics. DSU hosts a diverse range of researchers working in various disciplines, including health, social science, chemistry, geosciences, etc.. This allows the student a diverse exposure. The student will also be based in the Surgical Informatics group, hosting researchers with a range of clinical and health informatics expertise, allowing the student to learn from a different discipline besides informatics.

References

[1] https://doi.org/10.1038/s41586-025-09529-3 


Dementia is a progressive syndrome affecting memory, cognition, and spatial navigation, reducing quality of life for people living with dementia (PlwD) and placing strain on carers and healthcare systems. This project aims to improve the quantification of dementia progression, as well as emotion recognition, using data collected in a virtual reality environment that engages PlwD in personalised navigation tasks. Leveraging self-supervised deep learning and explainable artificial intelligence, the student will identify interpretable navigational biomarkers of disease progression and emotional reactions. The findings hold promise for earlier detection, personalised interventions, and scalable, cost-effective support to improve outcomes for PlwD.

Supervisory team

Arno Onken and Vito De Feo

Project Partner

Bike Labyrinth 

Bike Labyrinth will provide biomedical relevance in the form of use cases for virtual environments to assist in improving spatial navigation and memory abilities. The company will also provide access to their development and production facilities for conveying product development needs, and coach the student during monthly meetings.

Project background

Dementia is experienced as an ongoing decline in brain functions, including reasoning, memory, spatial navigation, and keeping track of time. Consequently, People living with Dementia (PlwD) tend to have additional difficulties affecting their cognitive, mental and physical abilities, which not only impacts their own quality of life but also poses challenges for their families and carers. For example, in its early stages, Alzheimer's Disease (AD) causes difficulties in dealing with new information. As AD progresses, memory loss affects sufferers’ ability to plan and carry out day-to- day tasks, and problems with spatial navigation make it more difficult for sufferers to reliably find their way back home from familiar places. Physical activity can help remedy some of this decline due to its benefits for brain health; however, taking part in physical activity can be particularly challenging for PlwD. These issues not only reduce the quality of life of PlwD but also lead to unnecessary hospitalisations and delays in hospital discharge, stressing the importance for effective pre-hospitalisation preventive and supportive solutions.

Project aims

The aim of this project is to improve the quantification of dementia disease progression and emotional recognition using data collected from a virtual reality (VR) environment. To this end, the student will leverage the latest developments in self-supervised deep learning and explainable artificial intelligence to find navigational features that best characterise disease progression and emotional reactions.

Data and Methodology

Dementia is a multi-dimensional syndrome that originates in the brain, affecting different functionalities, such as memory, cognition, spatial and temporal orientation, and emotional regulation. People living with dementia often have multiple comorbidities affecting their physical functioning as well as their overall quality of life and social health.

The external industry partner Bike Labyrinth is developing a VR-enhanced exercise bike, an easy-to-use, engaging and safe training environment. PlwD can train their spatial navigation and memory abilities by moving in a 3D virtual environment simulating their local city. This device does not require permanent involvement of specialist personnel and can be used to evaluate the performance of subjects without the need to create separate physical and cognitive measures. It can be personalised to any specific city and the individual needs of subjects, allowing person-centred training and real-time adaptation. While the bike is still at the prototype stage, the VR environment can already be explored using a simulator.

AI systems have been used to assess the state and predict the progression of cognitive decline (Jiang et al., 2020). Using the VR environment allows us to assess navigational and memory performance of subjects. There are suitable features that have already been used in early diagnosis of AD, namely average steps and path-efficiency (Jiang et al., 2020). We will build on these insights and enhance them using data-driven modelling. We will use self-supervised deep learning techniques to model subject navigation in the virtual environments and use explainable AI techniques such as LIME and SHAP (Hassija et al., 2024, Vimbi et al., 2024) to find interpretable features that characterise progression of dementia. This will allow us to accurately and automatically quantify changes in the performance of the subject as related to pathogenesis of the disease.

Translational Potential and Expected Impact

There are ~50 million people living with dementia worldwide and this is predicted to rise to 152 million by 2050. In the UK, 700,000 family carers look after the 850,000 people living with dementia, and this is expected to rise to 1.6 million by 2040. Current UK costs of dementia for older people are £34.7 billion a year, including healthcare (£4.9 billion), social care (£15.7 billion) and unpaid care (£13.9 billion). Total UK dementia care costs are projected to increase to £94.1 billion by 2040. Better quantification of disease progression holds promise to improve quality of life of PlwD.

References

Jiang, J., Zhai, G., & Jiang, Z. (2020, June). Modeling the self-navigation behavior of patients with Alzheimer’s disease in virtual reality. In International Conference on VR/AR and 3D Displays (pp. 121-136). Singapore: Springer Singapore.

Hassija, V., Chamola, V., Mahapatra, A. et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput 16, 45–74 (2024). https://doi.org/10.1007/s12559-023-10179-8 

Vimbi, V., Shaffi, N. & Mahmud, M. Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inf. 11, 10 (2024). https://doi.org/10.1186/s40708-024-00222-1 


This project will develop an AI-based method and tool for segmenting iron and calcium accumulation throughout the whole brain and in its different forms (tissue deposition, brain microbleeds, superficial siderosis, and haemorrhagic transformations from ischaemic lesions) in a large sample of MRI images acquired from different patient groups, assess the degree of mineral accumulation in the areas segmented offering a proxy for insoluble iron/calcium concentration and degree of aggregation (i.e., clustering) in different subregions (also using AI methods), and validate the AI-based imaging computational assessments using complementary biomedical analysis methods in a sample of individuals with brain MRI, retinal images, and tissue samples.

Supervisory team

Maria Valdés Hernández, Blanca Diaz-Castro and Miguel O. Bernabeu Llinares

Project Partners

Pharmatics Ltd

Project Background

Iron is involved in oxygen transport and is essential for maintaining a healthy body’s function. But an excess of it can lead to oxidative stress damage to biomolecules, as well as cellular dysfunction. This process is apparent with increasing age, where iron gets accumulated in the brain, and it increases the risks of neurodegenerative diseases. Overall, it is the strongest factor influencing cognitive decline in normal ageing. Although this process mainly occurs gradually and silently, it can be detected using magnetic resonance images (MRI) even in the preclinical stages when minor cognitive concerns are starting to occur and before any other clinical symptom appears. In normal ageing this toxic iron accumulation mainly occurs in the globus pallidus, a subregion at the centre of the brain. In individuals with neurodegenerative diseases it has different spatial distributional patterns. We previously developed an automatic method to identify and segment the areas in normal ageing MRI scans and validated it with a physical phantom. But we could not establish the degree of mineral accumulation in the segmented areas, most important for predictive medicine. Moreover, our method was only limited to a small brain region, given the computational power available at the time.

Project Aims

This project will develop an AI-based method and tool for segmenting iron and calcium accumulation throughout the whole brain and in its different forms (tissue deposition, brain microbleeds, superficial siderosis, and haemorrhagic transformations from ischaemic lesions) in a large sample of MRI images acquired from different patient groups, assess the degree of mineral accumulation in the areas segmented offering a proxy for insoluble iron/calcium concentration and degree of aggregation (i.e., clustering) in different subregions (also using AI methods), and validate the AI-based imaging computational assessments using complementary biomedical analysis methods in a sample of individuals with both brain MRI and tissue samples.

Data and Methodology

The student will use well-phenotyped data with carefully generated ground truth from studies conducted at the Centre for Clinical Brain Sciences (1,2) to develop the iron deposition assessment method, which will give as output differential probabilistic masks of various forms of iron deposits throughout the whole brain. The breadth of data available for the project includes routine clinical MRI, vascular function and blood-brain-barrier permeability measurements, clinical,
demographic, and cognitive information from each of the studies’ participants (approximately 1200). Tissue samples from which derive iron concentration curves are from ~20 brains from the study on cognitive ageing which also has brain MRI acquired in different (i.e., five) assessment waves every three years (2). The tissue samples were imaged at 7T MRI and the co-supervisor has aligned both modalities (3). The co-supervisor of the project has experience in proteomics analyses in relation to small vessel disease, to discern which imaging phenotypes involve different forms of iron deposition.
Therefore, preliminary data held by the co-supervisor of the project may be useful in further validating the developed method. More tissue-MRI pair samples have been also acquired from tissue banks, reaching a total of 80 samples (3).
Once the AI assessment method is validated using the in-house data from different studies, MRI data from online repositories will be downloaded to test and re-train the AI model for increased robustness and reduced bias. Finally, and given the strong association between these deposits and dementia progression, we will upload the model to the National Safe Heaven to apply it to the National Scottish Registry MRI data to estimate dementia prediction accuracy, over the estimation achieved using currently available methods.

(1) Clancy et al 2021 https://doi.org/10.1177/2396987320929617 
(2) Taylor et al 2018 https://doi.org/10.1093/ije/dyy022 
(3) Humphreys et al 2019 https://doi.org/10.1177/1747493018799962 

Translational Potential and Expected Impact

This project offers a rare opportunity of working in a clinically relevant theme to address a clinical need and work with a breadth of data from different modalities and nature. It goes beyond the conventional use of computational descriptors for validating the AI- based method, to use clinically relevant data to ensure its impact and further applicability in clinical research and practice. The project might enhance existing MRI instruments and methods by integrating Artificial Intelligence and has substantial scientific and commercial potential.

Training and Development Outcomes for the Student

The student will be trained on MRI mineral deposition identification and in all the current knowledge around it and exposed to real-world clinical and research neuroimaging work, as well as laboratory (biological, proteomics) work underpinning the clinical neuroimages. The student will also be trained in medical image processing methods, and exposed to commercial (industry), translational and research environments with emphasis in fair AI. At the end of the PhD it is expected that the student acquires a high level of knowledge on the theme and has developed a prototype that can be commercially viable as an add-on module for clinical and research MRI platforms.

References

(1) Ji Y, Zheng K, Li S, et al. Insight into the potential role of ferroptosis in neurodegenerative diseases. Front Cell Neurosci. 2022 Oct 27;16:1005182. https://doi.org/10.3389/fncel.2022.1005182 
(2) Valdés Hernández M, Allerhand M, Glatz A, et al. Do white matter hyperintensities mediate the association between brain iron deposition and cognitive abilities in older people? Eur J Neurol. 2016 Jul;23(7):1202-9. https://doi.org/10.1111/ene.13006
(3) Valdés Hernández, M., Ritchie, S., Glatz, A. et al. Brain iron deposits and lifespan cognitive ability. AGE 37, 100 (2015). https://doi.org/10.1007/s11357-015-9837-2 
(4) Valdés Hernández Mdel C, Glatz A, Kiker AJ, et al. Differentiation of calcified regions and iron deposits in the ageing brain on conventional structural MR images. J Magn Reson Imaging. 2014 Aug;40(2):324-33. https://doi.org/10.1002/jmri.24348 
(5) Glatz A, Bastin ME, Kiker AJ, Deary IJ, Wardlaw JM, Valdés Hernández MC. Automated segmentation of multifocal basal ganglia T2*-weighted MRI hypointensities. Neuroimage. 2015 Jan 15;105:332-46. https://doi.org/10.1016/j.neuroimage.2014.10.001 
(6) Clancy U, Garcia DJ, Stringer MS, Thrippleton MJ, Valdés-Hernández MC, Wiseman S, Hamilton OK, Chappell FM, Brown R, Blair GW, Hewins W, Sleight E, Ballerini L, Bastin ME, Maniega SM, MacGillivray T, Hetherington K, Hamid C, Arteaga C, Morgan AG, Manning C, Backhouse E, Hamilton I, Job D, Marshall I, Doubal FN, Wardlaw JM. Rationale and design of a longitudinal study of cerebral small vessel diseases, clinical and imaging outcomes in patients presenting with mild ischaemic stroke: Mild Stroke Study 3. Eur Stroke J. 2021 Mar;6(1):81-88. https://doi.org/10.1177/2396987320929617 
(7) Taylor AM, Pattie A, Deary IJ. Cohort Profile Update: The Lothian Birth Cohorts of 1921 and 1936. Int J Epidemiol. 2018 Aug 1;47(4):1042-1042r. https://doi.org/10.1093/ije/dyy022 
(8) Humphreys CA, Jansen MA, Muñoz Maniega S, González-Castro V, Pernet C, Deary IJ, Al-Shahi Salman R, Wardlaw JM, Smith C. A protocol for precise comparisons of small vessel disease lesions between ex vivo magnetic resonance imaging and histopathology. Int J Stroke. 2019 Apr;14(3):310-320. https://doi.org/10.1177/1747493018799962 


Document
Research Overview Slides (2.72 MB / PDF)

This PhD project addresses the complexity of designing therapeutic protein binders by developing a novel, interpretable abstract representation of proteins informed by Molecular Dynamics (MD) simulations. Current inverse design methods are challenged by the vast sequence-structure space and computational cost. Our primary aim is to create an abstract framework that significantly streamlines the design process, allowing for fast exploration of the protein sequence space to achieve target properties, including specific binding, stability, and desired immunogenicity. This data-driven abstraction, grounded in molecular dynamics, will capture essential features for accurate and efficient design. The approach will be validated using the therapeutically relevant PD-1 system with AstraZeneca. The aim is to overcome limitations in current design methodologies, paving the way for innovations in targeted drug delivery and biosensing.

Supervisory team

Kartic Subr and Chris Wood 

Project Partners

TBC

Project Background

The design of proteins using inverse methods plays a pivotal role in developing therapeutic binders—proteins engineered to attach to specific drug targets with high affinity. This PhD project aims to enhance binder design by integrating molecular dynamics (MD) simulations with simplified, interpretable representations of proteins to design highly specific binders. Protein folding, dictated by amino acid sequences, presents significant challenges due to its complexity, especially when designing proteins for specific interactions.
To facilitate the design of large protein complexes with targeted binding capabilities, this research focuses on creating and validating simplified, interpretable representations of proteins informed by molecular dynamics simulations. These models will streamline the design processes, making it feasible to design sequences that achieve desired binding properties at a fraction of the computation. The project will utilize computational approaches to optimize these sequences, ensuring that the resulting proteins exhibit the necessary stability, specificity, and immunogenicity profiles. Ultimately, this work intends to produce novel protein binders with significant applications in therapeutics, diagnostics, and synthetic biology. By advancing the methodologies for protein design, this project seeks to overcome limitations in creating efficient binders, paving the way for innovations in targeted drug delivery and biosensing technologies.

Project Aims

The primary aim for this project is to develop an abstract representation for proteins that will enable the design of specific therapeutic binders. The representation will be interpretable and explainable while also enabling fast exploration of the design space, based on target properties such as shape, dynamics, and functionality. By integrating molecular dynamics with data-driven abstraction, the framework will capture essential features required for accurate and efficient binder design. The approach will be tested using the PD-1 system, a well-known immune pathway with established therapeutic relevance in cancer immunotherapy.

Data and Methodology

The proposed methodology will focus on developing an abstract representation for protein binder design, using an anti-PD1 as a model system targeting the PD-1 receptor involved in immune checkpoint pathways.

The research will collect a dataset of known PD-1 structures and existing binders, including peptide sequences, 3D structures, and binding affinities. The dataset will serve as the basis for training machine learning models. The approach involves developing simplified protein representations through techniques like autoencoders, which will distil essential structural features critical for effective binding. Molecular dynamics simulations will validate these models, testing their capacity to predict how novel sequences fold and interact with the PD-1 receptor. The simulations will assess the dynamics and potential energy landscapes of candidate peptides designed using this abstract representation.

The project will develop search algorithms based on genetic algorithms to explore the design space, identifying peptide sequences optimized for binding PD-1. This computational search will focus on sequences predicted to exhibit high specificity, stability and immunogenicity profiles.
For validation, the project will perform in silico experiments to screen these candidates, employing binding free energy calculations and docking simulations to estimate their affinity for the PD-1 receptor. Successful candidates will move to experimental validation, where peptides will be synthesized and tested at AstraZeneca, Cambridge (external partner) using Surface Plasmon Resonance and isothermal titration calorimetry, measuring real-world binding affinities and kinetics.

The feedback loop, incorporating these experimental findings, will continuously refine the model and search algorithms. By integrating computational insights with empirical data, this iterative process aims to enhance the precision and effectiveness of designing PD-1 binders, contributing to advancements in cancer immunotherapy.

Translational Potential and Expected Impact

The proposed methodology offers a general framework for designing therapeutic protein binders using abstract representations. By representing proteins through simple fragment-based abstraction, the method enables broader exploration of conformational space while retaining interpretability. The added interpretability allows for a greater understanding of the mechanisms underlying binding and enables steering the design towards desired pharmacological profiles. Using PD-1 as a motivating example, the framework will demonstrate how simplified, explainable models can accelerate the discovery of selective immune checkpoint binders. Ultimately, this approach aims to shorten the path from computational design to effective therapeutics, improving outcomes for patients.

Training and Development Outcomes for the Student

The student will gain expertise in computational modelling, specifically in protein design and molecular dynamics simulations, enhancing their technical skills. They will develop proficiency in machine learning techniques for data analysis and representation design, directly applicable to modern drug discovery.
Additionally, the student will learn to conduct interdisciplinary research, bridging computational methods with experimental validation. This includes collaborating with laboratories for in vitro testing, providing practical insights into experimental protocols.

The student will also cultivate strong problem-solving abilities and the capacity to translate scientific findings into real-world applications, particularly in drug design and therapy development. Effective communication skills will be enhanced through reporting and presenting research findings to diverse audiences. Furthermore, the student will gain a thorough understanding of ethical research practices, preparing them for responsible scientific leadership in their future career.

References

Castorina LV, Wood CW, Subr K. From Atoms to Fragments: A Coarse Representation for Functional and Efficient Protein Design. bioRxiv; 2025. DOI: 10.1101/2025.03.19.644162.

Crowdsourced Protein Design: Lessons From the Adaptyv EGFR Binder Competition
Tudor-Stefan Cotet, Igor Krawczuk, Filippo Stocco, Noelia Ferruz, Anthony Gitter, Yoichi Kurumida, Lucas de Almeida Machado, Francesco Paesani, Cianna N. Calia, Chance A. Challacombe, Nikhil Haas, Ahmad Qamar, Bruno E. Correia, Martin Pacesa, Lennart Nickel, Kartic Subr, Leonardo V. Castorina, Maxwell J. Campbell, Constance Ferragu, Patrick Kidger, Logan Hallee, Christopher W. Wood, Michael J. Stam, Tadas Kluonis, Süleyman Mert Ünal, Elian Belot, Alexander Naka, Adaptyv Competition Organizers
bioRxiv 2025.04.17.648362; doi: https://doi.org/10.1101/2025.04.17.648362

North B, Lehmann A, Dunbrack RL Jr. A new clustering of antibody CDR loop conformations. J Mol Biol. 2011 Feb 18;406(2):228-56. doi: 10.1016/j.jmb.2010.10.030. Epub 2010 Oct 28. PMID: 21035459; PMCID: PMC3065967.


Accurate tracking of symptoms and progression of multiple sclerosis is essential for drug discovery and disease management. However, current measurement tools are tedious, prone to bias, and do not reflect what people with MS experience. Many symptoms remain invisible and unrecognised. Of these, fatigue is the most debilitating and reported by most patients. Yet no methodology to measure and tackle fatigue exists.

Supervisory team

Paul Patras and Thanasis Tsanas

Project Partners

Hoffmann-La Roche

Project Background

Multiple Sclerosis (MS) is a neurodegenerative autoimmune disease that affects approximately 3 million people worldwide. The disease primarily affects the central nervous system, with the immune system attacking the myelin sheath around nerve cells. The symptoms of MS very broadly and can have debilitating effects. This includes loss of vision, numbness, mobility problems, cognitive decline, etc., which increase as the disease progresses. Recent studies report that the annual cost of MS-related disability exceeds per capita gross domestic product (GDP), which confirms the major societal cost of this condition.

Project Aims

1) develop new fine-grained data-driven fatigue monitoring methods that build upon detailed telemetry that will be gathered using wearable devices and personal living space sensors (motion, pressure, LiDAR, etc.)
2) data labelling using correlation analysis with image biomarkers, fluid biomarkers, and patient-reported outcomes
3) baselining with volunteer patients using medical-grade wearable devices
4) develop deep learning models that can analyse multi-modal spatio-temporal data to detect early disease-specific symptoms, health improvements, or decline.
5) develop lightweight compact/approximate data structures and deep learning models that can be deployed on computationally-constrained devices

Translational Potential and Expected Impact

Ultimately the outcomes of the project seek to improve fatigue management and potentially the efficacy assessment of new drugs. Long-term, the methods developed may assist consultant neurologists in exploring personalised treatment and improve long-term patient outcomes.


Online gaming has become increasingly prevalent, yet research into the effects and impact of long-term gaming on mental health is limited and often lacks an interdisciplinary focus. This project, in collaboration with HealthyGaming, aims to develop a data analysis pipeline that focuses on individual traits and states, game-related decision making, and mental health outcomes. Employing techniques from data science (including machine learning) and neurocognitive sciences (encompassing questionnaires, computational modelling, biometrics, and neuroimaging) we aim to understand what determines the mental health outcomes in gaming and gamification settings. This could lead to proposed interventions with the objective of improving those outcomes.

Supervisory team

Gedi Luksys and Robin Hill

Project Partners

HealthyGaming

Project Background

Due to their increasing popularity, online platforms that act as information gateways across domains such as news, social media and gaming have been gaining prominence in research, helping to better understand decision making patterns, unravel their neurocognitive mechanisms, and determine impact on mental health. Gaming-related decision making takes place at many levels: from a decision to initiate playing a game (and causes as well as triggers of that) to further decisions to continuously invest time, effort, and sometimes money into the play, to decisions within the games such as team interaction and building (as many games are team-based) and many game-specific decisions that have impact on competitive outcomes. Similar dynamics occur on news and social media platforms that employ gamification in a substantial way.

In order to understand how such decision making links to mental health, computational psychiatry focuses on building models of decision making, fitting them to the observed behaviours and linking parameters and variables of such models to biometric markers (e.g. emotional expressions), neuroimaging markers (e.g. brain activity patterns) and responses from standard personality and mental health questionnaires. Such approach, if effective, can predict neuropsychiatric conditions in a more cost efficient way than standard clinical assessments.

Project Aims

We aim understand how individual traits and states can drive gaming-related decisions, what is the impact of competitive feedback in driving continued involvement, and how all these actions can lead to beneficial or detrimental mental health outcomes in the medium and the long term for individuals. In addition, we aim to understand neurocognitive mechanisms underlying such decisions and what is referred to as “suspension of disbelief”. Finally, we want to find strategies to improve mental health outcomes which could include advising both individuals on better paths for them as well as communities on more sustainable recruitment and engagement strategies.

Data and Methodology

Building on the supervisors’ experience with news-related decision making and human information processing (through the development of MyNewsScan news aggregator platform, mynewsscan.eu), computational modelling and cognitive science, and in conjunction with the industrial partner’s (HealthyGaming) experience with health-related gaming, this project will investigate the impact of gaming and gamification on mental health. In particular, it will focus on incentivisation and decision making (both in-game and in digital adjacent gaming environments).

Our research will involve case studies of gaming that may include both amateur and professional gamers, in single and multiplayer games, as well as gamification in non-gaming platforms, such as MyNewsScan. In cooperation with HealthyGaming, the student will analyse core game and gamification aspects and dynamics, as well as the structure of selected gaming communities. We will also have gaming and gamification metadata (covering the usage and access of the games) as well as in some cases in-game data which we could analyse using data science approaches, including machine learning.

In coordination with mental health professionals and our gaming partners (including but not limited to various esports events around the world), we will use standard personality and mental health questionnaires combined with gaming related questionnaires. We will explore the roles of modulators such as stress, sleep, and motivation on decision making. We will also develop computational psychiatry models (such as reinforcement learning, motivation and drift diffusion models) that could provide insights into key parameters underlying game and gamification-related decision making, and will aim to validate them using mental health datasets. Finally we will recruit a sample of gamers in Edinburgh whose decision making could be studied more in depth in the lab using neurocognitive techniques such as collection of biometrics (e.g. eye tracking, heart rate, pupil dilation, skin conductance and emotional expressions) and neuroimaging (particularly EEG) data.

Translational Potential and Expected Impact

Overall, our research effort will be beneficial towards identifying games that can best be used for psychotherapeutic purposes, thereby improving well-being of millions of gamers around the world. Through our computational psychiatry and cognitive neuroscience efforts, we also aim to develop effective methodologies how to use gaming and gamification-related data to predict mental health patterns and outcomes, which could then lead to a set of proposed interventions with the objective of improving those outcomes. Eventually our work may help improve the management of gaming addiction as well as gambling disorders, device, social media addictions, and AI companionship dependency.

Training and Development Outcomes for the Student

We expect that a successful PhD student will develop a data analysis pipeline, making use of existing data and data collection opportunities and collaborations, synthesising different types of data (such as questionnaires, game and gamification metadata, in-game data) and employing effective data analysis tools (machine learning or computational modelling) to gain insights. The student will also be involved in neurocognitive experiments that will aim to link our behavioural data analysis pipeline to neural substrates and mental health patterns. Due to methodologically diverse and highly collaborative nature of the project, it will provide numerous technical development, entrepreneurial training and networking opportunities.

References

Huckvale et al., “Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety”, npj Digital Medicine 2019
https://www.nature.com/articles/s41746-019-0166-1 

Aeberhard et al., “Introducing COSMOS: a Web Platform for Multimodal Game-Based Psychological Assessment Geared Towards Open Science Practice”, Journal of Technology in Behavioural Science 2019
https://link.springer.com/article/10.1007/s41347-018-0071-5 

Paquin et al., “Trajectories of Adolescent Media Use and Their Associations With Psychotic Experiences”, JAMA Psychiatry 2024
https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2817594 

Vosoughi et al., “The spread of true and false news online”, Science 2018
https://science.sciencemag.org/content/359/6380/1146.full 

Kramer et al., “Experimental evidence of massive-scale emotional contagion through social networks”, PNAS 2014
https://www.pnas.org/content/111/24/8788 

Strasser et al. “Glutamine-to-glutamate ratio in the nucleus accumbens predicts effort-based motivated performance in humans”, Neuropsychopharmacology 2020
https://www.nature.com/articles/s41386-020-0760-6 

Luksys et al., “Stress, genotype and norepinephrine in the prediction of mouse behavior using reinforcement learning”, Nature Neuroscience 2009
https://www.nature.com/articles/nn.2374 


Preclinical drug research and development relies heavily on animal models, particularly for central nervous system (CNS) indications. However, classic behavioural assays often fail to translate to the clinic. Home Cage Analysis systems have been developed to provide improved welfare (refinement) by replacing stressful isolated tests with continuous measurements. Supported by recent evidence, the key hypothesis behind this research is that longitudinal measures such as those collected in home cage monitoring align more closely with clinical outcomes. This project will explore, develop and evaluate computer vision and machine learning methods to extract robust behavioural biomarkers from large-scale longitudinal rodent home-cage datasets.

Supervisory team

Douglas Armstrong and Michael Camilleri

Project Partners

Actual Analytics

Project Background

Drug research and development relies heavily on the use of laboratory animal models at key stages in the pipeline to test for efficacy and safety. Both of these attempt to translate the physiology including behaviour of the animal model to the clinical condition in humans. Successful translation across species is critically dependent on identifying relevant biomarkers. In some cases molecular biomarkers can be identified and these have led to significant advances in the replacement of animals with innovative in vitro cellular and/or molecular assays. However for CNS indications (diseases) and in CNS safety indications this still relies heavily on behaviour.

Traditional behavioural observational assays produce snapshot data in carefully controlled stimulus-response scenarios [1] that does not take into account the entire richness of their behaviour [2], is often influenced by the presence of researchers and does not translate well to behaviour “in the wild”. For this reason, there has been a recent shift towards long-term analysis of animals in their home-cage [3]. For example, in a recent safety pharmacology study, the continuous behavioural measures extracted from home cage data correlated much more closely with adverse clinical outcome than the traditional tests [4].

Project Aims

- Explore, develop and evaluate methods to extract behavioural biomarkers from longitudinal rodent homecage data. These could be single behaviours eg. grooming, seizure or, more likely, complex interactions of multiple behaviours.
- Correlation of behavioural biomarkers with clinical outcomes using a mix of historical end points and collaboration with ongoing research with collaborators.
- Test the hypothesis that continuous home cage derived behavioural biomarkers have improved translational accuracy in pharmaceutical R&D.

Data and Methodology

We do not perform laboratory studies ourselves rather we collaborate with end-users. We have a wide range of data in-hand. This includes datasets collected at academic research institutions through to groups in the pharmaceutical industry. We have permission to use these data for research and development, we have the active engagement of these end users and we have the agreement in principle to publish along with these users any new findings. Essentially we have an excess of data in place, freedom to operate and favourable agreements to co-publish. Additional datasets are also now in the public domain with appropriate public licenses.

In addition we have well established collaborations with a number of pharmaceutical research companies (e.g. [4]) as well as academic users (e.g. [5]) where we can get access to new datasets and validate progress ‘in the wild’. For all of these collaborations we have general agreements in principle for student/research access and co-publishing of research findings. For very specific data access we may need to update agreements but this is no significant risk to the project.

During the project we will explore methods at the intersection of computer vision and machine learning to assess which are best suited to extracting behavioural biomarkers. This is a rapidly moving field but we will build on existing algorithms/methods that we have developed for identifying [6] and classifying high-level behaviours of group housed mice [7].

Translational Potential and Expected Impact

The project and its underpinning hypothesis are fundamentally translational. If successful we will identify new preclinical biomarkers that correlate better with the current traditional measures [8]. While there is room to explore, a possible example would define and validate a CNS safety liability biomarker from continuous home cage data that was more accurate that the classical functional observational battery.

Training and Development Outcomes for the Student

- The role and application of preclinical research in laboratory animal models.
- The application of new methodologies to promote 3Rs, in particular Refinement
- Computer vision methods for animal identification and tracking.
- Applied AI/ML methodologies for behaviour analytics.
- Development of new methodologies and approaches for the definition of digital biomarkers for translational drug research and development.

References

[1] P. Van Meer and J. Raber, “Mouse behavioural analysis in systems biology.” The Biochemical Journal, vol. 389, no. Pt 3, pp. 593– 610, 2005.
[2] A. Gomez-Marin and A. A. Ghazanfar, “The Life of Behavior,” Neuron, vol. 104, no. 1, pp. 25–36, 2019.
[3] S.D.M.Brown and M.W.Moore,“The International Mouse Phenotyping Consortium: past and future perspectives on mouse phenotyping.” Mammalian genome : official journal of the International Mammalian Genome Society, vol. 23, no. 9-10, pp. 632–640, 2012.
[4] Sillito et al in press. Rodent home cage monitoring for preclinical safety pharmacology assessment: results of a multi-company validation evaluating nonclinical and clinical data from three compounds. Frontiers in Toxicology in press.
[5] Bains et al. Analysis of Individual Mouse Activity in Group Housed Animals of Different Inbred Strains Using a Novel Automated Home Cage Analysis System. Front. Behav. Neurosci., 10 June 2016 https://doi.org/10.3389/fnbeh.2016.00106
[6] Camilleri, M.P.J., Zhang, L., Bains, R.S. et al. Persistent animal identification leveraging non-visual markers. Machine Vision and Applications 34, 68 (2023).
[7] Camilleri, M.P.J., Bains, R.S. & Williams, C.K.I. Of Mice and Mates: Automated Classification and Modelling of Mouse Behaviour in Groups Using a Single Model Across Cages. Int J Comput Vis 132, 5491–5513 (2024).
[8] Baran et al. Emerging Role of Translational Digital Biomarkers Within Home Cage Monitoring Technologies in Preclinical Drug Discovery and Development. Front. Behav. Neurosci., 14 February 2022. https://doi.org/10.3389/fnbeh.2021.758274 


SENTINEL will integrate electronic health records, patient-reported outcomes, and wearable signals to predict inflammatory bowel disease outcomes, updating in response to new measurements being observed. The student will introduce multi-omics in collaboration with our industrial partner, Nightingale Health, to further improve predictions. Embedded within Edinburgh’s IBD service and the Lees and Vallejos data-science research groups, the work delivers an interpretable pipeline ready to power SENTINEL’s proactive, EHR-adjacent decision support. The models will be trained on population based local NHS data (Lothian IBD Registry fully integrated with DataLoch; n=10,000 IBD patients) and Danish national registries.

Supervisory team

Charlie Lees and Catalina Vallejos

Project Partners

Nightingale Health

Role of the external partner
- Provide finger-prick sampling kits and high-throughput NMR metabolomics/protein panels for use during SENTINEL onboarding
- Advise on assay QC, feature engineering and integration into multi-modal models.
- Host a short placement for the student focused on data standards and translational pipelines, and industry experience.
- Co-supervise on multi-omic integration

Project Background

IBD is associated with unpredictable flares, unscheduled admissions, major surgeries and delays in therapy optimisation. In Lothian, >10,000 IBD patients are registered and two decades of longitudinal biomarker data (such as CRP and faecal calprotectin) are available, alongside rich phenotyping data, yet these data are siloed and rarely available at the point of clinical care. We have a world-leading IBD service, yet all-too-often IBD care is reactive and crisis-driven.

SENTINEL is our clinician-led response: an EHR-adjacent service that ingests live laboratory feeds to deliver risk predictions and prompts for nurse-led triage and treatment optimisation. A patient companion app will show both historical and recent trends in disease behaviour, whilst predicting future disease course and collecting patient reported outcomes and hyper-personalised information and sign-posting.

The analytics are being built by three full-time post-doctoral data scientists within the Lees IBD research team and the Vallejos biomedical data-science group, using a landmarking framework for dynamic risk prediction, with latent‑class mixed models to capture heterogeneous biomarker trajectories.

Our prior work shows that longitudinal faecal calprotectin/CRP profiles characterise disease-course heterogeneity and predict disease progression, outlining modern principles for dynamic IBD monitoring.

This PhD addresses a translational gap: integrating additional modalities such as patient-reported outcomes, wearable streams and finger-prick multi-omics (Nightingale Health) - to enhance prediction of flares, admission and surgery, and to extend the modelling to non-IBD outcomes with health-system relevance.

Embedded in Edinburgh’s IBD service and the Lees and Vallejos groups, the student will deliver an interpretable pipeline to assist proactive clinical decision support.

Project Aims

Develop an interpretable, EHR-adjacent pipeline that integrates longitudinal laboratory test results (such as CRP and faecal calprotectin), prescribing, patient-reported outcomes, wearables and finger-prick multi-omics to generate dynamic risk predictions for flare, admission and surgery.

Quantify incremental value, calibration, fairness and utility of each modality and algorithmic choice, and externally validate across Lothian and Danish datasets.

Extend the pipeline to non-conventional IBD outcomes (e.g., cardiovascular events, venous thromboembolism, mental-health crises), and produce a decision-curve playbook mapping risk thresholds to nurse-led actions in SENTINEL for equitable, safe deployment.

Data and Methodology

Datasets. The project will use (i) the Lothian IBD Registry (LIBDR) with two decades of routine laboratory data (CRP and faecal calprotectin, among others), prescribing and outcomes, linked via DataLoch; (ii) patient-reported outcomes and wearables collected through the SENTINEL app (pilot phase begins January 2026); and (iii) external validation cohorts from Danish national registries curated within PREDICT.

Finger-prick multi-omics (Nightingale Health NMR metabolomics/proteins) will be layered as samples are processed under existing collaborations.

Core modelling. We will implement dynamic prediction via landmarking coupled to latent-class mixed models (LCMMs) to capture individual and subgroup trajectories in longitudinal biomarker measurements, then augment with additional clinical features and patient-reported outcomes. A generic landmarking framework is already built by our team as a ready-to-use software which accommodates flexible modelling strategies, including the option to incorporate modern deep-learning based approaches. However, it lacks functionality for multimodal data integration.

End points. Primary: time-updated risk of flare, unplanned admission and IBD surgery within one year from the prediction time-point; Secondary: steroid exposure, quality of life decline and non-IBD outcomes (cardiovascular events, venous thromboembolism, mental-health crises).

Evaluation. Internal-external cross‑validation using metrics of discrimination (time‑dependent AUC/PR‑AUC), calibration, and net benefit (decision curves). We will quantify the incremental value of each modality (e.g. change in time-dependent AUC or Brier score) and differential performance by age, sex, deprivation and ethnicity within a ML fairness framework.

Deployment-readiness. Models will be packaged as services with audit logging, uncertainty quantification, data-shift monitoring and fallback rules for missingness, so outputs can flow to the EHR-adjacent SENTINEL portal for nurse-led triage and clinical review. Generic code (e.g. to incorporate the fairness evaluation) will be added to our landmarking software as an open-source tool.

Translational Potential and Expected Impact

Outputs will be models and code that plug into SENTINEL’s EHR-adjacent portal to drive risk-prioritised lists and nurse-led triage. In Lothian, the service aims to increase 12-month flare-free status by ≥15 percentage points, cut IBD bed-days by ≥20%, and halve time to treatment optimisation, with projected net savings of ~£800 to £2000 per patient-year.

Because the pipeline relies on routine labs that are widely available in healthcare settings. Through the use of optional patient inputs, it can also benefit those who rarely engage with the health system directly while improving with patient-reported outcomes (e.g. IBD symptoms; depression and anxiety scores) and wearables.

External validation and the Nightingale Health partnership provide a path from PhD outputs to adoption.

Training and Development Outcomes for the Student

The student will gain skills in: statistical and ML methods for longitudinal and time-to-event data analysis (landmarking, LCMMs, joint models); multi-omic integration; predictive modelling and evaluation, and ML fairness; and post-deployment monitoring and model updating. Clinical immersion will occur within the Edinburgh IBD service and weekly meetings with the Lees and Vallejos groups (co-supervision/mentorship).

They will complete CDT training, present at IBD/data-science meetings, and pursue papers and software releases. A placement with Nightingale Health will provide exposure to high-throughput NMR workflows and interfaces between omics and care pathways, aligned to SENTINEL.

References

Plevris N, Lees CW. Disease Monitoring in IBD: Evolving Principles and Possibilities. Gastroenterology 2022. (Framework for integrative monitoring/targets.)

Constantine-Cooke N et al. Longitudinal Faecal Calprotectin Profiles Characterize Disease Course Heterogeneity in Crohn’s Disease. Clin Gastroenterol Hepatol 2023. (Dynamic trajectories underpin predictions.)

Constantine-Cooke N et al. Large-scale clustering of longitudinal FCP and CRP profiles in IBD. medRxiv 2025. (Joint FCP/CRP modelling.)

Ebert AC et al. IBD and risk of >1,500 comorbidities. Am J Gastroenterol 2025. (Motivates non-IBD outcomes.)

Hracs L et al. Global evolution of IBD across epidemiologic stages. Nature 2025. (Health-system context.)

Elford AT et al. Twenty-Year Trends in Colectomy and Advanced Therapy Prescribing in Lothian. AP&T 2025. (Local real-world trends.)


Recent advances in AI-based modelling of human behaviour enable a novel, flexible and potentially more quantitatively precise method for measuring human cognitive function, detecting cognitive decline, and improving assistive technologies. This PhD project combines mechanistic theories of human cognitive function with deep reinforcement learning to develop models capturing how cognitive decline manifests in observable human behaviour, enabling new types of digital biomarker. The project provides the student with cross-disciplinary experience across both cognitive neuroscience and machine learning, as well as direct opportunities for translation and impact.

Supervisory team

Subramanian Ramamoorthy, Susan Shenkin and Gustav Markkula

Project Partner

NHS Borders

Project Background

Research has shown that cognitive markers for conditions such as mild cognitive decline and Alzheimer’s disease can be obtained from computerised tests and “serious games” or directly from e.g., in-home activity or smartphone data1–3, but so far the design of these evaluation methods has been largely heuristic. If we had human behaviour models which accurately represent how cognitive decline affects observable behaviour in these various tasks, this could unlock a dramatic improvement in test specificity. In addition, such models would be beneficial in assistive technologies, such as adaptive interaction and dialogue-based prompting systems, to infer cognitive function directly from an unfolding interaction, and to guide the actions taken by a combination of the person and the assistive system.

Emerging results in cognitive science and machine learning have recently enabled an approach to modelling of human behaviour with high fidelity across a variety of task contexts, by combining mechanistic modelling of human perceptual, cognitive, and motor limitations with deep reinforcement learning4–6. This approach, with its emphasis on cognitive modelling of human limitations, and its overall flexibility, holds promise for modelling and measuring how human behaviour is affected by cognitive decline, and could thus be used to improve tools for detecting such decline and better supporting individuals experiencing it.

Project Aims

The overall aim of this PhD project will be to investigate the use of advanced human behaviour models as a potential marker for useful cognitive function (e.g. executive function and working memory). More specifically, human behaviour models will be developed and tested for a few selected tasks, to investigate to what extent the models can capture empirically observed effects of variations in cognitive function on the human behaviour in these tasks. These models will then be used for estimating cognitive function directly from observed human behaviour in these tasks.

Translational Potential and Expected Impact

The developed models and methods have direct applicability for measuring and monitoring cognitive function. If the results are positive, steps toward translation and impact can be taken.

As mentioned, a second potential application is integration of the models in assistive technologies, where the models can be used both to infer user cognitive function, and to optimise adaptive interfaces for prompting and assistance. This is of timely interest to practitioners involved in dementia and other age-related conditions.

Training and Development Outcomes for the Student

This project will allow the student to build a rare cross-disciplinary skillset, across the state of the art in both cognitive neuroscience and machine learning.

The student will also benefit from experiencing research which is both cutting edge scientifically, while also having direct links to a specific patient group and development of method which can be of near-term value for these end users.

References

1. Ding, Z., Lee, T. & Chan, A. S. Digital Cognitive Biomarker for Mild Cognitive Impairments and Dementia: A Systematic Review. Journal of Clinical Medicine 11, 4191 (2022).

2. Chen, Y., Gerling, K., Verbert, K. & Vanden Abeele, V. Video Games and Gamification for Assessing Mild Cognitive Impairment: Scoping Review. JMIR Ment Health 12, e71304 (2025).

3. Park, J.-H. Discriminant Power of Smartphone-Derived Keystroke Dynamics for Mild Cognitive Impairment Compared to a Neuropsychological Screening Test: Cross-Sectional Study. J Med Internet Res 26, e59247 (2024).

4. Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349, 273–278 (2015).

5. Oulasvirta, A., Jokinen, J. P. P. & Howes, A. Computational Rationality as a Theory of Interaction. in Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems 1–14 (Association for Computing Machinery, New York, NY, USA, 2022). doi:10.1145/3491102.3517739.

6. Wang, Y., Srinivasan, A. R., Lee, Y. M. & Markkula, G. Modeling Pedestrian Crossing Behavior: A Reinforcement Learning Approach With Sensory Motor Constraints. IEEE Transactions on Intelligent Transportation Systems 1–12 (2025) doi:10.1109/TITS.2025.3581693.

7. O’Reilly, R. C. & Frank, M. J. Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia. Neural Computation 18, 283–328 (2006).

8. Hazy, T. E., Frank, M. J. & O’Reilly, R. C. Computational Neuroscientific Models of Working Memory. (2021).

9. Yoo, A. H. & Collins, A. G. E. How Working Memory and Reinforcement Learning Are Intertwined: A Cognitive, Neural, and Computational Perspective. Journal of Cognitive Neuroscience 34, 551–568 (2022).

10. Ahmed, S., Lytton, W. & Crystal, H. Computational Models of Age-associated Cognitive Slowing and Memory Loss (P6-9.010). Neurology 102, 4069 (2024).


AI4BI-SIDB Projects

Our CDT has partnered with Simons Initiative for Developing Brain (SIDB) to offer additional PhD studentships focused on understanding the neurological basis of and testing new therapies for monogenic forms of autism and intellectual disability, such as Fragile X Syndrome (FXS), SYNGAP1 haploinsufficiency and CDKL5 deficiency disorder (CDD). Check out available projects below.

Find out more about SIDB

Supervisory team

Raven Hickson, Peter Kind, Marino Pagan, Angus Chadwick 

Project Background

The richness and flexibility of the rat behavioural repertoire make them well suited as models of the cognitive and social aspects of neurodevelopmental disorders (NDDs). Standard laboratory housing drastically reduces opportunities for rats to express natural behaviours, therefore vastly diminishing the behavioural repertoire available to study1. The Habitat was designed to provide an environment that more closely aligns with the ecology of the Norway rat to address this mismatch and provide opportunities to observe the development of adaptive behaviours in their functional environment. Housing in the Habitat results in observable effects on the transcriptome, and living in the Habitat has different behavioural effects in two different models as compared with living in standard housing. Habitat housing also appears to alter behaviour at a micromovement scale, as captured by RatSeq (unpublished data, method described here2). However, little is known about what aspects of the Habitat experience may contribute to these effects on behaviour. The Habitat allows the capture of multiple modes of data (RFID tracking, video, audio, etc.) for characterising the animal’s behavioural repertoire with the ultimate goal of predicting genotype. The overarching goal is to generate testable hypotheses about circuit-level differences between models of NDD and wild-types.

Project Aims

  1. Develop a data analysis pipeline that allows for integration of multimodal (RFID tracking, video, audio, etc.) individual and group-level data collected from the Habitat.
  2. Apply the latest AI and machine learning technologies to the problem of behaviour analysis on individual, dyadic and group-levels at both long and short time-scales.
  3. Build behavioural models that allow for individuals to be clustered based on behavioural patterns in the Habitat and test whether these ‘behavioural profiles’ explain variability (inter- and/or intra-) in empirical behavioural tasks.

Training Outcomes

  1. Be able to critically examine and synthesize literature from multiple fields (ex. ecology, neuroscience, psychology, information theory, machine learning) to develop novel approaches to experimental design, data collection, and analysis.
  2. Be able to apply knowledge of machine learning and/or AI technologies to behavioural data collection, integration, and analysis in multiple modalities.
  3. Be able to communicate complex data effectively to colleagues and third-party stakeholders across a range of disciplines to facilitate collaboration.

References

1Shemesh, Y., & Chen, A. (2023). A paradigm shift in translational psychiatry through rodent neuroethology. Molecular Psychiatry, 1–11. https://doi.org/10.1038/s41380-022- 01913-z

2Wiltschko, A. B., Tsukahara, T., Zeine, A., Anyoha, R., Gillis, W. F., Markowitz, J. E., Peterson, R. E., Katon, J., Johnson, M. J., & Datta, S. R. (2020). Revealing the structure of pharmacobehavioral space through motion sequencing. Nature Neuroscience, 23(11), 1433–1443. https://doi.org/10.1038/s41593-020-00706-3


Supervisory team

Emma Wood, Matthias Hennig, Adrian Duszkiewicz

Project Background

Autism spectrum disorder and intellectual disability (ASD/ID) are comorbid conditions characterized by abnormalities in early cognitive development that persist into adulthood. In many cases, their symptoms are linked to de novo or inherited mutations in genes involved in neuronal function (Manoli and State, 2021). It is currently not well understood how such genetic changes are mechanistically related to the circuit dynamics, computations, and behavior that together constitute ASD phenotypes. In this project the head-direction (HD) system, a well conserved network in mammals that computes an animal heading direction, will be analysed to address this question.

In mammals, information about heading direction is maintained by a network of neurons across multiple brain regions known as the head-direction system (Laurens and Angelaki, 2018). Head-direction neurons combine two main sources of information: signals from the inner ear and body that track self-motion, and external cues such as visual landmarks that provide orienting reference points. Recent research indicates that the brain integrates these cues in a near-optimal way by giving more weight to the more reliable source at any moment. 

Recent work from our group shows that in a rat model of Fragile X syndrome, a neurodevelopmental disorder characterised by intellectual disability with a high prevalence of autism in humans, this balance is disrupted. Their head-direction system relies more heavily on external visual landmarks and too little on self-motion. This causes the internal representation of direction to become overly dominated by the visual landmarks, losing the normal partial adjustment that reflects balanced cue integration. 

This PhD project will use computational modelling in combination with data from real neural recordings to understand why this imbalance arises. By combining concepts from neuroscience and artificial intelligence, the student will explore how changes in network connectivity or plasticity could explain the altered information weighting seen in the mutant animals.

Project Aims

  1. Analyse existing neural recordings from the head direction (HD) system in Fmr1 KO and control animals to determine neural and circuit-level differences.
  2. Build a computational model of the rat head-direction network, based on a ring attractor circuit, that can combine visual and self-motion cues, to replicate the experiments, and explore hypotheses to explain the differences observed in Fmr1 animals.
  3. Use a data-driven approach to rule in and rule out hypotheses, and to generate testable predictions for new experiments.
  4. Use the modelling approach to analyse differences in the development of the HD system in Fmr1 animals. 

Training Outcomes

This position will offer comprehensive training at the intersection of basic and clinical/translational neuroscience, including exposure to related research across SIDB. Interdisciplinary working is at the core of the centre, and the student will learn to communicate research and findings to different audiences including experimental neuroscientists, clinicians, computational scientists, and non-specialist audiences such as patients and their carers. The student will gain skills in data-driven computational modelling in neuroscience, and in data analysis in neuroscience, including processing of neural/behavioral data. The project will  promote reproducible and open science and will offer ample opportunities for training in this area.

References

Hulse, B. K., & Jayaraman, V. (2020). Mechanisms underlying the neural computation of head direction. Annual Review of Neuroscience, 43(1), 31-54.

Laurens, J., & Angelaki, D. E. (2018). The brain compass: a perspective on how self-motion updates the head direction cell attractor. Neuron, 97(2), 275-289.

Manoli, D. S., & State, M. W. (2021). Autism spectrum disorder genetics and the search for pathological mechanisms. American Journal of Psychiatry, 178(1), 30-38.

Redish, A. D., Elga, A. N., & Touretzky, D. S. (1996). A coupled attractor model of the rodent head direction system. Network: Computation in Neural Systems, 7(4), 671.


Supervisory team

Ann Clemens, Laura Sevilla-Lara, Michael Camilleri

Project partner

Wellbeing of Women

The external partner will provide feedback on the translational potential of the project as well as aid in public outreach.

Project Background

The African spiny mouse (Acomys dimidiatus) is the only reported rodent which exhibits menstruation in its reproductive physiology [1], [2], making it a highly relevant model for understanding health conditions associated with female hormonal fluctuations. Menstruation in humans is linked to psychiatric conditions including pre-menstrual dysphoric disorder (PMDD), anxiety disorders and exacerbation of symptoms related to Neurodevelopmental Disorders (NDD). An estimated 9 in 10 women and girls report mental health issues associated with menstruation [3], thus, impacting an estimated 45% of the population. 

Understanding natural behaviour and neurophysiology of Acomys provides a foundation on which to gain a comprehensive picture of the organismal biology of the menstruating rodent. Study of natural behaviour and neurophysiology in a large group-living, communal species requires sophisticated tracking tools to identify and follow individuals over time and analysis methods to extract meaningful state-dependent neurophysiology associated with natural behaviour.

Project Aims

The project will analyse behavioural recordings of group-living Acomys in their home cage. The location of individuals will be tracked along with analyses of pose to identify features of underlying behavioural and hormonal state. Extensive behavioural data are collected on the project, many of which have undergone manual annotation and preliminary testing with tracking tools. Funding from the lab will maintain the Acomys colony throughout the PhD and support additional data collection.

Aim 1: Analyses will initially compare machine learning based tracking tools for behaviour (DeepLabCut [5], SLEAP [6]) and ultrasonic vocalisations (DeepSqueak [7]) which will be validated with manual curation. Novel tracking tools and modifications of existing software may be developed in collaboration with Informatics co-supervisors. Once tracking and vocalisation detection are validated, the extracted data will be analysed to identify cyclical features behaviour across days. Hormonally modulated behaviour including mating, pregnancy and birth of offspring will be used as reference to identify behavioural features most relevant to the menstrual cycle. 

Aim 2: Verification and testing of cycle-related behavioural features will be carried out with existing and new behavioural data collection followed by tissue and vaginal smear collection to provide confirmation of hormonal state. The lab has experience in these methods which will be additionally supported by expertise of colleagues in reproductive biology. Preliminary data support this aim. 

Aim 3: Finally, the project will record brain state and behaviour across the menstrual cycle using neural recording of hormonally modulated brain regions using silicon probes. Behavioural tracking, vocalisations and neural spiking data will be aligned to determine how Acomys brain state aligns to menstrual related behaviours. The AI4Bi SIDB student would collaborate with colleagues in the lab to implement analysis of neurophysiology and behaviour using AI.

Training Outcomes

The student will develop skills in applying AI methods to track and analyse behaviour, vocalisations and neurophysiology. With the primary supervisor, the student will acquire skills in analysis of behaviour, reproductive organs and neurophysiology, designing data acquisition setups and performing tracking and analysis of social interactions and vocalisations. Tracking methodology and analysis of neurophysiology will be refined in collaboration and training with the Informatics faculty. Meetings with the external partner will provide feedback on the translational potential of the project as well as aid in public outreach.

References

[1]          N. Bellofiore and J. Evans, ‘Monkeys, mice and menses: the bloody anomaly of the spiny mouse’, J Assist Reprod Genet, vol. 36, no. 5, pp. 811–817, May 2019, doi: 10.1007/s10815-018-1390-3.

[2]          N. Bellofiore, S. J. Ellery, J. Mamrot, D. W. Walker, P. Temple-Smith, and H. Dickinson, ‘First evidence of a menstruating rodent: the spiny mouse (Acomys cahirinus)’, American Journal of Obstetrics and Gynecology, vol. 216, no. 1, p. 40.e1-40.e11, Jan. 2017, doi: 10.1016/j.ajog.2016.07.041.

[3]          Censuswide report for Wellbeing of Women, ‘Wellbeing of Women’, 2023. [Online]. Available: https://www.wellbeingofwomen.org.uk/what-we-do/campaigns/just-a-period/just-a-period-survey-results/

[4]          B. A. Fricker and A. M. Kelly, ‘From grouping and cooperation to menstruation: Spiny mice (Acomys cahirinus) are an emerging mammalian model for sociality and beyond’, Horm Behav, vol. 158, p. 105462, Nov. 2023, doi: 10.1016/j.yhbeh.2023.105462.

[5]          A. Mathis et al., ‘DeepLabCut: markerless pose estimation of user-defined body parts with deep learning’, Nat Neurosci, vol. 21, no. 9, pp. 1281–1289, Sep. 2018, doi: 10.1038/s41593-018-0209-y.

[6]          T. D. Pereira et al., ‘SLEAP: A deep learning system for multi-animal pose tracking’, Nat Methods, vol. 19, no. 4, pp. 486–495, Apr. 2022, doi: 10.1038/s41592-022-01426-1.

[7]          K. R. Coffey, R. G. Marx, and J. F. Neumaier, ‘DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations’, Neuropsychopharmacology, 2019, doi: 10.1038/s41386-018-0303-6.

[8]          L. Hantsoo and C. N. Epperson, ‘Allopregnanolone in premenstrual dysphoric disorder (PMDD): Evidence for dysregulated sensitivity to GABA-A receptor modulating neuroactive steroids across the menstrual cycle’, Neurobiology of Stress, vol. 12, p. 100213, May 2020, doi: 10.1016/j.ynstr.2020.100213.

[9]          F. D. Rocha-Almeida, H. Takemoto, and A. M. Clemens, ‘Ontogeny of tactile, vocal and kinship dynamics in rat pup huddling’, Apr. 09, 2025, bioRxiv. doi: 10.1101/2025.04.08.647436.

[10]        A. M. Clemens, H. Wang, and M. Brecht, ‘The lateral septum mediates kinship behavior in the rat’, Nature Communications, vol. 11, no. 1, Art. no. 1, Jun. 2020, doi: 10.1038/s41467-020-16489-x.

[11]        A. M. Clemens et al., ‘Estrus-Cycle Regulation of Cortical Inhibition’, Current Biology, vol. 29, no. 4, pp. 605-615.e6, Feb. 2019, doi: 10.1016/j.cub.2019.01.045.

[12]        M. Mugnaini et al., ‘Supra-orbital whiskers act as wind-sensing antennae in rats’, PLoS Biol, vol. 21, no. 7, p. e3002168, Jul. 2023, doi: 10.1371/journal.pbio.3002168.


Menses-Follicular-Luteal