Applicants to our CDT for 2026 start can choose from the following selection of projects. More projects to come Please note we are still finalising the full list of projects on offer for September 2026 start. Make sure to check back in a week to see all projects on offer. Applications now open Applications for 2026 entry are now open!Apply by 20 January if you'd like to join our CDT in September 2026. Learn how to apply Data-Driven Insights for Improving Patient Journeys in Unscheduled Care: A Comprehensive Analysis of Healthcare Services in Scotland This project will use the Public Health Scotland Unscheduled Care Data Mart (UCD)—a linked patient-level dataset covering all of Scotland since 2011—to improve the efficiency and equity of unscheduled care. We will map patient pathways across NHS 24, ambulance, emergency, acute, and mental health services using descriptive statistics, pathway visualisation, and machine learning. Predictive models will identify factors affecting outcomes, while clustering will reveal common pathways and bottlenecks. Embedded within PHS, the project will deliver actionable policy recommendations to enhance data collection, optimise patient flows, and guide equitable redesign of urgent and unscheduled care services.Supervisory teamSyed Ahmar Shah and Saturnino Luz FilhoProject BackgroundHealthcare systems worldwide face growing pressure to deliver timely, efficient care while managing rising demand and constrained resources. The COVID-19 pandemic exposed critical vulnerabilities, disrupting routine services and overwhelming urgent care. In the UK, the first lockdown caused substantial drops in hospital admissions and major backlogs in elective care, with enduring impacts on waiting times and health outcomes [1–2]. In Scotland, the pandemic created sustained excess demand, prolonged delays, and excess mortality, underscoring the need for a resilient, responsive system capable of recovering from shocks and maintaining routine care [3–4].In response, the Scottish Government launched the Urgent and Unscheduled Care Collaborative as part of its NHS Recovery Plan to modernise services, strengthen coordination between primary and secondary care, and improve patient flow through initiatives such as Hospital at Home. However, unscheduled care remains fragmented, with patient information dispersed across multiple services, impeding timely decision-making, evaluation, and resource planning. These gaps disproportionately affect socioeconomically disadvantaged and minority ethnic groups, contributing to health inequalities.Public Health Scotland’s Unscheduled Care Data Mart (UCD) now offers a unique opportunity to link patient-level data across care settings, enabling system-wide analyses to identify bottlenecks and support data-driven redesign of unscheduled care pathways [5].Project AimsThis project aims to generate data-driven insights to improve the efficiency, equity, and resilience of unscheduled care services in Scotland. Specifically, it will:1. Map end-to-end patient pathways across NHS 24, ambulance, emergency, acute, and mental health services using the Unscheduled Care Data Mart (UCD);2. Identify systemic bottlenecks, data gaps, and their impact on patient outcomes;3. Develop predictive and clustering models to uncover drivers of delays and adverse outcomes; and4. Produce evidence-based policy recommendations to support service redesign, optimise patient flows, and reduce health inequalities in unscheduled care.Data and MethodologyThis project will leverage the Public Health Scotland (PHS) Unscheduled Care Data Mart (UCD), a comprehensive, linked dataset covering the entire Scottish population since 2011. The UCD integrates patient-level data across NHS 24, the Scottish Ambulance Service, Primary Care Out of Hours, Emergency Departments, Acute Hospital Admissions, Mental Health Admissions, and Death Records, capturing approximately 2.8 million unscheduled care pathways annually. Linkage is enabled via the Community Health Index (CHI) number, allowing complete tracking of individual patient journeys from first contact to discharge or death.Descriptive and Exploratory AnalysesInitial analyses will involve descriptive statistics and data visualisation to characterise service use and patient flows across unscheduled care. Network and Sankey-style pathway visualisations will map patient transitions between services, highlighting frequent routes, points of delay, and high-demand groups. These analyses will help identify candidate variables and outcomes for subsequent modelling.Predictive ModellingWe will develop supervised machine learning models (e.g. logistic regression, random forests, gradient boosting) to predict key outcomes such as hospital admission, waiting time, and length of stay. Models will be evaluated using standard performance metrics (AUROC, accuracy, calibration) and validated via k-fold cross-validation. Regularisation will be applied to prevent overfitting, and feature importance techniques will support model interpretability.Unsupervised ClusteringClustering methods (e.g. k-medoids, hierarchical clustering) will be used to identify common patient pathways and systemic bottlenecks. Clusters will be profiled on demographics, comorbidities, and outcomes, and findings will be reviewed with PHS and clinical stakeholders to ensure real-world relevance.Implementation ApproachThe student will be embedded within PHS two days per week, enabling close engagement with data engineers, analysts, and policymakers. This will facilitate timely access to data, iterative feedback on analysis, and co-production of actionable outputs to inform service redesign.Translational Potential and Expected ImpactAlthough centred on Scotland, this project addresses challenges shared by healthcare systems worldwide—managing surges in unscheduled care demand, reducing bottlenecks, and improving system resilience. By leveraging large-scale linked datasets to map patient journeys, the project will generate transferable methods and insights relevant to other national health systems. The analytical framework—combining pathway visualisation, predictive modelling, and clustering—can be adapted to different contexts, informing service redesign internationally. Findings will be disseminated through peer-reviewed publications, policy briefs, and international networks, contributing to global efforts to optimise urgent care delivery and strengthen health system preparedness.Training and Development Outcomes for the StudentThis project will provide comprehensive interdisciplinary training spanning data science, health informatics, and applied health services research. The student will gain advanced skills in data engineering, statistical analysis, machine learning, and pathway visualisation using large-scale linked health datasets. Embedding within Public Health Scotland (PHS) two days per week will offer hands-on experience with real-world data pipelines, governance procedures, and policy translation. They will also develop transferable skills in stakeholder engagement, responsible AI, scientific writing, and presenting findings to technical and non-technical audiences. This training will prepare the student for leadership roles in data-driven healthcare innovation.References[1]: Shah SA, Robertson C, Sheikh A. Effects of the COVID-19 pandemic on NHS England waiting times for elective hospital care: a modelling study. The Lancet. 2024 Jan 20;403(10423):241-3.[2]: Shah SA, Brophy S, Kennedy J, Fisher L, Walker A, Mackenna B, Curtis H, Inglesby P, Davy S, Bacon S, Goldacre B. Impact of first UK COVID-19 lockdown on hospital admissions: Interrupted time series study of 32 million people. EClinicalMedicine. 2022 Jul 1;49.[3]: Shah, S.A., Jeffrey, K., Robertson, C. and Sheikh, A., 2025. Impact of COVID-19 pandemic on elective care backlog trends, recovery efforts, and capacity needs to address backlogs in Scotland (2013–2023): a descriptive analysis and modelling study. The Lancet Regional Health–Europe, 50.[4]: Shah SA, Mulholland RH, Wilkinson S, Katikireddi SV, Pan J, Shi T, Kerr S, Agrawal U, Rudan I, Simpson CR, Stock SJ. Impact on emergency and elective hospital-based care in Scotland over the first 12 months of the pandemic: interrupted time-series analysis of national lockdowns. Journal of the Royal Society of Medicine. 2022 Nov;115(11):429-38.[5]: Public Health Scotland. Unscheduled Care Datamart (UCD) [Internet]. Available from: https://publichealthscotland.scot/services/national-data-catalogue/national-datasets/search-the-datasets/unscheduled-care-datamart-ucd/. Accessed 3 October 2024. Identifying Emerging Zoonotic Disease Hotspots This project aims to address threats to human health associated with zoonotic and emerging infectious diseases by developing new AI-based solutions for identifying disease hotspots and supporting interventions to help mitigate their spread. Using Highly Pathogenic Avian Influenza as a motivating example, the project will develop techniques for identifying potential pandemic risk hotspots by integrating information from a variety of sources, including human population data, remote sensing data, and species observation data. The ultimate goal is to provide new tools to anticipate potential emerging disease risks and thus limit their impact on human health.Supervisory teamOisin Mac Aodha and Rowland KaoProject PartnerAnimal and Plant Health Agency (APHA)The APHA is an executive agency of the Department for Environment, Food and Rural Affairs (Defra) of the United Kingdom. They work to safeguard animal and plant health for the benefit of people, the environment, and the economy. They will provide advice about data and methods, in addition to mentorship of the student. Importantly, they will also link the work to real world use case relevant to human health.Project BackgroundIt is estimated that 60%–75% of emerging infectious diseases in humans originate from zoonotic pathogens from wildlife [1]. Increased interactions between humans and animals, driven by climate change and habitat loss and fragmentation, are intensifying the prevalence of zoonotic diseases [2]. As a result, predicting and combatting their emergence is a public human health priority [3].To mitigate the worst of these impacts, we need computational methods to be able to identify disease hotspots and pinpoint risk factors for disease spread. This is especially important in the context of Highly Pathogenic Avian Influenza, a likely candidate for a next future pandemic [4]. Driven by advances in AI [5], attempts have been made to predict likely spillover events from specific species and pathogens in specific regions [6]. However, our ability to predict these events globally across larger species groups and diseases are hampered by the lack of information available about the spatial ranges of species, their preferences for different habitat types, and changes in their propensity to come into contact with each other and humans as a result of habitat loss.Project AimsThis project aims to develop new tools, powered by recent advances in multi-modal AI, to predict regions that are most at risk for the emergence of zoonotic diseases. The goal is to provide practical solutions to benefit human health. To achieve this, the project will (i) develop spatial distribution modelling techniques to generate estimates of global biodiversity at geographical scales relevant to the circulation of pathogens and landscape management decisions, (ii) identify likely zoonotic disease hotspots using existing datasets of infection events, and (iii) develop methods to recommend land management suggestions to increase resilience to infectious disease spreading.Translational Potential and Expected ImpactThe main outputs of this project will be (i) new models and data products for spatial zoonotic disease spread risk prioritisation which will provide information to practitioners to support them in performing interventions to increase resilience to zoonotic diseases at local scales and (ii) new models and data products for estimating the spatial distributions of 100 thousand different species. The data and code generated by the project will be made available under open and permissive licences to researchers, respecting the licences of the original training data. These outputs will provide valuable information for human health as well as ecological research and further the objectives of the UK Biological Security Strategy [12].Training and Development Outcomes for the StudentIn addition to the training provided by the CDT, the recruited student will benefit by being integrated into OMA’s and RK’s groups. Upon starting, OMA will help them identify any knowledge gaps and co-develop an action plan such that they can acquire the missing necessary domain knowledge over years one and two. They will attend OMA’s larger weekly group meetings in the SoI to present work in progress and receive feedback. They will also be connected to the larger Edinburgh Infectious Diseases network, in addition to OMA and RK’s network of collaborators, which will provide opportunities for future collaboration.References[1] Jones et al., Global trends in emerging infectious diseases, Nature 2008[2] Wang et al., Emerging zoonotic viral diseases, Rev Sci Tech 2014[3] Allen et al., Global hotspots and correlates of emerging zoonotic diseases, Nature Communications 2017[4] Possas et al., Highly pathogenic avian influenza: pandemic preparedness for a scenario of high lethality with no vaccines, Front Public Health 2025[5] Guo et al., Innovative applications of artificial intelligence in zoonotic disease management, Science in One Health 2023[6] Sedricke Lapuz et al., Mapping the Potential Risk of Coronavirus Spillovers in a Global Hotspot, Global Change Biology 2025[7] Cole et al., Spatial Implicit Neural Representations for Global-Scale Species Mapping, ICML 2023[8] Lange et al., Active Learning-Based Species Range Estimation, NeurIPS 2023[9] Beery et al., Species distribution modeling for machine learning practitioners: A review, Conference on Computing and Sustainable Societies 2021[10] Daroya et al., WildSAT: Learning Satellite Image Representations from Wildlife Observations, ICCV 2025[11] iNaturalist, https://www.inaturalist.org [12] UK Biological Security Strategy, https://www.gov.uk/government/publications/uk-biological-security-strategy/uk-biological-security-strategy-html [13] Gamża et al., Using sequence data to study spatial scales of interactions driving spread of Highly Pathogenic Avian Influenza in Great Britain, arXiv 2024 From sequence to function: next-generation deep learning tools for precision gene therapies The success of gene therapies relies on the ability to precisely control the function of engineered DNA sequences. This project aims to harness artificial intelligence and machine learning to predict molecular function directly from genetic sequences, using a combination of high-throughput genotype-phenotype data and validation with our partners Trogenix. We will develop computational frameworks capable of learning the structural and contextual dependencies within biological sequences, and how these correlate with delivery of therapeutic payloads. The resulting models will enable accurate sequence-to-function predictions to optimize the design gene therapies against aggressive cancers and other challenging conditions.Supervisory teamDiego Oyarzún and Grzegorz KudlaProject PartnerTrogenixAbout the ProjectWe are seeking candidates to join our multidisciplinary team to tackle one of the most pressing challenges in gene therapy: predicting how DNA sequences determine molecular function. Gene therapy holds transformative potential for treating a range of serious health conditions, including aggressive cancers and genetic disorders previously deemed untreatable. The core of our project is to develop next-generation AI and machine learning models that can accurately predict the function of DNA sequences. Partnering with the leading biotechnology company Trogenix, we aim to create fit-for-purpose frameworks that can decipher the complex associations between DNA and molecular phenotypes, enabling precision in the design of gene therapies.We will build predictors of function trained on libraries of regulatory DNA sequences using a combination of deep learning, geometric learning, and genomic language modelling, in tandem with global optimization algorithms for robust sequence design. We aim to develop technology suitable for low-N training and context-aware that ensures that therapeutic payloads are delivered at the right dose, at the right time and in the right tissue. As a member of our team, you will gain unparalleled experience and training in a vibrant ecosystem, with access to cross-domain knowledge as well as numerous networks and resources for career growth. Computational analysis bridging multi-omic data from skin organoid culture to therapeutic targets for atopic eczema This PhD project will integrate lipidomic, metabolomic, and proteomic data from skin organoid models to uncover molecular drivers of eczema and identify candidate therapeutic targets. Under the supervision of Prof. Sara Brown (IGC, University of Edinburgh) and Prof. Mark Parsons (EPCC), the student will develop computational pipelines combining network biology, machine learning, and drug-target mapping. By bridging omics data with drug discovery resources, the project aims to define mechanistic pathways underlying skin barrier dysfunction and inflammation. Input from Eczema Outreach Support (EOS) will guide translation toward patient benefit and public communication, advancing responsible, data-driven dermatology.Supervisory teamSara Brown and Mark ParsonsProject PartnerTBCProject BackgroundEczema (atopic dermatitis) is a chronic, relapsing inflammatory skin disease affecting millions worldwide. Despite advances in immunomodulatory therapies, the molecular mechanisms driving its onset and persistence remain incompletely understood, particularly regarding lipid and metabolite dysregulation in the skin barrier. Prof. Sara Brown’s group has generated a rich multi-omics dataset, including lipidomics, metabolomics, and proteomics, derived from patient-relevant skin cells and organoid models that mimic human epidermal physiology. These data offer an exceptional opportunity to decode the molecular pathways driving eczema and to identify actionable therapeutic targets.This interdisciplinary project will leverage computational and systems-biology approaches to integrate these omic layers, define disease-associated molecular signatures, and link them to existing drug–target resources for discovery and repurposing. Collaboration with the Edinburgh Parallel Computing Centre (EPCC) ensures access to secure, high-performance computing environments. Sara Brown is a medical adviser and long-term collaborator of the patient support group Eczema Outreach Support (EOS). They will provide translational and patient-centred perspectives, supporting prioritisation of computational findings for real-world benefit and effective public engagement.Project Aims1. Integrate lipidomic, metabolomic, and proteomic profiles from skin organoids to model molecular networks underlying skin differentiation and barrier formation. eczema.2. Identify key dysregulated pathways and candidate biomarkers associated with barrier dysfunction and inflammation.3. Develop computational pipelines linking molecular signatures to drug–target interaction databases to propose therapeutic candidates.4. Establish reproducible, privacy-preserving workflows for multi-omics analysis within secure computing environments (EPCC).Data and MethodologyThe student will analyse existing multi-omics datasets generated by the Brown group, encompassing lipidomics, metabolomics, and proteomics from skin organoid models under varying experimental and disease-relevant conditions.1. Data Processing and Integration:Pre-processing will involve normalization, quality control, and batch correction across modalities. Integration strategies will include similarity network fusion, canonical correlation analysis, and deep representation learning to capture cross-layer molecular relationships.2. Network and Machine Learning Approaches:Graph-based clustering, network propagation, and representation learning (e.g., graph neural networks, multi-view autoencoders) will be explored to detect modules of co-regulated features. Biological interpretation will rely on pathway enrichment and ontology analyses.3. Drug Repurposing and Therapeutic Targeting:Using proteomic signatures, the student will perform connectivity mapping (CMap) and perturbation analysis to identify compounds that reverse disease-associated expression profiles. Network pharmacology approaches will map dysregulated proteins to known drug–target interaction graphs (DrugBank, STITCH, ChEMBL). Structural bioinformatics and docking tools may be explored for selected targets to evaluate compound–target affinity. This integrative pipeline will prioritize drug candidates for experimental validation.4. Computing Environment:Analyses will be conducted using EPCC’s secure high-performance computing resources to ensure scalability, reproducibility, and compliance with data governance frameworks. The student will have access to EPCC’s wide range of supercomputing, data science and AI systems.Deliverables:A reproducible computational pipeline, interpretable multi-omic networks, and a ranked list of candidate therapeutic targets linked to potential repurposing compounds.Translational Potential and Expected ImpactThis project unites expertise in dermatology (Brown Lab), computational science (EPCC), and translational engagement (EOS), fostering collaboration across academia, clinical research, and the third sector. By producing a scalable computational framework for integrating complex multi-omics data and linking findings to drug discovery pipelines, the project will have broad relevance for inflammatory and metabolic diseases. Outcomes will include novel mechanistic insights into skin development and eczema, prioritized therapeutic targets, and publicly accessible computational tools. EOS’s involvement ensures patient-centred prioritization and effective dissemination to lay audiences, maximising societal and international impact.Training and Development Outcomes for the StudentThe student will gain cross-disciplinary expertise spanning computational biology, systems medicine, and drug discovery informatics. They will develop advanced skills in data integration, network modelling, and high-performance computing through EPCC, as well as bioinformatics and translational research methods under Prof. Brown’s supervision. Interaction with EOS will offer experience in public engagement and third-sector collaboration. The project provides professional development in scientific communication, responsible research, and reproducible software engineering, equipping the candidate for future roles in academia, healthcare data science, or the pharmaceutical sector.ReferencesElias MS, Wright SC, Nicholson WV et al. Functional and proteomic analysis of a full thickness filaggrin-deficient skin organoid model [version 2; peer review: 3 approved]. Wellcome Open Res 2019, 4:134 (https://doi.org/10.12688/wellcomeopenres.15405.2)Brown, Sara J. Keratinocytes Listen, Respond, and Actively Contribute to Crosstalk in the Epidermal Community and Beyond. Journal of Investigative Dermatology, 2024 Volume 144, Issue 12, 2628 - 2630Budu-Aggrey, A., Kilanowski, A., Sobczyk, M.K. et al. European and multi-ancestry genome-wide association meta-analysis of atopic dermatitis highlights importance of systemic immune regulation. Nat Commun 2023; 14, 6172.Standl et al. et al. Gene-environment Interaction Affects Risk of Atopic Eczema: Population and In Vitro Studies. Allergy 2025 https://doi.org/10.1111/all.16605 Clinically actionable insights into endometriosis symptom trajectories using longitudinal self-reports, biological samples, and data from digital technologies Endometriosis is a chronic debilitating condition affecting about 10% of women of reproductive age. There is an unmet clinical need to facilitate accurate, timely diagnosis, remote symptom monitoring, and intervention assessments. The project will focus on mining the largest longitudinal multimodal datasets in endometriosis (ongoing data collection from our team as part of two large scale grants) to provide new clinically actionable insights into how self-reports, home-collected biological samples, and data from wearable sensors can facilitate endometriosis telemonitoring.Supervisory teamThanasis Tsanas, Andrew Horne and Philippa SaundersProject PartnerRoche DiagnosticsProject BackgroundEndometriosis is a chronic condition associated with debilitating pain, fatigue, and heterogeneous symptom manifestation. It affects ~10% women of reproductive age, may take ~8 years to diagnose, and symptom progression typically relies on sparse clinical assessments. There is an urgent call for action to capitalize on recent biological and technical developments to improve diagnosis and symptom monitoring [1]. We have recently proposed developing a pioneering framework to transform endometriosis assessment capitalizing on digital technologies [2].Standardised patient reported outcome measures (PROMs) where people living with endometriosis regularly self-report on their symptoms are increasingly used to monitor symptom severity progression. Similarly, regularly collected biological samples may offer insights into symptom trajectories over time. The use of digital health technologies can provide additional continuous and passively collected data, which can be mined to obtain new insights complementing clinical reports, lab tests, and PROMs. We recently reported on the largest study of-its-kind endometriosis study, demonstrating how self-reports and wearable sensors can provide longitudinal insights into symptom trajectories and objective surgical intervention assessments [3]. Specifically, we have developed new signal processing and statistical machine learning algorithms towards assessing physical activity, sleep, and diurnal rhythm variability, demonstrating how these could complement and inform clinical assessments.Project AimsThe recruited student will further extend the algorithmic framework developed in the group to mine multimodal data (PROMs, lab-based results and clinical reports, data from wearables), to provide new clinically useful insights into endometriosis towards facilitating (a) longitudinal symptom monitoring, (b) objective intervention assessments, and (c) cohort stratification, capitalizing on some of our recently collected and ongoing data collection internationally (£6m EUMetriosis project).Ultimately, the goal is to develop clinical decision support tools for endometriosis assessment that will be embedded within the NHS/EXPPECT team (led by co-supervisor Prof. Andrew Horne) and potentially translated by the industrial project partner (Roche).Data and MethodologyThe student will explore multimodal datasets from recently completed and ongoing large international studies including (i) ENDO1000, EUmetriosis and ADVANTAGE projects that the supervisory team are leading (collectively >500 people living with endometriosis, collected longitudinally, comprising PROMs, biological samples and actigraphy data), (ii) additional unique actigraphy datasets to facilitate algorithm development with external measures of ground truth (e.g. in terms of actigraphy and polysomnography data, >100 participants already collected). These are unique resources that the student will have direct access from the point they start the project: they will not need to do any ethics/data collection, and their focus will be exclusively on data analysis.They will develop and apply signal processing, time-series analysis, and multimodal data processing and information fusion algorithms to provide clinically new insights and facilitate longitudinal symptom monitoring in visceral pain. The student will need to have or develop in depth understanding of statistics, signal processing, and machine learning algorithms, including towards feature engineering, feature selection, model selection and validation. Moreover the student will need to have or develop strong programming skills in a high-level programming language (e.g. MATLAB, R, or Python). Specifically, the student will develop methods to mine the questionnaires using specific methods (such as item response assessments), the data from wearables (actigraphy-based algorithms). They will also need to develop machine learning (feature selection, statistical mapping, information fusion) algorithms to provide insights into how the different modalities contribute to assessing endometriosis symptoms such as pain and fatigue longitudinally, and how these change as a result of interventions (e.g. dietary or surgical).The student will receive additional input, if required, from colleagues who are based at partnering institutions and regarding sleep and circadian health by the Circadian Mental Health Network - Prof. Tsanas is Co-I in the network and can make introductions if required.Translational Potential and Expected ImpactThe proposed PhD project builds on strong national and international partnerships of ongoing projects that the supervisory team lead: (1) ADVANTAGE, a £4.3m grant, and (2) EUMetriosis, a £6m grant. The former is UK-based with partners at the University of Cambridge, UCL etc., and the latter is international (led by colleagues from Belgium, data collection in the UK and Croatia).The student will focus on the data analytics and there is a clear expectation in the project to assess how findings generalize on international cohorts (which in turn has enormous potential for the resulting work to land in high IF journals and impact).Training and Development Outcomes for the StudentWe will train a T-shaped researcher having an understanding of both the technical work (signal processing, machine learning, programming) and the biomedical aspects (from PPIE to engaging with the clinical team in the NHS/EXPPECT), and also the clinical translation of work through the collaboration with Roche. We envisage the PhD graduate will have developed much sought-after skills in biomedical data science and will be exceptionally well placed to pursue their career in academia or industry.References[1] P.T.K. Saunders, A.W. Horne: Endometriosis: new insights and opportunities for relief of symptoms, Biology of Reproduction, (in press), https://doi.org/10.1093/biolre/ioaf164 [2] K. Edgley, A.W. Horne, P.T.K. Saunders, A. Tsanas: Symptom tracking in endometriosis using digital technologies: knowns, unknowns and future prospects, Cell Reports Medicine, Vol. 4(9), 101192, 2023[3] K. Edgley, P.T.K. Saunders, L.H.R. Whitaker, A.W. Horne, A. Tsanas: Insights into endometriosis symptom trajectories and assessment of surgical intervention outcomes using longitudinal actigraphy, npj Digital Medicine, Vol. 8:236, 2025[4] K. Woodward, E. Kanjo, A. Tsanas: Combining deep transfer learning with signal-image encoding for multi-modal mental wellbeing classification, ACM Transactions on Computing for Healthcare, Vol. 5(1):3, 2024 Agent-Based Active Learning Model for Knowledge-Guided Molecular Design This project will develop an agent-based active learning framework that integrates human medicinal chemistry expertise into AI-driven molecular design. By embedding domain knowledge within iterative learning cycles, the project aims to create models that not only predict compound performance but also account for synthesisability and design feasibility. The resulting agent-based “human-in-the-loop” system will enable adaptive compound selection informed by both data and expert reasoning. The outcome will be an interpretable, industrially deployable tool that bridges computational discovery and experimental validation, advancing the translation of AI innovations into real-world drug discovery workflows.Supervisory teamAntonia Mey and Andrea WeisseProject PartnerBioAscentBioAscent will contribute to the success of this project through mentorship, knowledge exchange, and an industrial internship placement. The student will gain exposure to real-world discovery pipelines, compound design, and the practical constraints that shape medicinal chemistry decisions. Our scientists will provide guidance on chemical feasibility assessment, design strategy, and how AI-driven compound selection can be effectively implemented within industrial workflows.Project BackgroundDrug discovery remains a costly, time-intensive process, with high attrition rates often resulting from the synthesis of compounds that are theoretically promising but practically unfeasible. Artificial intelligence has demonstrated strong predictive capabilities in molecular property estimation, yet these models frequently overlook the implicit knowledge of experienced medicinal chemists, such as judgment on synthesisability, structural novelty, and project-specific priorities.Active learning provides a mechanism for models to iteratively query new data, focusing on the most informative compounds to test. However, traditional implementations are limited by purely statistical reasoning, which can diverge from the nuanced decision-making of human chemists and often lack explainability.This project seeks to integrate medicinal chemists’ knowledge into active learning frameworks through agent-based modelling, enabling algorithms to reason and adapt more like expert practitioners. By incorporating synthesisability assessments and heuristic rules derived from expert feedback, the system will create a more realistic, human-aligned decision process. Through experts at the University of Edinburgh and BioAscent medicinal chemistry expertise and industrial expertise, will ensure that the developed models are grounded in practical constraints and can be validated within real-world discovery pipelines.Project AimsThe project aims to design and evaluate an agent-based active learning framework that integrates human expertise into molecular design. Specific objectives include:1. Developing methods to encode medicinal chemist knowledge into AI decision-making loops.2. Implementing active learning agents capable of balancing exploration (novel structures) and exploitation (synthetic feasibility).3. Testing and validating the system using real-world compound datasets and expert-in-the-loop simulations and provide explainable reasoning for model choices.4. Demonstrating how expert-informed active learning improves both the efficiency and industrial relevance of AI-driven compound selection.Translational Potential and Expected ImpactThis project directly targets the translation of academic AI research into deployable drug discovery tools. By embedding chemist expertise into active learning systems, the resulting framework will improve compound prioritisation, reduce experimental waste, and accelerate lead optimisation. Industrial partners can apply the methodology to enhance decision-making efficiency and integrate AI seamlessly within discovery workflows. The project will produce open, interpretable models and validated industrial case studies, contributing to the broader adoption of human-aligned AI systems in medicinal chemistry and advancing the UK’s position in data-driven biomedical innovation.Training and Development Outcomes for the StudentThe student will acquire interdisciplinary expertise spanning machine learning, computational chemistry, and medicinal chemistry. Training will include advanced AI model development, cheminformatics, and experimental design principles. Through collaboration with BioAscent, the student will gain valuable industrial experience via an internship and ongoing mentorship, developing practical insight into real-world discovery pipelines. The project’s interdisciplinary nature will cultivate transferable skills in data science, research ethics, communication, and innovation management making sure the student will be prepared for a career at the interface of AI research and pharmaceutical R&D.References1. Schneider, G. (2018) Automating drug discovery. Nat Rev Drug Discov 17, 97–1132. Gorantla, R. et al. (2024) J. Chem. Inf. Model. 2024, 64, 6, 1955–19653. Ramos, M. et al. (2025) Chem. Sci., 16, 2514-25724. MacDermott-Opeskin, H. et al. (2025) 10.26434/chemrxiv-2025-zd9mr-v4 From molecular mechanisms and cell states to Real-World Evidence and back in immunological disease A major challenge and opportunity in genomic medicine is integrating data across scales to identify and link disease-causal variants, molecular mechanism and cell types/states to clinical outcomes. This is necessary for efficient drug target candidate identification, as well as investigation of heterogeneous clinical outcomes with respect to disease progression trajectories or treatment response. We have developed stat/ML methodologies, Stator and TarGene, for high resolution disease cell type/state identification from single-cell RNA-seq data and disease-causal DNA variant prioritisation from large-scale biobanks, respectively. Here, we aim to develop novel stat/ML methodologies to integrate molecular states quantification with genotype-phenotype inference for application in disease state stratification in immunological disease.Supervisory teamAva Khamseh and Sara BrownProject PartnerTBCProject BackgroundModern molecular biology, genomics and population medicine take advantage of thousands of variables at contrasting scales. Biology is only rarely conveyed by marginal variation involving a single molecule or phenotype at a time, or pair-wise correlation between two molecules or two phenotypes. We have recently developed two fully general state-of-the-art stat/ML methodologies, backed up by mathematical theory: (1) Stator, to identify cell types and states at high resolution from scRNA-seq data of disease vs healthy controls by taking advantage of high-order expression dependencies, (2) TarGene, for double-robust quantification of the of DNA variants and their interaction on disease outcomes for large-scale genotype-phenotype biobanks, with minimum bias and maximum power. TarGene can and has been used to integrated population genetics with functional genomics epistatic contributions to human traits via transcription factor mechanisms, thus prioritising candidates variants and genes to disease via molecular mechanisms. Given Stator works on the RNA scale, and TarGene on the genotype-to-phenotype scale, we now wish to integrate these data modalities together to link DNA variant to gene expression, mechanisms and disease phenotypes, which are expected to be heterogeneous for complex trait. This is then expected to lead to differences in disease trajectory, severity and treatment response.Project AimsThe first aim of the project is to develop novel stat/ML methodologies for linking disease (severity/response) genes derived from genotype-phenotype population studies to cell states and corresponding RNA expression programmes derived from scRNA-seq data. The second aim of the project is to investigate how the identified strata of cell states/genes relate to differences in disease trajectory and/or severity and/or response to treatment. The key element of this project is to prioritise causation with respect to disease-relevance of cell (sub)types and states and genotype-phenotype inference. This is important to identify genomic contributions to subpopulations of disease spectrum, in order to apply targeted therapies.Data and MethodologyStator utilises structure learning and model-free non-parametric estimators of higher-order interactions, implemented as a nextflow software, pipeline and shiny app. TarGene utilises Targeted Learning (TL), involving diverse machine learning libraries and double-robust estimation strategies, such as Targeted Maximum Likelihood Estimation. TL also applies to quantification of treatment effects on disease outcomes under different treatment interventions (for TarGene, DNA variants are the analogous of “treatment interventions” in Real-World Evidence studies). Broadly, the approach is analogous to LDscore regression which integrates GWAS summary statistics and gene expression data to investigate how genes prioritised from population studies of disease can be stratified by combinatorial gene expression in different cell (sub)types or states. The main differences are 3-fold: (1) Stator offers a higher resolution of cell (sub)types and states, with a focus on cell states, (2) TarGene can be utilised to discover new candidate variants/genes, both with and without functional genomics integration, depending on the type of input data, (3) the focus here is to identify strata of disease, and link these back to molecular differences amongst the individuals.The methodology proposed is completely general and applicable to a diversity of disease areas. In this project, we develop and apply the proposed approach in the context of immunology, taking atopic dermatitis (AD) as an exemplar. We will utilise publicly available scRNA-seq data of AD and healthy controls, as well as large-scale biobanks such as the UK Biobank, All of Us and Our Future Health.Translational Potential and Expected ImpactDrug discovery is generally an inefficient and costly process due to limited understanding of tissue heterogeneity, specifically related to identification of disease-relevant cell populations, their biological states, and the molecular mechanisms involved. Beyond initial discovery, treatments are often only successful in subpopulations of patients. There is therefore a need to prioritise causal variants, genes and cell types/states leading to disease trajectories and treatment response for optimal development of drug targets for various patient subpopulations who would otherwise respond differently to various treatments. The focus here is on quantification of heterogeneous genomic contribution to disease outcome and/or treatment response.Training and Development Outcomes for the StudentOn the methodological front of this cross-disciplinary project, the student will develop technical skills in development and application of rigorous statistical inference (semi-parametric efficiency theory) and machine learning techniques, throughout the PhD and by attending MSc levels courses in these areas and beyond. In application of biomedical data at various scales, on the biomedical front, the student will develop a deep understanding of molecular biology via scRNA-seq, genotype-phenotype inference in large-scale biobanks and Real-World Evidence generation. The student will further develop essential cross-disciplinary and translational communication with access to a supervisory team with diverse expertise ranging across AI/ML, biostatistics and molecular biomedicine.References1. Review article: “A brief history of human disease genetics”, Nature, 2020, https://doi.org/10.1038/s41586-019-1879-7 2. Review article: “Refining the impact of genetic evidence on clinical success”, Nature, 2024, https://doi.org/10.1038/s41586-024-07316-0 3. Review article: “Applications of single-cell RNA sequencing in drug discovery and development”, Nature reviews Drug Discovery, https://doi.org/10.1038/s41573-023-00688-4 4. Stator: “High order expression dependencies finely resolve cryptic states and subtypes in single cell data”, EMBO Molecular Systems Biology, 2025, https://doi.org/10.1038/s44320-024-00074-1 5. TarGene: “Semiparametric efficient estimation of small genetic effects in large-scale population cohorts”, Oxford Biostatistics, 2025, https://doi.org/10.1093/biostatistics/kxaf030 6. TarGene application: “Epistatic contributions to human traits via transcription factor mechanisms”. medRxiv, 2025, https://doi.org/10.1101/2025.09.28.25336826 7. “Atopic Eczema: How Genetic Studies Can Contribute to the Understanding of this Complex Trait”, Journal of Investigative Dermatology, 2022, https://doi.org/10.1016/j.jid.2021.12.020 8. “Multi-omic triangulation identifies molecular candidates of atopic dermatitis severity”, merRxiv, 2025, https://doi.org/10.1101/2025.08.04.25332125 AI-Based Design and Cell-Free Synthesis of Next-Generation Phage Therapeutics This interdisciplinary project will develop an AI-based pipeline to engineer bacteriophage specificity, moving beyond discovery to active design. Leveraging the "Phrameworks" cell-free assembly platform developed with the external partner, Biophoundry, the student will train ML models to identify highly conserved regions on the bacterial surface. Using AI-based protein design methods, the student will design novel Receptor Binding Domains (RBDs) for the phage tail fibre, which will be assembled using Biophoundry’s proprietary "Trinity" technology for experimental validation. This computational-experimental cycle aims to develop effective antibacterial therapies by generating synthetic phages with a broad host range and a reduced risk of resistance evolution.Supervisory teamChris Wood and David GallyProject PartnerBiophoundryBiophoundry will serve as the industrial co-supervisor, leveraging their pioneering expertise in phage engineering and cell-free synthetic biology. Their primary contribution, aside from expertise provided through their co-supervision, will be providing access to their proprietary PHAX Foundry platform. This end-to-end platform unifies AI-driven design with cell-free production, and they will supply proprietary genomic and structural data from their T7 and K1f model phage systems to guide the student's machine learning model development. They will also provide intellectual and scientific input into both the generative AI models and cell-free synthesis methodologies, alongside supervisory, focusing on the translational pathway and commercial viability of the research for developing engineered phage assets. Project BackgroundAntimicrobial resistance (AMR) is a global health crisis, demanding innovative therapeutic solutions. Phage therapy, the use of viruses to kill bacteria, is a powerful alternative to traditional antibiotics, yet its narrow host range and susceptibility to bacterial resistance mechanisms limit its clinical use. Our approach tackles this limitation by developing a generalisable, non-host-dependent design and manufacturing platform, based on cutting edge protein design methods and sector leading cell-free phage assembly methods. The student will design novel receptor binding domains (RBDs) and test them in collaboration with Biophoundry using their “Trinity” platform that facilitates the rapid exchange of RBDs. The goal is to establish a predictive AI-based phage design pipeline, guided by evolutionary data to ensure sustained therapeutic efficacy against diverse pathogens.Project Aims1) Develop deep learning models to predict the binding affinity and killing efficacy of phage Receptor Binding Domains (RBDs) against diverse bacterial strains.2) Design a generative ML model to propose novel RBD amino acid sequences optimised for broad-spectrum killing, targeting functionally constrained regions on bacterial cell-surface proteins.3) Experimentally validate the best ML-designed RBDs using the Trinity engineering system and the Phrameworks cell-free assembly platform in collaboration with Biophoundry.Data and MethodologyThe project will draw on pre-existing data from Biophoundry and the supervisory team, as well as publicly available data, including:1) Genomic and structural data for model phages T7 and K1f.2) Genomic data from large, diverse panels of K. pneumoniae (100 strains) and Uropathogenic E. coli provided by the Gally lab.3) Results from a 100x100 cross-infection experiment mapping host range provided by Biophoundry, which will be augmented by synthetic training data already identified by Biophoundry’s PHAX pipeline.The student will develop a robust ML methodology in the following stages:Dataset Generation - This involves creating sophisticated representations of bacterial receptor targets and phage RBDs using techniques such as structural prediction/modelling, sequence analysis and protein language model embeddings. This will be combined with the cross infection data provided by Biophoundry to create the initial dataset.Predictive Modelling - Machine learning will be trained to predict bacterial receptor targets utilising data provided by Biophoundry, augmented with publicly available data.RBD Design - A generative model will be designed to propose novel RBD sequences that optimise for target binding, host range, and compatibility with the Trinity engineering platform.Wet-Lab Validation - The student will work closely with Biophoundry to synthesise and validate the top ML-designed RBDs. These will be integrated into the phage scaffold and assembled via the Phrameworks cell-free system. Efficacy and host range will be assessed using high-throughput Plaque Assays against the target bacterial panels, performed with the Gally Lab. Data generated experimentally will be fed back into the design pipeline to improve the models.Translational Potential and Expected ImpactThis work will deliver a high-value, translational platform for the rapid, intelligent design and generalisable manufacture of bacteriophage therapies. By shifting phage development from empirical discovery to precision, ML-guided engineering, we offer a scalable solution to the AMR crisis. The focus on conserved receptor targets yields broad-spectrum agents, while the cell-free assembly platform eliminates host-dependence in manufacturing. The external partner, Biophoundry, is perfectly positioned to translate the intellectual property and validated ML pipeline into commercial drug assets, ensuring immediate societal and economic impact in infectious disease treatment.Training and Development Outcomes for the StudentThe student will receive truly interdisciplinary training, becoming an expert in the convergence of AI and synthetic biology. Core ML Skills: Advanced training in deep learning architectures (GNNs, Transformers), protein sequence modelling, and generative design. Core Biomedical Skills: Expertise in synthetic biology (cell-free systems, phage engineering), molecular virology, and microbiology, including bacterial resistance mechanisms. The placement/collaboration with Biophoundry will provide invaluable experience in the drug development lifecycle, commercialisation strategy, IP management, and industrial-scale project delivery, making the student highly competitive for both academic and industrial careers.Referenceshttps://doi.org/10.1073/pnas.2313574121 https://doi.org/10.1002/pro.5148 https://doi.org/10.1021/acssynbio.2c00244 https://doi.org/10.1038/s41586-025-09429-6 https://doi.org/10.1126/sciadv.adt6432 https://doi.org/10.1101/2025.09.12.675911 Unlocking the Image: Enhancing the use of Medical Scans for Brain Health Prediction through Radiology Report Analysis This project explores the use of radiology reports, combined with medical imaging on clinical data. By extracting more nuanced information from free text, this will enable richer phenotyping for research purposes, help identify referral reasons (improving generalisability) and improve image quality assessment. Additionally, through integrating Vision-Language Models, we will improve the prediction of brain health conditions. We will use data collected during general healthcare (facilitating future integration into clinical workflows), and process it within Trusted Research Environments (TREs) to ensure patient privacy.Supervisory teamMichael Camilleri, Beatrice Alex and Grant MairProject PartnerPublic Health ScotlandProject BackgroundThe use of health data in research is often constrained to structured entries (e.g. ICD codes [1]), while most of the qualitative and nuanced understanding of the patient health is recorded in free-text, such as GP notes or radiology reports [2].At the same time, the recent successes in Natural Language Processing (NLP) [3] provide a relatively untapped opportunity to extract value from such unstructured data. Automated processing of clinical notes can help ascertain existing conditions [4] or, as proposed herein, identify biases in the data [2] which can feed into improving the robustness of AI tools applied to health data. Additionally, integrating language with visual models promises to improve performance of downstream tasks such as disease classification and prediction [5].This is accelerated by the rising availability of Trusted Research Environments (TREs) [6], with the aim of opening up clinical data for research purposes, ensuring that any methods developed can be more easily integrated into clinical workflows. Chief among these is the Brain Health Data-Pilot (BHDP) [7], within the Scottish National Safe Haven (NSH) with more than 1.2 million brain scans and linked Electronich Health Records (EHRs) from across Scotland.Project AimsThe primary goals of this project will be to process free-text radiology reports accompanying medical images (MRI/CT) to: (a) extract key conditions, artefacts and image quality features, (b) identify the reason for the scan (why the subject was referred to have a scan), and (c) as a stretch goal, integrate with an Imaging module as a Vision-Language-Model (VLM) [5] to improve prediction of brain health conditions (e.g. Dementia).Data and MethodologyThis project uses clinical datasets, which provide orders of magnitude more data and heterogeneity than publicly available sources [7], while exhibiting novel research opportunities due to their 'raw' nature. Access to the TRE (ensuring patient privacy) will be facilitated through having eDRIS as our external partner for the Scottish NSH (BHDP). Furthermore, there is scope for using consented data (e.g. Generation Scotland [10] or UK Biobank) as an alternative source of data to complement the above.MethodsThe project has 3 work packages:1. Enrich Research Value of Radiology reports: The Language Technology Group [8] developed a rule-based system, EDIE-R [4] to identify 24 brain-scan phenotypes. This will provide a starting point to develop newer neural models (e.g. Transformers [9] or Large-Language Models [3]) to extract relevant concepts. Using neural models will also allow us to extend to other relevant phenotypes, and also to image quality metrics (e.g. movement artefacts).2. Understanding Scanning Bias: The next step is to infer the reason for the scan. This will involve eliciting signal from the clinical history portion of the report. Furthermore, this may be missing in some scans, and hence will necessitate learning a mapping from the radiologist report to the referral context in a semi-supervised setting, allowing reasoning about selection bias in scanned individuals.3. Improving Prediction of Brain Health: This can be extended to disease progression models, incorporating condition codes [11] or MRI/CT scans themselves (using a VLM [5]) to improve prediction of brain health conditions e.g. Dementia. Translational Potential and Expected ImpactThe use of clinical data and input from domain experts (and the project partner) will ensure that the aforementioned systems can more easily be deployed in clinical workflows. Concretely, this work will:1. Develop systems to accelerate health research by increase the value of free text reports, and which can, in clinical settings, summarise patient trajectory for new consultations.2. Provide a path to analysing biases in referrals to scanning, improving fairness and trusthworthiness of predictive models for diseases.3. Develop and advance TRE functionality in collaboration with eDRIS.Training and Development Outcomes for the Student* Developing skills in applying/implementing deep learning for NLP and medical imaging* Data Science for curation of raw data within constrained environments (TREs)* Experience in using and developing the emerging field of TREs, including ethics and governance procedures.* Experience in working with real-world health data and collaborating with clinical domain experts* Experience in Patient and Public Involvement to shape the direction of research.References- [1] International Statistical Classification of Diseases and Related Health Problems. https://www.who.int/standards/classifications/classification-of-diseases - [2] Tang, A.S., Woldemariam, S.R., Miramontes, S. et al. "Harnessing EHR data for health research". Nat Med 30, 1847–1855 (2024). https://doi.org/10.1038/s41591-024-03074-8 - [3] Artsi Y., Klang E. et al. "Large language models in radiology reporting - A systematic review of performance, limitations, and clinical implications". Intelligence-Based Medicine, 12 (2025), ISSN 2666-5212, https://doi.org/10.1016/j.ibmed.2025.100287 - [4] Alex, B., Grover, C., Tobin, R. et al. Text mining brain imaging reports. J Biomed Semant 10 (Suppl 1), 23 (2019). https://doi.org/10.1186/s13326-019-0211-7 - [5] Li X., Li L. et al. "Vision-Language Models in medical image analysis: From simple fusion to general large models". Information Fusion, 118 (2025), ISSN 1566-2535,https://doi.org/10.1016/j.inffus.2025.102995 .- [6] Trusted Research Environments. https://www.hdruk.ac.uk/access-to-health-data/trusted-research-environments/ - [7] Camilleri M., Gouzou D. et al. "A large dataset of brain imaging linked to health systems data: a whole system national cohort" (in preparation).- [8] Language Technology Group (website) https://www.ltg.ed.ac.uk/ - [9] Tay Y., Dehghani M., et al. "Efficient Transformers: A Survey". ACM Comput. Surv. 55, 6, Article 109 (June 2023), https://doi.org/10.1145/3530811 - [10] Generation Scotland https://genscot.ed.ac.uk/ - [11] Shmatko, A., Jung, A.W., Gaurav, K. et al. Learning the natural history of human disease with generative transformers. Nature (2025). https://doi.org/10.1038/s41586-025-09529-3 Addressing patient mortality in hemodialysis via AI applied to metabolomics and material science Patients undergoing hemodialysis (HD) exhibit significantly higher mortality rates compared to those who had kidney transplants. This disparity is largely attributed to the accumulation of uremic toxins that standard HD treatments fail to completely remove. Despite this acknowledged issue, systematic identification of specific uremic toxins impacting mortality in patients receiving maintenance HD has not been effectively addressed. This project integrates AI, metabolomics, and biomedical materials science to accelerate the identification of key metabolites and biological pathways involved in the mortality of dialysis patients and to discover biocompatible filtering materials that could enhance HD efficacy in toxin removal. By leveraging data from existing literature and collaborations, this synergistic approach seeks to elucidate the mechanisms behind elevated mortality in HD patients and develop solutions to mitigate these risks, with the ultimate goal of reducing patient mortality.Supervisory teamGrazia De Angelis, Karl Burgess and Bryan ConwayProject PartnerKidney Research UKProject BackgroundApproximately 2 million individuals globally suffer from kidney failure, necessitating treatment options such as transplantation and dialysis. Transplantation is limited by donor availability, forcing many to rely on HD. Whereas transplant recipients exhibit approximately 80% survival rates five years post-procedure, those undergoing HD have less than a 50% chance of surviving the same period due to what's known as “residual uremic syndrome.” This condition results from the incomplete removal of certain uremic toxins during HD, significantly contributing to the higher mortality observed in these patients [1]. Current HD technologies rely on membranes which are limited by size, thus unable to effectively eliminate larger uremic toxins from the patient's bloodstream. This approach lacks precision and effectiveness as it is designed on small molecules like urea and fails to address other, more harmful toxins.Project AimsOur research aims to enhance HD treatment effectiveness and reduce mortality rates through a multidisciplinary strategy. Initially, we must identify metabolites linked to adverse effects, leveraging metabolomics combined with AI to uncover key molecules influencing kidney failure patient outcomes. Prior studies show inconsistent results, highlighting the complexity of metabolite impacts on patient mortality and emphasizing the need for deeper investigation. We plan to use an integrated metabolomics and AI approach to better understand these mechanisms, paving the way for future comprehensive studies and the development of materials tailored to remove toxic metabolites. AI will play a crucial role in rapidly advancing these objectives, tackling the vast scope of toxins and potential materials.Project ActivitiesAs a PhD student on this project, your primary role will involve:Utilizing data from landmark studies carried out over the past decade, enhancing your understanding of clinical outcomes in hemodialysis.Engaging in molecular simulations to assess databases containing thousands of porous materials, focusing particularly on Covalent Organic Frameworks, to identify those capable of efficiently removing harmful toxins from the bloodstream.Applying sophisticated machine learning techniques to screen these materials on a large scale, a methodology currently being developed by our Engineering group.Synthesizing and/or selecting optimal materials based on the unique properties required for effective toxin removal, thereby directly contributing to the design of more efficient and patient-centered hemodialysis treatments.Collaboration with Kidney Research UK and access to their NURTuRE biobank provides a rich, real-world context for your research, offering the opportunity to validate your findings against an extensive range of patient data. Translational Potential and Expected ImpactThis project not only aims to lead to significant academic contributions but also holds the potential to translate into real-world clinical applications that could drastically reduce patient mortality. We expect this project to lie the basis for interdisciplinary research between the involved groups and provide evidence for larger studies.Training and Development Outcomes for the StudentThrough this project, the student will gain invaluable skills in both the practical and theoretical aspects of biomedical research. They will develop proficiency in metabolomics and artificial intelligence techniques, learning to interpret complex biological data and to apply machine learning algorithms for real-world applications. Additionally, the student will enhance their capabilities in molecular simulations and materials science, crucial for addressing clinical challenges. Through collaboration with external partners, such as Kidney Research UK, and interdisciplinary teamwork, they will also improve their communication and project management skills. This comprehensive training will prepare them for a successful career in bioinformatics and materials engineering.References[1] The Kidney Project, University of California San Francisco, https://pharm.ucsf.edu/kidney [2] S. Al Awadhi et al, A Metabolomics Approach to Identify Metabolites Associated With Mortality in Patients Receiving Maintenance Hemodialysis, Kidney Int Rep 2024 9, 2718–26.[3] S. Kalim et al., A Plasma Long‐Chain Acylcarnitine Predicts Cardiovascular Mortality in Incident Dialysis Patients, J American Heart Association 2, 2013.[4] Hu, J.-R., et al Serum Metabolites and Cardiac Death in Patients on Hemodialysis, Clin J Am Society of Nephrology 14(5): 747-749, 2019.[5] https://nurturebiobank.org/ , visited on 4th October 2025.[6] T. Fabiani et al., In silico screening of nanoporous materials for urea removal in hemodialysis applications, Phys. Chem. Chem. Phys., 2023, 25, 24069.[7] REDIAL, redefining hemodialysis with data-driven materials innovation, project https://www.suspromgroup.eng.ed.ac.uk/redial [8] Zarghamidehagani and De Angelis, Machine learning-driven computational screening of covalent organic frameworks for gas separation applications, Separation and Purification Technology, 2025, 377, 134358.[9] Zarghamidehagani et al., Chemical engineering contribution to hemodialysis innovation: achieving the wearable artificial kidneys with nanomaterial based dialysate regeneration, Physical Sciences Reviews, 2025, 10(3), pp. 279–299 AI for Enhanced Decision-Making for Imaged Abnormalities of the Pancreas Early detection of pancreas cancers and pre-malignant lesions offers the best chance of cure for pancreatic cancer. Currently, patients at risk are managed through frequent imaging and clinical assessment—processes that are manual, time-consuming, and prone to error. This project will develop an AI system integrating imaging models and clinical data to detect early malignant transformations in the pancreas. Supervisory teamEleonora D’Arnese, Amir Vaxman and Damian MoleProject PartnerNHS LothianProject BackgroundAbnormalities in the pancreas detected on CT carry a risk of malignant transformation and require long-term surveillance. The incidence of such referrals is rising rapidly due to the increased number of scans done for other reasons, placing increasing demand on skilled specialists who must manually compare scans over time. This process is labour-intensive, costly, and prone to error: false negatives can delay treatment or allow cancers to go undetected, while false positives may lead to unnecessary surgery. Moreover, patients with low- or negligible-risk abnormalities are subjected to prolonged and expensive monitoring, impacting their well-being. Early detection of malignant transformation of abnormal areas could substantially improve survival.Project AimsThe primary goal of this project is to develop an AI-based image analysis and decision-making augmentation solution for the surveillance and early detection of cancers or pre-cancers in pancreas. This project will create a new tool that, starting from routinely acquired images and clinical data, will monitor, analyse, and inform decision-making.Training and Development Outcomes for the StudentThe student will train in: AI for scientific computation, medical image processing, geometry processing, and clinical imaging diagnostics. The research will begin by sandboxing training examples (that could be synthetic), to develop the algorithms, progressing to exposure to the clinical dataset, to further develop the algorithm. Concrete development outcomes are: 1) Acquisition of fundamental AI, scientific computation, and diagnostic skills. 2) Create a mature proof-of-concept for pancreatic abnormality analysis. 3) Develop an algorithm using real-world clinical data to meet standardized diagnostic metrics. Machine learning driven clinical prediction models using multimodal data for robot-assisted surgical informatics With the remarkable progress in Artificial Intelligence (AI), particularly in the field of Transformers, machine learning-driven clinical prediction models (CPM) are gaining prominence in the literature [1]. However, most of these models are yet to be applied in practice for real-world clinical decision-making. To translate these tools’ real-world applications, they need to be accessible, adaptable, and actionable. In this project, we will develop usable models and assess their translation potential to decision-making in robot-assisted surgery (RAS). Recent advances in RAS have revolutionized healthcare, and allowed the collection of real-time pre-, post- and during-surgery data that can assist critical decision-making around when these surgeries should be offered and what potential complications might arise from these surgeries. A usable predictive model will facilitate this and lead to safer decision-making, reducing the burden on individuals and the healthcare system.Supervisory teamSohan Seth and Ewen HarrisonProject PartnerIntuitiveProject BackgroundRecent years have witnessed significant progress in machine learning driven clinical prediction models [1]. These models are shown to be robust, accurate and well calibrated on various publicly available benchmark datasets, e.g., MIMIC-IV. Using these models in practice, however, is not straightforward, and additionally requires them to be accessible, adaptable, and actionable, such that they are equipped to deal with multimodal data under competing risks, predicting various outcomes of interest simultaneously in real-time while presenting their decisions in a human-interpretable manner for guiding practical decisions under various resource and safety constraints. This is challenging and particularly difficult in high-stakes environments such as lifesaving surgeries. Therefore, these models are yet to be applied to clinical practices for decision-making widely. Recent technological advancements have witnessed the advent on robot-assisted surgeries making them safer and proving real time measurements paving the way for data-driven decision-making. But critical decisions remain to made around the section of surgery in the context of whether the benefit from surgery outweighs to complications for postoperative care. Having a better sense of factual and counterfactual situations over multiple outcomes and constraints provide an holistic view of treatment that helps with more informed decision-making at an individual level, and resource allocation at a healthcare level.Project AimsWe aim to develop predictive models that are accessible, i.e., the model’s decision is understandable to the end-users and traceable to features responsible for the decision, adaptable, i.e., the model can be transferable to different populations relatively easily, and it can be adapted to a changing environment, and actionable, i.e., the model can integrate various data sources as potentially multiple resolutions, and can provide real-time outcome from longitudinal data. We aim to assess in model in uncovering the mechanisms that drive complications, resilience, and recovery, or to test whether different surgical approaches truly minimise physiological stress across diverse patient groups.Translational Potential and Expected ImpactThe project develops machine learning driven clinical prediction model to make these models usable. The project aims to assess the translation of a recently developed method into a real-world application. The current technology is at a Technology Readiness Level 3, and we expect it to explore its performance on real data beyond publicly available benchmarks to potentially move it towards Technology Readiness Level 4. However, we expect the project to evaluate performance beyond accuracy and calibration, and establish the method on various usability metrics based on transparency, traceability, accessibility, privacy, adaptability, etc. We expect the project to push the boundaries of translation-ready clinical predictive models and set standards in data and methods practices in healthcare informatics. The successful completion of the project will enable clinicians to make real-world decisions around life-saving surgeries and post-operative care.Training and Development Outcomes for the StudentWe expect the project to train the prospective student in cutting-edge AI tools and health informatics. The project requires developing machine learning models and deploying these models in clinical decision-making. The project also involves an understanding of the clinical variables, pre-processing and interpretation. The student will be based in the Data Science Unit at the School of Informatics. DSU hosts a diverse range of researchers working in various disciplines, including health, social science, chemistry, geosciences, etc.. This allows the student a diverse exposure. The student will also be based in the Surgical Informatics group, hosting researchers with a range of clinical and health informatics expertise, allowing the student to learn from a different discipline besides informatics.References[1] https://doi.org/10.1038/s41586-025-09529-3 AI-Driven Multimodal Alignment for Predicting Treatment Outcomes in Renal Cell Carcinoma Immune checkpoint inhibitors (ICIs) have markedly improved survival for several cancers, but safe, effective deployment in the NHS requires better tools and data to optimise use and manage toxicities. A major unmet need is robust biomarkers that distinguish responders from non-responders, predict immune-related adverse events and guide personalised therapy.This project addresses that gap by integrating RNA sequencing, whole-exome sequencing and immunofluorescence imaging to predict treatment outcomes and discover novel biomarkers. It will develop advanced AI-based multimodal learning methods to align and combine these diverse data types, aiming to deliver a more accurate, comprehensive picture of tumour–immune interactions.Supervisory teamAjitha Rajan, Siddarth N. and Alexander LairdProject PartnersNHS Lothian, Francis Crick Institute and University of CalgaryThe external partner will provide patient data includign RNA seq, Exome Seq and immunohistochemistry data that is accompanied by high quality clinical data with the treatment given. NHS Lothian and Francis Crick Institute will also provide clinical expertise in understanding the data modalities, guidance in alignment of modalities, interpretation and validation od results from the AI models. Javier Alfaro at the University of Calagry will help with preprocessing raw sequencing data and helpign interpret RNA seq and Exome seq data.Project BackgroundRenal cell carcinoma (RCC) is the most common form of kidney cancer and the eighth most prevalent cancer in the UK. Immunotherapy, which activates the immune system to target cancer cells, has transformed outcomes in several malignancies, including melanoma, lung cancer, and metastatic RCC (mRCC). However, most patients with mRCC ultimately develop intrinsic/acquired resistance to ICI and die of their disease. Moreover, immune-related adverse events (irAEs) can restrict the safe use of immune checkpoint inhibitors, affecting treatment efficacy and patient quality of life.Identifying robust biomarkers that predict treatment response, resistance, and irAEs remains an urgent and unmet clinical need. Addressing this gap would enable more precise patient stratification, improve therapeutic outcomes, and optimise allocation of healthcare resources.This project will develop novel AI innovations in multimodal data integration and modelling using in-depth patient profiles containing RNA sequences, Whole Exome sequencing, and immunofluroscence. The key contribution will be understanding the role of these modalities for immunotherapy response, what information they convey, and how they align with each other to discover biomarkers and predict treatment outcomes.Understanding and analysing patient data will involve close collaboration with Dr. Alex Laird, a consultant urological surgeon at the Western General Hospital and clinical researchers at Francis Crick Institute.Project Aims- Biomarker Discovery and Treatment outcomes (Progression Free Survival)- Assess Unimodal efficacy- Multimodal alignment and efficacy for biomarkers- Uncertainty quantification in the unimodal and multimodal settings for prediction- Evaluation using patient dataData and MethodologyData -Data from 122 patients for all three modalities (incl. clinical and blood) is already available to use. Additional data sources (e.g. through Glasgow hospitals) will also be explored in the first year of the project. Supervision team has experience in data sharing agreements and accessing NHS data for renal cancer (prior H2020 project: KATY). The PhD will also use the existing multi-modal data on Renal cell carcinoma, collected as part of the Manifest project (funded by MRC) led by Francis Crick Institute (that Dr. Amy Strange, Prof. Ajitha Rajan and Dr. Alex Laird are a part of) – multi omics data, histopathology and clinical data -- to design predictive AI models for biomarker discovery, predicting treatment outcomes. In particular, the MANIFEST project has access to the following clinical trial dataRAMPART – Sample size 551. The study looks at two new immunotherapy treatments. The aim was to find out whether taking one drug (durvalumab) or a combination of two drugs (durvalumab and tremelimumab) for one year can prevent or delay kidney cancer from coming back compared to the current standard of care (active monitoring after surgery).PRISM [5] – Sample size 192. The aim of the PRISM study is to assess whether less frequent dosing of ipilimumab (12-weekly versus 3-weekly), in combination with nivolumab, is associated with a favourable toxicity profile without adversely impacting efficacy.MITRE [6] – Sample size 81. The MITRE study explores and validates a microbiome signature in a larger scale prospective study across several different cancer types.The Cancer Genome Atlas Program (TCGA) will also be used.Methodology -Research challenges to be addressed in this project are as follows:1. Unimodal: Explore SOTA methods for prediction and review results across modalities (late fusion) to:- Understand which modalities provide the best predictors- Understand the relationships between modalities- Understand the optimal combination and ordering of a pipeline of models which could be mapped to clinical care (and collection of samples).2. Integration: Explore data-integration to see where and how modalities can be combined to provide further insight (via intermediate fusion). Situations involving varying amounts of overlap across modalities will require developing novel approaches. For example, similar modalities such as omics might allow earlier integration (i.e., jointly learning representations), whereas other cases involving distinct image modalities may be integrated later. A hierarchy of integrations can ensure fusion between modalities is effective at each level.3. Alignment: Evaluate the extent information can be aligned between different modalities and whether such alignment enhances the prediction of treatment outcomes. This involves using information-theoretic measures to identify and quantify alignment, investigate methods to merge aligned information at different stages, and learn such alignment from scratch. Techniques such as AJIVE and latent-space exploration will be employed to assess and interpret cross-modal relationships.UQ: Develop metrics to associate model predictions with a degree of confidence for clinical correlations. Explore Stochastic Weight Averaging (SWAG) and EpiNets as initial techniques.Evaluation: Conduct an evaluation study with clinicians to evaluate the accuracy of the model outputs and explanations for treatment outcomes with patient data from NHS Lothian and MANIFEST cohorts.Translational Potential and Expected Impact- A suite of unimodal models to predict immunotherapy response and toxicities- Multimodal fusion for modular integration of unimodal immunotherapy response and understanding modalities with shared and disjoint information on immunotherapy response.- Uncertainty quantification as a human oversight measure for confident predictionsThe project has immense scope for impact in the clinic and industry through the network of clinical partners in a recently completed H2020 project, KATY and other NHS collaborators and vast network of industry partners through the MANIFEST project.Training and Development Outcomes for the StudentThe PhD student will develop expertise in multimodal AI for predicting renal cancer treatment response, gaining skills in integrating imaging, omics, and clinical data using deep learning, multi modal alignment and uncertainty quantification of AI methods. They will learn robust model development, validation, and reproducibility practices while ensuring ethical and responsible use of patient data. Domain knowledge in renal cell carcinoma, biomarkers, and treatment mechanisms will be strengthened. The student will enhance scientific communication, collaboration with clinicians, and project management abilities, contributing to multidisciplinary research outputs and publications, preparing for careers in academia, healthcare AI, or precision oncology.References1. Maddox, W.J., Izmailov, P., Garipov, T., Vetrov, D.P. and Wilson, A.G., 2019. A simple baseline for bayesian uncertainty in deep learning. Advances in neural information processing systems, 32.2. Osband, I., Wen, Z., Asghari, S. M., Dwaracherla, V., Ibrahimi, M., Lu, X., & Van Roy, B. (2023). Epistemic neural networks. Advances in Neural Information Processing Systems, 36, 2795-2823.3. Feng, Qing, et al. "Angle-based joint and individual variation explained." Journal of multivariate analysis 166 (2018): 241-265.4. Qu, Linhao, et al. "Multi-modal data binding for survival analysis modeling with incomplete data and annotations." International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer Nature Switzerland, 2024.5. Buckley HL, Collinson FJ, Ainsworth G, Poad H, Flanagan L, Katona E, Howard HC, Murden G, Banks RE, Brown J, Velikova G, Waddell T, Fife K, Nathan PD, Larkin J, Powles T, Brown SR, Vasudev NS. PRISM protocol: a randomised phase II trial of nivolumab in combination with alternatively scheduled ipilimumab in first-line treatment of patients with advanced or metastatic renal cell carcinoma. BMC Cancer. 2019 Nov 14;19(1):1102. doi: 10.1186/s12885-019-6273-1. PMID: 31727024; PMCID: PMC6854710.6. Thompson NA, Stewart GD, Welsh SJ, Doherty GJ, Robinson MJ, Neville BA, Vervier K, Harris SR, Adams DJ, Dalchau K, Bruce D, Demiris N, Lawley TD, Corrie PG. The MITRE trial protocol: a study to evaluate the microbiome as a biomarker of efficacy and toxicity in cancer patients receiving immune checkpoint inhibitor therapy. BMC Cancer. 2022 Jan 24;22(1):99. doi: 10.1186/s12885-021-09156-x. PMID: 35073853; PMCID: PMC8785032. BRAID: Breast Radiological AI-Integrated Cancer Diagnosis – A Clinician-Centric Framework Deep learning has demonstrated strong performance in medical imaging, yet its clinical adoption remains limited due to the opaque, black-box nature of many models. In high-stakes settings like cancer diagnostics, accuracy alone is insufficient; clinicians need clear, interpretable explanations to ensure patient safety and build confidence in AI-assisted decisions. Therefore, this project, by specifically focusing on breast cancer, will be developing a clinician-driven AI framework for breast cancer diagnosis. It will develop a transparent, explainable, and robust solution to support effective, safe, and trustworthy decision-making in real clinical settings.Supervisory teamAjitha Rajan, Eleonora D'Arnese and Rishi RamaeshProject PartnersNHS Lothian and Erasmus MC (Rotterdam)NHS Lothian will support this project by helping gain access to anonymised patient mammograms and breast MRI data, together with associated clinical and demographic information, through Public Health Scotland and eDRiS in accordance with ethical and data governance regulations.Rishi's clinical team, including consultant radiologists and breast imaging specialists, will contribute clinical insights to guide model design and interpretation, ensuring alignment with diagnostic workflows and clinical reasoning. They will also assist in defining clinically relevant causal relationships to inform the project’s causal graph design. Additionally, they will participate in the evaluation of AI outputs, providing structured feedback from several radiologists to assess interpretability, usability, and clinical impact. Finally, Rishi as NHS Innovation fellow will support translation of the project in clinical practice.Jacob Visser at Erasmuc MC in Rotterdam will provide guidance on identifying diagnostically relevant image features and clinical variables to inform model development and causal analysis.He will also assist in defining clinically meaningful causal relationships that can be used to build and validate the project’s causal reasoning framework, ensuring that the resulting AI models align with real-world diagnostic reasoning. As the project progresses, Erasmus MC radiologists will participate in the clinical evaluation of AI-generated outputs, offering qualitative and quantitative feedback on interpretability, trustworthiness, and clinical utility.Having experts in two different sites will also mitigate expert and location bias for the project.Project BackgroundBreast cancer is the most common type of cancer in women, with around 55,000 people being diagnosed with the disease yearly. Currently, UK women between 50 and 71 are invited to be screened every 2/3 years to help detect cases. This equates to around 2.1 million breast cancer screens carried out annually, helping to prevent around 1,300 deaths. Accurate and timely diagnosis is critical in the management of breast cancer, with early detection through imaging significantly improving patient outcomes, reducing the need for invasive interventions, and easing the financial and operational burden on healthcare services. However, recent reports have highlighted persistent challenges in interpreting medical imaging within healthcare services like the NHS. In response, several artificial intelligence (AI) tools are being trialled in hospitals to assist radiographers by triaging images, prioritising abnormal findings, and expediting urgent cases. While these developments prove the potential of AI to enhance diagnostic workflows, the opaque, black-box nature of many deep learning–based systems pose a significant barrier to clinical integration. For AI tools to be fully adopted and trusted in sensitive, high-stakes settings such as breast cancer imaging, it is essential to develop interpretable, transparent models that provide clear, understandable reasoning along diagnostic outputs.Project AimsThe project aims at developing a clinician-driven AI framework for breast cancer diagnosis - one that is transparent, explainable, and robust to support effective, safe, and trustworthy decision-making. The approach will prioritise clinical interpretability and reasoning, aiming to build models that perform well while providing meaningful insights that clinicians can trust and act upon. The framework will require the development and integration of transparent and interpretable AI models, causal reasoning, and robustness which will rely on generative AI-based synthetic images.Data and MethodologyData -We aim to have 100K mammograms and 10K breast MRIs from patients in Scotland. The larger number of mammograms is because it is the primary modality for routine scanning. We plan to gain HSC –PBPP approval for accessing this data before January 2027. The supervisory team have experience in obtaining HSC-PBPP approval for imaging data from Public Health Scotland through eDRiS and working in the national safe haven. This prior experience will mitigate the risk of patient image data access.Methodology -To ensure models transparency and clinical interpretability, a set of clinically meaningful concepts for both mammograms and MRIs will be defined. These sets will be derived from the BI-RADS atlas and refined through collaboration with radiologists to ensure clinical relevance. These concepts will form the basis of a concept bottleneck model, a two-stage classification architecture designed to enhance interpretability. The first model predicts the presence of individual clinical concepts directly from images, while the second model takes these predicted concepts and outputs the final diagnostic label (similar to our recent work in [1]), the BI-RADS score. In addition, causal structures will be defined to reflect expert understanding of the relationships between imaging features and diagnostic outcomes, as specified in the BI-RADS lexicon. The resulting causal graphs will encode how specific imaging features (e.g., calcifications) causally contribute to BI-RADS scores, distinguishing them from mere associations.Finally, to evaluate the proposed solution robustness to misdiagnosis a generative AI pipeline will be developed to produce synthetic mammograms and MRIs that simulate real-world diagnostic uncertainties and misinterpretations (similar to our work for chest X-rays in [2]). These adversarial cases will be generated by perturbing the concept vectors - modifying the presence, absence, or expression of clinical features based on clinical input. These altered vectors will be used to produce synthetic reports describing the perturbed findings. These reports, in turn, will condition image generation models to produce corresponding synthetic scans, ensuring coherence between the report and the visual content.Translational Potential and Expected ImpactThe project aims to deliver a trustworthy, interpretable, and robust AI system for breast cancer diagnosis, co-designed with clinicians and validated on real patient data. This has the potential to greatly improve healthcare by enabling faster, more accurate diagnoses, reducing patient wait times, and easing the burden on radiologists. Currently, two specialists are required per mammogram; the proposed solution could reduce this to one without compromising safety, thanks to its transparent, explainable outputs and ability to triage and highlight abnormalities. This will allow radiologists to focus on complex cases, reduce diagnostic errors, and generate significant operational and economic benefits.Training and Development Outcomes for the StudentThe PhD will train the student in developing transparent, interpretable, and robust AI for breast cancer diagnosis. They will gain expertise in machine learning, explainable AI, causal inference, and generative modelling for synthetic medical images. Training includes medical image analysis, ethical data governance, and interdisciplinary collaboration with clinicians to ensure clinical relevance. The student will develop strong research, communication, and project management skills through publications, presentations, and teamwork with NHS and academic partners. By completion, they will be equipped to lead research in trustworthy AI for healthcare, bridging technical innovation and clinical translation.References1. Amy Rafferty, Rishi Ramaesh, and Ajitha Rajan. Explainability Through Human-Centric Design for XAI in Lung Cancer Detection. The 34th International Joint Conference on Artificial Intelligence (IJCAI-25), Human-Centred AI track.2. Amy Rafferty, Rishi Ramaesh and Ajitha Rajan. CoRPA: Adversarial Image Generation for Chest X-rays Using Concept Vector Perturbations and Generative Models. In 13th IEEE International Conference on Healthcare Informatics (ICHI 2025) . Quantifying dementia progression and emotion recognition in a virtual reality environment Dementia is a progressive syndrome affecting memory, cognition, and spatial navigation, reducing quality of life for people living with dementia (PlwD) and placing strain on carers and healthcare systems. This project aims to improve the quantification of dementia progression, as well as emotion recognition, using data collected in a virtual reality environment that engages PlwD in personalised navigation tasks. Leveraging self-supervised deep learning and explainable artificial intelligence, the student will identify interpretable navigational biomarkers of disease progression and emotional reactions. The findings hold promise for earlier detection, personalised interventions, and scalable, cost-effective support to improve outcomes for PlwD.Supervisory teamArno Onken and Vito De FeoProject PartnerBike Labyrinth Bike Labyrinth will provide biomedical relevance in the form of use cases for virtual environments to assist in improving spatial navigation and memory abilities. The company will also provide access to their development and production facilities for conveying product development needs, and coach the student during monthly meetings.Project backgroundDementia is experienced as an ongoing decline in brain functions, including reasoning, memory, spatial navigation, and keeping track of time. Consequently, People living with Dementia (PlwD) tend to have additional difficulties affecting their cognitive, mental and physical abilities, which not only impacts their own quality of life but also poses challenges for their families and carers. For example, in its early stages, Alzheimer's Disease (AD) causes difficulties in dealing with new information. As AD progresses, memory loss affects sufferers’ ability to plan and carry out day-to- day tasks, and problems with spatial navigation make it more difficult for sufferers to reliably find their way back home from familiar places. Physical activity can help remedy some of this decline due to its benefits for brain health; however, taking part in physical activity can be particularly challenging for PlwD. These issues not only reduce the quality of life of PlwD but also lead to unnecessary hospitalisations and delays in hospital discharge, stressing the importance for effective pre-hospitalisation preventive and supportive solutions.Project aimsThe aim of this project is to improve the quantification of dementia disease progression and emotional recognition using data collected from a virtual reality (VR) environment. To this end, the student will leverage the latest developments in self-supervised deep learning and explainable artificial intelligence to find navigational features that best characterise disease progression and emotional reactions.Data and MethodologyDementia is a multi-dimensional syndrome that originates in the brain, affecting different functionalities, such as memory, cognition, spatial and temporal orientation, and emotional regulation. People living with dementia often have multiple comorbidities affecting their physical functioning as well as their overall quality of life and social health.The external industry partner Bike Labyrinth is developing a VR-enhanced exercise bike, an easy-to-use, engaging and safe training environment. PlwD can train their spatial navigation and memory abilities by moving in a 3D virtual environment simulating their local city. This device does not require permanent involvement of specialist personnel and can be used to evaluate the performance of subjects without the need to create separate physical and cognitive measures. It can be personalised to any specific city and the individual needs of subjects, allowing person-centred training and real-time adaptation. While the bike is still at the prototype stage, the VR environment can already be explored using a simulator.AI systems have been used to assess the state and predict the progression of cognitive decline (Jiang et al., 2020). Using the VR environment allows us to assess navigational and memory performance of subjects. There are suitable features that have already been used in early diagnosis of AD, namely average steps and path-efficiency (Jiang et al., 2020). We will build on these insights and enhance them using data-driven modelling. We will use self-supervised deep learning techniques to model subject navigation in the virtual environments and use explainable AI techniques such as LIME and SHAP (Hassija et al., 2024, Vimbi et al., 2024) to find interpretable features that characterise progression of dementia. This will allow us to accurately and automatically quantify changes in the performance of the subject as related to pathogenesis of the disease.Translational Potential and Expected ImpactThere are ~50 million people living with dementia worldwide and this is predicted to rise to 152 million by 2050. In the UK, 700,000 family carers look after the 850,000 people living with dementia, and this is expected to rise to 1.6 million by 2040. Current UK costs of dementia for older people are £34.7 billion a year, including healthcare (£4.9 billion), social care (£15.7 billion) and unpaid care (£13.9 billion). Total UK dementia care costs are projected to increase to £94.1 billion by 2040. Better quantification of disease progression holds promise to improve quality of life of PlwD.ReferencesJiang, J., Zhai, G., & Jiang, Z. (2020, June). Modeling the self-navigation behavior of patients with Alzheimer’s disease in virtual reality. In International Conference on VR/AR and 3D Displays (pp. 121-136). Singapore: Springer Singapore.Hassija, V., Chamola, V., Mahapatra, A. et al. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn Comput 16, 45–74 (2024). https://doi.org/10.1007/s12559-023-10179-8 Vimbi, V., Shaffi, N. & Mahmud, M. Interpreting artificial intelligence models: a systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inf. 11, 10 (2024). https://doi.org/10.1186/s40708-024-00222-1 AI-based assessment and validation of brain mineral deposition in its different forms detected from routine clinical brain magnetic resonance images This project will develop an AI-based method and tool for segmenting iron and calcium accumulation throughout the whole brain and in its different forms (tissue deposition, brain microbleeds, superficial siderosis, and haemorrhagic transformations from ischaemic lesions) in a large sample of MRI images acquired from different patient groups, assess the degree of mineral accumulation in the areas segmented offering a proxy for insoluble iron/calcium concentration and degree of aggregation (i.e., clustering) in different subregions (also using AI methods), and validate the AI-based imaging computational assessments using complementary biomedical analysis methods in a sample of individuals with brain MRI, retinal images, and tissue samples.Supervisory teamMaria Valdés Hernández, Blanca Diaz-Castro and Miguel O. Bernabeu LlinaresProject PartnersPharmatics LtdProject BackgroundIron is involved in oxygen transport and is essential for maintaining a healthy body’s function. But an excess of it can lead to oxidative stress damage to biomolecules, as well as cellular dysfunction. This process is apparent with increasing age, where iron gets accumulated in the brain, and it increases the risks of neurodegenerative diseases. Overall, it is the strongest factor influencing cognitive decline in normal ageing. Although this process mainly occurs gradually and silently, it can be detected using magnetic resonance images (MRI) even in the preclinical stages when minor cognitive concerns are starting to occur and before any other clinical symptom appears. In normal ageing this toxic iron accumulation mainly occurs in the globus pallidus, a subregion at the centre of the brain. In individuals with neurodegenerative diseases it has different spatial distributional patterns. We previously developed an automatic method to identify and segment the areas in normal ageing MRI scans and validated it with a physical phantom. But we could not establish the degree of mineral accumulation in the segmented areas, most important for predictive medicine. Moreover, our method was only limited to a small brain region, given the computational power available at the time.Project AimsThis project will develop an AI-based method and tool for segmenting iron and calcium accumulation throughout the whole brain and in its different forms (tissue deposition, brain microbleeds, superficial siderosis, and haemorrhagic transformations from ischaemic lesions) in a large sample of MRI images acquired from different patient groups, assess the degree of mineral accumulation in the areas segmented offering a proxy for insoluble iron/calcium concentration and degree of aggregation (i.e., clustering) in different subregions (also using AI methods), and validate the AI-based imaging computational assessments using complementary biomedical analysis methods in a sample of individuals with both brain MRI and tissue samples.Data and MethodologyThe student will use well-phenotyped data with carefully generated ground truth from studies conducted at the Centre for Clinical Brain Sciences (1,2) to develop the iron deposition assessment method, which will give as output differential probabilistic masks of various forms of iron deposits throughout the whole brain. The breadth of data available for the project includes routine clinical MRI, vascular function and blood-brain-barrier permeability measurements, clinical,demographic, and cognitive information from each of the studies’ participants (approximately 1200). Tissue samples from which derive iron concentration curves are from ~20 brains from the study on cognitive ageing which also has brain MRI acquired in different (i.e., five) assessment waves every three years (2). The tissue samples were imaged at 7T MRI and the co-supervisor has aligned both modalities (3). The co-supervisor of the project has experience in proteomics analyses in relation to small vessel disease, to discern which imaging phenotypes involve different forms of iron deposition.Therefore, preliminary data held by the co-supervisor of the project may be useful in further validating the developed method. More tissue-MRI pair samples have been also acquired from tissue banks, reaching a total of 80 samples (3).Once the AI assessment method is validated using the in-house data from different studies, MRI data from online repositories will be downloaded to test and re-train the AI model for increased robustness and reduced bias. Finally, and given the strong association between these deposits and dementia progression, we will upload the model to the National Safe Heaven to apply it to the National Scottish Registry MRI data to estimate dementia prediction accuracy, over the estimation achieved using currently available methods.(1) Clancy et al 2021 https://doi.org/10.1177/2396987320929617 (2) Taylor et al 2018 https://doi.org/10.1093/ije/dyy022 (3) Humphreys et al 2019 https://doi.org/10.1177/1747493018799962 Translational Potential and Expected ImpactThis project offers a rare opportunity of working in a clinically relevant theme to address a clinical need and work with a breadth of data from different modalities and nature. It goes beyond the conventional use of computational descriptors for validating the AI- based method, to use clinically relevant data to ensure its impact and further applicability in clinical research and practice. The project might enhance existing MRI instruments and methods by integrating Artificial Intelligence and has substantial scientific and commercial potential.Training and Development Outcomes for the StudentThe student will be trained on MRI mineral deposition identification and in all the current knowledge around it and exposed to real-world clinical and research neuroimaging work, as well as laboratory (biological, proteomics) work underpinning the clinical neuroimages. The student will also be trained in medical image processing methods, and exposed to commercial (industry), translational and research environments with emphasis in fair AI. At the end of the PhD it is expected that the student acquires a high level of knowledge on the theme and has developed a prototype that can be commercially viable as an add-on module for clinical and research MRI platforms.References(1) Ji Y, Zheng K, Li S, et al. Insight into the potential role of ferroptosis in neurodegenerative diseases. Front Cell Neurosci. 2022 Oct 27;16:1005182. https://doi.org/10.3389/fncel.2022.1005182 (2) Valdés Hernández M, Allerhand M, Glatz A, et al. Do white matter hyperintensities mediate the association between brain iron deposition and cognitive abilities in older people? Eur J Neurol. 2016 Jul;23(7):1202-9. https://doi.org/10.1111/ene.13006(3) Valdés Hernández, M., Ritchie, S., Glatz, A. et al. Brain iron deposits and lifespan cognitive ability. AGE 37, 100 (2015). https://doi.org/10.1007/s11357-015-9837-2 (4) Valdés Hernández Mdel C, Glatz A, Kiker AJ, et al. Differentiation of calcified regions and iron deposits in the ageing brain on conventional structural MR images. J Magn Reson Imaging. 2014 Aug;40(2):324-33. https://doi.org/10.1002/jmri.24348 (5) Glatz A, Bastin ME, Kiker AJ, Deary IJ, Wardlaw JM, Valdés Hernández MC. Automated segmentation of multifocal basal ganglia T2*-weighted MRI hypointensities. Neuroimage. 2015 Jan 15;105:332-46. https://doi.org/10.1016/j.neuroimage.2014.10.001 (6) Clancy U, Garcia DJ, Stringer MS, Thrippleton MJ, Valdés-Hernández MC, Wiseman S, Hamilton OK, Chappell FM, Brown R, Blair GW, Hewins W, Sleight E, Ballerini L, Bastin ME, Maniega SM, MacGillivray T, Hetherington K, Hamid C, Arteaga C, Morgan AG, Manning C, Backhouse E, Hamilton I, Job D, Marshall I, Doubal FN, Wardlaw JM. Rationale and design of a longitudinal study of cerebral small vessel diseases, clinical and imaging outcomes in patients presenting with mild ischaemic stroke: Mild Stroke Study 3. Eur Stroke J. 2021 Mar;6(1):81-88. https://doi.org/10.1177/2396987320929617 (7) Taylor AM, Pattie A, Deary IJ. Cohort Profile Update: The Lothian Birth Cohorts of 1921 and 1936. Int J Epidemiol. 2018 Aug 1;47(4):1042-1042r. https://doi.org/10.1093/ije/dyy022 (8) Humphreys CA, Jansen MA, Muñoz Maniega S, González-Castro V, Pernet C, Deary IJ, Al-Shahi Salman R, Wardlaw JM, Smith C. A protocol for precise comparisons of small vessel disease lesions between ex vivo magnetic resonance imaging and histopathology. Int J Stroke. 2019 Apr;14(3):310-320. https://doi.org/10.1177/1747493018799962 Developing Novel Data-Driven Tools and Methodologies to Understand Inequalities in Maternity Vaccination Uptake in Scotland This PhD project aims to investigate inequalities in maternity vaccine uptake in Scotland using advanced health data science methods. Leveraging national electronic health records and the DataLoch Respiratory Registry, the study will develop computational models to identify patterns in vaccine delivery and access across demographic and clinical factors. Complementary qualitative research will explore maternal attitudes and healthcare delivery models, particularly among underserved groups. By integrating biomedical, clinical, and behavioural data, the project will create a novel, data-driven framework to inform targeted public health interventions and improve maternal and neonatal health outcomes across diverse populations.Supervisory teamTing Shi, Louisa Pollock and Cheryl GibbonsProject PartnersPublic Health ScotlandProject BackgroundVaccination during pregnancy is a well-established public health strategy that protects both mother and infant from potentially severe infectious diseases. Vaccines such as influenza, pertussis, and - most recently - respiratory syncytial virus (RSV) have been shown to significantly reduce disease burden in the infant period. In the UK, these vaccines are recommended for all pregnant individuals as part of routine antenatal care. However, uptake remains inconsistent. In Scotland, uptake varies substantially between NHS Health Boards and across demographic groups. Notably, the uptake of the maternal RSV vaccine was approximately 50% during its first year of introduction but ranged from 38.2% uptake in the most deprived group compared to 56.1% in the least deprived, a 17.9 percentage point difference, and ranging by NHS Health Board from 23.1% (lowest) to 57.6% (highest). This could highlight variation or gaps in vaccine delivery and engagement.Understanding the factors driving this variation is crucial for improving health outcomes and ensuring equitable access to care. Yet these drivers remain poorly understood, partly due to challenges in how maternity vaccination data are captured, standardised, and made available for analysis. Data are often fragmented across systems, with limited integration between vaccination records and clinical or sociodemographic information. Additionally, differences in service delivery models, access to care, and patient attitudes likely contribute to disparities but have not been comprehensively studied. This project addresses these gaps through the development of new tools and methodologies to enable more effective analysis and targeted intervention.Project AimsThis PhD project will develop innovative tools and methodologies to explore and explain variation in maternity vaccination uptake, with a focus on health inequalities and access. Specifically, our objectives are to:1. Design a new methodological pipeline that integrates and links multiple data sources, including health board-level vaccination records, service delivery data, and patient-level EHRs.2. Develop the visualisation tools to identify gaps in access and inform targeted interventions.3. Use qualitative methodologies, including interviews and focus groups, to capture maternal perceptions, beliefs, and barriers to vaccination, particularly in underserved groups.Data and MethodologyThis project will adopt a mixed-methods approach, integrating advanced computational modelling with qualitative research to investigate inequalities in maternity vaccine uptake. The primary quantitative component will involve the use of routinely collected, linked electronic health records (EHRs) accessed through Scotland’s robust data infrastructure, including the DataLoch Respiratory Registry and the electronic Data Research and Innovation Service (eDRIS).These datasets will be linked at the patient level using the Community Health Index (CHI) number to enable population-scale analyses of vaccine uptake in relation to gestational age, appointment timing, healthcare setting (e.g. hospital vs community), delivery staff (maternity vs immunisation teams), and key socio-demographic factors such as age, ethnicity, deprivation, and rurality. Machine learning models and statistical techniques (e.g. logistic regression, clustering, classification algorithms) will be used to identify patterns, high-risk subgroups, and potential intervention points.To complement the quantitative analysis, qualitative methods will explore maternal attitudes and barriers to vaccination. This will involve interviews and/or focus groups with pregnant individuals and healthcare providers, with a particular focus on underserved groups. Insights from this component will help interpret data patterns and support the development of equitable, context-sensitive recommendations.Patient and public involvement (PPI) will be integrated throughout the project. PPI members will contribute to shaping the research questions, ensuring relevance to public health priorities, and co-developing effective dissemination strategies to reach diverse audiences, including the general public, clinicians, and policymakers.Translational Potential and Expected ImpactThis project will deliver a replicable, data-driven framework for understanding and addressing inequalities in maternity vaccine uptake. By integrating computational modelling with qualitative insights and lived experience, the findings will inform tailored public health strategies and support NHS and government efforts to optimise vaccine delivery. The methodologies developed will be scalable across the UK and internationally, enabling targeted interventions for underserved groups. Embedding Patient and Public Involvement (PPI) ensures real-world relevance and impact. Ultimately, the project will contribute to more equitable, efficient, and responsive maternal vaccination programmes and inform national vaccination policy.Training and Development Outcomes for the StudentCore technical areas of learning will include epidemiology, health data science, computational modelling, machine learning and computational approaches in using large datasets from different sources and modalities as well as qualitative research, and science-policy translation. The student will develop or extend their programming expertise in programming languages, such as R or Python. We will encourage developing and sharing code for the wider scientific community through platforms such as GitHub. The student will have the opportunity to learn in an academic and public health setting, understanding the applied aspects and context of epidemiology and data analyses. Soft skills in scientific communication and collaboration will be fostered via the interdisciplinary supervisory team and participation in different conferences and through publications. Efficient design of binders using surrogate models This PhD project addresses the complexity of designing therapeutic protein binders by developing a novel, interpretable abstract representation of proteins informed by Molecular Dynamics (MD) simulations. Current inverse design methods are challenged by the vast sequence-structure space and computational cost. Our primary aim is to create an abstract framework that significantly streamlines the design process, allowing for fast exploration of the protein sequence space to achieve target properties, including specific binding, stability, and desired immunogenicity. This data-driven abstraction, grounded in molecular dynamics, will capture essential features for accurate and efficient design. The approach will be validated using the therapeutically relevant PD-1 system with AstraZeneca. The aim is to overcome limitations in current design methodologies, paving the way for innovations in targeted drug delivery and biosensing.Supervisory teamKartic Subr and Chris Wood Project PartnersAstraZenecaProject BackgroundThe design of proteins using inverse methods plays a pivotal role in developing therapeutic binders—proteins engineered to attach to specific drug targets with high affinity. This PhD project aims to enhance binder design by integrating molecular dynamics (MD) simulations with simplified, interpretable representations of proteins to design highly specific binders. Protein folding, dictated by amino acid sequences, presents significant challenges due to its complexity, especially when designing proteins for specific interactions.To facilitate the design of large protein complexes with targeted binding capabilities, this research focuses on creating and validating simplified, interpretable representations of proteins informed by molecular dynamics simulations. These models will streamline the design processes, making it feasible to design sequences that achieve desired binding properties at a fraction of the computation. The project will utilize computational approaches to optimize these sequences, ensuring that the resulting proteins exhibit the necessary stability, specificity, and immunogenicity profiles. Ultimately, this work intends to produce novel protein binders with significant applications in therapeutics, diagnostics, and synthetic biology. By advancing the methodologies for protein design, this project seeks to overcome limitations in creating efficient binders, paving the way for innovations in targeted drug delivery and biosensing technologies.Project AimsThe primary aim for this project is to develop an abstract representation for proteins that will enable the design of specific therapeutic binders. The representation will be interpretable and explainable while also enabling fast exploration of the design space, based on target properties such as shape, dynamics, and functionality. By integrating molecular dynamics with data-driven abstraction, the framework will capture essential features required for accurate and efficient binder design. The approach will be tested using the PD-1 system, a well-known immune pathway with established therapeutic relevance in cancer immunotherapy.Data and MethodologyThe proposed methodology will focus on developing an abstract representation for protein binder design, using an anti-PD1 as a model system targeting the PD-1 receptor involved in immune checkpoint pathways.The research will collect a dataset of known PD-1 structures and existing binders, including peptide sequences, 3D structures, and binding affinities. The dataset will serve as the basis for training machine learning models. The approach involves developing simplified protein representations through techniques like autoencoders, which will distil essential structural features critical for effective binding. Molecular dynamics simulations will validate these models, testing their capacity to predict how novel sequences fold and interact with the PD-1 receptor. The simulations will assess the dynamics and potential energy landscapes of candidate peptides designed using this abstract representation.The project will develop search algorithms based on genetic algorithms to explore the design space, identifying peptide sequences optimized for binding PD-1. This computational search will focus on sequences predicted to exhibit high specificity, stability and immunogenicity profiles.For validation, the project will perform in silico experiments to screen these candidates, employing binding free energy calculations and docking simulations to estimate their affinity for the PD-1 receptor. Successful candidates will move to experimental validation, where peptides will be synthesized and tested at AstraZeneca, Cambridge (external partner) using Surface Plasmon Resonance and isothermal titration calorimetry, measuring real-world binding affinities and kinetics.The feedback loop, incorporating these experimental findings, will continuously refine the model and search algorithms. By integrating computational insights with empirical data, this iterative process aims to enhance the precision and effectiveness of designing PD-1 binders, contributing to advancements in cancer immunotherapy.Translational Potential and Expected ImpactThe proposed methodology offers a general framework for designing therapeutic protein binders using abstract representations. By representing proteins through simple fragment-based abstraction, the method enables broader exploration of conformational space while retaining interpretability. The added interpretability allows for a greater understanding of the mechanisms underlying binding and enables steering the design towards desired pharmacological profiles. Using PD-1 as a motivating example, the framework will demonstrate how simplified, explainable models can accelerate the discovery of selective immune checkpoint binders. Ultimately, this approach aims to shorten the path from computational design to effective therapeutics, improving outcomes for patients.Training and Development Outcomes for the StudentThe student will gain expertise in computational modelling, specifically in protein design and molecular dynamics simulations, enhancing their technical skills. They will develop proficiency in machine learning techniques for data analysis and representation design, directly applicable to modern drug discovery.Additionally, the student will learn to conduct interdisciplinary research, bridging computational methods with experimental validation. This includes collaborating with laboratories for in vitro testing, providing practical insights into experimental protocols.The student will also cultivate strong problem-solving abilities and the capacity to translate scientific findings into real-world applications, particularly in drug design and therapy development. Effective communication skills will be enhanced through reporting and presenting research findings to diverse audiences. Furthermore, the student will gain a thorough understanding of ethical research practices, preparing them for responsible scientific leadership in their future career.ReferencesCastorina LV, Wood CW, Subr K. From Atoms to Fragments: A Coarse Representation for Functional and Efficient Protein Design. bioRxiv; 2025. DOI: 10.1101/2025.03.19.644162.Crowdsourced Protein Design: Lessons From the Adaptyv EGFR Binder CompetitionTudor-Stefan Cotet, Igor Krawczuk, Filippo Stocco, Noelia Ferruz, Anthony Gitter, Yoichi Kurumida, Lucas de Almeida Machado, Francesco Paesani, Cianna N. Calia, Chance A. Challacombe, Nikhil Haas, Ahmad Qamar, Bruno E. Correia, Martin Pacesa, Lennart Nickel, Kartic Subr, Leonardo V. Castorina, Maxwell J. Campbell, Constance Ferragu, Patrick Kidger, Logan Hallee, Christopher W. Wood, Michael J. Stam, Tadas Kluonis, Süleyman Mert Ünal, Elian Belot, Alexander Naka, Adaptyv Competition OrganizersbioRxiv 2025.04.17.648362; doi: https://doi.org/10.1101/2025.04.17.648362North B, Lehmann A, Dunbrack RL Jr. A new clustering of antibody CDR loop conformations. J Mol Biol. 2011 Feb 18;406(2):228-56. doi: 10.1016/j.jmb.2010.10.030. Epub 2010 Oct 28. PMID: 21035459; PMCID: PMC3065967. Embedded AI for neurodegenerative disease monitoring Accurate tracking of symptoms and progression of multiple sclerosis is essential for drug discovery and disease management. However, current measurement tools are tedious, prone to bias, and do not reflect what people with MS experience. Many symptoms remain invisible and unrecognised. Of these, fatigue is the most debilitating and reported by most patients. Yet no methodology to measure and tackle fatigue exists.Supervisory teamPaul Patras and Thanasis TsanasProject PartnersHoffmann-La RocheProject BackgroundMultiple Sclerosis (MS) is a neurodegenerative autoimmune disease that affects approximately 3 million people worldwide. The disease primarily affects the central nervous system, with the immune system attacking the myelin sheath around nerve cells. The symptoms of MS very broadly and can have debilitating effects. This includes loss of vision, numbness, mobility problems, cognitive decline, etc., which increase as the disease progresses. Recent studies report that the annual cost of MS-related disability exceeds per capita gross domestic product (GDP), which confirms the major societal cost of this condition.Project Aims1) develop new fine-grained data-driven fatigue monitoring methods that build upon detailed telemetry that will be gathered using wearable devices and personal living space sensors (motion, pressure, LiDAR, etc.)2) data labelling using correlation analysis with image biomarkers, fluid biomarkers, and patient-reported outcomes3) baselining with volunteer patients using medical-grade wearable devices4) develop deep learning models that can analyse multi-modal spatio-temporal data to detect early disease-specific symptoms, health improvements, or decline.5) develop lightweight compact/approximate data structures and deep learning models that can be deployed on computationally-constrained devicesTranslational Potential and Expected ImpactUltimately the outcomes of the project seek to improve fatigue management and potentially the efficacy assessment of new drugs. Long-term, the methods developed may assist consultant neurologists in exploring personalised treatment and improve long-term patient outcomes. This article was published on 2025-11-05