
Raphael Schmitt is an AI and NLP researcher working on large-scale model development, applied information retrieval, and clinical AI systems. He has developed RoBERTa-style language models such as GottBERT, GeistBERT, ChristBERT, and PortBERT, and designed scalable pipelines for medical information retrieval and clinical text mining. His work spans AI infrastructure, database-backed IR systems, medical informatics, applied software engineering, and GPU-based volume rendering and imaging. He combines his technical expertise with a psychological perspective and a personal foundation shaped by Christian faith, martial arts, and diving, supporting clarity, resilience, and grounded decision-making. His expertise can be requested to support organizations in AI-driven projects, digital transformation, and modern software development.
Download Consultating ProfilePrevious roles that shaped my professional expertise
Selected work in AI, NLP, and healthcare

A focused Hebrew RoBERTa project delivering the first large Hebrew model with state-of-the-art benchmark performance; pre-trained on a TPUv4-128 pod and evaluated on private hardware.

The first large Turkish RoBERTa-style model, developed after PortBERT with extensive evaluations on private GPUs and the LRZ BayernKI H100 cluster. The study highlights the importance of corpus variance over sheer size.

A domain-adaptive German medical RoBERTa model, exploring continued pre-training and from-scratch training with specialized vocabularies.

A Portuguese RoBERTa model evaluated during a research stay on the Azores, highlighting efficiency-focused perspective models.

A continued-pretraining extension of GottBERT, developed during a period of transition and finalized as a preprint before being presented at GlobalNLP@RANLP 2025.

A CUDA based GPU volume rendering engine with a MATLAB interface.

A PostgreSQL Row-Level Security framework for i2b2, including policy generation, automated testing with pgTAP, performance benchmarking, and a Docker-based deployment stack for clinical data warehouses.

A set of reusable React Admin data providers for PostgREST and FHIR, developed to streamline UI development.

A lightweight, API-driven MeSH explorer using PostgREST and React-Admin, offering a simple English and German interface for hierarchical navigation.

The first published German RoBERTa-based model family with a clear development path: from its 2020 preprint to the extended EMNLP 2024 version.

Refactoring and dockerization of the MIRACUM NGS analysis pipeline, enabling standardized, reproducible workflows for precision oncology and Molecular Tumor Boards.

A modern medical search engine developed from scratch with a fully open, extensible IR architecture.

Building the ESID Registry addons with modern reporting, secure APIs, PostgreSQL/RLS, GraphQL/React, and the Json2Xlsx workflow.

A clinically driven registry rebuilt from a legacy Access database with a custom UI, modular plausibility logic, and an export pipeline.

Efficient octree traversal and ray based operations for 3D robotic mapping, developed as part of my bachelor thesis.

A family business that shaped my early digital and entrepreneurial skills.