The first large Turkish RoBERTa-style model, developed after PortBERT with extensive evaluations on private GPUs and the LRZ BayernKI H100 cluster. The study highlights the importance of corpus variance over sheer size.
A domain-adaptive German medical RoBERTa model, exploring continued pre-training and from-scratch training with specialized vocabularies.
A continued-pretraining extension of GottBERT, developed during a period of transition and finalized as a preprint before being presented at GlobalNLP@RANLP 2025.