About Me

Philipp S. Sommer

Philipp S. Sommer

Reseach software engineer for climate science

  • climate models
  • numerics
  • python
  • open science
  • palaeo
  • pollen
  • statistics
  • physics

My Career

During my Bachelor I realized that I really want to do research on how our world is evolving. This wish was even stronger than my joy about the beauty of maths in physics. I have strong theoretical skills so I started coding and to develop climate models. This is now the way to go for me and since then I rejoice in data analysis and visualization. We can extract so much new knowledge out of these large data sets through visual and statistical exploration. Hence I am always keen to learn and develop new techniques and to share them with others.

Helmholtz-Zentrum Hereon

Helmholtz Coastal Data Center (HCDC)

Dec. 2019
Data Scientist

University of Lausanne, Institute of Earth Surface Dynamics (IDYST)

Numerical Tools and Software Solutions for Palaeoclimate Analysis

Dec. 2015
PhD student

Max-Planck-Institute for Meteorology/University of Hamburg

Master in Integrated Climate System Sciences

Sep. 2013
Master student

Greenpeace Germany e.V.

Biodiversity and Climate team

Sep. 2012
Intern

University of Heidelberg

Bachelor in Physics

Sep. 2009
Bachelor student

My Skills

My Projects

HCDC

HCDC

{"en"=>"Helmholtz Coastal Data Center"}

{"en"=>"The Helmholtz Coastal Data Center (HCDC) is the central point of contact\nfor scientific data management at the Institute of Coastal Research. The\nnew central working group possesses broad expertise and consists of\ngeoscientists, programmers and web developers as well as database and\nmetadata managers. One aim of the unit is to merge data from coastal and\nmarine research with the help of a central data portal and to provide\nsustainable access to different user groups. HCDC therefore equally\naddresses the scientific field, the interested public and decision makers\nwho manage coastal regions and the marine environment.\n", "de"=>"Das Helmholtz Coastal Data Center (HCDC) ist die zentrale Anlaufstelle\nfür wissenschaftliches Datenmanagement am Institut für Küstenforschung.\nDie neue zentrale Arbeitsgruppe verfügt über eine breite Expertise und\nsetzt sich aus Geowissenschaftlern, Programmierern, Webentwicklern sowie\nDatenbank- und Metadatenmanagern zusammen. Ein Ziel dieser Einheit ist\nes, Daten der Küsten- und Meeresforschung mit Hilfe eines zentralen\nDatenportals zusammenzuführen und unterschiedlichen Nutzerkreisen\nnachhaltig nutzbar zu machen. HCDC spricht daher gleichermaßen die\nWissenschaft, die interessierte Öffentlichkeit und Entscheidungsträger\nfür das Management von Küstenregionen und der marinen Umwelt an.\n"}

psyplot

psyplot

{"en"=>"Python framework for interactive data visualization", "de"=>"Python software zur interaktiven Datenvisualisierung"}

{"en"=>"psyplot is a cross-platform open source python project that mainly\ncombines the plotting utilities of matplotlib and the data management of\nthe xarray package and integrates them into a software that can be used\nvia command-line and via a GUI.\n\nThe main purpose is to have a framework that allows a fast, attractive,\nflexible, easily applicable, easily reproducible and especially an\ninteractive visualization and analysis of data.\n\nThrough various plugins psyplot visualizes georeferenced climate model\ndata on rectangular and unstructured grids, regressions, stratigraphic\ndiagrams and more.\n\nExamples are available at\n[psyplot.github.io/examples](https://psyplot.github.io/examples).\n", "de"=>"psyplot ist ein plattformübergreifendes Open Source Python Projekt das\nhauptsächlich die Visualisierungsfunktionen von matplotlib mit den\nDatenmanagementfunktionen des xarray Pakets vereint. Die entstehende\nSoftware kann dabei sowohl von der Kommandozeile, als auch durch eine\nGraphische Benutzeroberfläche bedient werden.\n\nDer Hauptzweck dieses Projektes ist es, eine Umgebung zu schaffen, die\neine schnelle, attraktive, flexible, einfach anzuwendende, einfach zu\nreproduzierende und ganz besonders eine interaktive Visualisierung und\nAnalyse von Klimamodelldaten ermöglicht.\nDurch verschiedene Zusatzprogramme visualisiert psyplot georeferenzierte\nKlimamodelldaten auf rechteckförmigen und unstrukturierten Gittern.\nAußerdem können Regressionskurven, sowie stratigraphische Diagramme\nerstellt werden, und mehr.\n\nAnwendungsbeispiele gibt es auf\n[psyplot.github.io/examples](https://psyplot.github.io/examples)\n"}

...   ...
DJAC

DJAC

{"en"=>"Web portal to manage an academic community and to foster collaboration", "de"=>"Online portal um eine wissenschaftliche Community zu managen und die Zusammenarbeit zu fördern"}

{"en"=>"Collaboration Platforms for harmonization and building a shared understanding\nof communities are essential components in today's academic environment.\nWith the help of modern software tools and advancing digitization, our\ncommunities can improve collaboration via event, project and file\nmanagement, and communication. The variety of tasks and tools needed in\ninterdisciplinary communities, however, pose a considerable obstacle for\ncommunity members.\n\nWe see them in the administration, and especially when on-boarding new\nmembers with different levels of experience (from student to senior\nscientist). Here, user-friendly, technical support is needed.\nAs part of my work, I am involved in many communities, particularly in\nthe Climate Limited-area Modelling Community (CLM-Community) and the\nHelmholtz Metadata Collaboration (HMC). With the input of these\n(and more) communities, I developed the DJAC-Platform, an open-source,\nPython (Django)-based website. DJAC manages communities from a single\ninstitute to an (inter-)national community with hundreds and more\nparticipating research institutions.\n", "de"=>"Kollaborationsplattformen zur Harmonisierung und zum Aufbau einer\n_Knowledge Base_ in einer _Community_ sind wesentliche Bestandteile des\nheutigen akademischen Umfelds.\n\nMit Hilfe moderner Softwaretools und der fortschreitenden Digitalisierung\nkönnen unsere Communities die Zusammenarbeit durch Veranstaltungs-,\nProjekt- und Dateimanagement sowohl Verwaltung, also auch Kommunikation\nverbessern. Die Vielfalt der Aufgaben und Werkzeuge, die in\ninterdisziplinären Communities stellen jedoch eine erhebliche Hürde für\nCommunity-Mitglieder dar.\n\nWir sehen sie in der Verwaltung und insbesondere beim Onboarding neuer\nMitgliedern mit unterschiedlichem Erfahrungsstand (vom Studenten bis zum\nerfahrenen Wissenschaftler). Hier ist eine benutzerfreundliche, technische\nUnterstützung erforderlich. Ich bin bei meiner Arbeit an vielen\nGemeinschaften beteiligt, insbesondere an der Climate Limited-area\nModeling Community (CLM-Community) und der Helmholtz Metadata\nCollaboration (HMC). Mit dem Input dieser (und weiterer) Gemeinschaften\nhabe ich die DJAC-Plattform entwickelt, eine Open-Source, Python\n(Django)-basierte Website. DJAC verwaltet Gemeinschaften von einem\neinzelnen Institut bis einer (inter-)nationalen Gemeinschaft mit\nhunderten und mehr teilnehmenden Forschungs-Einrichtungen.\n"}

Kubernetes OpenShift Deployment

Kubernetes OpenShift Deployment

{"en"=>"Software templates and Helm Charts for secure and sustainable deployment via Kubernetes", "de"=>"Software Templates und Helm Charts für ein sichere und nachhaltige Deployments über Kubernetes"}

{"en"=>"Web applications have become increasingly vital in scientific settings,\nfacilitating knowledge transfer, collaboration, and data publication.\nEstablishing a secure IT infrastructure to accommodate a diverse range of\nweb applications and technologies however poses significant challenges.\nAddressing this issue, I developed a secure yet adaptable approach to deploy\nand maintain multiple web applications.\n\nThe concept leverages Kubernetes\nOpenShift, employing a standardized, git-centric workflow to document web\napplications, responsibilities, and technologies. Security measures include\nregular image updates, micro-segmentation, and least privilege roles. The\nconcept embraces reproducibility and division of labor through the\nutilization of cookiecutter templates, Helm charts, and Ansible. It\nincorporates essential concepts such as CI/CD, scalability, and\nreproducibility, benefiting the entire research institution.\n", "de"=>"Webanwendungen sind in der Wissenschaft immer wichtiger geworden,\nSie erleichtern den Wissenstransfer, die Zusammenarbeit und die Veröffentlichung von Daten.\nDie Einrichtung einer sicheren IT-Infrastruktur für ein breites Spektrum von\nWebanwendungen und -technologien unterzubringen, stellt jedoch eine große Herausforderung dar.\nDeshalb habe ich ein sicheres und dennoch anpassungsfähiges Konzept für die Bereitstellung\nmehrere Webanwendungen entwickelt.\n\nDas Konzept nutzt Kubernetes OpenShift und verwendet einen\nstandardisierten, Git-zentrierten Arbeitsablauf um Web Anwendungen,\nVerantwortlichkeiten und Technologien zu dokumentieren. Zu den Sicherheitsmaßnahmen gehören\nregelmäßige Image-Updates, Mikro-Segmentierung und _least privilege_ Rollen.\nDas Konzept umfasst Reproduzierbarkeit und Arbeitsteilung durch die\nVerwendung von Cookiecutter-Vorlagen, Helm Charts und Ansible. Es\numfasst wesentliche Konzepte wie CI/CD, Skalierbarkeit und\nReproduzierbarkeit, wovon die gesamte Forschungseinrichtung profitiert.\n"}


HORNET

HORNET

{"en"=>"Holocene Climate Reconstruction for the Northern Hemisphere Extra-tropics", "de"=>"Klimarekonstruktionen der Nördlichen Hemisphäre während des Holozäns"}

{"en"=>"This is the central project of the _Davis Group_. My contributions to\nthis project are the development of new methods on how we can reconstruct\nthe climate from hundreds of thousands of pollen samples, as well as\nfurther analysis of the dataset, e.g. the reconstruction of atmospheric\nteleconnection patterns, such as the North Atlantic Oscillation (NAO).\nAnother task is to develop a web-viewer in order to share our database\nwith the rest of the scientific community.\n", "de"=>"Dies ist das zentrale Projekt der _Davis Gruppe_. Mein Beitrag zu diesem\nProjekt ist die Entwicklung neuer Methoden zur Rekonstruktion des Klimas\nbasierend auf hundertausenden Pollenproben. Außerdem nutze ich die\nenstandende Rekonstruktion für weitergehende Analysen, wie zum Beispiel\ndie Entwicklung atmosphärischer Stukturen der oberen Atmosphäre, z.B. der\nNordatlantische Oszillation. Eine weitere Aufgabe ist die Entwicklung\neines Internetportals über das wir unsere Datenbank mit dem Rest der\nWelt teilen können.\n"}

straditize

straditize

{"en"=>"A software for a semi-automatic digitization of pollen diagrams or other types of stratigraphic diagrams using the command line or a graphical user interface.", "de"=>"Software für die semi-automatische Digitalisierung von Pollendiagrammen und anderen stratigrafischen Diagrammen das per Kommandozeile oder per graphischer Benutzeroberfläche bedient werden kann"}

{"en"=>"STRADITIZE (Stratigraphic Diagram Digitizer) is an open-source program\nthat allows stratigraphic figures to be digitized in a single\nsemi-automated operation. It is designed to detect multiple plots of\nvariables analyzed along the same vertical axis, whether this is a\nsediment core or any similar depth/time series.\n\nMore at [straditize.readthedocs.io](https://straditize.readthedocs.io).\n", "de"=>"STRADITIZE (Stratigraphic Diagram Digitizer) ist ein Open Source Programm\nmit dem stratigraphische Diagramme in einem einzigen semi-automatischen\nVorgang digitalisiert werden können. Es ist darauf ausgerichted, mehrere\nTeildiagramme zu entdecken welche die selbe vertikale Axis teilen.\n\nMehr Informtation gibt es auf\n[straditize.readthedocs.io](https://straditize.readthedocs.io).\n"}

...   ...
EMPD

EMPD

{"en"=>"The Eurasian Modern Pollen Database", "de"=>"Die 'Eurasian Modern Pollen Database'"}


GWGEN

{"en"=>"A global weather generator for daily data", "de"=>"Ein globaler Wettergenerator für tägliche Wetterdaten"}

{"en"=>"This synthesis of FORTRAN and Python is a globally applicable weather\ngenerator parameterized through a global dataset of weather station data\nwith more than 50 million individual daily records. It downscales wind\nspeed, precipitation, temperature, and cloud cover from monthly to daily\nresolution.\n\nMore at [arve-research.gihub.io/gwgen](https://arve-research.github.io/gwgen/)\n", "de"=>"Diese Synthese aus FORTRAN und Python ist ein global anwendbarer\nWettergenerator der anhand eines globalen Datensatzes täglicher\nWetterstationsdaten mit über 50 Millionen individueller Aufzeichnungen\nparameterisiert wurde. Er skaliert Windgeschwindigkeit, Niederschlag,\nTemperatur und Bewölkung von monatlicher auf tägliche Auflösung.\n\nMehr informationen gibt es auf\n[arve-research.gihub.io/gwgen](https://arve-research.github.io/gwgen/).\n"}

...   ...

IUCm

{"en"=>"A model to simulate urban growth and transformation with the objective of minimising the energy required for transportation.", "de"=>"Ein Model zur Simulation urbaner Wachstums- und Transformationsprozesse bei gleichzeitiger Minimierung von benötigter Transport-Energie"}

{"en"=>"The Integrated Urban Complexity model (IUCm) is a relatively simple\nprobablistic computational model to compute 'climate-smart urban forms',\nthat cut down emissions related to energy consumption from urban mobility.\n\nMore at [iucm.readthedocs.io](https://iucm.readthedocs.io).\n", "de"=>"Das 'Integrated Urban Complexity model (IUCm)' is ein relativ einfaches\nprobabilistisches Computermodell um sogennante 'climate-smart urban forms'\nzu berechnen. Diese verringern transport-bedingte Emissionen durch den\nEnergieverbrauch von urbaner Mobilität.\n\nMehr Informtation gibt es auf\n[iucm.readthedocs.io](https://iucm.readthedocs.io).\n"}

...   ...

docrep

{"en"=>"Python package for docstring repetition", "de"=>"Python Programm für die Wiederverwertung von Dokumentationen"}

{"en"=>"The documentation repetition module (docrep) targets developpers that\ndevelop complex and nested Python APIs and helps them to create a\nwell-documented software.\n\nMore at [docrep.readthedocs.io](https://docrep.readthedocs.io).\n", "de"=>"Das 'documentation repetition module (docrep)' ist ein Python Programm\nfür die Entwickler komplexer und vernetzter Python Programme. Es hilft\nihnen gut dokumentierte Software zu erstellen durch die Wiederverwertung\nvon bisher bestehender Dokumentation.\n\nMehr Information gibt es auf\n[docrep.readthedocs.io](https://docrep.readthedocs.io).\n"}

...   ...


Publications

Peer-reviewed

  1. Akhtar, N., Geyer, B., Rockel, B., Sommer, P. S., & Schrum, C. (2021). Accelerating deployment of offshore wind energy alter wind climate and reduce future power generation potentials. Scientific Reports, 11(1), 11826. https://doi.org/10.1038/s41598-021-91283-3
    Abstract

    The European Union has set ambitious CO2 reduction targets, stimulating renewable energy production and accelerating deployment of offshore wind energy in northern European waters, mainly the North Sea. With increasing size and clustering, offshore wind farms (OWFs) wake effects, which alter wind conditions and decrease the power generation efficiency of wind farms downwind become more important. We use a high-resolution regional climate model with implemented wind farm parameterizations to explore offshore wind energy production limits in the North Sea. We simulate near future wind farm scenarios considering existing and planned OWFs in the North Sea and assess power generation losses and wind variations due to wind farm wake. The annual mean wind speed deficit within a wind farm can reach 2-2.5 ms−1 depending on the wind farm geometry. The mean deficit, which decreases with distance, can extend 35-40 km downwind during prevailing southwesterly winds. Wind speed deficits are highest during spring (mainly March-April) and lowest during November-December. The large-size of wind farms and their proximity affect not only the performance of its downwind turbines but also that of neighboring downwind farms, reducing the capacity factor by 20% or more, which increases energy production costs and economic losses. We conclude that wind energy can be a limited resource in the North Sea. The limits and potentials for optimization need to be considered in climate mitigation strategies and cross-national optimization of offshore energy production plans are inevitable.

  2. Kadow, C., Illing, S., Lucio-Eceiza, E. E., Bergemann, M., Ramadoss, M., Sommer, P. S., Kunst, O., Schartner, T., Pankatz, K., Grieger, J., Schuster, M., Richling, A., Thiemann, H., Kirchner, I., Rust, H. W., Ludwig, T., Cubasch, U., & Ulbrich, U. (2021). Introduction to Freva – A Free Evaluation System Framework for Earth System Modeling. 9. https://doi.org/10.5334/jors.253
    Abstract

    Freva – Free Evaluation System Framework for Earth system modeling is an efficient solution to handle evaluation systems of research projects, institutes or universities in the climate community. It is a scientific software framework for high performance computing that provides all its available features both in a shell and web environment. The main system design is equipped with the programming interface, history of evaluations, and a standardized model database. Plugin – a generic application programming interface allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language. History – the configuration sub-system stores every analysis performed with the evaluation system in a database. Databrowser – an implemented meta data system with its advanced but easy-to-handle search tool supports scientists and their plugins to retrieve the required information of the database. The combination of these three core components, increases the scientific outcome and enables transparency and reproducibility for research groups using Freva as their framework for evaluation of Earth system models.

  3. Kaufman, D., McKay, N., Routson, C., Erb, M., Davis, B., Heiri, O., Jaccard, S., Tierney, J., Dätwyler, C., Axford, Y., Brussel, T., Cartapanis, O., Chase, B., Dawson, A., de Vernal, A., Engels, S., Jonkers, L., Marsicek, J., Moffa-Sánchez, P., … Zhilich, S. (2020). A global database of Holocene paleotemperature records. Scientific Data, 7(1), 115. https://doi.org/10.1038/s41597-020-0445-3
    Abstract

    A comprehensive database of paleoclimate records is needed to place recent warming into the longer-term context of natural climate variability. We present a global compilation of quality-controlled, published, temperature-sensitive proxy records extending back 12,000 years through the Holocene. Data were compiled from 679 sites where time series cover at least 4000 years, are resolved at sub-millennial scale (median spacing of 400 years or finer) and have at least one age control point every 3000 years, with cut-off values slackened in data-sparse regions. The data derive from lake sediment (51%), marine sediment (31%), peat (11%), glacier ice (3%), and other natural archives. The database contains 1319 records, including 157 from the Southern Hemisphere. The multi-proxy database comprises paleotemperature time series based on ecological assemblages, as well as biophysical and geochemical indicators that reflect mean annual or seasonal temperatures, as encoded in the database. This database can be used to reconstruct the spatiotemporal evolution of Holocene temperature at global to regional scales, and is publicly available in Linked Paleo Data (LiPD) format.

  4. Kaufman, D., McKay, N., Routson, C., Erb, M., Dätwyler, C., Sommer, P. S., Heiri, O., & Davis, B. (2020). Holocene global mean surface temperature, a multi-method reconstruction approach. Scientific Data, 7(1), 201. https://doi.org/10.1038/s41597-020-0530-7
    Abstract

    An extensive new multi-proxy database of paleo-temperature time series (Temperature 12k) enables a more robust analysis of global mean surface temperature (GMST) and associated uncertainties than was previously available. We applied five different statistical methods to reconstruct the GMST of the past 12,000 years (Holocene). Each method used different approaches to averaging the globally distributed time series and to characterizing various sources of uncertainty, including proxy temperature, chronology and methodological choices. The results were aggregated to generate a multi-method ensemble of plausible GMST and latitudinal-zone temperature reconstructions with a realistic range of uncertainties. The warmest 200-year-long interval took place around 6500 years ago when GMST was 0.7 °C (0.3, 1.8) warmer than the 19th Century (median, 5th, 95th percentiles). Following the Holocene global thermal maximum, GMST cooled at an average rate −0.08 °C per 1000 years (−0.24, −0.05). The multi-method ensembles and the code used to generate them highlight the utility of the Temperature 12k database, and they are now available for future use by studies aimed at understanding Holocene evolution of the Earth system.

  5. Davis, B. A. S., Chevalier, M., Sommer, P., Carter, V. A., Finsinger, W., Mauri, A., Phelps, L. N., Zanon, M., Abegglen, R., Åkesson, C. M., Alba-Sánchez, F., Anderson, R. S., Antipina, T. G., Atanassova, J. R., Beer, R., Belyanina, N. I., Blyakharchuk, T. A., Borisova, O. K., Bozilova, E., … Zimny, M. (2020). The Eurasian Modern Pollen Database (EMPD), version 2. Earth System Science Data, 12(4), 2423–2445. https://doi.org/10.5194/essd-12-2423-2020
    Abstract

    The Eurasian (née European) Modern Pollen Database (EMPD) was established in 2013 to provide a public database of high-quality modern pollen surface samples to help support studies of past climate, land cover, and land use using fossil pollen. The EMPD is part of, and complementary to, the European Pollen Database (EPD) which contains data on fossil pollen found in Late Quaternary sedimentary archives throughout the Eurasian region. The EPD is in turn part of the rapidly growing Neotoma database, which is now the primary home for global palaeoecological data. This paper describes version 2 of the EMPD in which the number of samples held in the database has been increased by 60 % from 4826 to 8134. Much of the improvement in data coverage has come from northern Asia, and the database has consequently been renamed the Eurasian Modern Pollen Database to reflect this geographical enlargement. The EMPD can be viewed online using a dedicated map-based viewer at https://empd2.github.io and downloaded in a variety of file formats at https://doi.pangaea.de/10.1594/PANGAEA.909130 (Chevalier et al., 2019).

  6. Chevalier, M., Davis, B. A. S., Heiri, O., Seppä, H., Chase, B. M., Gajewski, K., Lacourse, T., Telford, R. J., Finsinger, W., Guiot, J., Kühl, N., Maezumi, S. Y., Tipton, J. R., Carter, V. A., Brussel, T., Phelps, L. N., Dawson, A., Zanon, M., Vallé, F., … Kupriyanov, D. (2020). Pollen-based climate reconstruction techniques for late Quaternary studies. Earth-Science Reviews, 210, 103384. https://doi.org/10.1016/j.earscirev.2020.103384
    Abstract

    Fossil pollen records are well-established indicators of past vegetation changes. The prevalence of pollen across environmental settings including lakes, wetlands, and marine sediments, has made palynology one of the most ubiquitous and valuable tools for studying past environmental and climatic change globally for decades. A complementary research focus has been the development of statistical techniques to derive quantitative estimates of climatic conditions from pollen assemblages. This paper reviews the most commonly used statistical techniques and their rationale and seeks to provide a resource to facilitate their inclusion in more palaeoclimatic research. To this end, we first address the fundamental aspects of fossil pollen data that should be considered when undertaking pollen-based climate reconstructions. We then introduce the range of techniques currently available, the history of their development, and the situations in which they can be best employed. We review the literature on how to define robust calibration datasets, produce high-quality reconstructions, and evaluate climate reconstructions, and suggest methods and products that could be developed to facilitate accessibility and global usability. To continue to foster the development and inclusion of pollen climate reconstruction methods, we promote the development of reporting standards. When established, such standards should 1) enable broader application of climate reconstruction techniques, especially in regions where such methods are currently underused, and 2) enable the evaluation and reproduction of individual reconstructions, structuring them for the evolving open-science era, and optimising the use of fossil pollen data as a vital means for the study of past environmental and climatic variability. We also strongly encourage developers and users of palaeoclimate reconstruction methodologies to make associated programming code publicly available, which will further help disseminate these techniques to interested communities.

  7. Cremades, R., & Sommer, P. S. (2019). Computing climate-smart urban land use with the Integrated Urban Complexity model (IUCm 1.0). Geoscientific Model Development, 12(1), 525–539. https://doi.org/10.5194/gmd-12-525-2019
    Abstract

    Cities are fundamental to climate change mitigation, and although there is increasing understanding about the relationship between emissions and urban form, this relationship has not been used to provide planning advice for urban land use so far. Here we present the Integrated Urban Complexity model (IUCm 1.0) that computes “climate-smart urban forms”, which are able to cut emissions related to energy consumption from urban mobility in half. Furthermore, we show the complex features that go beyond the normal debates about urban sprawl vs. compactness. Our results show how to reinforce fractal hierarchies and population density clusters within climate risk constraints to significantly decrease the energy consumption of urban mobility. The new model that we present aims to produce new advice about how cities can combat climate change.

  8. Sommer, P., Rech, D., Chevalier, M., & Davis, B. (2019). straditize: Digitizing stratigraphic diagrams. Journal of Open Source Software, 4(34), 1216. https://doi.org/10.21105/joss.01216
  9. Weitzel, N., Wagner, S., Sjolte, J., Klockmann, M., Bothe, O., Andres, H., Tarasov, L., Rehfeld, K., Zorita, E., Widmann, M., Sommer, P., Schädler, G., Ludwig, P., Kapp, F., Jonkers, L., García-Pintado, J., Fuhrmann, F., Dolman, A., Dallmeyer, A., & Brücher, T. (2018). Diving into the past – A paleo data-model comparison workshop on the Late Glacial and Holocene. Bulletin of the American Meteorological Society. https://doi.org/10.1175/bams-d-18-0169.1
    Abstract

    An international group of approximately 30 scientists with background and expertise in global and regional climate modeling, statistics, and climate proxy data discussed the state of the art, progress, and challenges in comparing global and regional climate simulations to paleoclimate data and reconstructions. The group focused on achieving robust comparisons in view of the uncertainties associated with simulations and paleo data.

  10. Sommer, P. S. (2017). The psyplot interactive visualization framework. The Journal of Open Source Software, 2(16). https://doi.org/10.21105/joss.00363
  11. Sommer, P. S., & Kaplan, J. O. (2017). A globally calibrated scheme for generating daily meteorology from monthly statistics: Global-WGEN (GWGEN) v1.0. Geosci. Model Dev., 10(10), 3771–3791. https://doi.org/10.5194/gmd-10-3771-2017
    Abstract

    While a wide range of Earth system processes occur at daily and even subdaily timescales, many global vegetation and other terrestrial dynamics models historically used monthly meteorological forcing both to reduce computational demand and because global datasets were lacking. Recently, dynamic land surface modeling has moved towards resolving daily and subdaily processes, and global datasets containing daily and subdaily meteorology have become available. These meteorological datasets, however, cover only the instrumental era of the last approximately 120 years at best, are subject to considerable uncertainty, and represent extremely large data files with associated computational costs of data input/output and file transfer. For periods before the recent past or in the future, global meteorological forcing can be provided by climate model output, but the quality of these data at high temporal resolution is low, particularly for daily precipitation frequency and amount. Here, we present GWGEN, a globally applicable statistical weather generator for the temporal downscaling of monthly climatology to daily meteorology. Our weather generator is parameterized using a global meteorological database and simulates daily values of five common variables: minimum and maximum temperature, precipitation, cloud cover, and wind speed. GWGEN is lightweight, modular, and requires a minimal set of monthly mean variables as input. The weather generator may be used in a range of applications, for example, in global vegetation, crop, soil erosion, or hydrological models. While GWGEN does not currently perform spatially autocorrelated multi-point downscaling of daily weather, this additional functionality could be implemented in future versions.

Conference contributions

  1. Sommer, P. S., Baldewein, L., Takyar, H., Chaudhary, R., Hadizadeh, M., Dibeh, H., Böcke, M., Lorenz, C., Dinter, T., Pinkernell, S., Getzlaff, K., & Kleeberg, U. (2023, May). ESM Data Exploration with the Model Data Explorer. https://doi.org/10.5194/egusphere-egu23-3624
    Abstract

    Making Earth-System-Model (ESM) Data accessible is challenging due to the large amount of data that we are facing in this realm. The upload is time-consuming, expensive, technically complex, and every institution has their own procedures. Non-ESM experts face a lot of problems and pure data portals are hardly usable for inter- and trans-disciplinary communication of ESM data and findings, as this level of accessibility often requires specialized web or computing services. With the Model Data Explorer, we want to simplify the generation of web services from ESM data, and we provide a framework that allows us to make the raw model data accessible to non-ESM experts. Our decentralized framework implements the possibility for an efficient remote processing of distributed ESM data. Users interface with an intuitive map-based front-end to compute spatial or temporal aggregations, or select regions to download the data. The data generators (i.e. the scientist with access to the raw data) use a light-weight and secure python library based on the Data Analytics Software Framework (DASF, https://digital-earth.pages.geomar.de/dasf/dasf-messaging-python) to create a back-end module. This back-end module runs close to the data, e.g. on the HPC-resource where the data is stored. Upon request, the module generates and provides the required data for the users in the web front-end. Our approach is intended for scientists and scientific usage! We aim for a framework where web-based communication of model-driven data science can be maintained by the scientific community. The Model Data Explorer ensures fair reward for the scientific work and adherence to the FAIR principles without too much overhead and loss in scientific accuracy. The Model Data Explorer is in the progress of development at the Helmholtz-Zentrum Hereon, together with multiple scientific and data management partners in other German research centers. The full list of contributors is constantly updated and can be accessed at https://model-data-explorer.readthedocs.io.

  2. Sommer, P. S., Geyer, B., Steger, C., Söding, E., Pörsch, A., Baldewein, L., & Kleeberg, U. (2023). The DJango Academic Community Platform (DJAC): An open-source website for transparent discussion, collaboration and management in an academic community. https://doi.org/10.5281/zenodo.7660330
    Abstract

    Collaboration Platforms for harmonization and building a shared understanding of communities are essential components in today’s academic environment. With the help of modern software tools and advancing digitization, our communities can improve collaboration via event, project and file management, and communication. The variety of tasks and tools needed in interdisciplinary communities, however, pose a considerable obstacle for community members. We see them in the administration, and especially when on-boarding new members with different levels of experience (from student to senior scientist). Here, user-friendly, technical support is needed. We are involved in many communities, particularly in the Climate Limited-area Modelling Community (CLM-Community) and the Helmholtz Metadata Collaboration (HMC). With the input of these (and more) communities, we are currently working on the DJAC-Platform, an open-source, Python (Django)-based website. DJAC manages communities from a single institute to an (inter-)national community with hundreds and more participating research institutions. DJAC is available at codebase.helmholtz.cloud.

  3. Sommer, P. S., Wichert, V., Eggert, D., Dinter, T., Getzlaff, K., Lehmann, A., Werner, C., Silva, B., Schmidt, L., & Schäfer, A. (2021). A new distributed data analysis framework for better scientific collaborations. International Series of Online Research Software Events (SORSE). https://doi.org/10.5281/zenodo.4575652
    Abstract

    A common challenge for projects with multiple involved research institutes is a well-defined and productive collaboration. All parties measure and analyze different aspects, depend on each other, share common methods, and exchange the latest results, findings, and data. Today this exchange is often impeded by a lack of ready access to shared computing and storage resources. In our talk, we present a new and innovative remote procedure call (RPC) framework. We focus on a distributed setup, where project partners do not necessarily work at the same institute, and do not have access to each others resources. We present an application programming interface (API) developed in Python that enables scientists to collaboratively explore and analyze sets of distributed data. It offers the functionality to request remote data through a comfortable interface, and to share and invoke single computational methods or even entire analytical workflows and their results. The prototype enables researchers to make their methods accessible as a backend module running on their own infrastructure. Hence researchers from other institutes may apply the available methods through a lightweight python or Javascript API. In the end, the overhead for both, the backend developer and the remote user, is very low. The effort of implementing the necessary workflow and API usage equalizes the writing of code in a non-distributed setup. Besides that, data do not have to be downloaded locally, the analysis can be executed "close to the data" while using the institutional infrastructure where the eligible data set is stored. With our prototype, we demonstrate distributed data access and analysis workflows across institutional borders to enable effective scientific collaboration. This framework has been developed in a joint effort of the DataHub and Digitial Earth initiatives within the Research Centers of the Helmholtz Association of German Research Centres, HGF.

  4. Sommer, P. S. (2021, December). Interactive visualization of climate model data via Python or GUI with psyplot. https://doi.org/10.5194/dach2022-280
    Abstract

    psyplot (https://psyplot.github.io) is an open-source data visualization framework that integrates rich computational and mathematical software packages (such as xarray and matplotlib) into a flexible framework for visualization. It differs from most of the visual analytic software such that it focuses on extensibility in order to flexibly tackle the different types of analysis questions that arise in pioneering research. The design of the high-level API of the framework enables a simple and standardized usage from the command-line, python scripts or Jupyter notebooks. A modular plugin framework enables a flexible development of the framework that can potentially go into many different directions. The additional enhancement with a graphical user interface (GUI) makes it the only visualization framework that can be handled from the convenient command-line or scripts, as well as via point-click handling. It additionally allows to build further desktop applications on top of the existing framework. In this presentation, I will show the main functionalities of psyplot, with a special focus on the visualization of unstructured grids (such as the ICON model by the German Weather Service (DWD)), and the usage of psyplot on the HPC facilities of the DKRZ (mistral, jupyterhub, remote desktop, etc.). My demonstration will cover the basic structure of the psyplot framework and how to use psyplot in python scripts (and Jupyter notebooks). I will demonstrate a quick demo of to the psyplot GUI and psy-view, a ncview-like interface built upon psyplot, and talk about different features such as reusing plot configurations and exporting figures.

  5. Sommer, P. S., Petrik, R., Geyer, B., Kleeberg, U., Sauer, D., Baldewein, L., Luckey, R., Möller, L., Dibeh, H., & Kadow, C. (2020, March). Integrating Model Evaluation and Observations into a Production-Release Pipeline. https://doi.org/10.5194/egusphere-egu2020-19298
    Abstract

    The complexity of Earth System and Regional Climate Models represents a considerable challenge for developers. Tuning but also improving one aspect of a model can unexpectedly decrease the performance of others and introduces hidden errors. Reasons are in particular the multitude of output parameters and the shortage of reliable and complete observational datasets. One possibility to overcome these issues is a rigorous and continuous scientific evaluation of the model. This requires standardized model output and, most notably, standardized observational datasets. Additionally, in order to reduce the extra burden for the single scientist, this evaluation has to be as close as possible to the standard workflow of the researcher, and it needs to be flexible enough to adapt it to new scientific questions. We present the Free Evaluation System Framework (Freva) implementation within the Helmholtz Coastal Data Center (HCDC) at the Institute of Coastal Research in the Helmholtz-Zentrum Geesthacht (HZG). Various plugins into the Freva software, namely the HZG-EvaSuite, use observational data to perform a standardized evaluation of the model simulation. We present a comprehensive data management infrastructure that copes with the heterogeneity of observations and simulations. This web framework comprises a FAIR and standardized database of both, large-scale and in-situ observations exported to a format suitable for data-model intercomparisons (particularly netCDF following the CF-conventions). Our pipeline links the raw data of the individual model simulations (i.e. the production of the results) to the finally published results (i.e. the released data). Another benefit of the Freva-based evaluation is the enhanced exchange between the different compartments of the institute, particularly between the model developers and the data collectors, as Freva contains built-in functionalities to share and discuss results with colleagues. We will furthermore use the tool to strengthen the active communication with the data and software managers of the institute to generate or adapt the evaluation plugins.

  6. Sommer, P. S., Davis, B. A. S., & Chevalier, M. (2019). Github and Open Research Data; an example using the Eurasian Modern Pollen Database. EGU General Assembly Conference Abstracts, 21, 5669. https://meetingorganizer.copernicus.org/EGU2019/EGU2019-5669.pdf
    Abstract

    Established in 2011, the Eurasian Modern Pollen Database (EMPD) is a standardized, fully documented and quality-controlled dataset of over 8000 modern pollen samples which can be openly accessed, and to which scientists can also contribute and help maintain. The database has recently been upgraded to include an intuitive client-based JavaScript web-interface hosted on the version control system Github, allowing data and metadata to be accessed and viewed using a clickable map. We present how we manage the FAIR principles, such as well-documented access and handling of data and metadata using the free Github services for open source development, as well as other critical points for open research data, such as data accreditation and referencing. Our community-based framework allows automated and transparent quality checks through continuous integration, fast and intuitive access to the data, as well as transparency for data contributors and users concerning changes and bugs in the EMPD. Furthermore, it allows a stable and long-lasting access to the web interface (and the data) without any funding requirements for servers or the risk of security holes.

  7. Sommer, P. S., Davis, B. A. S., Chevalier, M., Ni, J., & Tipton, J. (2019). The HORNET project: applying ’big data’ to reconstruct the climate of the Northern Hemisphere during the Holocene. 20th Congress of the International Union for Quaternary Research (INQUA). https://app.oxfordabstracts.com/events/574/program-app/submission/94623
    Abstract

    Pollen data remains one of the most widely geographically distributed, publicly accessible and most thoroughly documented sources of quantitative palaeoclimate data. It represents one of the primary terrestrial proxies in understanding the spatial pattern of past climate change at centennial to millennial timescales, and a great example of ’big data’ in the palaeoclimate sciences. The HORNET project is based on the synthesis and analysis of thousands of fossil and modern pollen samples to create a spatially and seasonally explicit record of climate change covering the whole Northern Hemisphere over the last 12,000 years, using a common reconstruction and error accounting methodology. This type of study has been made possible only through long-term community led efforts to advance the availability of ’open big data’, and represents a good example of what can now be achieved within this new paradigm. Primary pollen data for the HORNET project was collected not only from open public databases such as Neotoma, Pangaea and the European Pollen Database, but also by encouraging individual scientists and research groups to share their data for the purposes of the project and these open databases, and through the use of specifically developed digitisation tools which can bring previously inaccessible data into this open digital world. The resulting project database includes over 3000 fossil pollen sites, as well as 16000 modern pollen samples for use in the pollen-climate calibration transfer-function. Building and managing such a large database has been a considerable challenge that has been met primarily through the application and development of open source software, which provide important cost and resource effective tools for the analysis of open data. The HORNET database can be interfaced through a newly developed, simple, freely accessible, and intuitive clickable map based web interface. This interface, hosted on the version control system Github, has been used mainly for quality control, method development and sharing the results and source database. Additionally, it provides the opportunity for other applications such as the comparison with other reconstructions based on other proxies, which we have also included in the database. We present the challenges in building and sharing such a large open database within the typically limited resources and funding that most scientific projects operate. Pollen data remains one of the most widely geographically distributed, publicly accessible and most thoroughly documented sources of quantitative palaeoclimate data. It represents one of the primary terrestrial proxies in understanding the spatial pattern of past climate change at centennial to millennial timescales, and a great example of ’big data’ in the palaeoclimate sciences. The HORNET project is based on the synthesis and analysis of thousands of fossil and modern pollen samples to create a spatially and seasonally explicit record of climate change covering the whole Northern Hemisphere over the last 12,000 years, using a common reconstruction and error accounting methodology. This type of study has been made possible only through long-term community led efforts to advance the availability of ’open big data’, and represents a good example of what can now be achieved within this new paradigm. Primary pollen data for the HORNET project was collected not only from open public databases such as Neotoma, Pangaea and the European Pollen Database, but also by encouraging individual scientists and research groups to share their data for the purposes of the project and these open databases, and through the use of specifically developed digitisation tools which can bring previously inaccessible data into this open digital world. The resulting project database includes over 3000 fossil pollen sites, as well as 16000 modern pollen samples for use in the pollen-climate calibration transfer-function. Building and managing such a large database has been a considerable challenge that has been met primarily through the application and development of open source software, which provide important cost and resource effective tools for the analysis of open data. The HORNET database can be interfaced through a newly developed, simple, freely accessible, and intuitive clickable map based web interface. This interface, hosted on the version control system Github, has been used mainly for quality control, method development and sharing the results and source database. Additionally, it provides the opportunity for other applications such as the comparison with other reconstructions based on other proxies, which we have also included in the database. We present the challenges in building and sharing such a large open database within the typically limited resources and funding that most scientific projects operate.

  8. Sommer, P. S. (2018). Psyplot: Interactive data analysis and visualization with Python. EGU General Assembly Conference Abstracts, 20, 4701. http://adsabs.harvard.edu/abs/2018EGUGA..20.4701S
    Abstract

    The development, usage and analysis of climate models often requires the visualization of the data. This visualization should ideally be nice looking, simple in application, fast, easy reproducible and flexible. There exist a wide range of software tools to visualize model data which however often lack in their ability of being (easy) scriptable, have low flexibility or simply are far too complex for a quick look into the data. Therefore, we developed the open-source visualization framework psyplot that aims to cover the visualization in the daily work of earth system scientists working with data of the climate system. It is build (mainly) upon the python packages matplotlib, cartopy and xarray and integrates the visualization process into data analysis. This data can either be stored in a NetCDF, GeoTIFF, or any other format that is handled by the xarray package. Due to its interactive nature however, it may also be used with data that is currently processed and not already stored on the hard disk. Visualizations of rastered data on the glob are supported for rectangular grids (following or not following the CF Conventions) or on a triangular grid (following the CF Conventions (like the earth system model ICON) or the unstructured grid conventions (UGRID)). Furthermore, the package visualizes scalar and vector fields, enables to easily manage and format multiple plots at the same time. Psyplot can either be used with only a few lines of code from the command line in an interactive python session, via python scripts or from through a graphical user interface (GUI). Finally, the framework developed in this package enables a very flexible configuration, an easy integration into other scripts using matplotlib.

  9. Sommer, P. S., Davis, B. A. S., & Chevalier, M. (2018). STRADITIZE: An open-source program for digitizing pollen diagrams and other types of stratigraphic data. EGU General Assembly Conference Abstracts, 20, 4433. http://adsabs.harvard.edu/abs/2018EGUGA..20.4433S
    Abstract

    In an age of digital data analysis, gaining access to data from the pre-digital era - or any data that is only available as a figure on a page - remains a problem and an under-utilized scientific resource. Whilst there are numerous programs available that allow the digitization of scientific data in a simple x-y graph format, we know of no semi-automated program that can deal with data plotted with multiple horizontal axes that share the same vertical axis, such as pollen diagrams and other stratigraphic figures that are common in the Earth sciences. STRADITIZE (Stratigraphic Diagram Digitizer) is a new open-source program that allows stratigraphic figures to be digitized in a single semi-automated operation. It is designed to detect multiple plots of variables analyzed along the same vertical axis, whether this is a sediment core or any similar depth/time series. The program is written in python and supports mixtures of many different diagram types, such as bar plots, line plots, as well as shaded, stacked, and filled area plots. The package provides an extensively documented graphical user interface for a point-and-click handling of the semi-automatic process, but can also be scripted or used from the command line. Other features of STRADITIZE include text recognition to interpret the names of the different plotted variables, the automatic and semi-automatic recognition of picture artifacts, as well an automatic measurement finder to exactly reproduce the data that has been used to create the diagram. Evaluation of the program has been undertaken comparing the digitization of published figures with the original digital data. This generally shows very good results, although this is inevitably reliant on the quality and resolution of the original figure.

  10. Sommer, P. S., Chevalier, M., & Davis, B. A. S. (2018, July). STRADITIZE: An open-source program for digitizing pollen diagrams and other types of stratigraphic data. AFQUA - The African Quaternary. https://afquacongress.wixsite.com/afqua2018
    Abstract

    Straditize (Stratigraphic Diagram Digitizer) is a new open-source program that allows stratigraphic diagrams to be digitized in a single semi-automated operation. It is specifically designed for figures that have multiple horizontal axes plotted against a shared vertical axis (e.g. depth/age), such as pollen diagrams.

  11. Sommer, P., & Kaplan, J. (2017). Quantitative Modeling of Human-Environment Interactions in Preindustrial Time. PAGES OSM 2017, Abstract Book, 129–129.
  12. Sommer, P., & Kaplan, J. (2016). Fundamental statistical relationships between monthly and daily meteorological variables: Temporal downscaling of weather based on a global observational dataset. EGU General Assembly Conference Abstracts, 18, EPSC2016–18183. http://adsabs.harvard.edu/abs/2016EGUGA..1818183S
    Abstract

    Accurate modelling of large-scale vegetation dynamics, hydrology, and other environmental processes requires meteorological forcing on daily timescales. While meteorological data with high temporal resolution is becoming increasingly available, simulations for the future or distant past are limited by lack of data and poor performance of climate models, e.g., in simulating daily precipitation. To overcome these limitations, we may temporally downscale monthly summary data to a daily time step using a weather generator. Parameterization of such statistical models has traditionally been based on a limited number of observations. Recent developments in the archiving, distribution, and analysis of "big data" datasets provide new opportunities for the parameterization of a temporal downscaling model that is applicable over a wide range of climates. Here we parameterize a WGEN-type weather generator using more than 50 million individual daily meteorological observations, from over 10’000 stations covering all continents, based on the Global Historical Climatology Network (GHCN) and Synoptic Cloud Reports (EECRA) databases. Using the resulting "universal" parameterization and driven by monthly summaries, we downscale mean temperature (minimum and maximum), cloud cover, and total precipitation, to daily estimates. We apply a hybrid gamma-generalized Pareto distribution to calculate daily precipitation amounts, which overcomes much of the inability of earlier weather generators to simulate high amounts of daily precipitation. Our globally parameterized weather generator has numerous applications, including vegetation and crop modelling for paleoenvironmental studies.

  13. Sommer, P. (2016). Psyplot: Visualizing rectangular and triangular Climate Model Data with Python. EGU General Assembly Conference Abstracts, 18, 18185. http://adsabs.harvard.edu/abs/2016EGUGA..1818185S
    Abstract

    The development and use of climate models often requires the visualization of geo-referenced data. Creating visualizations should be fast, attractive, flexible, easily applicable and easily reproducible. There is a wide range of software tools available for visualizing raster data, but they often are inaccessible to many users (e.g. because they are difficult to use in a script or have low flexibility). In order to facilitate easy visualization of geo-referenced data, we developed a new framework called "psyplot," which can aid earth system scientists with their daily work. It is purely written in the programming language Python and primarily built upon the python packages matplotlib, cartopy and xray. The package can visualize data stored on the hard disk (e.g. NetCDF, GeoTIFF, any other file format supported by the xray package), or directly from the memory or Climate Data Operators (CDOs). Furthermore, data can be visualized on a rectangular grid (following or not following the CF Conventions) and on a triangular grid (following the CF or UGRID Conventions). Psyplot visualizes 2D scalar and vector fields, enabling the user to easily manage and format multiple plots at the same time, and to export the plots into all common picture formats and movies covered by the matplotlib package. The package can currently be used in an interactive python session or in python scripts, and will soon be developed for use with a graphical user interface (GUI). Finally, the psyplot framework enables flexible configuration, allows easy integration into other scripts that uses matplotlib, and provides a flexible foundation for further development.

  14. Sommer, P., & Kaplan, J. (2016, May). Fundamental statistical relationships between monthly and daily meteorological variables: Temporal downscaling of weather based on a global observational dataset. Workshop on Stochastic Weather Generators. https://www.lebesgue.fr/content/sem2016-climate-program
    Abstract

    Accurate modelling of large-scale vegetation dynamics, hydrology, and otherenvironmental processes requires meteorological forcing on daily timescales. Whilemeteorological data with high temporal resolution is becoming increasingly available,simulations for the future or distant past are limited by lack of data and poor perfor-mance of climate models, e.g., in simulating daily precipitation. To overcome theselimitations, we may temporally downscale monthly summary data to a daily timestep using a weather generator. Parameterization of such statistical models has tradi-tionally been based on a limited number of observations. Recent developments in thearchiving, distribution, and analysis of big data datasets provide new opportunities forthe parameterization of a temporal downscaling model that is applicable over a widerange of climates. Here we parameterize a WGEN-type weather generator using morethan 50 million individual daily meteorological observations, from over 10’000 stationscovering all continents, based on the Global Historical Climatology Network (GHCN)and Synoptic Cloud Reports (EECRA) databases. Using the resulting “universal”parameterization and driven by monthly summaries, we downscale mean temperature(minimum and maximum), cloud cover, and total precipitation, to daily estimates.We apply a hybrid gamma-generalized Pareto distribution to calculate daily precipi-tation amounts, which overcomes much of the inability of earlier weather generatorsto simulate high amounts of daily precipitation. Our globally parameterized weathergenerator has numerous applications, including vegetation and crop modelling for pa-leoenvironmental studies.