About Me

Philipp S. Sommer

Philipp S. Sommer

Reseach software engineer for climate science

  • climate models
  • numerics
  • python
  • open science
  • palaeo
  • pollen
  • statistics
  • physics

My Career

During my Bachelor I realized that I really want to do research on how our world is evolving. This wish was even stronger than my joy about the beauty of maths in physics. I have strong theoretical skills so I started coding and to develop climate models. This is now the way to go for me and since then I rejoice in data analysis and visualization. We can extract so much new knowledge out of these large data sets through visual and statistical exploration. Hence I am always keen to learn and develop new techniques and to share them with others.

University of Lausanne, Institute of Earth Surface Dynamics (IDYST)

Numerical Tools and Software Solutions for Palaeoclimate Analysis

Dec. 2015
PhD student

Max-Planck-Institute for Meteorology/University of Hamburg

Master in Integrated Climate System Sciences

Sep. 2013
Master student

Greenpeace Germany e.V.

Biodiversity and Climate team

Sep. 2012
Intern

University of Heidelberg

Bachelor in Physics

Sep. 2009
Bachelor student

My Skills

My Projects

HORNET

HORNET

Holocene Climate Reconstruction for the Northern Hemisphere Extra-tropics

A key scientific objective of this SNF funded project is using data to identify the relative contribution of the summer and winter seasons to Northern Hemisphere interglacial warming, and particularly the relative role of an orbitally driven increase in insolation in summer, and a dynamically driven increase in the pole-ward heat flux in winter. We investigate the role of atmospheric dynamics by comparison with comparable patterns of regional climate anomalies generated by modern analogue circulation patterns. The project will make available a high quality gridded and seasonally resolved reconstruction of Northern Hemisphere climate change during the Holocene that will provide a state-of-the-art baseline for climate model evaluation, and particularly to evaluate the ability of models to reproduce regional climate change, which remains a key uncertainty in simulations of future climate change.

psyplot

psyplot

Python framework for interactive data visualization

psyplot is a cross-platform open source python project that mainly combines the plotting utilities of matplotlib and the data management of the xarray package and integrates them into a software that can be used via command-line and via a GUI. It forms the basis for many of my data analysis tasks.
The main purpose is to have a framework that allows a fast, attractive, flexible, easily applicable, easily reproducible and especially an interactive visualization and analysis of data.
Through various plugins psyplot visualizes georeferenced climate model data on rectangular and unstructured grids, regressions, stratigraphic diagrams and more. The plugins are:
  • psy-simple for simple visualizations
  • psy-maps for visualizations of georeferenced data
  • psy-reg for the visualization of statistical fits
  • psyplot-gui for a graphical user interface
  • psy-strat for the visualization of stratigraphic data, such as pollen diagrams
Examples are available at psyplot.readthedocs.io.

...   ...
straditize

straditize

A software for a semi-automatic digitization of pollen diagrams or other types of stratigraphic diagrams using the command line or a graphical user interface.

STRADITIZE (Stratigraphic Diagram Digitizer) is an open-source program that allows stratigraphic figures to be digitized in a single semi-automated operation. It is designed to detect multiple plots of variables analyzed along the same vertical axis, whether this is a sediment core or any similar depth/time series.
More at straditize.readthedocs.io.

...   ...

GWGEN

A global weather generator for daily data

This synthesis of FORTRAN and Python is a globally applicable weather generator parameterized through a global dataset of weather station data with more than 50 million individual daily records. It downscales wind speed, precipitation, temperature, and cloud cover from monthly to daily resolution.
More at arve-research.gihub.io/gwgen

...   ...

IUCm

A model to simulate urban growth and transformation with the objective of minimising the energy required for transportation.

The Integrated Urban Complexity model (IUCm) is a relatively simple probablistic computational model to compute “climate-smart urban forms”, that cut down emissions related to energy consumption from urban mobility.
More at iucm.readthedocs.io

...   ...

docrep

Python package for docstring repetition

The documentation repetition module (docrep) targets developpers that develop complex and nested Python APIs and helps them to create a well-documented software.
More at docrep.readthedocs.io

...   ...
sphinx-nbexamples

sphinx-nbexamples

Include Jupyter notebooks in a sphinx documentation

This python package creates an example gallery for documentations generated by pythons sphinx package out of a bunch of jupyter notebooks.
More examples at sphinx-nbexamples.readthedocs.io

...   ...

model-organization

Python package for a transparent organization of computational models


...   ...

funcargparse

Python package to create command line parsers from docstrings


...   ...


Publications

Follow me on ResearchGate

Peer-reviewed

  1. Cremades, R., & Sommer, P. S. (2019). Computing climate-smart urban land use with the Integrated Urban Complexity model (IUCm 1.0). Geoscientific Model Development, 12(1), 525–539. https://doi.org/10.5194/gmd-12-525-2019
    Abstract

    Cities are fundamental to climate change mitigation, and although there is increasing understanding about the relationship between emissions and urban form, this relationship has not been used to provide planning advice for urban land use so far. Here we present the Integrated Urban Complexity model (IUCm 1.0) that computes “climate-smart urban forms”, which are able to cut emissions related to energy consumption from urban mobility in half. Furthermore, we show the complex features that go beyond the normal debates about urban sprawl vs. compactness. Our results show how to reinforce fractal hierarchies and population density clusters within climate risk constraints to significantly decrease the energy consumption of urban mobility. The new model that we present aims to produce new advice about how cities can combat climate change.

  2. Sommer, P., Rech, D., Chevalier, M., & Davis, B. (2019). straditize: Digitizing stratigraphic diagrams. Journal of Open Source Software, 4(34), 1216. https://doi.org/10.21105/joss.01216
  3. Weitzel, N., Wagner, S., Sjolte, J., Klockmann, M., Bothe, O., Andres, H., … Brücher, T. (2018). Diving into the past – A paleo data-model comparison workshop on the Late Glacial and Holocene. Bulletin of the American Meteorological Society. https://doi.org/10.1175/bams-d-18-0169.1
    Abstract

    An international group of approximately 30 scientists with background and expertise in global and regional climate modeling, statistics, and climate proxy data discussed the state of the art, progress, and challenges in comparing global and regional climate simulations to paleoclimate data and reconstructions. The group focused on achieving robust comparisons in view of the uncertainties associated with simulations and paleo data.

  4. Sommer, P. S. (2017). The psyplot interactive visualization framework. The Journal of Open Source Software, 2(16). https://doi.org/10.21105/joss.00363
  5. Sommer, P. S., & Kaplan, J. O. (2017). A globally calibrated scheme for generating daily meteorology from monthly statistics: Global-WGEN (GWGEN) v1.0. Geosci. Model Dev., 10(10), 3771–3791. https://doi.org/10.5194/gmd-10-3771-2017
    Abstract

    While a wide range of Earth system processes occur at daily and even subdaily timescales, many global vegetation and other terrestrial dynamics models historically used monthly meteorological forcing both to reduce computational demand and because global datasets were lacking. Recently, dynamic land surface modeling has moved towards resolving daily and subdaily processes, and global datasets containing daily and subdaily meteorology have become available. These meteorological datasets, however, cover only the instrumental era of the last approximately 120 years at best, are subject to considerable uncertainty, and represent extremely large data files with associated computational costs of data input/output and file transfer. For periods before the recent past or in the future, global meteorological forcing can be provided by climate model output, but the quality of these data at high temporal resolution is low, particularly for daily precipitation frequency and amount. Here, we present GWGEN, a globally applicable statistical weather generator for the temporal downscaling of monthly climatology to daily meteorology. Our weather generator is parameterized using a global meteorological database and simulates daily values of five common variables: minimum and maximum temperature, precipitation, cloud cover, and wind speed. GWGEN is lightweight, modular, and requires a minimal set of monthly mean variables as input. The weather generator may be used in a range of applications, for example, in global vegetation, crop, soil erosion, or hydrological models. While GWGEN does not currently perform spatially autocorrelated multi-point downscaling of daily weather, this additional functionality could be implemented in future versions.

Conference contributions

  1. Sommer, P. S., Davis, B. A. S., & Chevalier, M. (2019). Github and Open Research Data; an example using the Eurasian Modern Pollen Database. In EGU General Assembly Conference Abstracts (Vol. 21, p. 5669). Retrieved from https://meetingorganizer.copernicus.org/EGU2019/EGU2019-5669.pdf
    Abstract

    Established in 2011, the Eurasian Modern Pollen Database (EMPD) is a standardized, fully documented and quality-controlled dataset of over 8000 modern pollen samples which can be openly accessed, and to which scientists can also contribute and help maintain. The database has recently been upgraded to include an intuitive client-based JavaScript web-interface hosted on the version control system Github, allowing data and metadata to be accessed and viewed using a clickable map. We present how we manage the FAIR principles, such as well-documented access and handling of data and metadata using the free Github services for open source development, as well as other critical points for open research data, such as data accreditation and referencing. Our community-based framework allows automated and transparent quality checks through continuous integration, fast and intuitive access to the data, as well as transparency for data contributors and users concerning changes and bugs in the EMPD. Furthermore, it allows a stable and long-lasting access to the web interface (and the data) without any funding requirements for servers or the risk of security holes.

  2. Sommer, P. S., Davis, B. A. S., Chevalier, M., Ni, J., & Tipton, J. (2019). The HORNET project: applying ’big data’ to reconstruct the climate of the Northern Hemisphere during the Holocene. In 20th Congress of the International Union for Quaternary Research (INQUA). International Union for Quaternary Research. Retrieved from https://app.oxfordabstracts.com/events/574/program-app/submission/94623
    Abstract

    Pollen data remains one of the most widely geographically distributed, publicly accessible and most thoroughly documented sources of quantitative palaeoclimate data. It represents one of the primary terrestrial proxies in understanding the spatial pattern of past climate change at centennial to millennial timescales, and a great example of ’big data’ in the palaeoclimate sciences. The HORNET project is based on the synthesis and analysis of thousands of fossil and modern pollen samples to create a spatially and seasonally explicit record of climate change covering the whole Northern Hemisphere over the last 12,000 years, using a common reconstruction and error accounting methodology. This type of study has been made possible only through long-term community led efforts to advance the availability of ’open big data’, and represents a good example of what can now be achieved within this new paradigm. Primary pollen data for the HORNET project was collected not only from open public databases such as Neotoma, Pangaea and the European Pollen Database, but also by encouraging individual scientists and research groups to share their data for the purposes of the project and these open databases, and through the use of specifically developed digitisation tools which can bring previously inaccessible data into this open digital world. The resulting project database includes over 3000 fossil pollen sites, as well as 16000 modern pollen samples for use in the pollen-climate calibration transfer-function. Building and managing such a large database has been a considerable challenge that has been met primarily through the application and development of open source software, which provide important cost and resource effective tools for the analysis of open data. The HORNET database can be interfaced through a newly developed, simple, freely accessible, and intuitive clickable map based web interface. This interface, hosted on the version control system Github, has been used mainly for quality control, method development and sharing the results and source database. Additionally, it provides the opportunity for other applications such as the comparison with other reconstructions based on other proxies, which we have also included in the database. We present the challenges in building and sharing such a large open database within the typically limited resources and funding that most scientific projects operate. Pollen data remains one of the most widely geographically distributed, publicly accessible and most thoroughly documented sources of quantitative palaeoclimate data. It represents one of the primary terrestrial proxies in understanding the spatial pattern of past climate change at centennial to millennial timescales, and a great example of ’big data’ in the palaeoclimate sciences. The HORNET project is based on the synthesis and analysis of thousands of fossil and modern pollen samples to create a spatially and seasonally explicit record of climate change covering the whole Northern Hemisphere over the last 12,000 years, using a common reconstruction and error accounting methodology. This type of study has been made possible only through long-term community led efforts to advance the availability of ’open big data’, and represents a good example of what can now be achieved within this new paradigm. Primary pollen data for the HORNET project was collected not only from open public databases such as Neotoma, Pangaea and the European Pollen Database, but also by encouraging individual scientists and research groups to share their data for the purposes of the project and these open databases, and through the use of specifically developed digitisation tools which can bring previously inaccessible data into this open digital world. The resulting project database includes over 3000 fossil pollen sites, as well as 16000 modern pollen samples for use in the pollen-climate calibration transfer-function. Building and managing such a large database has been a considerable challenge that has been met primarily through the application and development of open source software, which provide important cost and resource effective tools for the analysis of open data. The HORNET database can be interfaced through a newly developed, simple, freely accessible, and intuitive clickable map based web interface. This interface, hosted on the version control system Github, has been used mainly for quality control, method development and sharing the results and source database. Additionally, it provides the opportunity for other applications such as the comparison with other reconstructions based on other proxies, which we have also included in the database. We present the challenges in building and sharing such a large open database within the typically limited resources and funding that most scientific projects operate.

  3. Sommer, P. S. (2018). Psyplot: Interactive data analysis and visualization with Python. In EGU General Assembly Conference Abstracts (Vol. 20, p. 4701). Retrieved from http://adsabs.harvard.edu/abs/2018EGUGA..20.4701S
    Abstract

    The development, usage and analysis of climate models often requires the visualization of the data. This visualization should ideally be nice looking, simple in application, fast, easy reproducible and flexible. There exist a wide range of software tools to visualize model data which however often lack in their ability of being (easy) scriptable, have low flexibility or simply are far too complex for a quick look into the data. Therefore, we developed the open-source visualization framework psyplot that aims to cover the visualization in the daily work of earth system scientists working with data of the climate system. It is build (mainly) upon the python packages matplotlib, cartopy and xarray and integrates the visualization process into data analysis. This data can either be stored in a NetCDF, GeoTIFF, or any other format that is handled by the xarray package. Due to its interactive nature however, it may also be used with data that is currently processed and not already stored on the hard disk. Visualizations of rastered data on the glob are supported for rectangular grids (following or not following the CF Conventions) or on a triangular grid (following the CF Conventions (like the earth system model ICON) or the unstructured grid conventions (UGRID)). Furthermore, the package visualizes scalar and vector fields, enables to easily manage and format multiple plots at the same time. Psyplot can either be used with only a few lines of code from the command line in an interactive python session, via python scripts or from through a graphical user interface (GUI). Finally, the framework developed in this package enables a very flexible configuration, an easy integration into other scripts using matplotlib.

  4. Sommer, P. S., Davis, B. A. S., & Chevalier, M. (2018). STRADITIZE: An open-source program for digitizing pollen diagrams and other types of stratigraphic data. In EGU General Assembly Conference Abstracts (Vol. 20, p. 4433). Retrieved from http://adsabs.harvard.edu/abs/2018EGUGA..20.4433S
    Abstract

    In an age of digital data analysis, gaining access to data from the pre-digital era - or any data that is only available as a figure on a page - remains a problem and an under-utilized scientific resource. Whilst there are numerous programs available that allow the digitization of scientific data in a simple x-y graph format, we know of no semi-automated program that can deal with data plotted with multiple horizontal axes that share the same vertical axis, such as pollen diagrams and other stratigraphic figures that are common in the Earth sciences. STRADITIZE (Stratigraphic Diagram Digitizer) is a new open-source program that allows stratigraphic figures to be digitized in a single semi-automated operation. It is designed to detect multiple plots of variables analyzed along the same vertical axis, whether this is a sediment core or any similar depth/time series. The program is written in python and supports mixtures of many different diagram types, such as bar plots, line plots, as well as shaded, stacked, and filled area plots. The package provides an extensively documented graphical user interface for a point-and-click handling of the semi-automatic process, but can also be scripted or used from the command line. Other features of STRADITIZE include text recognition to interpret the names of the different plotted variables, the automatic and semi-automatic recognition of picture artifacts, as well an automatic measurement finder to exactly reproduce the data that has been used to create the diagram. Evaluation of the program has been undertaken comparing the digitization of published figures with the original digital data. This generally shows very good results, although this is inevitably reliant on the quality and resolution of the original figure.

  5. Sommer, P. S., Chevalier, M., & Davis, B. A. S. (2018). STRADITIZE: An open-source program for digitizing pollen diagrams and other types of stratigraphic data. In AFQUA - The African Quaternary. Nairobi (Kenya): AFQUA. Retrieved from https://afquacongress.wixsite.com/afqua2018
    Abstract

    Straditize (Stratigraphic Diagram Digitizer) is a new open-source program that allows stratigraphic diagrams to be digitized in a single semi-automated operation. It is specifically designed for figures that have multiple horizontal axes plotted against a shared vertical axis (e.g. depth/age), such as pollen diagrams.

  6. Sommer, P., & Kaplan, J. (2017). Quantitative Modeling of Human-Environment Interactions in Preindustrial Time. In PAGES OSM 2017, Abstract Book (pp. 129–129).
  7. Sommer, P., & Kaplan, J. (2016). Fundamental statistical relationships between monthly and daily meteorological variables: Temporal downscaling of weather based on a global observational dataset. In EGU General Assembly Conference Abstracts (Vol. 18, pp. EPSC2016–18183). Retrieved from http://adsabs.harvard.edu/abs/2016EGUGA..1818183S
    Abstract

    Accurate modelling of large-scale vegetation dynamics, hydrology, and other environmental processes requires meteorological forcing on daily timescales. While meteorological data with high temporal resolution is becoming increasingly available, simulations for the future or distant past are limited by lack of data and poor performance of climate models, e.g., in simulating daily precipitation. To overcome these limitations, we may temporally downscale monthly summary data to a daily time step using a weather generator. Parameterization of such statistical models has traditionally been based on a limited number of observations. Recent developments in the archiving, distribution, and analysis of "big data" datasets provide new opportunities for the parameterization of a temporal downscaling model that is applicable over a wide range of climates. Here we parameterize a WGEN-type weather generator using more than 50 million individual daily meteorological observations, from over 10’000 stations covering all continents, based on the Global Historical Climatology Network (GHCN) and Synoptic Cloud Reports (EECRA) databases. Using the resulting "universal" parameterization and driven by monthly summaries, we downscale mean temperature (minimum and maximum), cloud cover, and total precipitation, to daily estimates. We apply a hybrid gamma-generalized Pareto distribution to calculate daily precipitation amounts, which overcomes much of the inability of earlier weather generators to simulate high amounts of daily precipitation. Our globally parameterized weather generator has numerous applications, including vegetation and crop modelling for paleoenvironmental studies.

  8. Sommer, P. (2016). Psyplot: Visualizing rectangular and triangular Climate Model Data with Python. In EGU General Assembly Conference Abstracts (Vol. 18, p. 18185). Retrieved from http://adsabs.harvard.edu/abs/2016EGUGA..1818185S
    Abstract

    The development and use of climate models often requires the visualization of geo-referenced data. Creating visualizations should be fast, attractive, flexible, easily applicable and easily reproducible. There is a wide range of software tools available for visualizing raster data, but they often are inaccessible to many users (e.g. because they are difficult to use in a script or have low flexibility). In order to facilitate easy visualization of geo-referenced data, we developed a new framework called "psyplot," which can aid earth system scientists with their daily work. It is purely written in the programming language Python and primarily built upon the python packages matplotlib, cartopy and xray. The package can visualize data stored on the hard disk (e.g. NetCDF, GeoTIFF, any other file format supported by the xray package), or directly from the memory or Climate Data Operators (CDOs). Furthermore, data can be visualized on a rectangular grid (following or not following the CF Conventions) and on a triangular grid (following the CF or UGRID Conventions). Psyplot visualizes 2D scalar and vector fields, enabling the user to easily manage and format multiple plots at the same time, and to export the plots into all common picture formats and movies covered by the matplotlib package. The package can currently be used in an interactive python session or in python scripts, and will soon be developed for use with a graphical user interface (GUI). Finally, the psyplot framework enables flexible configuration, allows easy integration into other scripts that uses matplotlib, and provides a flexible foundation for further development.

  9. Sommer, P., & Kaplan, J. (2016). Fundamental statistical relationships between monthly and daily meteorological variables: Temporal downscaling of weather based on a global observational dataset. In Workshop on Stochastic Weather Generators. Vannes (France): University of Bretagne Sud. Retrieved from https://www.lebesgue.fr/content/sem2016-climate-program
    Abstract

    Accurate modelling of large-scale vegetation dynamics, hydrology, and otherenvironmental processes requires meteorological forcing on daily timescales. Whilemeteorological data with high temporal resolution is becoming increasingly available,simulations for the future or distant past are limited by lack of data and poor perfor-mance of climate models, e.g., in simulating daily precipitation. To overcome theselimitations, we may temporally downscale monthly summary data to a daily timestep using a weather generator. Parameterization of such statistical models has tradi-tionally been based on a limited number of observations. Recent developments in thearchiving, distribution, and analysis of big data datasets provide new opportunities forthe parameterization of a temporal downscaling model that is applicable over a widerange of climates. Here we parameterize a WGEN-type weather generator using morethan 50 million individual daily meteorological observations, from over 10’000 stationscovering all continents, based on the Global Historical Climatology Network (GHCN)and Synoptic Cloud Reports (EECRA) databases. Using the resulting “universal”parameterization and driven by monthly summaries, we downscale mean temperature(minimum and maximum), cloud cover, and total precipitation, to daily estimates.We apply a hybrid gamma-generalized Pareto distribution to calculate daily precipi-tation amounts, which overcomes much of the inability of earlier weather generatorsto simulate high amounts of daily precipitation. Our globally parameterized weathergenerator has numerous applications, including vegetation and crop modelling for pa-leoenvironmental studies.