Welcome to 𝓔𝓿𝓸Sem

𝓔𝓿𝓸Sem is a scientific project meant to explore the “Evolving Semantics” at play in the world's languages. It brings in one place the vast knowledge acquired by generations of scholars in the domain of etymology, for a variety of language families. Our purpose is to observe empirically the way languages have built semantic connections between concepts, through the historical evolution of their lexicons.

As of January 2026, 𝓔𝓿𝓸Sem features a total of 30,359 concepts, expressed by 260,022 words from 3,345 languages. These words descend from 29,664 etyma from 170 protolanguages.
 


Here is how you can cite the 𝓔𝓿𝓸Sem database:

Alexandre François, Mathieu Dehouck, Konstantin Henke & Siva Kalyan. () 𝓔𝓿𝓸Sem: A database of dialexification across language families. Online database. Lattice & HéLiCéO, CNRS — ENS‒PSL, Paris. https://tiny.cc/EvoSem [access date: ]

If you wish to know more about 𝓔𝓿𝓸Sem — why and how it was created, or how to read its graphs and tables — you can read our paper:

Mathieu Dehouck, Alexandre François, Siva Kalyan, Martial Pastor & David Kletz. (2023) pdf 𝓔𝓿𝓸Sem: A database of polysemous cognate sets. In Nina Tahmasebi et al. (conv.), Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, 66–75. Singapore. Association for Computational Linguistics.

 


Download area

The data underlying 𝓔𝓿𝓸Sem can be downloaded from this zipped folder:

EvoSem dataset, version 1.0 (March 2025). The Zip folder contains a “ReadMe” file with instructions on how to use that dataset.