Welcome to 𝓔𝓿𝓸Sem

𝓔𝓿𝓸Sem is a scientific project meant to explore the “Evolving Semantics” at play in the world's languages. It brings in one place the vast knowledge acquired by generations of scholars in the domain of etymology, for a variety of language families. Our purpose is to observe empirically the way languages have built semantic connections between concepts, through the historical evolution of their lexicons.

As of January 2025, 𝓔𝓿𝓸Sem features a total of 29,552 concepts, expressed by 214,858 words from 3,258 languages. These words descend from 24,916 etyma from 135 protolanguages.
 


Here is how you can cite the 𝓔𝓿𝓸Sem database:

Alexandre François, Siva Kalyan, Mathieu Dehouck, Martial Pastor & David Kletz. () 𝓔𝓿𝓸Sem: A database of dialexification across language families. Online database. CNRS—LaTTiCe, Paris. https://tiny.cc/EvoSem [access date: ]

If you wish to know more about 𝓔𝓿𝓸Sem — why and how it was created, or how to read its graphs and tables — you can read our paper:

Mathieu Dehouck, Alexandre François, Siva Kalyan, Martial Pastor & David Kletz. (2023) pdf 𝓔𝓿𝓸Sem: A database of polysemous cognate sets. In Nina Tahmasebi et al. (conv.), Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, 66–75. Singapore. Association for Computational Linguistics.

 


Download area

The data underlying 𝓔𝓿𝓸Sem can be downloaded from this zipped folder:

EvoSem dataset, version 1.0 (March 2025). The Zip folder contains a “ReadMe” file with instructions on how to use that dataset.