Database APIs

APIs for different linguistic databases can be accessed with lingtypology.db_apis.

In [1]:
import lingtypology.db_apis

1. General

Lingtypology attempts to provide unified API for given language databases. Therefore, classes in this module share some common attributes and methods. In this paragraph I will describe them and provide examples for Autotyp, Wals and Phoible.

In [2]:
from lingtypology.db_apis import Autotyp, Wals, Phoible

1.1. features_list

You can get the list of available features from the database using this attribute.

In [3]:
Autotyp().features_list[:10] #It's cutoff in order not to take took much space
Out[3]:
['Agreement',
 'Alienability',
 'Alignment',
 'Alignment_case_splits',
 'Alignment_per_language',
 'Clause_linkage',
 'Clause_word_order',
 'Clusivity',
 'GR_per_language',
 'Gender']

Note: Phoible has no features_list attribute because there are no features. However, it has subsets_list that shows list of available subsets of Phoible data.

In [4]:
Phoible().subsets_list
Out[4]:
['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON']

1.2. get_df and get_json

These two methods access the database and return data as pandas.Series or dict. Example of usage:

In [5]:
Autotyp('Agreement', 'Clusivity').get_df().head()
Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0
Out[5]:
language LID VPolyagreement.Presence.v2 VPolyagreement.Presence.v1 InclExclAsPerson.Presence InclExclAny.Presence InclExclType InclExclAsMinAug.Presence
0 Ambulas 6 False False False False no i/e False
1 Abkhazian 7 True True False False no i/e False
2 Achinese 9 True False False True plain i/e type False
3 Western Keres 10 True True False False no i/e False
4 Hokkaido Ainu 12 True True False True plain i/e type False

Note: for Phoible and Autotyp you can use strip_na parameter (list, default: []) to strip rows in which there is empty cell in the given columns. Compare the following.
No strip_na (empty cells are replaced with '~N/A~'):

In [6]:
Phoible().get_df().head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[6]:
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 KOREAN (UPSID 423) Korean (37.5, 128.0) kore1280 Eurasia 32 21 11 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2170.html https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
3 KET (UPSID 399) Ket (63.7551, 87.5466) kett1243 Eurasia 25 18 7 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2706.html https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252

tones column given to strip_na:

In [7]:
Phoible().get_df(strip_na=['tones']).head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[7]:
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
6 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
8 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302

Note: By default when you call get_df or get_json it prints the citation. If you want to disable it, you shoud set the show_citation to False.

In [8]:
p = Phoible()
p.show_citation = False
p.get_df(strip_na=['tones']).head()
Out[8]:
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
6 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
8 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302

1.3. citation

You can get the citation for each database using citation attribute. E.g.:

In [9]:
from lingtypology.db_apis import Autotyp
print(Autotyp().citation)
Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0

Note: if you use Wals, citation will be shown for every feature. If you want general citation for the whole Wals, use general_citation.

In [10]:
w = Wals('1a', '2a')
print(w.citation)
Citation for feature 1A:
Ian Maddieson. 2013. Consonant Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/1, Accessed on 2019-06-04.)

Citation for feature 2A:
Ian Maddieson. 2013. Vowel Quality Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/2, Accessed on 2019-06-04.)


In [11]:
print(w.general_citation)
Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013.
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info, Accessed on 2019-06-04.)

2. Wals

It is possible to access Wals data (online) using lingtypology.db_apis.Wals

In [12]:
from lingtypology.db_apis import Wals
In [13]:
wals_page = Wals('1a', '2a').get_df()
wals_page.head()
Citation for feature 1A:
Ian Maddieson. 2013. Consonant Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/1, Accessed on 2019-06-04.)

Citation for feature 2A:
Ian Maddieson. 2013. Vowel Quality Inventories.
In: Dryer, Matthew S. & Haspelmath, Martin (eds.)
The World Atlas of Language Structures Online.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://wals.info/chapter/2, Accessed on 2019-06-04.)

Out[13]:
wals_code language genus family coordinates _1A_area _1A _1A_num _1A_desc _2A_area _2A _2A_num _2A_desc
0 kiw Kiwai (Southern) Kiwaian Kiwaian (-8.0, 143.5) Phonology 1. Small 1 Small Phonology 2. Average (5-6) 2 Average (5-6)
1 xoo !Xóõ Tu Tu (-24.0, 21.5) Phonology 5. Large 5 Large Phonology 2. Average (5-6) 2 Average (5-6)
2 ani //Ani Khoe-Kwadi Khoe-Kwadi (-18.9166666667, 21.9166666667) Phonology 5. Large 5 Large Phonology 2. Average (5-6) 2 Average (5-6)
3 abi Abipón South Guaicuruan Guaicuruan (-29.0, -61.0) Phonology 2. Moderately small 2 Moderately small Phonology 2. Average (5-6) 2 Average (5-6)
4 abk Abkhaz Northwest Caucasian Northwest Caucasian (43.0833333333, 41.0) Phonology 5. Large 5 Large Phonology 1. Small (2-4) 1 Small (2-4)

Map example for feature 1A:

In [14]:
m = lingtypology.LingMap(wals_page.language)
m.add_custom_coordinates(wals_page.coordinates)
m.add_features(wals_page._1A)
m.legend_title = 'Consonant Inventory'
m.colors = lingtypology.gradient(5, 'yellow', 'green')
m.create_map()
Out[14]:

3. Autotyp

It is possible to access Autotyp data (online) using lingtypology.db_apis.

Unlike in Wals, each new tablename passed into Autotyp gives several additional columns:

In [15]:
Autotyp_table = Autotyp('Gender', 'Agreement').get_df(strip_na=['Gender.binned4'])
Autotyp_table.head()
Bickel, Balthasar, Johanna Nichols, Taras Zakharko,
Alena Witzlack-Makarevich, Kristine Hildebrandt, Michael Rießler,
Lennart Bierkandt, Fernando Zúñiga & John B. Lowe.
2017. The AUTOTYP typological databases.
Version 0.1.0 https://github.com/autotyp/autotyp-data/tree/0.1.0
Out[15]:
language LID Gender.n Gender.binned4 Gender.Presence VPolyagreement.Presence.v2 VPolyagreement.Presence.v1
0 Godoberi 1531 3 3 genders True False False
1 Bininj Kun-Wok 655 4 4 genders True True True
2 Luvale 553 10 more than 4 genders True True False
3 North-Central Dargwa 2949 3 3 genders True True True
4 Gaagudju 82 4 4 genders True True True

Now we can draw a map out of gender data from multiple languages.

In [16]:
m = lingtypology.LingMap(Autotyp_table.language)
m.add_features(Autotyp_table['Gender.binned4'])
m.colors = lingtypology.gradient(4, color1='yellow', color2='red')
m.legend_title = 'Genders'
m.create_map()
Out[16]:

4. AfBo

In [17]:
from lingtypology.db_apis import AfBo
In [18]:
adj = AfBo('adjectivizer').get_df()
adj.head()
Seifart, Frank. 2013.
AfBo: A world-wide survey of affix borrowing.
Leipzig: Max Planck Institute for Evolutionary Anthropology.
(Available online at http://afbo.info, Accessed on 2019-06-04.)
Out[18]:
language_recipient language_donor reliability adjectivizer
0 Resígaro Bora high 0
1 Gurindji Kriol Gurindji high 0
2 Copper Island Aleut Russian high 0
3 Sakha Mongolian high 4
4 Kalderash Romani Romanian high 1
In [19]:
m = lingtypology.LingMap(adj.language_recipient)
m.add_features(adj['adjectivizer'], numeric=True)
m.legend_title = 'Adj'
m.create_map()
Out[19]:

5. SAILS

In [20]:
from lingtypology.db_apis import Sails

To get a pandas.DataFrame of features and descriptions:

In [21]:
Sails().features_descriptions.head()
Out[21]:
Feature Description
0 ICU17 Is plurality in independent pronouns expressed...
1 ICU16 Is plurality in independent pronouns expressed...
2 ICU15 Is plurality in independent pronouns expressed...
3 ICU14 Is an associative or collective plural disting...
4 ICU13 Are nouns denoting inanimates marked for plural?

Get description for particular features:

In [22]:
Sails().feature_descriptions('ICU10', 'ICU11')
Out[22]:
Feature Description
0 ICU10 Is nominal plural marking obligatory?
1 ICU11 Are nouns denoting humans marked for plural?

To get the SAILS data as dict, you can use get_json method. To get data as pandas.DataFrame you can run:

In [23]:
sails = Sails('ICU3', 'ICU4')
df = sails.get_df()
df.head()
You probably should cite it, but I don't understand how. Please, consult https://sails.clld.org/
Out[23]:
language coordinates ICU3 ICU3_desc ICU4 ICU4_desc
0 Baniva <zip object at 0x7fc233a96788> ~N/A~ ~N/A~ ~N/A~ ~N/A~
1 Apolista <zip object at 0x7fc233a96788> ~N/A~ ~N/A~ ~N/A~ ~N/A~
2 Yavitero <zip object at 0x7fc233a96788> ~N/A~ ~N/A~ ~N/A~ ~N/A~
3 Resígaro <zip object at 0x7fc233a96788> ~N/A~ ~N/A~ ~N/A~ ~N/A~
4 Tol <zip object at 0x7fc233a96788> ~N/A~ ~N/A~ ~N/A~ ~N/A~

Map example:

In [24]:
m = lingtypology.LingMap(df.language)
m.add_features(df.ICU3_desc)
m.legend_title = sails.feature_descriptions('ICU3').Description.at[0]
m.start_location = (9, -79)
m.start_zoom = 5
m.legend_position = 'bottomleft'
m.create_map()
Out[24]:

6. Phoible

In [25]:
from lingtypology.db_apis import Phoible

Unlike in other databases you do not pass features into Phoible. You should pass the subset. Take a look:

In [26]:
p = Phoible()
p.get_df().head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[26]:
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 KOREAN (UPSID 423) Korean (37.5, 128.0) kore1280 Eurasia 32 21 11 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2170.html https://phoible.org/languages/kore1280
2 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
3 KET (UPSID 399) Ket (63.7551, 87.5466) kett1243 Eurasia 25 18 7 ~N/A~ http://web.phonetik.uni-frankfurt.de/L/L2706.html https://phoible.org/languages/kett1243
4 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252

There are several entries for different languages: it happens because Phoible data consists of several different subsets. You can get the list of available subsets:

In [27]:
p.subsets_list
Out[27]:
['all', 'UPSID', 'SPA', 'AA', 'PH', 'GM', 'RA', 'SAPHON']

... and pass them into the class:

In [28]:
p = Phoible(subset='SPA')
df = p.get_df(strip_na=['tones'])
df.head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[28]:
contribution_name language coordinates glottocode macroarea phonemes consonants vowels tones source inventory_page
0 Korean (SPA 1) Korean (37.5, 128.0) kore1280 Eurasia 40 22 18 0 https://archive.org/details/kor_SPA1979_phon https://phoible.org/languages/kore1280
1 Ket (SPA 2) Ket (63.7551, 87.5466) kett1243 Eurasia 32 18 14 0 https://archive.org/details/ket_SPA1979_phon https://phoible.org/languages/kett1243
2 Lak (SPA 3) Lak (42.1328, 47.0809) lakk1252 Eurasia 69 60 9 0 https://archive.org/details/lbe_SPA1979_phon https://phoible.org/languages/lakk1252
3 Kabardian (SPA 4) Kabardian (43.5082, 43.3918) kaba1278 Eurasia 56 49 7 0 https://archive.org/details/kbd_SPA1979_phon https://phoible.org/languages/kaba1278
4 Georgian (SPA 5) Georgian (41.850396999999994, 43.78613) nucl1302 Eurasia 35 29 6 0 https://archive.org/details/kat_SPA1979_phon https://phoible.org/languages/nucl1302

You can also get non-aggregated data by setting aggregated to False while initializing the class.

In [29]:
Phoible(aggregated=False).get_df().head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[29]:
InventoryID Glottocode ISO6393 LanguageName SpecificDialect GlyphID Phoneme Allophones Marginal SegmentClass ... retractedTongueRoot advancedTongueRoot periodicGlottalSource epilaryngealSource spreadGlottis constrictedGlottis fortis raisedLarynxEjective loweredLarynxImplosive click
0 1 kore1280 kor Korean ~N/A~ 0061 a a ~N/A~ vowel ... - - + - - - 0 - - 0
1 1 kore1280 kor Korean ~N/A~ 0061+02D0 ~N/A~ vowel ... - - + - - - 0 - - 0
2 1 kore1280 kor Korean ~N/A~ 00E6 æ ɛ æ ~N/A~ vowel ... - - + - - - 0 - - 0
3 1 kore1280 kor Korean ~N/A~ 00E6+02D0 æː æː ~N/A~ vowel ... - - + - - - 0 - - 0
4 1 kore1280 kor Korean ~N/A~ 0065 e e ~N/A~ vowel ... - - + - - - 0 - - 0

5 rows × 48 columns

Map example:

In [30]:
m = lingtypology.LingMap(df.language)
m.colormap_colors = ('white', 'red')
m.add_features(df.tones, numeric=True)
m.start_zoom = 1
m.legend_title = 'Tones'
m.create_map()
Out[30]:

Another example (slow due to large amount of data):

In [31]:
df = Phoible(subset='UPSID', aggregated=False).get_df()
#Get all languages with ejectives
df = df[df.raisedLarynxEjective == '+']
#Remove duplicates
df = df.drop_duplicates(subset='Glottocode')
df.head()
Moran, Steven & McCloy, Daniel (eds.) 2019.
PHOIBLE 2.0.
Jena: Max Planck Institute for the Science of Human History.
(Available online at http://phoible.org, Accessed on 2019-06-04.)
Out[31]:
InventoryID Glottocode ISO6393 LanguageName SpecificDialect GlyphID Phoneme Allophones Marginal SegmentClass ... retractedTongueRoot advancedTongueRoot periodicGlottalSource epilaryngealSource spreadGlottis constrictedGlottis fortis raisedLarynxEjective loweredLarynxImplosive click
7570 198 afad1236 aal KOTOKO ~N/A~ 0063+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
7802 206 ahte1237 aht AHTNA ~N/A~ 006B+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
7920 211 qawa1238 alc QAWASQAR ~N/A~ 006B+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
8131 218 hame1242 amf HAMER ~N/A~ 0071+02BC ~N/A~ False consonant ... 0 0 - - - + - + - -
8157 219 amha1245 amh AMHARIC ~N/A~ 006B+02B7+02BC kʷʼ ~N/A~ False consonant ... 0 0 - - - + - + - -

5 rows × 48 columns

In [32]:
m = lingtypology.LingMap(df.Glottocode, glottocode=True)
m.title = 'Languages with Ejectives'
m.create_map()
Out[32]: