Download - Estructura Social y Lenguaje
-
8/3/2019 Estructura Social y Lenguaje
1/32
SOCIAL AND LINGUISTIC STRUCTURE 1
Language structure is partly determined by social structure.
Gary Lupyan
University of Pennsylvania
Rick Dale
University of Memphis
-
8/3/2019 Estructura Social y Lenguaje
2/32
SOCIAL AND LINGUISTIC STRUCTURE 2
Abstract
The languages of the world differ greatly both in their syntactic and morphological
systems and in the social and ecological environments in which they exist. In the present work,
we challenge the long-held assumption that language grammars are unrelated or only spuriously
related to the social environments in which they are found (Chomsky, 1995) 1. Based on a
statistical analysis of 2,236 languages, we report strong relationships between linguistic factors
related to morphological complexity and demographic/socio-historical factors such as the
number of language users, geographic spread, and degree of language contact. The analyses
suggest that languages spoken by large groups have simpler inflectional morphology than
languages spoken by smaller groups as measured on a variety of factors such as case systems and
complexity of conjugations. Additionally, languages spoken by large groups are much more
likely to use lexical strategies in place of inflectional morphology to encode evidentiality,
negation, aspect, and possession. These results are explained using principles borrowed from
-
8/3/2019 Estructura Social y Lenguaje
3/32
SOCIAL AND LINGUISTIC STRUCTURE 3
Although the most populous languages are spoken by millions of people spread over vast
geographic areas, most languages are spoken by relatively few individuals over comparatively
small areas. The median number of speakers for the 6,912 languages catalogued by the
Ethnologue is only 7000, compared to the mean of over 828,0002. Similarly, for the 2,236
languages in our sample (Figure 1), the median area over which a language is spoken is about the
size of Luxembourg or San Diego, California (948 km2). The mean area is about the size of
Austria or the US state of Maryland (33,795 km2). Languages also differ dramatically in the
proportion of individuals who speak the language natively (L1 speakers) to those who learned it
later in life (L2 speakers)Table 1.Although there are numerous counter-examples
(Supplementary Note 1), languages spoken by millions of people have a greater likelihood of
coming into contact with other languages and of having numerous nonnative speakers compared
to languages spoken by only a few thousand people. This is not surprising: a language spoken by
more people is more likely to encompass a larger and more diverse area and include speakers
-
8/3/2019 Estructura Social y Lenguaje
4/32
SOCIAL AND LINGUISTIC STRUCTURE 4
differ in the amount of information conveyed through inflectional morphology compared to the
amount of information conveyed through non grammatical devices such as word order and
lexical constructions, e.g., compare morphological marking aspect in Russian: Ya vypil chai (I
PERFECTIVE+drank tea) to the English lexical strategy: Ifinished drinkingthe tea). Other
domains exhibiting such differences between lexical and morphological strategies include tense,
aspect, evidentiality, negation, plurality, and expressions of possibility.
Languages with richer morphological systems are said to be more overspecified 5-7.For
instance, of the languages that encode the past tense inflectionally, about 20% have past tenses
that explicitly mark remoteness distinctions. For example Yagua, a language of Peru, has
inflections that differentiate 5 levels of remoteness. A verb denoting an event that happened only
a few hours ago takes the suffix jsiy; an event that happened a day previous to the utterance
requires a different suffix, -jay, an event that occurred a week to a month ago, a still different
suffix, -siy, etc.8. Of course, languages without these grammatical distinctions can express them
-
8/3/2019 Estructura Social y Lenguaje
5/32
SOCIAL AND LINGUISTIC STRUCTURE 5
The degree and specificity of inflectional encoding can reach astounding levels. In
Karoka language of Northwestern Californiawe find grammaticalized verbal suffixes for
various containmentpa:-kirih throw into fire,pa:-kurih throw into water,pa:-ruprih
throw in through a solid (the affixes are unrelated to the nouns water, fire, etc.)10. Clearly, such
elaboration does not arise from communicative necessity. Researchers have long been puzzled
by the reasons why some languages abound in such overspecification, while others eschew it,
particularly in cases of closely related languages. For example, in comparing English and
German we find that where the surface structures of English and German contrast, English tends
to leave more to context6, thus, German speakers are forced to make certain semantic
distinctions which can regularly be left unspecified in English (ref.6, p. 28). For example,
German obligatorily specifies the direction of motion in the place adverbs here/there/where.
Compare: hier/her; dort/hin; wo/wohin. English can specify direction using to and from (where
to versus wherefrom), but such specification is optional and is generally left to context11.
-
8/3/2019 Estructura Social y Lenguaje
6/32
SOCIAL AND LINGUISTIC STRUCTURE 6
learning are morphologically simpler, less redundant, and more regular/transparent2,6,11,12,13,37.
This argument has been made most forcefully and convincingly for Creole languages17, but it has
been speculated that any situation in which a language is learned by a substantial number of
adults becomes simplified due to the lousy language learning abilities of the human adult19.
The evidence for such linguistic simplification has been descriptive, consisting of selected
examples and examining grammatical inventories of a small number of languages6,12,19,21. Thus,
at present, there is no convincing evidence of global relationships between linguistic structure
and non-linguistic factors and no framework within which to understand such relationships. An
additional limitation of previous work is that it fails to explain why morphological complexity
and grammatical overspecification arise in the first place. That is, why arent all languages as
morphologically simple as those that have been argued to be heavily shaped by adult learning
(e.g., English11)?
The present work aims to: (1) establish whether non-spurious relationships exists
-
8/3/2019 Estructura Social y Lenguaje
7/32
SOCIAL AND LINGUISTIC STRUCTURE 7
speakers of esoteric languages are more likely to (1) be nonnative speakers or have learned the
language from nonnative speakers, (2) use the language to speak to outsidersindividuals from
different ethnic and/or linguistic backgrounds. The exoteric niche includes languages like
English, Swahili, and Hindi, while the esoteric niche includes languages like Tatar, Elfdalian,
and Algonquin. The analyses described below aim to test whether systematic relationships exist
between grammars and social contexts within which languages are spoken.
Methods
To assess relationships between social and linguistic structure we constructed a dataset
that combined social/demographic and typological information for 2,236 languages. The dataset
was constructed by combining typological data from the World Atlas of Language Structures
(WALS)24 with the following demographic and ecological variables: speaker population,
geographic spread, and number of linguistic neighbors. Because direct measures for the
-
8/3/2019 Estructura Social y Lenguaje
8/32
SOCIAL AND LINGUISTIC STRUCTURE 8
counting the number of languages whose global polygons are contained in, overlapping with, or
contacting a given languages polygons. For example, although English originates in the British
Isles, the fact that it is spoken in North America and Australia means that its neighbors include
the extant indigenous languages on those continents.
Selecting Typological Features for Analysis
Our analysis focused on typological factors most relevant to inflectional morphology
with particular emphasis on continuous variables such as the number of inflectional case
markings or the inflectional synthesis of verbsthe number of different types of information that
can be inflectionally encoded by verbal affixesmeasured in categories per word26
. An
additional guide for feature selection was the ability to make a priori predictions about the level
of morphological complexity of a given feature. For instance, plurality (feature 16) can be coded
using prefixes, suffixes, some combination of the two, a plural word, a plural clitic,
-
8/3/2019 Estructura Social y Lenguaje
9/32
SOCIAL AND LINGUISTIC STRUCTURE 9
Results
Table 2 shows the results of three models used to explore the relationships between
typological features, and measures of population, geographic spread, and degree of linguistic
contact (Supplementary Note 4). For most (20/26) of the WALS features that were most relevant
to inflectional morphology, demographic variables (population, area over which a language is
spoken, and degree of linguistic contact) combined with geographic covariates
(latitude/longitude) proved to be better predictors of the linguistic features than geographic
location alone (Supplementary Note 5). The results provide overwhelming evidence against the
null hypothesis that language structure is unrelated to demographic factors. Across a wide range
of linguistic features, a systematic relationship between demographic and typological variables
was found. Although the three demographic predictors are not independent (intercorrelations
range from .5 to .6), including all three predictors helps to ensure that linguistic-demographic
relationships are not spurious. We summarize the findings below (parenthetical numbers
-
8/3/2019 Estructura Social y Lenguaje
10/32
SOCIAL AND LINGUISTIC STRUCTURE 10
2. Contain fewer case markings (3), and have case systems with higher degree of casesyncretism (4) (further reducing the number of morphological distinctions).
Nominative/accusative alignment is more prevalent than ergative/absolutive alignment
(5).
3. Have fewer grammatical categories marked on the verb (6) and are less likely to haveidiosyncratic verbal morphology such as verbal person markings that alternate between
marking agent or patient depending on semantic context (7).
4. Are more likely to not possess noun/verb agreement or have agreement limited to agents(8) and are more likely to possess no person markings on adpositions (9). As with case
-
8/3/2019 Estructura Social y Lenguaje
11/32
SOCIAL AND LINGUISTIC STRUCTURE 11
(16). For languages with optional markers, analytic (word) strategies are more common
than inflectional strategies (affixes or clitics). (c) Are less likely to have a separate
associative plural (e.g., He and his friends) (17) (c) Are more likely to have a dedicated
question particle (18).
7. (a) Are lesslikely to encode the future tense morphologically (19) or possess remotenessdistinctions in the past tense (20). In contrast, languages spoken in the exoteric niche are
somewhat morelikely to mark the perfective/imperfective distinction in their morphology
(21), although this relationship disappears when language geography is particle out. (b)
Are more likely to mark singular imperatives on verbs using inflections than have no
morphological markings for imperatives at all, but are less likely to contain more
elaborate markings that differentiate between singular and plural imperatives (22). (c)
Are less likely to have inflections that mark possession (23), and the optative mood (24).
-
8/3/2019 Estructura Social y Lenguaje
12/32
SOCIAL AND LINGUISTIC STRUCTURE 12
relationship is particularly striking when averaged by the largest language families (Figure 3a,
Pearson r= .48) and by continents (Figure 3b, Pearson r= .96).
In a subsequent analysis, we constructed an overall complexity measure by adding up the
number of features for which each language relies on lexical versus morphological coding and
subtracting the total from 0 (Supplementary Note 6). There was a strong relationship between
complexity and speaker population, F(1,1246)=71.20, p
-
8/3/2019 Estructura Social y Lenguaje
13/32
SOCIAL AND LINGUISTIC STRUCTURE 13
As noted above, semantic distinctions coded lexically are more likely to be optionally
expressed than those coded inflectionally (e.g., lexical versus inflectional encoding of tense).
Thus, languages that are less grammatically specified tend to rely more on extra-linguistic
information such as pragmatics and context.12 Reduced reliance on morphology also has the
effect of increasing the transparency between word-forms and meanings (form-meaning
compositionality)3. Consider the high occurrence of exceptions in the inflectionally marked past
tense forms of English compared to the perfect regularity of the modally marked future tense.
One reason for the inverse relationship between morphology and form-meaning compositionality
is that inflections such as affixes are, by definition, phonologically bound to the stem which
increases opportunities for phonological compression and sound change to disrupt regular
mappings between form and meaning. Thus, although it is logically possible to have complex
inflectional morphology that is highly regular (frequently classified as agglutination), in practice,
coarticulation, historical sound change, and other phonological/articulatory processes often
-
8/3/2019 Estructura Social y Lenguaje
14/32
SOCIAL AND LINGUISTIC STRUCTURE 14
niche morphologically simpler than languages spoken in the esoteric niche? (2) Why are
languages spoken in the esoteric niche so morphologically complex, given that such a high level
of specification seems unnecessary for communication?
We propose that the level of morphological specification is a product of languages
adapting to the learning constraints and the unique communicative needs of the speaker
population. As a language spreads over a larger area (e.g., as a result of colonization) and is
being learned by a greater number of outside learners, complex morphological paradigms
become simplified19,17,11. Complex morphological paradigms appear to present particular learning
challenges for adult learners even when their native languages make use of similar paradigms31.
This appeal to learning constraints of adult learners as an explanation for morphological
simplification has also been proposed by the descriptive analyses of Trudgill20 and McWhorters
(interrupted transmission hypothesis)7 which has been previously supported only by selected
examples.
-
8/3/2019 Estructura Social y Lenguaje
15/32
SOCIAL AND LINGUISTIC STRUCTURE 15
mother tongue from parents to offspring. For example, in a survey of 188 individuals in
Senegal who listed Bambara as their native language, Bambara was the fathers native language
in 16%, the mothers in 19%, the native language of both parents in 26%, and the native
language ofneitherparent in 39%32. It is thus common for children to receive input of what they
consider to be their native language from nonnative learners. Vehicular languages like Bambara
(as well as colonial languages like French in Gabon) are often dominant enough to impose
themselves within families even when they are not the native language of the parents. Although
children are learning these languages from a young age and are, in theory, fully capable of
learning whatever inflectional system the language possesses, much of their input may come
from nonnative speakers. Thus, whatever aspects of Bambara were difficult for the parents to
learn are more likely to be passed on to the offspring in a revised form.
Many have commented on the puzzle of baroque accretion so common to languages33.
We propose that the surface complexity of languages adapted to the esoteric niche may arise as a
-
8/3/2019 Estructura Social y Lenguaje
16/32
SOCIAL AND LINGUISTIC STRUCTURE 16
The reconstruction-erroris the number of bits required to repair any errors that occur
when S as communicated by A is reconstructed by L.
Total Cost = Code-cost + Model-cost + Reconstruction-error
Let us assume that the reconstruction error is constant (Supplementary Note 7). Minimizing the
code-cost increases the model-cost. To take an example from a familiar domain: one can reduce
the size of a music file by compressing it, but decreasing its size in this way requires more
powerful decoders to read the file. Reading a CD is far simpler than reading an MP3. Let the
code-cost correspond to the surface level grammatical specification. Thus, requiring speakers to
specify tense, number, aspect, evidentiality, and mood on a verbwhich we have shown to be
more common to languages spoken in the esoteric niche (e.g., feature 6)corresponds to a
greater code-cost. A decrease in the model cost under such circumstances, would suggest that
morphological overspecification may increase redundancy (Supplementary Note 8) and,
provided that infants benefit from such an increase may simplify language acquisition36,37
-
8/3/2019 Estructura Social y Lenguaje
17/32
SOCIAL AND LINGUISTIC STRUCTURE 17
walk and walked can be compressed by storing walk and ed in a dictionary and
referencing ed for any regularly inflected verb, producing a storage savings whenever an
inflected verb occurs (of course the addition of inflections can increase the overall size of the
uncompressed document). In the absence of an inflectional past tense marker, no such savings
occurs.
Table3 shows the obtained correlations between the measure of redundancy
(compression ratio) and the demographic variables used in our main analysis. As shown in
Figure 6, languages spoken by more people and/or over a larger area are less compressible than
languages spoken by fewer people (Supplementary Note 9). Additional analyses that particle out
the original file size and the number of unique and total words, did not eliminate the negative
relationship between exotericity and compressibility. To ensure that the redundancy differences
arose from differences in morphological specification, we replaced each unique word with a
unique number, e.g., walk and walked might be consistently replaced throughout the
-
8/3/2019 Estructura Social y Lenguaje
18/32
SOCIAL AND LINGUISTIC STRUCTURE 18
underspecification more effectively than infants and thus it is infants that would benefit most
from linguistic redundancy36,37,40
. The paradoxical prediction that morphological
overspecification, while clearly difficult for adults, facilitates infant language acquisition,
remains to be empirically tested. Supplementary materials present some evidence that the most
frequent typologies (e.g., case suffixes are much more widespread/frequent than case prefixes)
correspond to those most easily learned by children whereas typologies common to high-
population (i.e., exoteric) languages are most learnable by adults.
We have argued that, depending on the number of speakers, geographic spread, and
linguistic contact, languages are placed under different learnability and communication
pressures. Languages spoken by millions of people over a diverse region are (1) under a greater
pressure to be learnable by outsiders and (2) under a greater pressure to be understood by
strangersindividuals with whom the speaker does not share much common ground. Languages
appear to respond to these pressures by simplifying their morphology, increasing productivity of
-
8/3/2019 Estructura Social y Lenguaje
19/32
SOCIAL AND LINGUISTIC STRUCTURE 19
The Linguistic Niche Hypothesis adds a new perspective to the question that has puzzled
people for millennia. Why are there so many different languages? One, currently accepted
answer is that as a population splits into several groups, dialect differences emerge and gradually
render the languages mutually incomprehensible30. This linguistic drift account is analogous to
genetic drift in evolutionary biology. Crucially, biological speciation events are also produced by
ecological speciation in which genetic diversity is increased between cohabitating populations
when populations adapt to different ecological niches41. The present work suggests that
languages may undergo a similar process of adaptation to a niche. On this view, linguistic
diversity is not simply a product of passive drift, but also of active speciation as languages
adapt either to a small socially cohesive community of native speakers or to a large, diverse
group that includes nonnative learners. The present levels of morphological complexity in a
language may thus be informative of the socio-historical context in which the language evolved.
-
8/3/2019 Estructura Social y Lenguaje
20/32
SOCIAL AND LINGUISTIC STRUCTURE 20
Table 1:
Language Speakers (millions)2
L1 L2 %L1
Malay 30 170 .15
English 330 812 .29
French 65 50 .57
Amharic 27 7 .79
Abkhaz 0.11 .006 .95
Siberian Yupik Eskimo 0.001 ~0 ~1
-
8/3/2019 Estructura Social y Lenguaje
21/32
SOCIAL AND LINGUISTIC STRUCTURE 21
Table 2
Model
FeatureObserved
PatternPopulation
(LogSpeakers)
Area
(Logkm2)
LingContact
(Logling.
neighbors)
Effect
size
Morphological Type
1. Fusion of inflectionalformatives (20)
Isolating > Concatenating ** x . 17.69
2.
Inflectional Morphology(26) Little or None > Present ** . . 37.26
Cases
3. Number of Cases (49) Fewer Cases > More Cases ** x x f2=.08
4. Case Syncretism (28) Core/Non-Core Cases > CoreOnly > No Syncretism
** * * 11.03
5. Alignment of Case markingsof Full NPs (98)
Nom/Acc > Erg/Abs ** ** ** 25.16
Verb Morphology
6. Inflectional Synthesis of theVerb (categories perword)(22)
Few Forms > Many Forms ** * *f2=.1
5
7. Alignment of Verbal PersonMarking (100)
Neutral Ergative=Accusative >Context Dependent
** * x 31.78
Agreement
-
8/3/2019 Estructura Social y Lenguaje
22/32
SOCIAL AND LINGUISTIC STRUCTURE 22
> None 17.Associative Plural (36) No assoc. Plural > Assoc.
Plural** . . 3.74
18.Polar Question coding (92) Question particle > NoQuestion particle ** ** ** 15.06Tense, Possession, Aspect, and Mood
19.Future Tense (67) No Morph > Morph. ** * * 15.9520.Past Tense (66)
Simple Past > No Morph Past> 2-3 Remoteness Dist. > 3+RemotenessDist.
** * * 34.41
21.Perfective/ Imperfective (65)
Morph. Distinction > No
Morph Distinction . * . 4.5022.Morphological Imperative
(70)
Sing only > Not Morph.Marked Sing & Plural Sing. Syncretic with Plural
** x x 26.52
23.Coding of Possessives (57)
No possessive affix >Possessive Affix
** ** ** 30.53
24.Optative (73) Not Marked >Morphologically Marked
. ** x 18.54
Articles and Demonstratives
25.Definite/Indefinite Articles(38-39)
None Both (Lexical) = OnlyDef. or Only Indef. Both(Affixes)
. ** . 23.52
26.Distance distinctions indemonstratives (41)
No distance contrasts > 2Contrasts 2+ Contrasts
** . ** 13.83
Effect Size is the log-likelihood ratio from a comparison of the intercept-only model with a model
-
8/3/2019 Estructura Social y Lenguaje
23/32
SOCIAL AND LINGUISTIC STRUCTURE 23
Table 3
Population
(LogSpeakers
)
Area
(Logkm2)
LingContact
(Logling.
neighbors)
Total Words-.01
-.17
(.10)-.11
Unique Words .15
(.14)
.17
(.10).04
Size in bytes -.23
(.02)
-.19
(.06)-.12
Compression
Ratio (CR)
-.56
(
-
8/3/2019 Estructura Social y Lenguaje
24/32
SOCIAL AND LINGUISTIC STRUCTURE 24
References
1. Chomsky, N. The Minimalist Program. 300(The MIT Press: 1995).
2. Gordon, R.G. Ethnologue: Languages of the World, 15th Edition. 1272(SIL International: 2005).
3. Wray, A. & Grace, G. The consequences of talking to strangers : Evolutionary corollaries ofsocio-cultural influences on linguistic form. Lingua117, 543-578(2007).
4. Greenberg, J.H. Universals of language. (MIT Press: 1966).
5. Dahl, . The Growth and Maintenance of Linguistic Complexity. (John Benjamins Publishing Co:2004).
6. Hawkins, J.A.A Comparative Typology of English and German: Unifying the Contrasts. (Univ ofTexas Pr: 1986).
7. McWhorter, J. Language Interrupted: Signs of Non-Native Acquisition in Standard LanguageGrammars. 304(Oxford University Press, USA: 2007).
8. Payne, D.L. & Payne, T.E. Yagua. Handbook of Amazonian Languages2, 249474(1990).
9. Dahl, . & Velupillai, V. The past tense. Haspelmath et al(2005).at
10. Bright, W. The Karok language. (University of California Press: 1957).
11. McWhorter, J. What happened to English? Diachronica19, 217-272(2002).
-
8/3/2019 Estructura Social y Lenguaje
25/32
SOCIAL AND LINGUISTIC STRUCTURE 25
22. Dahl, O. The Growth And Maintenance Of Linguistic Complexity. 333(John Benjamins PublishingCo: 2004).
23. Thurston, W. How exoteric languages build a lexicon: esoterogeny in West New Britain. Papersfrom the Fifth International Conference on Austronesian Linguistics 555-579
24. Haspelmaths, M. et al. The world atlas of language structures online. (Max Planck Digital Library:Munich,).
25. Seamless Digital Chart of the World. at
26. Nichols, J. Linguistic Diversity in Space and Time. 374(University Of Chicago Press: 1999).
27. Bybee, J.L. Morphology: A Study of the Relation Between Meaning and Form. (J. Benjamins: 1985).
28. Bybee, J.L., Perkins, R.D. & Pagliuca, W. The Evolution of Grammar: Tense, Aspect, and Modality inthe Languages of the World. (University Of Chicago Press: 1994).
29. Dressler, W.U. Word formation as part of natural morphology. Leitmotifs in Natural Morphology99-126(1987).
30. Sapir, E. Language: An Introduction to the Study of Speech. (Dover Publications: 1921).
31. Klein, W. & Perdue, C. The Basic Variety (or: Couldn't natural languages be much simpler?).Second Language Research13, 301-347(1997).
32. Calvet, L. Towards an Ecology of World Languages. 304(Polity: 2006).
33. Bickerton, D. Roots of Language. 351(Karoma Publishers, Incorporated: 1985).
-
8/3/2019 Estructura Social y Lenguaje
26/32
SOCIAL AND LINGUISTIC STRUCTURE 26
Figure Captions.
Figure 1. Geographic distribution of the 2,236 languages included in the present study.
Figure 2. a: The relationship between population, the number of cases. b: number of categories
per word. The regression lines are flanked by 95% CIs. The ranges on the x-axis correspond
to the coding of these features in the World Atlas of Langauge Structures.
Figure 3. a: Categories-per-word (inflectional synthesis of the verb (feature 6 in Table 2) plotted
against the mean number of speakers for the largest language families (those containing at
least 32 languages). b: Inflectional synthesis of the verb collapsed by continent. The
regression line is flanked by 95% CIs. Eurasia corresponds to the region 38o N 71o20 N /
29oE 172oW.
Figure 4. Languages spoken by more people have simpler inflectional morphology. X-axis scores
represent a measure of lexical devices compared to the use of inflectional morphology.
Symbols represent means; bars show 95% confidence intervals of the median. Bar width is
proportional to sample size for each score.
-
8/3/2019 Estructura Social y Lenguaje
27/32
-
8/3/2019 Estructura Social y Lenguaje
28/32
-
8/3/2019 Estructura Social y Lenguaje
29/32
-
8/3/2019 Estructura Social y Lenguaje
30/32
-
8/3/2019 Estructura Social y Lenguaje
31/32
-
8/3/2019 Estructura Social y Lenguaje
32/32