/ Glossika Blog - English Edition

A Map to the Syntax of All Spoken Languages

Reading Level: C2 (advanced)

Over the course of the last decade I have been working on a list of universal structures for as many of the world's languages as possible. I have now completed most of that work in both fields of syntax and semantics, as the two fields are deeply intertwined. From this work it is obvious that semantics affects syntax. For example, the motion of a verb affects an object with velocity, direction, rotation, thereby giving rise to prepositional and/or locational structures tied with such verbs. We can even see the mutual influence of culture and syntax, which I will touch on in this article.

Is Syntax Worth Learning?

With a better understanding of how all languages are structured, you can use this information to decipher how a language works, regardless of what you know about the language, where it's spoken, its size, whether it's written or not, or what it sounds like.

Learning to speak a foreign language fluently as native speakers do requires understanding how to encode the world around you using the infrastructure of the particular language. Most grammars focus on surface grammar rules, whereas this paper discusses the underlying syntactic-semantic structure of human communication, free from the constraints of any particular language. This is a top-down approach.

The semantic representation of the world can be broken down into a couple dozen broad semantic fields depending on the level of granularity you choose. I recommend as little granularity as possible so that easily understood patterns emerge. The semantic entities within the string of communication have relationships with each other that we label with syntactic roles.

When we observe and describe the world, we construct semantic entities and string them together in an arbitrary order to construct a sentence. Our choice of putting one word before another depends on the constraints of specific language surface rules, such as the rules of eng English or rus Russian. In this paper I seek the description of universal roles completely independent of such surface rules.

The surface (grammar) rules for each language are actually well-defined in academic literature. It is the link between the observation of the real world and a syntactic-semantic construct that is not yet well-defined. One could say that each language can be deconstructed in reverse back to its core construct and then reconstructed into another human language as an improved definition of "translation".


Defining Parts of Speech

We have all heard of nouns and verbs. But we need to reevaluate how we think of these concepts. They are not concrete concepts but rather fluid concepts. Think of them as occurring at several positions on a continuum. We cannot say that every sentence must contain a verb, or that every language has nouns, but we can say that every sentence in every language has an element occurring somewhere on this continuum. This continuum is represented graphically under Noun or Verb? below.

At one end of the continuum are stable states (nouns) and at the other end are unstable states (verbs). Between these two states a variety of things can occur, including various forms of adjectives and adverbs.

This means that any event that is uttered contains something on this continuum and that the entities on this continuum are contained within a "speech time".

Coordinating conjunctions fall outside of this continuum and instead connect between two speech times. Some languages encode coordinating conjunctions into the verbal structure grammatically (e.g. kor Korean, for which insertion rules would be required for that specific grammar), in which case no separate coordinating conjunction must exist outside the speech time.

Defining Event, Reference, and Speech Time

When somebody talks, this is speech time which I label as .

Reference time is the relation between speech time and the talked about action . Therefore, we can set up the following relationships:

1. ∀(φ)↦∃(詎) The complete structure is ∀(φ) which maps to the existence of an speech time utterance .

2. ∃(詎)↦指 The utterance maps to a reference. This reference is the relationship between the action and the speech time.

3. 指↦指⊏場 The reference contains the events within the speech.

4. 場↦場⊔指⊔場′ The event itself has a beginning point and an ending point containing the reference.

The following lines show the timeline of an event 1 from left to right, and possible observations 2 and discussions 3 of that event. The reference point does not overtly encode tense, which is determined by knowing the combination of the event, the reference, and the speech time.

1 - ..... 場 ..... - ..... 場′ ..... - (the beginning and end of the event)

2 指 ..... 指 ..... 指 ..... 指 ..... 指 (reference time: past, present, future?)

3 詎 ..... - ..... 詎 ..... - ..... 詎 (possible speech times)

The start and end of an event are considered instantaneous and semelfactive having no duration at all, which is why the time of speech occurs either before, between, or after these points in time. Some complete events are also semelfactive such as the bubble burst with no particular start or end. Semantics of this type cannot appear in durative form: bursting, unless the semantics are borrowed in a metaphorical sense: [X ate so much that he appeared to be] bursting at the seams, or as an iterative action: X keeps popping all the bubbles. Whereas pop is lexically entirely different from burst, it is actually the complementary transitive verb for the same event, and therefore lexicalized as the same word in other human languages.


Defining Syntactic Relationships

The problem with syntactic trees is that a hierarchical relationship is assumed to exist between all parts of speech within a sentence. This is not universal. Some languages display agreement between dependent clauses and head clauses, and other such variations (e.g. as found in dak Lakhota and dbl Dyirbal).

To develop a truly universal descriptive system of human language, we need to establish a completely free-standing syntactic infrastructure. This structure must not have any reliance on word order. This means everything has a specific role and is marked as such. This reduces sub-clauses and recursion.

The hierarchical structure of X̅ (X-bar) syntax and phrase structure is not conducive to such an independent structure. The independent structure allows agreement across any element within the action .

Syntactic description should be as simple as possible. The problem with such grammars as Minimalist Grammar and Minimalist Insertion Grammar is that highly convoluted explanations are required just to process the simplest of concepts. I think of these grammars as surface grammars for specific languages rather than a universal tool for observing and describing the world, so it is important to understand what kind of grammar is appropriate for what you're trying to achieve.

Defining Subjects and Objects

The terms subject and object are misnomers and are convenient labels used by the layman to describe syntax and grammatical relations. These terms only describe surface realizations of the grammar and have no relationship with the underlying roles which are defined by semantics. Therefore, subjects and objects have no relationship with semantics.

The terms "subject" and "object" refer only to the arbitrary position of any syntactic role in relation to the verb core. Some languages have strict ordering of all elements in a sentence, such as deu German, so trying to label the position of "subject" and "object" in such languages is complex.

The fact that I can say "the glass shattered" with 'glass' in the subject position has nothing to do with the agent / patient / experential / etc. roles in the sentence. This is not a passive sentence, but rather an unmarked ergative sentence (made up of certain kinds of intransitive verbs). The 'glass' didn't do anything of its own volition and therefore could not be the agent or source of such an occurrence. Despite this, it occurs in the "subject" position of the sentence in eng English , and therefore adopts surface rules specific to that language.


Compare the following example of "A car accident happened" in several languages, and pay attention to whether the accident is a subject or object:

rus: Произошла авария. proizošla avarija. Syntax: verb + subject

zho: 發生了車禍。 fasheng-le chehuo Syntax: verb + object

kor: 사고가 일어났어요. sago-ga irŏnassŏyo. Syntax: theme + verb

eng: A car accident happened. Syntax: subject + Verb

eng English surface rules do not allow: *Happened a car accident.

zho Chinese surface rules do not allow: *車禍發生了。 chehuo fasheng-le Syntax: subject + verb

Although semantics is exactly the same in every language, the syntax produces different surface realizations for each language. However, in a universal syntactic-semantic framework, the above sentence describes only one and the same event: the existence (occurrence or happening) of a theme θ (a car accident). We can write this as follows:


Since this event is existent in nature and not bound by telicity, the tense that manifests in each language is unstable. The reference time is now, and the theme does exist, therefore the aspect is completed. This completed aspect gives rise to the use of a perfective verb in rus Russian and a past tense in eng English.

We can add granularity to the syntactic function, by using a reference time of now thus:

場↦現{∃(θ)} 現 A car accident(θ) happened{∃v̿}.

If the car accident was still in process of happening (though impossible if this were a semelfactive occurrence) at the moment of speech time, we would define the verb as durative v̿:

場↦現{∃v̿(θ)} 現 A car accident(θ) is occurring{∃v̿} (as I speak).

Since every element is a single character and can be arranged in any order, this gives our processing algorithms great range of freedom. We can extract any kind of possible collocation at the deepest and most meta level of syntax. Rearranging all the elements in any order does not changing the semantics or the syntax:

場↦{(θ)∃v̿}現 現 A car accident(θ) is occurring{∃v̿} (as I speak).

In every language, this theme θ will be assigned to a different grammatical position: a subject behind the verb in Russian, an object behind the verb in Chinese, a theme before the verb in Korean, a subject before the verb in English. These rules are defined by each language individually and represent a "translation" from the real-world observation.

The effects of whether something occurs in subject or object position on a particular culture is questionable and inconclusive. I believe that speakers of any language can easily understand or acquire comprehension of the underlying roles without distortion from their own language or culture.

Defining Independent Syntactic Entities

At the core of every action is the verb phrase. In Role and Reference Grammar adjuncts may be added outside of the verb phrase, but we treat certain kinds of adjuncts here as functions of the core verb which I demonstrate below.

This means that a verb of motion can take arguments such as agent A, location λ, comitative ω, etc. Please note that not every preposition in European languages maps neatly to these roles, nor do the many cases that manifest in Uralic languages. Roles are dependent upon the underlying semantics of the sentence. Therefore, semantics drives syntactic structure and the relationship is a very tight one.

If this verb of motion is not time-bound (i.e. telic ) and is a dynamic verb , I encode this as follows:

場↦{動v͌(Aλω)} (again, all the elements have free order)

A possible translation of this phrase into eng English is:

eng A man (A) is walking (動v͌) to a store (λ) with a friend (ω).

We have not defined reference time here, but it appears that the action and the speech time coincide 場⊔指⊔場′. The phrase of words "to a store" is treated as a single "noun-like" entity here, and in fact does appear as only one word in many languages, if we can define word as: a string of letters not interrupted by a space. This same "noun-like" entity also applies to "with a friend".

The accusative/directional "to a store" is also an example of an achievement.

Here is an alternative structure using a telic verb with a possible eng English translation.

場↦{動v͆(Aλω時)} eng: A man (A) walks (動v͆) (and arrives) at a store (λ) with a friend (ω) after five minutes (時). This is an example of an accomplishment.

Or: eng A man arrived at a store with [his] friend after walking for five minutes. Labels such as determinants DET are language specific rules and too granular, but these can be derived by the underlying structure. Here it would be most appropriate to translate this as 'his friend' rather than 'a friend' in eng. Most languages would omit this DET label, whereas some languages may refer back to the agent using a reflexive, such as rus свой.

Since I have discarded the use of the terms "subject" and "object", how about the terms "noun" and "verb"? What evidence do we have that nouns and verbs are concrete concepts? In the next section I discuss how to re-define these terms.


Noun or Verb?

What we call nouns (things) and verbs (actions) appear on a continuum between stationary and moving. Concepts change easily between nouns and verbs and everywhere in between. Adjectives exist at an intermediary point between nouns and verbs. Adjectives can behave either like verbs (predicates) or like nouns (nominal adjectives). Adjectives can also take on active and passive roles just like verbs (compare English 'interesting' and 'interested').

Every semantic concept in human language therefore has a continuum state between noun and verb. Not all concepts are realized in every role. For example, we can speak of "hope", "to be hopeful", and "to hope" thereby fulfilling all three roles. But we cannot speak of "window", "to be windowed", and "to window", unless we change the way the verb operates. Some languages may have a single word for "to put in windows" based on the noun "window". In fact, I can create such a verb in English by saying "We'll be windowing the new house today." Though not necessarily this specific word, it happens to be something that prolific writers like Stephen King use for dramatic effect in their writing as literary devices.

In some languages adjectives stand independently between nouns and verbs:

stable N ..... ..... A ..... ..... V unstable

In some languages, adjectives function more like nouns:

stable N ..... A ..... ..... ..... V unstable

In many languages, adjectives function more like verbs:

stable N ..... ..... ..... A ..... V unstable

It appears that most languages are able to adjust the positions of N/A/V on this continuum farther to the left or right.

Both nouns and verbs can manifest in many forms. Verbs can be existential (∃) as the most stable form, closer to nouns. Verbs can be stative , acting like adjectives. And verbs can be unstable or dynamic , as actions. Here is where they appear on this continuum (and now replacing adjective with stative verb):

stable N ..... ∃ ..... ..... v͇ ..... v͌ unstable

Existential , stative verbs , and dynamic verbs can all take the core position in a sentence and they do not invoke one another. These elements usually require one or more arguments, although zero arguments are common as well. Stable entities (nouns) rarely appear in isolation, and therefore always manifest as an argument of a verb. In non-agglutinative languages such as eng, these arguments manifest as noun phrases consisting of multiple words.


Nouns (and non-functional adverbs, "adjuncts") seem to be the only thing that can appear as arguments, though adjuncts tend to be independent. Everything else requires arguments, and therefore are considered "core verbs".

One may ask how to deal with compound verb phrases if these verbs do not invoke one another. This is resolved via verb functions, which is where most adverbs end up in our syntactic description, and is described in great detail under Verb Functions below.

Here is an example of a semantic entity that manifests across the spectrum.

When we say "it is dark", in terms of 'rays of light' we can deduce that darkness is a temporary state. If we refer to an object's attribute then it is a permanent state. In English we use the "be" verb with the adjective to indicate the stative nature (note: "-stat-" means "being" or "standing"; compare Italian "stato" (been) and Latin "status"). We can treat "be dark" either as a predicate (acting as a verb), or as an adjective (which is what "dark" is classified as in English). I label the two words together "be dark" as . I label adjectives that appear together with nouns as nominal adjectives å.

stable N ..... ∃ ..... å ..... v͇ ..... v͌ unstable

N = darkness

∃ = there is darkness, let there be darkness (this includes hortatives)

å = dark (as in 'a dark thing')

v͇ = be dark, get dark, become dark, feels dark

v͌ = to darken

Also known as a predicate adjective, can take on various predicate verbs in English, usually related to the senses (seems, finds, feels, looks, sounds, tastes, smells) or change of state (gets, becomes) or behavior (acts, resembles, appears). Some of these verbs take a predicate noun δ as a possible argument. Van Valin has a complete list that also includes verbs such as costs in this category.

eng: I find it strange. 'I' is the experiencer ε; 'to find strange' the stative verb ; 'it' the theme θ.

In many of the world's languages, a takes on aspect rather than tense (but tense can be added). Aspect refers to whether a change in state as happened and tense refers to when an event happened regardless of any change.

For example, in many languages, to express that "it is now dark" is the same as "it has already become dark" and may use a past tense or perfect aspect, meaning that the process of becoming dark has finished. English makes use of the word "now" semantically as aspect rather than tense, whereas many languages would only use the word "now" temporally. Without the word "now", the English "it is dark" disregards aspect and only focuses on the temporary state during the tense of the verb.

Languages that lack aspect express "it is dark now" using past tense to refer to the completed change in the present state. The English now is your cue for knowing this completed change and is equivalent to saying "already". Since now as a function of change is encoded in the core verb, translating or saying this now in many other languages would be considered an error or a redundancy.

This feature is commonly found in East Asian languages. Japanese marks tense, so it applies past tense to verbs to indicate this change.

jap: 疲れる
to get/be tired

jap: 疲れました
tsukaremashita--past tense
I'm tired now/already.

Chinese has aspect by using 了, so it applies this to indicate the change in state:

zho: 我累
wo lei
I tire / I am tiring.

zho: 我累了
wo lei-le
I'm tired now/already.

In Slavic languages, nominal adjectives can take a large variety of endings. Predicate adjectives in Slavic languages appear as "short-form" (simplified versions of nominal adjectives):

rus: согласный человек
soglasnyj čelovek
an agreeable person (nominal adjective)

rus: Я согласен
ja soglasen
I(male) [am-in-agreement] (predicate adjective)


Arguments and Valency

Verbs are said to carry an amount of valency which then tells us how many arguments exist tied to that verb. The valency is usually the lowest number, since arguments can be added to the base number.

Intransitive verbs have a valency of 1 (the agent, the experiencer, or in ergative sentences the patient -- frequently occurring in subject position across languages).

Transitive and ditransitive verbs have objects, therefore having a minimum valency of 2.

Zero Valency

It can be argued that impersonal constructions describing weather in English such as "it's sunny" that require "it" in a subject position are in fact, zero valency verbs.

In some cases it is not so easily defined. Many languages describe "it's raining" as "rain falling" with a noun+verb construct, others as simply verb, yet others as a "state":

1. {v͌()} (Active verb with zero arguments)

Examples: kat: წვიმს (ʦ⁼ᶹims) lit: Lỹja slk: Prší ron: Plouă ell: Βρέχει (vréxi) swa: inanyesha dru: uda-udal-e trk: q[m]uyux tgl: u[mu]ulan

2. {v͌(θ)} (Active verb with a theme argument)

Examples: srp: Pȁdā kȉša nan: 落雨 (lo̍h-hōo)

3. {v͇()} (Stative verb with zero arguments)

Examples: ssf: quraz-iza xsy: 'o[mo]ral-ila (The xsy SaiSiyat example contains an active verb infix, but the perfective ending gives it a parallel stative verb structure as the ssf Thao.)

4. {∃()} (Existential with zero or one argument)

(seeking examples)

If the rain falls [itself] from the clouds, is it an ergative construction? If raining is a state, can it actually be a stative verb without any arguments? Or does rain just exist as an existential in some cultures?

Arguments as Roles

Agents, patients, beneficiaries (benrecs), experientials, predicate nouns, themes, and comitatives have already been explained. Causative topics are discussed under Verb Functions.

Also including humans are: vocative, genitive, and self. Objects can also be oblique, partitive, and incorporative (a verb like fly that incorporates another object as a tool in order to execute the verb).

In addition to these are locational and directionals: ablative, locative, and spatial/directional.

Temporals are given an argument where they are not redundant and not already expressed as functions of the verb. Certain kinds of adverbs that describe method/manner not encoded in the verb are positioned as an argument.

Arguments can also occur as an amount, a degree, or as a measure.

It was necessary to also add environmental Natural Causes as an argument.

Verb Functions

If we look specifically at eng English, many verb functions require ad hoc constructions of other verbs (or modals) that cause the core verb to change into an infinitive or a gerund, or perhaps a base form with no change at all. English frequently mades use of adverbs to .

Phrase Functions

Includes: 1) conditional, if; 2) purpose followed by a verb phrase; 3) reason for doing; 4) derived clause, sub-clause, {that, which}; 5) therefore; 6) because; 7) and; 8) not; 9) but; 10) either; 11) or.

Intention Functions

Includes: 1) will; 2) prepare, about to do; 3) want; 4) would; 5) should; 6) attempt/try/fail; 7) certainly; 8) absolutely; 9) voluntarily, on purpose, purposefully; 10) involuntarily, on accident, happened by itself; 11) deliberately, with motive/intention; 12) accidentally, without motive/intention; 13) willingly; 14) agree; 15) unwillingly agree; 16) forcibly agree (under duress).


Includes: 1) able; 2) could; 3) might; 4) must, need, necessary; 5) may, supposedly; 6) possibly, possible that.

Valency-Increasing Functions (Jussives and Causatives)

We add a "topic" to the sentence, another kind of agent that is not the actual agent of the core verb. The topic is encoded as an argument and the functions encoded with the verb.

These include: 1) help (an agent do something); 2) make (an agent do something), by command/order; 3) have (an agent do something), by request; 4) get (an agent to do something voluntarily), by asking softly or persuasively; 5) let (an agent do something), by allowing; 6) manipulate/force (an agent to do something), by telling; 7) show (an agent how to do something), though this frequently does not manifest as a causative in many languages, but rather "do something + beneficiary onlooker".

Though neither jussive or causative in nature, indirect speech is also a valency-increasing function.


Valency 1:

場↦{v͌v̿(Α)} He teaches.

Valency 2:

場↦{v͌v̿(Αθ)} He teaches math.

場↦{⪂v͌v̿(TΑ)} I asked him to teach.

Valency 3:

場↦{v͌v̿(Αθβ)} He teaches children math.

場↦{⪄v͌v̿(TΑθ)} I got him to teach math.

Valency 4:

場↦{幫v͌v̿(TΑθβ)} I helped him teach the children math.

Complex Valency:

場↦{⪀v͇(TΑβæ̿{v͌v̿(θβ)})} I allowed her to show him how to teach the children math.


This may decrease valency, depending on the semantics.

Includes: 1) let's (hortative, suggestion to do something), 2) suggest and recommend doing.


Many of these cognitive verbs, although able to stand as independent core verbs, are almost always coupled with core verbs. In English we find that the core verbs are frequently implied but omitted in discourse.

Includes: 1) hope/wish; 2) fear/afraid of; 3) assume/consider; 4) decide to; 5) forget/remember to; 6) believe; 7) think of/would like to; 8) like to; 9) dare to; 10) claim to; 11) admit to; 12) deny; 13) allege/etc.


Includes: 1) habit of doing, used to doing; 2) used to do.


Appearance includes direct and indirect perception of events. You may only know of an event indirectly through someone else, or merely heard about or of it. Some events happen clandestinely or intentionally made to look false, and such events have a range of perception that can be expressed.

Includes: 1) apparition/apparent (sounds like, looks like, tastes like, smells like, feels like, appears to be, know/heard of, know/heard that); 2) expectedly; 3) unexpectedly; 4) obviously; 5) falsely, pretend to; 6) secretly; 7) supposedly secretly; 8) obviously secretly; 9) deliberately secretly.


Sometimes the verb binds the agent and patient/beneficiary/comitative through reciprocation.

Includes: 1) expressly apart (agent and patient separated); 2) expressly together (agent and patient); 3) expressly reciprocal (one another, with each other).

Manner of Action

There are a myriad of ways an action unfolds, from the inchoative to the perfective and every step in between (imperfective).

Includes: 1) iterative, action happens again; 2) continuously iterative, again and again, repeated action, including repeated semelfactive verbs (hitting, smacking, ...); 3) simultaneously, during, when, while; 4) not simultaneously, afterwards, and then (syntax:
場1 ≢ 場2: 場1 and then 場2, 場1 before 場2, 場2 after 場1; 5) stay doing, keep doing; 6) start to, begin to; 7) stop doing (not a command, as in "stop smoking"), see also "purpose followed by a verb phrase" for structures like "to stop to smoke", meaning "in order to"); 8) finish doing, all the way done.


Other than those mentioned in the previous section, frequency or explicit mention of time of action is often encoded in separate words in the sentence. If these additional words are not mere reinforcements of the core verb structure (such as "aspectual change" by eng "now"), then we encode the adverb phrase as an additional argument.


As demonstrated, aspect and modality have already been encoded in the categories above. What is missing is tense which is encoded in the complex relationship between reference time and the action . Since we encode this relationship, explicit marking of tense is not a necessary grammatical function.

Languages that explicitly encode tense translate the reference time in arbitrary ways specific to that particular language, which is beyond the scope of this paper.


Registers often deal with social class. There are several things to note here:

  1. Gender; 2) Social position; 3) Respect levels; 4) Clusivity

Registers frequently manifest on pronouns, but can also be encoded into verbs in that completely separate verbs are required in different registers (e.g. in jav Javanese, ind Indonesian, tha Thai, vie Vietnamese, kor Korean, jap Japanese, and sometimes even eng English). In some cases, social position can cause the lack of pronouns to be used.

Gender has become a hotly debated topic as language use starts to shift in some societies. We have already witnessed degenderization in languages like swe Swedish, where it has happened grammatically and naturally over time. English is currently undergoing a forced linguistic change based on specific social influences. So although some languages have a male/female mixed register (e.g. esp Spanish, hrv Croatian), it is important to account for a neuter register, both as speaker and listener.

In many languages, gender plays a role either as speaker, or as listener, or as both (compare usage in pol Polish and ara Arabic).

Inclusive and exclusive mostly occurs on the 1st person pronouns. An example of this is the use of "we" used by a married couple in English. Though unmarked in English, it conveys an "exclusive" nuance. Some language families such as Austronesian have separate words for these pronouns. In addition to inclusive and exclusive pronouns, it is important to account for possible registers as married, divorced, or a "potential inclusive" for partnerships.


Semantic Fields

The goal of this grammar is to achieve advanced capability at sorting any language in terms of complexity, vocabulary, collocations, and with this new level of granularity, develop new Natural Language Processing tools. We have already made use of our syntax-semantic mapping in Glossika's Machine Learning framework. The framework not only allows us to track complex relationships in tense and aspect, but also more accurate nuances in human discourse.

Being too granular however leads to problems when running algorithms. They fail to produce the expected results. If your data is too granular, then you need to build a larger database of training data. At our initial stages, we have over-tagged everything in anticipation for more training data, and ignore specific types of granularity when running algorithms.

As you may have noticed, we're using single-length characters to represent every aspect of syntax and semantics in this paper, which means that we rely on Chinese characters in our functions for great efficacy.

When considering the lexicon of any language, just by taking a glance at any dictionary, one finds that the typical "word" entry has multiple meanings attached to it. One of these meanings may have its own lexical entry elsewhere in the dictionary under a different word, known as a "synonym". For example, "get up" may be listed under "get", though these should be considered separate lexical entries, but a synonym may be found under "wake up" and even "awake", though each of these uses may differ in terms of valency and transitivity, perhaps ergativity (whether marked or not).

The number of words in a dictionary does not equal the number of meanings found in human communication.

How many "meanings" exist in human communication? This is extremely difficult to answer. But as we move to a more detailed markup of our database, in theory we get closer to knowing the answer. The problem is that we cannot be too granular (there are literally a million objects in our world that each have a name -- so we find it meaningless to encode every noun in existence like the exact names for all the pieces and parts and transistors and atoms found in every machine and computing device, not to mention every other thing known to man; instead, we can lump like things together into a specific class of nouns). More importantly, we look at verbs, their extrapolation from unstable to stable (e.g. the verb "to freeze" has a corresponding noun "ice" -- even though the two look nothing like each other in eng, they do in zho and many other languages), and then map their possible valencies. Valencies tend to adopt specific types or "fields" of semantics rather than specific "objects". For example, if a verb acts upon David, then why can't it act upon Mary and John, and him or her, ad infinitum? If this is so, we can deduce that the verb acts upon "living people". Can it act upon plants and animals as well? Are plants and animals in complementary distribution (semantically) with living people? We can prove otherwise, which gives rise to each their own semantic "field".

Although it seems that Anna Wierzbicka's semantic primitives would be a great fit for our goal, and do prove useful for determining the specific breakdown of all kinds of complex ideas, they play no role in our organization of semantic fields, listed here:

Existence , measures and numbers , times and dates , astronomy , geology/geometry/geography and inorganic matter , directions and positions , shapes and things , clothing , motion , food , biological organisms , the body , botany , zoology , physical senses , knowledge and learning , communication , volition/needs/success/action , social constructs and trade , the mind and entertainment and beliefs . We have mapped dozens of sub-fields for each of these.

If we split up each lexical entry in a given language by each definition with an ascribed valency matrix, we find that the individual semantic meanings in a given language far exceed the number of lexical entries by an order of magnitude. After doing so, we find that a finite number of base syntactic patterns emerge and by grouping semantics into specific fields, a finite number of syntactic-semantic patterns that represent a complete map of human communication, independent of human languages.

It is precisely these groupings into "patterns" that enable humans to communicate fluently in any language. Mastering these sets of patterns enables the human to manipulate sentences, and therefore ideas. Mastering more granular "vocabulary" enables the human to expand expression and speak ever more precisely.

This is how Machine Learning works as well. Pattern recognition at low granularities pass through more convolutions until more and more detail is added to the machine's "understanding". In other words, we have taken the methods by which machines learn and reverse-engineered human language to discover the underlying patterns that drive fluency and expression in humans. This is what the Glossika algorithms deliver on our training platform.


Those who have had the biggest influence on this work include the following individuals:

Daniel Everett, Robert Van Valin, Michael Tomasello, Thomas Givon, Tim Hunter, Jeffrey Lidz, Tim Fernando, Maria Bittner, Anna Wierzbicka.

Michael Campbell

Polyglot, phonologist, linguist specialising in Formosan, PAN, Sinitic, Slavic, typology, IPA, and L2. Does GSR training daily.

Read More