One-Shot Learning: The End of Big Data?

By | June 27, 2017

Recently, a Bayesian probabilistic model outperformed neural networks and humans  in classifying written letters using very small datasets. One-shot learning is a type of machine learning that learns an object class after just one or a few examples. This is similar to how humans learn to identify objects, creating a rich, abstract template of objects from a  few examples. These templates are used to classify objects.

Noam Chomsky coined the term “poverty of the stimulus” (POS), which asserts that natural language is unlearnable given the relatively limited amount of data we’re exposed to while learning a language. This assertion entails that learning grammar is somehow assisted by some sort of innate linguistic function.

Nativists claim that humans are born with innate tools specific to learning language. This lended itself well to Chomsky’s generative grammar theories. Generative grammar is a hierarchical rules that allow an infinite number of possible sentences based on only a few lexical items.

The focus on this article is that machines are becoming increasingly proficient in classification given very small datasets. Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum published a paper in 2015 that demonstrated that a Bayesian probabilistic generative model was able to classify a large class of visual concepts with just a single example. Traditionally, models are exposed to many — often thousands — of negative and positive examples for training. They called their framework Bayesian program learning (BPL). Here, a program is basically a template for a class.

BPL uses compositionality, causality, and learning which have been key topics in semantics for decades. One of the first things you’ll learn in a semantics class is that abstract concepts can be created using composite parts. Simple templates represent concepts, building them compositionally from parts, subparts, and spatial relations. BPL captures causal processes that create classes from a natural abstract causal structure. The model analyses this compositionality and causality using hierarchical priors, and learns to classify. These priors are previous experiences that the model uses to enhance learning of new concepts. The resulting models can be used recursively, forming a generative structure similar to grammar.

BPL uses existing primitive features to create subparts, then subparts to create parts. These parts, and their relations, compose an object template, which is part of a hierarchy of knowledge used for identifying that class.

The model learns by fitting each conditional distribution to a set of characters from alphabets, creating character tokens from parts and relations. This created a binary image, interpreting the pixel values as independent Bernoulli probabilities.  Bottom-up method uses a range of candidate parses, with the most promising ones being refined using continuous optimization and local search. This creates a set of templates used to score against test images, where higher scores indicated better classification.

Chomsky proposed that grammar is learned with sparse exposure, and that it’s generative. Lexical items are similar to primitives that are used to compose sentences. Concepts can be formed compositionally in a grammatical structure in a recursive loop. These concepts could contain parts and their relations, which are added to the knowledge hierarchy.

 

Source: Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.

 

Share this article