Deep Learning 2.0 , illustrated!: Envisioning an AI that combines Hinton’s GLOM & Bengio’s causal reasoning

6 min readApr 15, 2021

**Figure 1: From Deep Learning 1.0 to Deep Learning 2.0**

Deep Learning 2.0 : An AI that learns knowledge concepts & its interactions

This article presents a new vision of a AI based on two breakthrough visions by two Turning award winners. What will you do when you get inspired by two powerful ideas — one from Hinton and another from Bengio.

The idea paper by Hinton on GLOM on the vision of an AI that can understand the whole part hierarchy is an approach that can lead us into Deep Learning 2.0. Yoshua Bengio’s view on Deep Learning 2.0 explores the idea of an AI that learns the explicit knowledge.

What are the abilities of a future Deep Learning paradigm? Illustrated !

Cognition of Concepts : Does the network understand the semantic concepts? For example, Can an robot take a autonomous decision based on action recognition using an semantic understanding of actions and its reactions (Netwon’s law — every action has an equal and opposite reaction)?
Deep understanding of cause & effect: Can the AI employ causal knowledge to forecast the future (say for example in video generation)?
Resilience against adversarial attack based on Cognitive understanding: Is the neural network model resilient to AI security vulnerabilities such as fast gradient sign attack? While a CNN model can expose itself to such adversaries because of the way it learns the weights, Can the resilience be built upon knowledge concepts rather than a weight vector?
Out of distribution challenge: What is the generalization power of a neural network? Is it possible to design a futuristic AI that can be trusted by a doctor for a AI diagnosis inspite of sample from a out of distribution? Will mere selective prediction suffix or a deeper understanding of the semantic context and understanding of human anatomy activate a medical decision.

What could be Deep Learning 2.0?:

Future Deep Learning systems will be transform implicit knowledge into explicit knowledge using the idea of GLOM. And then combine Causal knowledge using Transformers.!

Let’s understand this vision with a illustrative example. A Deep Learning 2.0 system architecture is presented in Figure 2.

Figure 2: A illustrative architecture for a envisioned Deep Learning 2.0. In this idea, the model learns to pay attention to knowledge concepts. It also learns to pays attention to the interactions between these concepts. This innovative vision of understanding of whole and part is inspired by Hinton’s GLOM. And Bengio’s causal knowledge is used to predict and paint the future. Transformers (Attention mechanism) is employed to learn the interactions between these concepts.

In a video generation system using Action Prediction (Figure 3), if you drop a drop a ball, the AI should predict the ball will fall on the floor because the implicit knowledges “gates” the model to generate a future where the ball goes upward like a Helium balloon. But how to build such “gating mechanism” (seen in Figure 4) that uses cognitive knowledge of cause & effect?

Figure 3: Action Prediction: Visualizing what will happen in the future. (Image credits: MIT)

Figure 4: How to encode Cause and Effect into a neural network? (Image credits: Bengio at GTC 2021)

Let’s take another example. Is it possible to design a futuristic AI that can be trusted by a doctor for a AI diagnosis inspite of sample from a out of distribution? Will mere selective prediction suffix or a deeper understanding of the semantic context and understanding of human anatomy activate a medical decision. During medical diagnosis , how can an AI avoid a false prediction based on conceptual knowledge of the medical domain? (Figure 4). Can the explicit knowledge prevent false positives based on understanding of explicit knowledge (rather an mere weight vector of a CNN). By combining a dataset inspired by image and caption pairs such as COCO, a Deep Learning 2.0 model would try and learn the knowledge concepts, and then explain itself with the help such explicit knowledge.

Figure 4: Out of distribution challenge (Image credits: Science Direct)

How to design a Deep Learning 2.0 paradigm?

Design of a Deep Learning 2.0 paradigm:

1. An approach to learns knowledge concepts such as Actions, Objects, Persons, Causal rules and the hierarchical relationships between whole & parts . This is inspired by Hinton’s GLOM, the idea can be extended to concepts such as human actions, objects & its locations, human poses, and causal reasoning knowledge

2. An approach to learn the interactions among concepts. Cross Attention between concepts inspired by how Transformers learn a concept based on the its context or its interactions with the context

So based on the encoding of experiences into abstract concepts and its interactions in the context, Deep Learning 2.0 can be built using the above two tenants. This idea is illustrated in Figure 5.

**Figure 5: The realizable roadmap towards Deep Learning 2.0.** Deep Learning 2.0 is based on concepts learning rather than weights learning. The concepts and its relationships with each other is either learnt or coded by humans.

Traditional Neural networks activate a neuron by taking a scalar product of vector of incoming activities by a vector of weights, followed by processing of the scalar product by a non-linear activation function. In other hand, Transformers choose to activate based on product of 2 activity vector. This allows for deeper level of attention based learning between the concepts. The idea of Hinton’s GLOM can be used for learning hierarchy of whole & parts of the knowledge concepts at various level of abstractions. Further by attention based mechanism the interactions between the concepts in a context can be learnt. Alternatively, a human in the loop can explicit program the interactions between the high level concepts.

Can you illustrate Deep Learning 2.0 architecture using an example?

Let’s consider the example of Deep Learning 2.0 solution for sports video forecasting. Where will the ball go next? Can i form a sports game strategy based on such analysis and coach the player by a visualization of the possible future events based on current action.

Figure 2 illustrates the example of a Deep Learning 2.0 system architecture using Transformers.

In this example,

The knowledge concepts are

Concept #1: <Action> Human drops an object
Concept #2: <Object> Ball
Concept #3: <Location> 10m at t=1, 5m at t=2 seconds
Concept #4: <Predicted location> 2 m at next timestamp

2. Pair wise interactions (based on Attention based Query, Key, Value):

Interaction between Concept #3 and Concept #1: Based on the human action of dropping the ball, the location of the ball can be predicted based on causal knowledge
Interaction between Concept #3 and Concept #2: Based on nature of object (ball vs balloon) and knowledge of cause & effect, the location of the object can be predicted
Interaction between Concept #3 and Concept #4: Based on on sequence modelling to learn how the ball behaved in the previous timestamps, the ball’s new location in the future can be predicted by holistically including the abstract knowledge of gravitational force.

Practical applications for Deep Learning 2.0:

Path planning for autonomous vehicles

Where the cyclist will be after 10 seconds at traffic signal, given his pose is hurried? And importantly, visualize the future location on the screen to win the confidence of the human driver about an semi-autonomous decision.

2. Interpretable medical diagnoses in out of distribution

3. Resilient Face/iris biometrics adversarial attacks

4. Real time robotic planning

Summary

A more trustable Deep Learning 2.0 paradigm can be enabled with the benefits of explain-ability with explicit knowledge concepts, resilience against adversarial attacks, trustable even during a test sample comes out of distribution, reasoning based on cause & effect, modelling of the future using the knowledge concepts & its interactions.

Two practical ways to implement Deep Learning 2.0 product would be

1. Transformers (Attention mechanism) based learning of GLOM like knowledge concepts of whole & parts

2. Pairwise interactions between the knowledge concepts by either learning by Transformers (Query, Key, Value) approach. As a interim roadmap, explicit human instructed rules can be employed to define the interactions among the concepts.