Neural Turing Machines

Neural Turing Machines.

Neural Turing Machines are able to learn simple algorithms that generalize "far beyond" the training data. Since publication there have been following up works, but this paper introduces some important mechanisms.

Overview

The paper focused on the experiments and the results -- this notebook provides an overview of some of the implementation details.

Architecture

The authors used the proposed Neural Turing Machine (NTM) to solve a range of basic data manipulation tasks. An external controller network utilizes NTM's API of blurry read and write operations to tackle the task.
Simplified architecture.

The controller has a number of heads which each other read or write from the memory. The number and purpose of the heads is static (part of the network configuration). The generic API for the controller as a whole is only a sequence of vectors (describing a problem) and then the controller will output an answer for the given problem once a termination symbol is reached. There is no intermittent guidance on how the controller uses the NTM to organize information. The authors hypothesized, and at least to some degree demonstrated, that the controller and the NTM in concert learn to construct generalizable programs.

The read operation is a weighted sum of the stateful memory M \in \mathbb{R}^{N \times M} as determined by a mask w \in \mathbb{R}^{N}.

\sum_i M_i w_i \in \mathbb{R}^{M} \;
Get representation using attention across the memory.

The write is more complicated; it uses something akin to the gating mechanism in an LSTM to update memory positions. The update is applied across all of its memory, but again, the update is controlled using the same mask as above.

The NTM uses a novel mechanism for addressing content inside its memory.

Addressing mechanism to create a mask used for reading and writing.