Neural Turing Machines are able to learn simple algorithms that generalize "far beyond" the training data. Since publication there have been following up works, but this paper introduces some important mechanisms.
Overview
The paper focused on the experiments and the results -- this notebook provides an overview of some of the implementation details.
Architecture
The authors used the proposed Neural Turing Machine (NTM) to solve a range of basic data manipulation tasks. An external controller network utilizes NTM's API of blurry read and write operations to tackle the task.
The controller has a number of heads which each other read or write from the memory. The number and purpose of the heads is static (part of the network configuration). The generic API for the controller as a whole is only a sequence of vectors (describing a problem) and then the controller will output an answer for the given problem once a termination symbol is reached. There is no intermittent guidance on how the controller uses the NTM to organize information. The authors hypothesized, and at least to some degree demonstrated, that the controller and the NTM in concert learn to construct generalizable programs.
The read operation is a weighted sum of the stateful memory M \in \mathbb{R}^{N \times M} as determined by a mask w \in \mathbb{R}^{N}.