Teaching a chatbot to Read and Comprehend has been elusive until recently. While DeepMind seems to have figured how to search documents to find answers, Microsoft Research has a chatbot that can answer questions in a number of languages.
Machine Reading Using Neural Machines
Teaching a chatbot to read, process and comprehend natural language documents and images is a coveted goal in modern AI. We see growing interest in machine reading comprehension (MRC) due to potential industrial applications as well as technological advances, especially in deep learning and the availability of various MRC datasets that can benchmark different MRC systems. Despite the progress, many fundamental questions remain unanswered: Is question answer (QA) the proper task to test whether a machine can read? What is the right QA dataset to evaluate the reading capability of a machine? For speech recognition, the switchboard dataset was a research goal for 20 years – why is there such a proliferation of datasets for machine reading? How important is model interpretability and how can it be measured? This session will bring together experts at the intersection of deep learning and natural language processing to explore these topics.
How to Make a Text Summarizer – Intro to Deep Learning #10
Build an AI Reader – Machine Learning for Hackers #7
Google released SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. The release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you and that you can use to analyze English text.
Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence. Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU.
Get the code from Siraj’s AI Reader video at https://github.com/llSourcell/AI_Reader
Learn more about Parsey McParseface at:
Generate Text using an LSTM Recurrent Network
Microsoft LUIS for Natural Language Understanding?
Language Understanding (LUIS) is a Microsoft cloud-based service that applies custom machine-learning to a user’s conversational, natural language text to predict overall meaning and pull out relevant, detailed information.
A client application for LUIS can be any conversational application that communicates with a user in natural language to complete a task. Examples of client applications include social media apps, chatbots, and speech-enabled desktop applications.
Your client application (such as a chatbot) sends user text of what a person wants in their own words to LUIS in an HTTP request. LUIS applies your learned model to the natural language to make sense of the user input and returns a JSON format response. Your client application uses the JSON response to fulfill the user’s requests.
What is a LUIS app?
A LUIS app is a domain-specific language model you design. You can start your app with a prebuilt domain model, build your own, or blend pieces of a prebuilt domain with your own custom information.
A model begins with a list of general user intentions, called intents, such as “Book Flight” or “Contact Help Desk.” You provide user’s example phrases, called utterances for the intents. Then mark significant words or phrases in the utterance, called entities.
Prebuilt domain models include all these pieces for you and are a great way to start using LUIS quickly.
Once your model is built and published, your client application sends utterances to the LUIS endpoint API and receives the prediction results as JSON responses.
Teaching Machines to Read and Comprehend
Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large-scale training and test datasets have been missing for this type of evaluation. In this work, we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.
- Build a supervised reading comprehension dataset using a news corpus.
- Compare the performance of neural models and state-of-the-art natural language processing model for reading comprehension task.
- Link to the paper
- Estimate conditional probability p(a|c, q), where c is a context document, q is a query related to the document, and a is the answer to that query.
Question answering dataset featured in “Teaching Machines to Read and Comprehend https://github.com/deepmind/rc-data/
- Use online newspapers (CNN and DailyMail) and their matching summaries.
- Parse summaries and bullet points into Cloze style questions.
- Generate corpus of document-query-answer triplets by replacing one entity at a time with a placeholder.
- Data anonymized and randomised using coreference systems, abstract entity markers and random permutation of the entity markers.
- The processed data set is more focused in terms of evaluating reading comprehension as models can not exploit co-occurrence.
Cloze style questions
Cloze style questions are fill in the blank questions. Words may be deleted from the text in question either mechanically (every nth word) or selectively, depending on exactly what aspect it is intended to test for. The methodology is the subject of an extensive academic literature; nonetheless, teachers commonly devise ad hoc tests.
A language teacher may give the following passage to students:
Today, I went to the ________ and bought some milk and eggs. I knew it was going to rain, but I forgot to take my ________, and ended up getting wet on the way.
- Majority Baseline
- Picks the most frequently observed entity in the context document.
- Exclusive Majority
- Picks the most frequently observed entity in the context document which is not observed in the query.
Symbolic Matching Models
- Frame-Semantic Parsing
- Parse the sentence to find predicates to answer questions like “who did what to whom”.
- Extracting entity-predicate triples (e1,V, e2) from query q and context document d
- Resolve queries using rules like
- Word Distance Benchmark
- Align placeholder of Cloze form questions with each possible entity in the context document and calculate the distance between the question and the context around the aligned entity.
- Sum the distance of every word in q to their nearest aligned word in d
Neural Network Models
- Deep LSTM Reader
- Test the ability of Deep LSTM encoders to handle significantly longer sequences.
- Feed the document query pair as a single large document, one word at a time.
- Use Deep LSTM cell with skip connections from input to hidden layers and hidden layer to output.
- Attentive Reader
- Employ attention model to overcome the bottleneck of fixed width hidden vector.
- Encode the document and the query using separate bidirectional single layer LSTM.
- Query encoding is obtained by concatenating the final forward and backwards outputs.
- Document encoding is obtained by a weighted sum of output vectors (obtained by concatenating the forward and backwards outputs).
- The weights can be interpreted as the degree to which the network attends to a particular token in the document.
- Model completed by defining a non-linear combination of document and query embedding.
- Impatient Reader
- As an add-on to the attentive reader, the model can re-read the document as each query token is read.
- Model accumulates the information from the document as each query token is seen and finally outputs a joint document query representation in the form of a non-linear combination of document embedding and query embedding.
- Attentive and Impatient Readers outperform all other models highlighting the benefits of attention modelling.
- Frame-Semantic pipeline does not scale to cases where several methods are needed to answer a query.
- Moreover, they provide poor coverage as a lot of relations do not adhere to the default predicate-argument structure.
- Word Distance approach outperformed the Frame-Semantic approach as there was significant lexical overlap between the query and the document.
- The paper also includes heat maps over the context documents to visualise the attention mechanism.
Teaching Machines to Read and Comprehend Citations:
Implementation of Teaching Machines to Read and Comprehend code on GitHub:
This repository contains an implementation of the two models (the Deep LSTM and the Attentive Reader) described in Teaching Machines to Read and Comprehend by Karl Moritz Hermann and al., NIPS, 2015. This repository also contains an implementation of a Deep Bidirectional LSTM.
The three models implemented in this repository are:
deepmind_deep_lstmreproduces the experimental settings of the DeepMind paper for the LSTM reader
deepmind_attentive_readerreproduces the experimental settings of the DeepMind paper for the Attentive reader
deep_bidir_lstm_2x128implements a two-layer bidirectional LSTM reader
We trained the three models during 2 to 4 days on a Titan Black GPU.
We would like to thank the developers of Theano, Blocks and Fuel at MILA for their excellent work.
We thank Simon Lacoste-Julien from SIERRA team at INRIA, for providing us access to two Titan Black GPUs.
Theano implementation of Deep LSTM Reader & Attentive Reader from Google DeepMind’s paper Teaching Machines to Read and Comprehend – Hermann et al. (2015):
- Python 2.7
- Scikit-learn (for computing F1 score)
Acknowledgment: This code uses a portion of Data reading interface written by Danqi Chen.
Try the Teaching an AI yourself on Google Colab:
Learning to Learn
In a 2016 paper, Learning to Learn in TensorFlow, authors from Google DeepMind, the University of Oxford, and the Canadian Institute for Advanced Research used Learning to learn by gradient descent by gradient descent:
Learning to Learn Abstract
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
python train.py --problem=mnist --save_path=./mnist
save_path: If present, the optimizer will be saved to the specified path every time the evaluation performance is improved.
num_epochs: Number of training epochs.
log_period: Epochs before mean performance and time is reported.
evaluation_period: Epochs before the optimizer is evaluated.
evaluation_epochs: Number of evaluation epochs.
problem: Problem to train on. See Problems section below.
num_steps: Number of optimization steps.
unroll_length: Number of unroll steps for the optimizer.
learning_rate: Learning rate.
true, the optimizer will try to compute second derivatives through the loss function specified by the problem.
python evaluate.py --problem=mnist --optimizer=L2L --path=./mnist
path: Path to saved optimizer, only relevant if using the
learning_rate: Learning rate, only relevant if using
num_epochs: Number of evaluation epochs.
seed: Seed for random number generation.
problem: Problem to evaluate on. See Problems section below.
num_steps: Number of optimization steps.
The training and evaluation scripts support the following problems (see
util.py for more details):
simple: One-variable quadratic function.
simple-multi: Two-variable quadratic function, where one of the variables is optimized using a learned optimizer and the other one using Adam.
quadratic: Batched ten-variable quadratic function.
mnist: Mnist classification using a two-layer fully connected network.
cifar: Cifar10 classification using a convolutional neural network.
cifar-multi: Cifar10 classification using a convolutional neural network, where two independent learned optimizers are used. One to optimize parameters from convolutional layers and the other one for parameters from fully connected layers.
New problems can be implemented very easily. You can see in
train.py that the
meta_minimize method from the
MetaOptimizer class is given a function that returns the TensorFlow operation that generates the loss function we want to minimize (see
problems.py for an example).
It’s important that all operations with Python side effects (e.g. queue creation) must be done outside of the function passed to
cifar10 function in
problems.py is a good example of a loss function that uses TensorFlow queues.
Disclaimer: This is not an official Google product.