RecordTrend.com is a website that focuses on future technologies, markets and user trends. We are responsible for collecting the latest research data, authority data, industry research and analysis reports. We are committed to becoming a data and report sharing platform for professionals and decision makers. We look forward to working with you to record the development trends of today’s economy, technology, industrial chain and business model.Welcome to follow, comment and bookmark us, and hope to share the future with you, and look forward to your success with our help.
The following is the Top 10 research progress of machine learning in 2020 recommended by recordtrend.com. And this article belongs to the classification: professional knowledge.
What are the important developments of machine learning that you must pay attention to last year? Listen to what deepmind research scientists say.
In 2020, because covid-19, many people had to work and study at home, and a large number of academic conferences on artificial intelligence were also turned online. But last year we still saw a lot of progress in AI technology. Sebastian ruder, a deepmind research scientist, recently gave us a summary of last year’s machine learning community.
First of all, you must understand that the selection of these key points is based on the author’s personal familiar fields, and the selected topics tend to be representational learning, transfer learning, and oriented to natural language processing (NLP). If readers have different opinions, they can leave their own comments.
Sebastian ruder listed the top ten machine learning research progresses in 2020 as follows:
01, large model and efficient model
The development of language model from 2018 to 2020
(picture from state of AI report 2020)
Q. what happened in 2020?
In the past year, we have seen many unprecedented giant language and speech models, such as Meena (adiwardana et al., 2020), Turing NLG, BST (roller et al., 2020) and gpt-3 (Brown et al., 2020). At the same time, researchers have long realized that training such models will consume excessive energy (strubel et al., 2019), and turn to explore smaller models with good results: some recent progress directions come from tailoring (Sajjad et al., 2020, Sanh et al., 2020,), quantification (fan et al., 2020b), distillation (Sanh et al., 2019, sun et al, 2020) and compression (Xu et al., 2020).
Other studies focus on how to make the transformer architecture itself more efficient. The models include performer (choromanski et al., 2020) and big bird (zaheer et al., 2020), as shown in the first figure of this paper. This figure shows the performance (Y-axis), speed (x-axis) and memory footprint (circle size) of different models in the long range arena benchmark (Tay et al., 2020).
Tools like experiment impact tracker (Henderson et al., 2020) have given us a better understanding of the energy efficiency of the model. Its researchers have also promoted competitions and benchmarks to assess efficiency, such as the sustainnlp Seminar on emnlp 2020, the efficient QA competition on neurips 2020, and Hulk (Zhou et al., 2020).
The expansion of model size can make us break through the limit of deep learning ability. And in order to deploy them in the real world, models have to be efficient. These two directions complement each other: compressing large-scale models can give consideration to both efficiency and performance (Li et al., 2020), and more efficient methods can also promote stronger and larger models (Clark et al., 2020).
In view of the consideration of efficiency and availability, I think the focus of future research is not only on the performance of the model and the number of parameters, but also on energy efficiency. This will help people to make a more comprehensive evaluation of new methods, so as to narrow the gap between machine learning research and practical application.
02. Retrieval enhancement
Using realm for unsupervised pre training,
The retriever and encoder are pre trained jointly.
Large scale models can use pre training data to learn surprising global knowledge, which enables them to reconstruct facts (Jiang et al., 2020) and answer questions without touching the external context (Roberts et al., 2020). However, it is inefficient to store the knowledge implicitly in the model parameters, and a huge model is needed to store enough information. In contrast, some recent methods choose to train retrieval model and large-scale language model at the same time, and obtain strong results on knowledge intensive NLP tasks, such as open domain question answering (Gu et al., 2020, Lewis et al., 2020) and language modeling (Khandelwal et al., 2020).
The main advantage of these methods is that the retrieval is directly integrated into the pre training of the language model, which makes the language model more efficient and focuses on learning more challenging concepts in natural language understanding. Therefore, the best system in the neurips 2020 efficient QA competition depends on retrieval (min et al., 2020).
Retrieval is a standard method for many generating tasks, such as text summarization and dialogue, which have been replaced by summarization generation (allahyari et al., 2017). Retrieval enhanced generation can combine the advantages of two aspects: the fact correctness and authenticity of the retrieval segment, and the relevance and composition of the generated text.
Retrieval enhancement generation is especially useful for dealing with the past failure cases of neural model generation, especially in dealing with hallucination (NIE et al., 2019). It can also help to make the system easier to interpret by directly providing prediction basis.
03. Learning with few samples
Prompt based tuning tips and demonstrations using Templating
（Gao et al., 2020）
In the past few years, due to the progress of pre training, the number of training samples for a given task has continued to decrease (Peters et al., 2018, Howard et al., 2018). We are now in the stage where we can use dozens of examples to complete a given task (Bansal et al., 2020). Naturally, people think of the paradigm of few sample learning to transform language modeling, and the most prominent example is the context learning method in gpt-3. It can predict based on some input-output pairs and a hint. No gradient update is required.
However, this approach still has its limitations: it needs a huge model – the model needs to rely on existing knowledge – the amount of knowledge that the model can use is limited by its context window, and the prompt needs to be done manually.
Recent work attempts to make small sample learning more effective by using small models, integrating fine-tuning and automatically generating natural language cues (Schick and sch ü Tze, 2020, Gao et al., 2020, Shin et al., 2020). These studies are closely related to the wider field of controllable neural text generation, which attempts to make extensive use of the generation ability of pre training models.
In this regard, you can refer to a blog by Lilian Weng:
Small sample learning can make a model quickly undertake various tasks. But it is wasteful to update the weight of the whole model for each task. We’d better make a local update so that the changes are concentrated in a small number of parameters. There are several ways to make these fine-tuning more effective and practical, including using adapter (houlsby et al., 2019, Pfeiffer et al., 2020A, Ü St ü n et al., 2020), adding sparse parameter vector (Guo et al., 2020), and only modifying deviation value (Ben zaken et al., 2020).
Based on only a few examples, the model can learn to complete the task, which greatly reduces the threshold of machine learning and NLP model application. This allows the model to adapt to new areas and opens the way for application possibilities in the case of expensive data.
For real-world situations, we can collect thousands of training samples. The model should also be able to switch seamlessly between small sample learning and large training set learning, and should not be limited by text length. The model fine tuned in the whole training set has achieved better performance than human beings in many popular tasks such as superglue, but how to enhance its small sample learning ability is the key to the improvement.
04. Comparative study
Instance discrimination compares features from different transformations of the same image
Caron et al., 2020_;
Contrastive learning is a method to describe similar and different tasks for ML model. Using this method, machine learning model can be trained to distinguish similar and different images.
Recently, contrastive learning has become more and more popular in the self supervised representation learning of computer vision and speech (van den oord, 2018; H é naff et al., 2019). The new generation of self supervised powerful methods for visual representation learning rely on the comparative learning of instance discrimination task: different images are regarded as negative pairs, and multiple views of the same image are regarded as positive pairs. Recent methods further improve this general framework: simclr (Chen et al., 2020) defines the contrast loss of enhanced instances; momentum contrast (he et al., 2020) attempts to ensure a large and consistent set of sample pairs; swav (Caron et al., 2020) uses on-line clustering; byol only uses positive pairs (grill et al., 2020). Chen and he (2020) further proposed a simpler formulation related to the previous method.
Recently, Zhao et al. (2020) found that data enhancement is crucial for contrastive learning. This may indicate why unsupervised contrastive learning using large pre training models is not successful in NLP where data enhancement is less common. They also assume that the reason why instance discrimination is better than supervised pre training in computer vision is that it does not try to make the features of all instances in a class similar, but retains the information of each instance. In NLP, gunel et al. (2020) unsupervised pre training involves the classification of thousands of word types. In NLP, gunel et al. (2020) recently used contrastive learning for supervised fine-tuning.
The goal of cross entropy between one hot tag and logit output in language modeling has some limitations, such as poor generalization in unbalanced classes (Cao et al., 2019). Contrastive learning is an alternative complementary paradigm, which can help alleviate some of the problems.
The combination of contrastive learning and masked language modeling enables us to learn richer and more robust representations. It can help solve the problems caused by model outliers and rare syntactic and semantic phenomena, which is a challenge to the current NLP model.
It’s not just the accuracy that needs to be evaluated
Checklist template and test for exploring negative understanding in sentiment analysis
Ribeiro et al., 2020
The SOTA model in NLP has surpassed human performance in many tasks, but can we believe that such a model can achieve real natural language understanding (yogatama et al., 2019; bender and Koller, 2020)? In fact, the current model is far from this goal. But paradoxically, the existing simple performance indicators can not reflect the limitations of these models. There are two key topics in this field: a) selecting examples that are difficult to deal with by current models; b) not only selecting simple indicators such as accuracy, but also conducting more fine-grained evaluation.
As for the former, a common method is to use anti filtering in the process of data set creation (Zellers et al., 2018) to filter out the samples correctly predicted by the current model. Recent studies have proposed more effective anti filtering methods (Sakaguchi et al., 2020; Le bras et al., 2020) and an iterative data set creation and processing method (NIE et al., 2020; Bartolo et al., 2020), in which samples are filtered and models are retrained for several rounds. Dynabench provides a subset of such changing benchmarks.
The approach to the second point is similar in nature. In this field, minimal pairs (also known as counterfactual examples or contrast sets) are usually created (Kaushik et al., 2020; Gardner et al., 2020; wartadt et al., 2020). These minimal pairs interfere with the examples in a minimal way and often change the gold label. Ribeiro et al. (2020) formalized some basic intuitions in the checklist framework, so that such test cases can be created semi automatically. In addition, describing examples based on different attributes can make a more fine-grained analysis of the advantages and disadvantages of the model (Fu et al., 2020)
In order to build a more powerful machine learning model, we not only need to know whether the model is better than the previous system, but also need to know what kind of errors it will lead to and what problems are not reflected. By providing fine-grained diagnosis of model behavior, we can more easily identify model defects and propose solutions. Similarly, fine-grained evaluation can be used to compare the advantages and disadvantages of different methods.
06. Practical application of language model
The model generates harmful results based on seemingly harmless hints
(Gehman et al., 2020)
Compared with the 2019 language model (LMS) analysis, which focuses on the grammatical, semantic and world cognitive climate captured by such models, the analysis in recent year reveals many practical problems.
For example, pre trained LM is easy to generate “toxic” language (Gehman et al., 2020) “and leak information (song & raghunathan, 2020). There is also the problem of being vulnerable to attack after fine-tuning, so that the attacker can manipulate the prediction results of the model (Kurita et al., 2020; Wallace et al., 2020), and be vulnerable to the influence of the model (Krishna et al., 2020; Carlini et al., 2020).
It is well known that the pre training model can capture biases about protected attributes (such as gender) (bolukbasi et al., 2016; Webster et al., 2020). The study of sun et al., 2019 provides a survey to reduce gender bias.
The large-scale pre training models launched by large companies often have positive deployment in actual scenarios, so we should be more aware of the biases and harmful consequences of these models.
With the development and launch of larger models, it is important to incorporate these biases and fairness issues into the development process from the beginning.
Uneven distribution of global marked / unlabeled linguistic data
(Joshi et al., 2020)
In 2020, multilingual NLP has many bright spots. Masakhane’s keynote speech at the Fifth Conference on machine translation (wmt20), which aims to strengthen NLP research in African languages, is one of the most encouraging speeches of last year. In addition, new common benchmarks for other languages emerged in this year, including Xtreme (Hu et al., 2020), xglue (Liang et al., 2020), indonlu (WILIE et al., 2020), and indicglue (Kakwani et al., 2020). The existing data sets are also extended to other languages, such as:
SQuAD: XQuAD (Artetxe et al., 2020), MLQA (Lewis et al., 2020), FQuAD (d’Hoffschmidt et al., 2020);
Natural Questions: TyDiQA (Clark et al., 2020), MKQA (Longpre et al., 2020);
MNLI:OCNLI（Hu et al.，2020），Farstail（Amirkhani et al.，2020）；
the CoNLL-09 dataset: X-SRL (Daza and Frank, 2020);
the CNN/Daily Mail dataset: MLSUM (Scialom et al., 2020).
Through hugging face dataset, you can access most of them, as well as data in many other languages. Powerful models covering 100 languages have emerged, including xml-r (conneau et al., 2020), rembert (Chung et al., 2020), infoxlm (chi et al., 2020), etc. for details, please refer to Xtreme ranking. A large number of language specific Bert models have been trained for languages other than English, such as arabert (antoun et al., 2020) and indobert (WILIE et al., 2020). For more information, see the research of Nozza et al., 2020; rust et al., 2020. With the help of efficient multi language frameworks, such as adapterhub (Pfeiffer et al., 2020), stanza (Qi et al., 2020) and trankit (Nguyen et al., 2020), the modeling and application of many languages in the world become much easier.
In addition, there are two enlightening studies, “the state and fate of linguistic diversity (Joshi et al., 2020)” and “decimating speech and Language Technology (bird, 2020)”. The first article emphasizes the urgency of using languages other than English, and the second article points out that language communities and data should not be regarded as commodities.
There are many advantages in extending NLP research beyond English, which can have a real impact on human society. Considering the availability of data and models in different languages, NLP models other than English will do a lot. At the same time, it is still an exciting task to develop models that can cope with the most challenging settings and to determine which situations will cause the underlying assumptions of the current model to fail.
08. Image transformers
In vision transformer’s paper, researchers applied transformer encoder to flat image block.
Transformer has achieved great success in the field of NLP, but it is not so successful in the field of computer vision dominated by convolutional neural network CNN. In early 2020, Detr (Carion et al., 2020) used CNN to calculate image features, but the later model was completely convolutional. Image GPT (Chen et al., 2020) uses the gpt-2 method to perform the pre training directly from the pixels, and its performance is better than the supervised wide RESNET. The later model is to reshape the image as a patch which is regarded as a “token”. Vision transformer (VIT, dosovitskiy et al., 2020) has trained on millions of labeled images, each image contains such patches, and the model effect is better than the latest CNN. Image processing transformer (IPT, Chen et al., 2020) implemented a new SOTA on the low-level image task by performing contrast loss pre training on the damaged Imagenet sample. Data efficient image transformer (deit, touvron et al., 2020) has been trained on Imagenet by distillation.
Interestingly, researchers have found that CNN is a better teacher, which is similar to the application of inductive bias to Bert (kuncoro et al., 2020). In contrast, in the field of voice, transformer is not directly applied to audio signal, but usually takes the output of CNN encoder as input (Moritz et al., 2020; Gulati et al., 2020; conneau et al., 2020).
Compared with CNN and RNN, transformer has less inductive bias. Although theoretically, it is not as powerful as RNN (Weiss et al., 2018; Hahn et al., 2020), based on sufficient data and scale, transformer will outperform other competitors.
In the future, we may see transformer becoming more and more popular in the field of CV. They are especially suitable for situations where there are enough computation and data for unsupervised pre training. In the case of small-scale configuration, CNN should still be the preferred method and baseline.
09. Natural science and machine learning
Alphafold architecture based on self attention
Last year, alphafold of deepmind achieved a breakthrough performance in the CASP protein folding challenge. In addition, there are still some significant progress in applying machine learning to natural science. Metnet (s ü nderby et al., 2020) proved that machine learning is superior to numerical weather prediction in precipitation prediction; lample and charton (2020) used neural network to solve differential equations, which is better than commercial computer system; bellemare et al. (2020) used reinforcement learning to navigate hot-air balloon in stratosphere.
In addition, ML has been widely used in covid-19. For example, Kapoor et al. Used ml to predict the spread of covid-19 and predict the structure related to covid-19. Anastasopoulos et al. Translated the relevant data into 35 different languages. Lee et al. Can answer the questions about covid-19 in real time.
For an overview of covid-19 related NLP applications, please refer to the proceedings of the 1st Workshop on NLP for covid-19 (Part 2) at emnlp 2020.
Natural science is the most influential application field of ml. Its improvement involves many aspects of life and can have a profound impact on the world. With the development of protein folding and other core fields, the application speed of ML in natural science will only accelerate. We are looking forward to more studies to promote world progress.
10. Reinforcement learning
Compared with the most advanced agents, the performance of agent57 and muzero in the whole training process is better than that of human benchmark (Badia et al., 2020).
A single deep reinforcement learning agent agent57 (Badia et al., 2020) surpasses humans in 57 Atari games for the first time, which is also a long-term benchmark in the field of deep reinforcement learning. The multifunctionality of agent comes from neural network, which allows switching between exploratory strategy and Exploitative Strategy.
Another milestone of reinforcement learning in games is muzero developed by schrittwieser et al. It can predict all aspects of the environment, and the environment is very important for accurate planning. Without any knowledge of game dynamics, muzero achieves SOTA performance on Atari and performs well on go, chess and Japanese chess.
Finally, Munchausen RL agent (vieillard et al., 2020) improves the level of SOTA through a simple and theoretically valid modification.
Reinforcement learning algorithm has many practical significance (bellemare et al., 2020). Researchers have improved the basic algorithm in this field, which has a great practical impact through better planning, environmental modeling and action prediction.
With the basic solution of classic Benchmarks (such as Atari), researchers may look for more challenging settings to test their algorithms, such as extending to distributed tasks, improving sample efficiency, multi task learning and so on.
By Sebastian ruder
Compiler: the heart of machine
From: the heart of machine
Link to the original text: https://ruder.io/research-highlights-2020/
Image from pexels
Read more: machine learning math anxiety want to work in the data industry? You have to master this core skill. In the right place, machine learning will bring about a revolution. Kaggle: 2017 machine learning and data science survey on how to explain some methods of machine learning. Can new developments in machine learning and satellite images be used to measure employment growth? Machine learning: the power and hope of the computer that can learn through samples to understand basic machine learning algorithm from model selection to hyper parameter adjustment: how to select algorithm shivon for machine learning project Zilis: machine learning industry ecology map version 3.0 machine learning hardware Outlook: specialization is the general trend machine learning Q & a website: quora case sharing ppt – (with download) in depth comments Amazon, Microsoft, Google, IBM and other six machine learning cloud industry released the employment report of AI talents: the annual salary of machine learning engineers is 980000, but AI employment growth is declining! Big data and AI strategy: machine learning and alternative data approach for investment (280 pages report attached)
If you want to get the full report, you can contact us by leaving us the comment. If you think the information here might be helpful to others, please actively share it. If you want others to see your attitude towards this report, please actively comment and discuss it. Please stay tuned to us, we will keep updating as much as possible to record future development trends.