The technical papers to show you the key under the hood technologies in AI - 2024-05-10
1. The Annotated Transformer (Attention is All You Need - https://arxiv.org/pdf/1706.03762)
https://nlp.seas.harvard.edu/annotated-transformer/
The Transformer has been on a lot of people's minds over the last five years. This post presents an annotated version of the paper in the form of a line-by-line implementation. It reorders and deletes some sections from the original paper and adds comments throughout. This document itself is a working notebook, and should be a completely usable implementation. Code is available here (https://github.com/harvardnlp/annotated-transformer/)
2. The First Law of Complexodynamics
https://scottaaronson.blog/?p=762
https://scottaaronson.blog/
The blog of Scott Aaronson - "If you take nothing else from this blog: quantum computers won't solve hard problems instantly by just trying all solutions in parallel"
3. The Unreasonable Effectiveness of Recurrent Neural Networks
https://karpathy.github.io/2015/05/21/rnn-effectiveness/
"We'll train RNNs to generate text character by character and ponder the question "how is that even possible?"
BTW, together with this post I am also releasing code that allows you to train character-level language models based on multi-layer LSTMs. (https://github.com/karpathy/char-rnn)
4. Understanding LSTM Networks
https://colah.github.io/posts/2015-08-Understanding-LSTMs/
5. Recurrent Neural Network Regularization
https://arxiv.org/pdf/1409.2329.pdf
Present a simple regularization technique for Recurrent Neural Networks (RNNs)- with Long Short-Term Memory (LSTM) units.
6. Keeping Neural Networks Simple by Minimizing the Description Length of the Weights
https://www.cs.toronto.edu/~hinton/absps/colt93.pdf
Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases.
7. Pointer Networks
https://arxiv.org/pdf/1506.03134.pdf
Introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence.
8. ImageNet Classification with Deep Convolutional Neural Networks
https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
AlexNet
code: https://github.com/ulrichstern/cuda-convnet
9. Order Matters: Sequence to Sequence for Sets
https://arxiv.org/pdf/1511.06391
The order in which we organize input and/or output data matters significantly when learning an inderlying model.
10. GPipe: Easy Scalling with Micro-Batch Pipeline Parallelism
https://arxiv.org/pdf/1811.06965
Introduce GPipe, a pipeline parallelism library that allows acaling any network that can be expressed as a sequence of layers
11. Deep Residual Learning for Image Recognition
https://arxiv.org/pdf/1512.03385
ResNet
12. Multi-scale Context Aggregation by Dilated Convolution
https://arxiv.org/pdf/1511.07122
A new convolution network module that is specifically designed for dense prediction
13. Nerual Message Passing for Quantum Chemistry
https://arxiv.org/pdf/1704.01212
Message Passing Nerual Networks (MPNNs)
14. Attention Is All You Need
https://arxiv.org/pdf/1706.03762
Attention and Transformer
15. Neural Machine Translation by Jointly Learning to Align and Translate
https://arxiv.org/pdf/1409.0473
Allow a model to automatically (soft-)serach for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
16. Identity Mapings in Deep Residual Networks
https://arxiv.org/pdf/1603.05027
Analyze the propagation formulations behind the resdual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.
code: https://github.com/KaimingHe/resnet-1k-layers
17. A Simple Neural Network Module for Relational Reasoning
https://arxiv.org/pdf/1706.01427
Use Relation Networks (RNs) as a simple plug-and-play module to solve problems that fundamentally hinge on relational reasoning
18. Variational Lossy Autoencoder
https://arxiv.org/pdf/1611.02731
Present a simple but principle method to learn such global representations by combining Variational Autoencoder (VAE) with neural autoregressive models such as RNN, MADE and PixelRNN/CNN
19. Relational Recurrent Neural Networks
https://arxiv.org/pdf/1806.01822
Relational Memory Core (RMC) - which employs multi-head dot product attention to allow memories to interact
20. Quantifying the Rise and Fall of Complexity in Closed Systems: The Coffee Automaton
https://arxiv.org/pdf/1405.6903
21. Neural Tuing Machines
https://arxiv.org/pdf/1410.5401
22. Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
https://arxiv.org/pdf/1512.02595
23. Scaling Laws for Neural Language Models
https://arxiv.org/pdf/2001.08361
24. A Tutorial Introduction to the Minimum Description Length Principle
https://arxiv.org/pdf/math/0406077
25. Machine Super Intelligence
https://www.vetta.org/documents/Machine_Super_Intelligence.pdf
26. Kolmogorov Complexity and Algorithmic Randomnes
https://www.lirmm.fr/~ashen/kolmbook-eng-scan.pdf
27. CS231n Convolutional Neural Networks for Visual Recognition
https://cs231n.github.io/
标签:what,Neural,make,arxiv,matters,https,pdf,org,Networks From: https://www.cnblogs.com/munanbuer/p/18219530