Five reason to explore Deep Learning:
1. learning representation; 2. the need for distribution representation -- curse dimensionality; 3. unsurperwised feature and weight learning; 4. multi-level representation; 5. why now (RBM,训练方法等出现)
1. the basic
1.1 from logistic regression to neural nets
看问题角度很有意思。逻辑回归本身就是一个单一神经元的神经网络(感知机)。而(三层)神经网络,就是多个逻辑回归模型放到一起,各自输出各自的,然后再加一个softmax层,变成分类器。
From Maxent Classifiers to Neural Networks
最大熵的函数形式,也可以转成sigmoid函数形式,所以最大熵也等同于只有一个神经元的神经网络。在实际应用中,最大熵也可以作为softmax层来使用。
训练神经网络:(1)Stochastic gradient descent (梯度下降);(2)Conjugate gradient or L-BFGS
为什么神经网络需要非线性(Non-linearities)?如果都是线性的话,多层神经网络的描述能力相当于只有一个层的神经网络。
1.2 word representation
one-hot representation;
distributional representation;
class-based representation (hard class -- cluster, or soft class -- LDA);
word embedding
1.3 unsuperwised word vector learning
feed-forward computation:如何计算一个语句s(cat chills on a mat)的概率?
构建三层神经网络,输入层是每个词(和对应的实数向量),中间隐含层,输出层是单个节点变量,表示句子概率。
训练的时候,给定一个ngram窗口,来构建上述神经网络,输出ngram概率s;同时,在当前ngram的基础上,构建反例,同样用上述网络计算反例概率sc。则,目标优化函数是最小化这个数值
J = max (0, 1-S+Sc)
google 的 word2vec,用的就是这个目标函数。
为了优化这个目标函数,可以用梯度下降方法计算梯度,bp方式逐层更新网络权重。
1.4backpropagation training
介绍bp的基本原理
1.5learning word level classifiers: pos and ner
和1.3中的训练ngram的网络结构类似,只不过“replaces the single scalar score with a SoBmax/Maxent classifier”,即最上一层是softmax层,用来做分类器。
The interesting twist in deep learning is that the input featuresare also learned——同传统bp过程不同的是,word embedding中,输入向量(指word embedding)也被学习了。
word embedding也有助于在各个资源(词典)之间share信息——以词为单位,信息源融合
1.6sharing statistical strength
semi-supervised learning:指先用unsupervised learning做pretrain,然后用supervised learning做细调。pretrain能成功的一个理由是:原则上我们要得到条件概率p(c|x),不过pretrain得到的是p(x),后者能够很好地逼近前者。
autoencoder:multi-level NN with output = input
pca = linear manifold = linear auto-encoder
正常autoencoder相当于non-linear pca
附:"manifold"这个词的含义相当于“复印”,即在某个方向上存在微小变化,但是总体来讲还和原来的物体一致。
Minimizing reconstruction errorforces latent representation of“similar inputs” to stay onmanifold。
autoencoder改进:对于离散输入,用交叉熵或者log-likelihood作为准则函数;Undercomplete、Sparsity、Denoising、Contractive等问题的解决,其中Sparsity的解决是强迫参数在0附近。
2. recursive NN
2.1 motivation
RNN可以学习句子的句法结构,但只能是二叉树的结构。
2.2 RNN for parsing
可以参考“leanring meanings for sentence”
2.3 theory: bp through structure
介绍很简略,不过基本过程与bp一致。
对于语法树中的每一个节点,节点的label计算,可以在节点的向量表示的基础上,加上softmax层,进行训练和标记。
实验表明,这种方法对短句效果比较好,对长句的效果比较差
还讲了几个应用:paraphrase detection、scene parsing(用NLP中的parsing应用在图像上面,分析图像结构)
2.4 recusive auto-encoders
类似RNN,只不过目标函数不再是一个surpervised score,而是reconstruct error
semi-supervised autoencoders,在目标函数中加入了cross entropy
2.5 applications tosentiment detection(情感倾向性检测)and paraphrase detection
sentiment detection(情感倾向性检测):bag of words方法,采用本文自动学习向量的方法(在此基础上再构件分类器,区分是“正面”倾向还是“负面”倾向的情感)
paraphrase detection:how to compare the meanings of two sentences?
recusive auto-encoder to full sentence paraphrase detection (sochar 2011): 用2.3的方法分别计算两个句子的语法树、以及非叶子结点,同叶子节点一起,两颗语法树的节点之间计算相似度,形成相似度矩阵,在矩阵基础之上,再用NN方法,计算paraphrase的可能性。
个人疑问:句子的长度不同,形成的相似度矩阵的大小(两个维度)不同,如何将不同规模的矩阵,用同样的NN方法来计算相似度的值,ppt中没说,只能看sochar原文了。
2.6compositionality through recursive matrix-vector spaces
上文中,语法树每个中间节点都由一个vector来表示,本小节中的方法,除了vector之外,还有一个matrix。方法比较复杂,介绍比较简略。
3. applications
3.1 applications
3.1.1 nerual language model
LM: Bengio 2003
ASR:Mikolov 2011 word2vec
output bottleneck:通常,NNLM的输出是个向量,向量的维度与词表大小有关,最简单的,one-hot表示方法,或者输出向量是ngram中要预测的词语的向量,但是该向量要与词表中每个词语做相似度计算,从而确定预测出的是哪个词语。
对这个问题,Mikolov借鉴class-based language model的想法,在NNLM上也是输出为word class,然后再用p(word|class, context)来还原计算p(word|context)
SMT:也是从LM角度来做的,将从前SMT中的ngram换成NNLM
3.1.2structured embedding fo knowledge bases
Bengio aaai2011
3.1.3assorted speech and nlp applications
learn multiple word vectors:处理一词多义现象——用多个word vector来表示这个词语
......
3.2 resources (tutorials and code)
• See “Neural Net Language Models” Scholarpedia entry
• Deep Learning tutorials: http://deeplearning.net/tutorials
• Stanford deep learning tutorials with simple programming assignments and reading list
http://deeplearning.stanford.edu/wiki/
• Recursive Autoencoder class project
http://cseweb.ucsd.edu/~elkan/250B/learningmeaning.pdf
• Graduate Summer School: Deep Learning, Feature Learning
http://www.ipam.ucla.edu/programs/gss2012/
• ICML 2012 Representation Learning tutorial http://www.iro.umontreal.ca/~bengioy/talks/deep-learning-tutorial-2012.html
• Paper references in separate pdf
softwares
• Theano (Python CPU/GPU) mathema>cal and deep learning library http://deeplearning.net/so\ware/theano
• Can do automatic, symbolic differen>a>on
• Senna: POS, Chunking, NER, SRL
• by Collobert et al. http://ronan.collobert.com/senna/
• State-of-the-art performance on many tasks
• 3500 lines of C, extremely fast and using very liCle memory
• Recurrent Neural Network Language Model
http://www.fit.vutbr.cz/~imikolov/rnnlm/
• Recursive Neural Net and RAE models for paraphrase detection, sentiment analysis, relation classification
www.socher.org
3.3 deep learning tricks
• Stochastic gradient descent and seáng learning rates• Main hyper-parameters• Learning rate schedule & Early stopping• Minibatches• Parameter initialization• Number of hidden units• L1 or L2 weight decay• Sparsity regularization• Debugging à Finite difference gradient check (Yay)• How to efficiently search for hyper-parameter configurationstanh(z)=2logistic(2z)−1tanh is better than sigmoid(logistic) in deep learningOrdinary gradient descent is a batch method, very slow, should never be used. Use 2nd order batch method such as LBFGS.learning rate: Better results can generally be obtained by allowing learning rates to decrease, typically in O(1/t)
parameter initialization:
Initialize hidden layer biases to 0 and output (or reconstruction) biases to optimal value if weights were 0
Initialize weights ~ Uniform(-r,r), r inversely proportional to fanin (previous layer size) and fan-out (next layer size)
相关推荐
Deep Learning for NLP at Oxford with Deep Mind 2017英文课件Deep Learning for NLP at Oxford with Deep Mind 2017英文课件Deep Learning for NLP at Oxford with Deep Mind 2017英文课件
deep learning for nlp.
Deep Learning for NLP and Speech Recognition,2019年新书,介绍深度学习在自然语言处理和语音识别中的应用。
这些项目将涵盖医疗、自动驾驶、和自然语言处理等时髦领域,以及音乐生成等等。Coursera上有一些特定方向和知识的资料,但一直没有比较全面、深入浅出的深度学习课程——《深度学习专业》的推出补上了这一空缺。 ...
Discover the concepts of deep learning used for natural language processing (NLP), with full-fledged examples of neural network models such as recurrent neural networks, long short-term memory ...
• Explore advanced deep learning techniques and their applications across computer vision and NLP. • Learn how a computer can navigate in complex environments with reinforcement learning. Book ...
如何在计算机视觉、自然语言处理(NLP)、推荐系统、表格和时间序列数据分析中创建最先进的模型:《Deep Learning for Coders with fastai and PyTorch》 如何使用全新的fastai v2库和PyTorch 深度学习的基础:什么是...
Explore advanced deep learning techniques and their applications across computer vision and NLP. Learn how a computer can navigate in complex environments with reinforcement learning.
机器学习 - MachineLearning - ML、深度学习 - DeepLearning - DL、自然语言处理 NLP
While deep learning has revolutionized computer vision and natural language processing, there’s still a lot to uncover for its applications in search. I’m sure we can’t (yet?) rely on deep learning...
the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful...
Deep Learning for Sentiment Analysis : A Survey(深度学习情感分析综述)
You’ll also see examples of machine learning concepts such as semi-supervised learning, deep learning, and NLP. Advanced Data Analytics Using Python also covers important traditional data analysis ...
Python Deep Learning Projects: 9 projects demystifying neural network and deep learning models for building intelligent systems By 作者: Matthew Lamons – Rahul Kumar – Abhishek Nagaraja ISBN-10 书号...
Gain the fundamentals of deep learning and its mathematical prerequisites Discover deep learning frameworks in Python Develop a chatbot Implement a research paper on sentiment classification
Practical Deep Learning For Coders, Part 1(5-7) 视频共7个,每个大概1个半小时,由于视频太大分成两部分,本部分是课程5-7。 Practical Deep Learning For Coders, Part 1, Base on Torch taught by Jeremy ...
Train popular deep learning models for computer vision and NLP Use extensive abstraction libraries to make development easier and faster Learn how to scale TensorFlow, and use clusters to distribute ...
Aaron Courville 主要专注于计算机视觉应用,在其他领域,如自然语言处理、音频信号处理、语音理解和其他AI 相关任务方面也有所研究。 中文版审校者简介 张志华,北京大学数学科学学院统计学教授,北京大学大数据...
( Deep Learning for Natural Language Processing(CS224d).ra nlp入门课程
Practical Deep Learning For Coders, Part 1(1-4) 视频共7个,每个大概1个半小时,由于视频太大分成两部分,本部分是课程1-4。 Practical Deep Learning For Coders, Part 1, Base on Torch taught by Jeremy ...