Arora S. Theory of Deep Learning 2023
Download this torrent!
Arora S. Theory of Deep Learning 2023
To start this P2P download, you have to install a BitTorrent client like qBittorrent
Category: Other
Total size: 24.55 MB
Added: 2025-03-10 23:39:07
Share ratio:
9 seeders,
4 leechers
Info Hash: A02EE4431E26397542EDC05E866E625372866E91
Last updated: 9.4 hours ago
Description:
Textbook in PDF format
Basic Setup and some math notions
List of useful math facts
Basics of Optimization
Gradient descent (GD)
Stochastic gradient descent (SGD)
Accelerated Gradient Descent
Running time: Learning Rates and Update Directions
Convergence rates under smoothness conditions
Correspondence of theory with practice
Note on overparametrized linear regression and kernel regression
Overparametrized least squares linear regression
Kernel least-squares regression
Note on Backpropagation and its Variants
Problem Setup
Backpropagation (Linear Time)
Auto-differentiation
Notable Extensions
Basics of generalization theory
Occam's razor formalized for ML
Some simple upper bounds on generalization error
Data dependent complexity measures
Understanding limitations of the union-bound approach
A Compression-based framework
PAC-Bayes bounds
Exercises
Tractable Landscapes for Nonconvex Optimization
Preliminaries and challenges in nonconvex landscapes
Cases with a unique global minimum
Symmetry, saddle points and locally optimizable functions
Case study: top eigenvector of a matrix
Escaping Saddle Points
Preliminaries
Perturbed Gradient Descent
Saddle Points Escaping Lemma
Algorithmic Regularization
Linear models in regression: squared loss
Matrix factorization
Linear Models in Classification
Homogeneous Models with Exponential Tailed Loss
Induced bias in function space
Ultra-wide Neural Networks and Neural Tangent Kernels
Evolution equation for net parameters
NTK: Simple 2-layer example
Explaining Optimization and Generalization of Ultra-wide Neural Networks via NTK
NTK formula for Multilayer Fully-connected Neural Network
NTK in Practice
Exercises
Interpreting output of Deep Nets: Credit Attribution
Influence Functions
Shapley Values
Data Models
Saliency Maps
Inductive Biases due to Algorithmic Regularization
Matrix Sensing
Deep neural networks
Landscape of the Optimization Problem
Role of Parametrization
SDE approximation of SGD and its implications
Understanding gradient noise in SGD
Stochastic processes: Informal Treatment
Notion of closeness between stochastic processes
Stochastic Variance Amplified Gradient (SVAG)
Effect of Normalization in Deep Learning
Warmup Example: How Normalization Helps Optimization
Normalization schemes and scale invariance
Exponential learning rate schedules
Convergence analysis for GD on Scale-Invariant Loss
Unsupervised learning: Distribution Learning
Possible goals of unsupervised learning
Training Objective for Learning Distributions: Log Likelihood
Variational method
Autoencoders and Variational Autoencoder (VAEs)
Normalizing Flows
Stable Diffusion
Language Models (LMs)
Transformer Architecture
Explanation of Cross-Entropy Loss
Scaling Laws and Emergence
(Mis)understanding, Excess entropy, and Cloze Questions
How to generate text from an LM
Instruction tuning
Aligning LLMs with human preferences
Mathematical Framework for Skills and Emergence
Analysis of Emergence (uniform cluster)
Generative Adversarial Nets
Distance between Distributions
Introducing GANs
"Generalization" for GANs vs Mode Collapse
Self-supervised Learning
Adversarial Examples and efforts to combat them
Basic Definitions
Provable defense via randomized smoothing
Examples of Theorems, Proofs, Algorithms, Tables, Figures
Example of Theorems and Lemmas
Example of Long Equation Proofs
Example of Algorithms
Example of Figures
Example of Tables
Exercise
Bibliography