Deep Learning Notations

by Elvis Saravia


Aim: This notebook contains useful notations widely used in deep learning papers and educational materials found online. I used similar notations used in the Deep Learning book written by Ian Goodfellow, Yoshua Bengio and Aaron Courville. I will also provide sample code using PyTorch to show the type of data structures and concepts these notation may represent.

Uses: You can reuse the notations in this notebook as a cheatsheet to assist you in writing your research papers, presentations, and blogs. It's also good resource for reviewing important mathematical notations used widely in deep learning research and other related fields. I provide example code in PyTorch but as an exercise, you can try generating similar code using Numpy or Tensorflow. (The code shouldn't be too different.)

Requirements: PyTorch


In [1]:
import torch

Number and Arrays

A scalar

$a$ - a scalar (integer or real)
Latex: $a$

In [2]:
a = 2
print(a)
2

A vector

$\boldsymbol a$ - a vector
Latex: $\boldsymbol a$

In [3]:
### 1D vector (column vector)
a = torch.Tensor([1,2]) 
print(a)
 1
 2
[torch.FloatTensor of size 2]

In [4]:
### 1D vector (row form)
a = torch.Tensor([[1,2]])
print(a)
 1  2
[torch.FloatTensor of size 1x2]

A matrix

$\boldsymbol A$ - a matrix
Latex: $\boldsymbol A$

In [5]:
A = torch.Tensor([[1,2,4],[4,5,6]])
print(A)
 1  2  4
 4  5  6
[torch.FloatTensor of size 2x3]

A Tensor

$\mathsf A$ - a tensor
Latex: $\mathsf A$

In [6]:
A = torch.Tensor([[[1., 2.], [3., 4.]],
                  [[5., 6.], [7., 8.]]])
print(A)
(0 ,.,.) = 
  1  2
  3  4

(1 ,.,.) = 
  5  6
  7  8
[torch.FloatTensor of size 2x2x2]

Identity matrix

$\boldsymbol I_n$ - identity matrix with $n$ rows and $n$ columns
Latex: $\boldsymbol I_n$

In [7]:
I = torch.eye(4)
print(I)
 1  0  0  0
 0  1  0  0
 0  0  1  0
 0  0  0  1
[torch.FloatTensor of size 4x4]

Standard Basis Vector

$\boldsymbol e^{(i)}$ - standard basic vector $[0,...,0,1,0,...,0]$ with a 1 at position $i$
Latex: $\boldsymbol e^{(i)}$

In [8]:
i = 5 # index
e = torch.zeros(9)
e[i]=1
print(e)
 0
 0
 0
 0
 0
 1
 0
 0
 0
[torch.FloatTensor of size 9]

Diagonal Matrix

$\text{diag}(\boldsymbol a)$ - A square, diagonal matrix with diagonal entries given by $\boldsymbol a$
Latex: $\text{diag}(\boldsymbol a)$

In [9]:
torch.diag(torch.randn(4))
Out[9]:
-0.1878  0.0000  0.0000  0.0000
 0.0000 -0.6244  0.0000  0.0000
 0.0000  0.0000 -0.5755  0.0000
 0.0000  0.0000  0.0000 -0.4487
[torch.FloatTensor of size 4x4]

Random Variables

$\rm a$ - a scalar random variable
Latex: $\rm a$

$\bf a$ - a vector-valued random variable
Latex: $\bf a$

$\rm {a_i}$ - element $i$ of the random vector $\bf a$
Latex: $\rm {a_i}$

$\bf A$ - a matrix-valued random variable
Latex: $\bf A$


Sets and Graphs

A set

$\mathbb{A}$ - a set
Latex: $\mathbb{A}$

$\mathbb{R}$ - the set of real numbers
Latex: $\mathbb{R}$

$\{ 0,1\}$ - the set containing $0$ and $1$
Latex: $\{ 0,1\}$

$\{ 0,1,...,n\}$ - the set of all integers between $0$ and $n$
Latex: $\{ 0,1,...,n\}$

$\left[ a, b\right]$ - the real interval including $a$ and $b$
Latex: $\left[ a, b\right]$

$(a,b ]$ - the real interval excluding $a$ but not including $b$
Latex: $(a,b ]$

$\mathbb{A} \backslash \mathbb{B}$ - set substraction, i.e., the set containing the elements of $\mathbb{A}$ that are not in $\mathbb{B}$
Latex: $\mathbb{A} \backslash \mathbb{B}$

$\mathcal{G}$ - a graph
Latex: $\mathcal{G}$

$Pa_{\mathcal{G}}(\rm x_{i})$ - the parents of $\rm x_{i}$ in $\mathcal{G}$


Indexing

$a_i$ - the i-th element of a vector (indexing starting at 0)
Latex: $a_i$

In [10]:
i = 1
a = torch.Tensor([1,2,3,4,5])
print(a[i])
2.0

$a_{-i}$ - all elements of vector $\boldsymbol a$ except for element $i$
Latex: $a_{-i}$

In [11]:
i = 2 # element 3
[b for b in a if b != a[i]]
Out[11]:
[1.0, 2.0, 4.0, 5.0]

$A_{ij}$ - element $i,j$ of a matrix $\boldsymbol A$
Latex: $A_{ij}$

In [12]:
A = torch.randn((4,4))
i, j = 2,2
print(A, A[i][j])
-1.7500  1.1511 -0.5790  1.2555
-0.1847  0.1199 -2.0554 -0.1277
 0.7025  0.2841  0.4543  0.2571
 1.0044  0.3005 -0.3800 -0.9919
[torch.FloatTensor of size 4x4]
 0.454309344291687

$\boldsymbol A_{i,:}$ - row $i$ of matrix $\boldsymbol A$
Latex: $A_{i,:}$

In [13]:
i = 2 # i.e., row 3
A[2,:]
Out[13]:
 0.7025
 0.2841
 0.4543
 0.2571
[torch.FloatTensor of size 4]

$\boldsymbol A_{:,i}$ - column $i$ of matrix $\boldsymbol A$
Latex: $\boldsymbol A_{:,i}$

In [14]:
i = 2 # i.e., column 3
A[:,i]
Out[14]:
-0.5790
-2.0554
 0.4543
-0.3800
[torch.FloatTensor of size 4]

$\mathsf A_{i,j,k}$ - element $(i,j,k)$ of a 3-D tensor $\mathsf A$
Latex: $\mathsf A_{i,j,k}$

In [15]:
i, j , k = 1,1,2 
A = torch.randn((2,2,3))
print(A)
print(A[i, j ,k])
(0 ,.,.) = 
 -0.5824  1.4592 -0.7219
  1.8931  2.7500 -0.9853

(1 ,.,.) = 
  0.5265 -0.5071 -0.8086
 -0.1803  0.2604 -0.3469
[torch.FloatTensor of size 2x2x3]

-0.3468764126300812

$\mathsf A_{:,:,i}$ - 2-D slice of a 3-D tensor
Latex: $\mathsf A_{:,:,i}$

In [16]:
i = 2 
A[:,:,i]
Out[16]:
-0.7219 -0.9853
-0.8086 -0.3469
[torch.FloatTensor of size 2x2]

Linear Algebra Operations

$\boldsymbol A^\top $ - transpose of matrix $\boldsymbol A$
Latex: $\boldsymbol A^\top $

In [17]:
A = torch.randn((3,2))
print(A)
print(A.t())
-0.5358  0.0759
 0.1394 -0.8398
-1.1586 -0.2389
[torch.FloatTensor of size 3x2]


-0.5358  0.1394 -1.1586
 0.0759 -0.8398 -0.2389
[torch.FloatTensor of size 2x3]

$\boldsymbol A^+$ - the Moore-Penrose pseudoinverse pseudoinverse of matrix $\boldsymbol A$
Latex: $\boldsymbol A^+$

$\boldsymbol A^{-1}$ - the inverse matrix of the square matrix $\boldsymbol A$
Latex: $\boldsymbol A^{-1}$

In [18]:
A = torch.randn((2,2))
print(A)
print(torch.inverse(A))
-1.5843 -1.2536
-1.6495  1.2179
[torch.FloatTensor of size 2x2]


-0.3047 -0.3136
-0.4127  0.3963
[torch.FloatTensor of size 2x2]

$\boldsymbol A \bigodot \boldsymbol B$ - element-wise (Hadamard) product of $\boldsymbol A$ and $\boldsymbol B$
Latex: $\boldsymbol A \bigodot \boldsymbol B$

In [19]:
A = torch.randn((2,2))
B = torch.randn((2,2))
print(A.mul(B))
-0.0398 -0.7343
 0.5065 -0.2811
[torch.FloatTensor of size 2x2]

$\text{det}(\boldsymbol A)$ - determinant of $\boldsymbol A$
Latex: $\text{det}(\boldsymbol A)$