Some of the most useful operators in linear algebra are norms. Informally, the norm of a vector tells us how big a vector is. The notion of size under consideration here concerns not dimensionality but rather the magnitude of the components.

In linear algebra, a vector norm is a function f that maps a vector to a scalar, satisfying a handful of properties.

Given any vector x, the first property says that if we scale all the elements of a vector by a constant factor α, its norm also scales by the absolute value of the same constant factor:

The second property is the familiar triangle inequality:

The third property simply says that the norm must be non-negative:

That makes sense, as in most contexts the smallest size for anything is 0. The final property requires that the smallest norm is achieved and only achieved by a vector consisting of all zeros.

You might notice that norms sound a lot like measures of distance. And if you remember Euclidean distances (think Pythagorasʼ theorem) from grade school, then the concepts of non-negativity and the triangle inequality might ring a bell.

In fact, the Euclidean distance is a norm: specifically it is the L2 norm. Suppose that the elements in the n-dimensional vector x are x1; : : : ; xn. The L2 norm of x is the square root of the sum of the squares of the vector elements:

In deep learning, we work more often with the squared L2 norm.

You will also frequently encounter the L1 norm, which is expressed as the sum of the absolute values of the vector elements:

As compared with the L2 norm, it is less influenced by outliers. To calculate the L1 norm, we compose the absolute value function with a sum over the elements.

Both the L2 norm and the L1 norm are special cases of the more general Lp norm:

Reference: Dive into Deep Learning Release 0.16.2 (Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola Mar 20,)