Moving Average/Probability Distribution

Probability Distribution as Distribution of Importance


The definition of expected value provides the mathematical foundation for moving averages in the discrete and continuous setting and the mathematical theory is just an application of basic principles of probability theory. Nevertheless the notion of probability is bit misleading because the semantic of moving average does not refer to probability of events. The probability must be regarded as distribution of importance. In time series e.g. less importance is assigned to older data and that does not mean that older data is less likely than recent data. The events that create the collected data are not considered from probability perspective in general.

Importance can be defined by moving averages by;

  • proximity in time (old and recent data)
  • proximity in space (see application of the moving average on images above)

To quantify this proximity a Metric or Norm on the underlying vector space   can be assigned. Greater distance to reference point in   lead to less importance, e.g. by


The weight for the importance is 1 for  . For increasing distance measure by the norm   decreases the weight towards 0. Standardization with   as sum of all weights for discrete moving averages (as mentioned for EMA) lead to the property of probability distributions:


Furthermore there are other moving averages that incorporate negative weights. This leads to the fact that

 . This could happen when the positive/negative impact   of collected data   is assigned to the weight and the probability mass function. The assignment of impact factors of collected data to the probability/importance values mixes two different properities. This should be avoided and the impact   on   should be kept separately for a transparent definition of the moving average, i.e.
  with  .