Interpolation of sparse pointwise and integral geographical Data over Time

Interpolation of sparse pointwise and integral geographical Data over Time



Martin Papke*

  • *Universität Koblenz-Landau, Germany

Abstract

Geographical data, for example distribution densities of species as discussed in [1], often are given both by pointwise and integral values. We aim to give two interpolation algorithm which can handle both data types together. As these data tend to change over time, we will extend our algorithm to allow a timestamp attached to the data and give newer data more weight in the interpolation. This will allow us to model time dependent functions and data. We will compare both approaches and discuss their advantages and disadvantages. The possibility of giving both pointwise and integral data extends the basic ideas from [2], allowing for integral data points.

Keywords: Interpolation, Applied Mathematics, Numerical Mathematics

Introduction

edit

Let   be a time set. Assume we have collected data about an unknown mapping   that changes over time.   is the mapping at time  . The ocollected data   indicates that   was collected at time  . As time set we use   with   denotes the current time stamp. All   are time stamp of the future. Collected data   has always a time stamp  .

For the following sections we use the time index   as last argument of the function   and use   as  . If   does not change over time we have   for all  . If the domain of   does not contain the time set  , that the function   is regarded as static of time.

We propose two algorithms to handle sparse pointwise and integral data as they arrise in geographical problems. The first algorithm uses a least squares idea, the second one has convex combinations as its grounding idea.

Setting and Notations

edit

We consider a triangulation   of a region  . We denote the set of nodes of   by  . For each triangle  , we denote by  ,   the indices of  's nodes, that is the nodes of  , are  ,   and  . For each node  , we denote by  ,  , the neighbours of  , that is the nodes which are connected to  .

We are given two types of data for an unknown function  . Firstly, point values  , which are measured values of   at a node  . Secondly we are given integrals   of   over triangles  .

Our aim is to construct a function   which interpolates the given values in the sense that

 

 

taking into account that our data are for example values of a density distribution of a species, hence give   only locally.

Example: Single Point Data Collection

edit

E.g. at single point in a habitat a camera is placed, that takes automatically snapshots of all animals that pass this point. The aggregated data provides pointwise values of the density distribution of a species. Including the time index   the collected data provides pointwise data of the density distribution for every month.

Combine Different Data Collections

edit

Two different data collections at the same time   for a given population density could indicate different population estimations, which can arrise for example when using the capture-recapture method for estimating populations, as done in[3] and[4].

The least squares Approach

edit

The first algorithm we propose to takle this kind of problem is a least squares approach, which allows for more than one data point for a fixed site in our lattice.

As geographical data tend to be time dependent, we want to add a time stamp to each data point and consider a time dependent function  . We therefore replace the above equations by

 
 

At a given time   we will only consider values with a timestamp  , which will allow live data being considered. The importance of measurements decays over time, we attach to each equation a weight, given for a data point with time stamp   by   denoting by   the weighting function  , where   denotes the speed of decay.   has to be choosen in a way that addapts to the given problem, a possible choice could be an estimate for the time derivative of the model function  , because   having strong fluctuations in time means that the importance of past time values decays more rapidly.

Constructing a linear least squares system

edit

For a given time   we will construct an overdetermined linear system of equations for the node values   of our function  , which we will solve in a weighted least square sense, that is we will transform both point and integral data into equations for the values of   at the nodes  . Each of the equations to be modelled gives rise to one linear equation. Another set of equations models the smoothness of our function  .

Equations for the point data

edit

For each given point datum with   - future measurements are not considered giving information for the current time - we add the equation

  with weight  

The weight is choosen such that it has its maximal value when   and decays over time.

Equations for the volume data

edit

To give an equation for a given volume datum we first have to approximate the integral by point values of the function  . We use the two-dimensional variant of the trapezoidal rule, that is we approximate the integral of a function   over a triangle   by the volume of the trapezoidal body determined by the values of   at the nodes of  , hence

 

where   denotes the area of the triangle.

Hence, we get the equation

  with weight  ,

where the weight is choosen exactly as in the point value case.

Smoothness equations

edit

If we have only a little set of data points, our system will be underdertermined and the calculated function may have strong fluctuations. To prevent that, we add for each node the equation

  with weight  

stating that we want the value at each note to be approximately the mean of its neighbouring values.

The resulting linear system   is solved in a weighted least square sense, giving us point values for   which we interpolate by linear functions.

Implementation

edit

We represent the triangulation by a structure consisting of the nodes, given by their coordinates and the triangles, which are given by the indices of their vertices. From that we calculate the neighbours of each node and store this also in the lattice structure. Given the data we now assemble for each time   a matrix   and a right hand side   collecting the equations.

The convex combination approach

edit

As a second possibility to attack our problem, we propose an algorithm that handles point and integral data as distinct interpolation problems and afterwards takes a convex combination of the to solution to obtain a solution to the full problem.

The pointwise interpolation

edit

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with   as in the first algorithm. That is to a node   in our lattice, we look at all values given for that point with   and form the average

 

of the values measured at this particular point.

Interpolation of the integral values

edit

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

 

where   denotes  's center of gravity.

The value, that we want the integral to have is calculated exactly as in the point case, that is we have assign

 

Obtaining the solution to the full problem

edit

Let   denote the interpolating function obtained in step one and   denote the interpolation of the integral values. We now let   where   and   are the number of given point and integral values respectively.

Another idea that did not work out

edit

As a second possibility to attack our problem, we propose an algorithm starts by interpolating the point data and than matches the integral data by a refinement of the lattice. The value at the center of gravity of each cell is choosen in a way to match the integral data.

The pointwise interpolation

edit

At each given point, we first combine all data we have at this point into one by taking a convex combination of the measured values, where each value is weighted with   as in the first algorithm. That is to a node   in our lattice, we look at all values given for that point with   and form the average

 

of the values measured at this particular point.

Interpolation of the integral values

edit

The same way as we interpolated the point values, we now interpolate the given integral values. Hence, we first assign to each triangle of our lattice an unique integral value by weighting the given ones. Afterwards, we interpolate this values in a linear sense, assigning the value divided by the volume of the particular cell to the Center of Gravity of each triangle. That is, we approximate the integral by means of the midpoint rule, i. e.

 

where   denotes  's center of gravity.

Full solution

edit

We then choose a function   that interpolates both the point and the integral values. That did not lead to a good result, because the fluctiation of the function   was so large, that it could not be regarded as an approximation of a smooth function.

Examples

edit

As an example we generate for a simple triangulation of  , over 10 seconds 4000 random point data and 1000 random volume data. As timestep we choose  , that is we interpolate our data every   seconds. As random data, we start with a simple function, here

 ,

that creates a diagonal shift of the graph of   in time. Furthermore we add some normally distributed noise. The points and time values to interpolate are also generated randomly.

.

Conclusion

edit

Extending the ideas form[2], we allow for integral values to be given. Representing the data points as a weighted linear squares system would allow us to use an iterative least square solver to reuse the calculations we have already done, for example the solver LSQR discussed by[5].

Source code files

edit

See also

edit

References

edit
  1. Franklin, Janet (1995). "Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients". Progress in Physical Geography: Earth and Environment 29 (4): 474-499. doi:10.1177/030913339501900403. http://journals.sagepub.com/doi/10.1177/030913339501900403. 
  2. 2.0 2.1 Li, Lixin; Revesz, Peter (2004-05). "Interpolation methods for spatio-temporal geographic data". Computers, Environment and Urban Systems 28 (3): 201–227. doi:10.1016/s0198-9715(03)00018-8. ISSN 0198-9715. https://doi.org/10.1016/S0198-9715(03)00018-8. 
  3. Schouten, Leo J.; Straatman, Huub; Kiemeney, Lambertus a. L. M.; Gimbrère, Charles H. F.; Verbeek, André L. M. (1994-12-01). "The Capture-Recapture Method for Estimation of Cancer Registry Completeness: A Useful Tool?". International Journal of Epidemiology 23 (6): 1111–1116. doi:10.1093/ije/23.6.1111. ISSN 0300-5771. https://academic.oup.com/ije/article/23/6/1111/660321. 
  4. Brenner, Hermann (1995-01). "USE AND LIMITATIONS OF THE CAPTURE-RECAPTURE METHOD IN DISEASE MONITORING WITH TWO DEPENDENT SOURCES". Epidemiology 6 (1): 42–48. doi:10.1097/00001648-199501000-00009. ISSN 1044-3983. https://insights.ovid.com/crossref?an=00001648-199501000-00009. 
  5. Paige, Christopher C.; Saunders, Michael A. (1982-03-01). "LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares". ACM Transactions on Mathematical Software (TOMS) 8 (1): 43–71. doi:10.1145/355984.355989. ISSN 0098-3500. http://dl.acm.org/citation.cfm?id=355984.355989.