Multidimensional Tiling

Multidimensional tiling is the mechanism used to partition the domain of temporal values in buckets or tiles of varying number of dimensions. In the case of a single dimension, the domain can be partitioned by value or by time using buckets of the same size or duration, respectively. For temporal numbers, the domain can be partitioned in two-dimensional tiles of the same size for the value dimension and the same duration for the time dimension. For temporal points, the domain can be partitioned in space in two- or three-dimensional tiles, depending on the number of dimensions of the spatial coordinates. Finally, for temporal points, the domain can be partitioned in space and time using three- or four-dimensional tiles. Furthermore, the temporal values can also be fragmented according to a multidimensional grid defined over the underlying domain.

Multidimensional tiling can be used for various purposes. For example, it can be used for computing multidimensional histograms, where the temporal values are aggregated according to the underlying partition of the domain. On the other hand, multidimensional tiling can be used for distributing a dataset across a cluster of servers, where each server contains a partition of the dataset. The advantage of this partition mechanism is that it preserves proximitity in space and time, unlike the traditional hash-based partition mechanisms used in big data environments.

Figure 9.4. Multidimensional tiling for temporal floats.


Figure 9.4, “Multidimensional tiling for temporal floats.” illustrates multidimensional tiling for temporal floats. The two-dimensional domain is split into tiles having the same size for the value dimension and the same duration for the time dimension. Suppose that this tiling scheme is used for distribute a dataset across a cluster of six servers, as suggested by the gray pattern in the figure. In this case, the values are fragmented so each server will receive the data of contiguous tiles. This implies in particular that four nodes will receive one fragment of the temporal float depicted in the figure. One advantage of this distribution of data based on multidimensional tiling is that it reduces the data that needs to be exchanged between nodes when processing queries, a process typically referred to as reshuffling.

Many of the functions in this section are set-returning functions (also known as a table functions) since they typically return more than one value. In this case, the functions are marked with the symbol.

Bucket Functions

  • Return a set of buckets that cover the span with buckets of the same size or duration aligned with the origin

    bucketList(bounds unitspan,size number,origin number=0) → {(index,span)}

    bucketList(bounds tstzspan,duration interval,origin timestamptz='2000-01-03') →

    {(index,span)}

    The result is a set of pairs (index,span). If the origin is not specified, it is set by default to 0 for number spans and Monday, January 3, 2000 for periods. The indices start by 1.

    SELECT (bl).index, (bl).span
    FROM (SELECT bucketList(intspan '[15, 25]', 2) AS bl) t;
    -- 1 | [14,16)
    -- 2 | [16,18)
    -- 3 | [18,20)
    -- ...
    SELECT bucketList(floatspan '[-10, -1]', 2.5, -7);
    -- (1,"[-12, -9.5)")
    -- (2,"[-9.5, -7)")
    -- (3,"[-7, -4.5)")
    -- ...
    SELECT (bl).index, (bl).span
    FROM (SELECT bucketList(tstzspan '[2001-01-15, 2001-01-25]','2 days') AS bl) t;
    -- 1 | [2001-01-15, 2001-01-17)
    -- 2 | [2001-01-17, 2001-01-19)
    -- 3 | [2001-01-19, 2001-01-21)
    -- ...
    SELECT bucketList(tstzspan '[2001-01-15, 2001-01-25]', '2 days', '2001-01-02');
    -- (1,"[2001-01-14, 2001-01-16)")
    -- (2,"[2001-01-16, 2001-01-18)")
    -- (3,"[2001-01-18, 2001-01-20)")
    -- ...
    
  • Return the start value of the bucket that contains the input number

    valueBucket(value number,size number,origin number=0) → number

    If the origin is not specified, it is set by default to 0.

    SELECT valueBucket(3, 2);
    -- 2
    SELECT valueBucket(3.5, 2.5, 1.5);
    -- 1.5
    
  • Return the span in the bucket space that contains the input number

    spanBucket(value number,size number,origin number=0) → span

    If the origin is not specified, it is set by default to 0.

    SELECT spanBucket(2, 2);
    -- [2, 4)
    SELECT spanBucket(2, 2, 1);
    -- [1, 3)
    SELECT spanBucket(2, 2.5);
    -- [0, 2.5)
    SELECT spanBucket(2, 2.5, 1.5);
    -- [1.5, 4)
    
  • Return the start timestamp of the bucket that contains the input timestamp

    timeBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03') →

    timestamptz

    If the origin is not specified, it is set by default to Monday, January 3, 2000.

    SELECT timeBucket(timestamptz '2001-05-01', interval '2 days');
    -- 2001-04-29 01:00:00
    SELECT timeBucket(timestamptz '2001-05-01', interval '2 days', timestamptz '2001-01-01');
    -- 2001-04-30 01:00:00
    
  • Return the period in the bucket space that contains the input timestamp

    periodBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03') →

    tstzspan

    If the origin is not specified, it is set by default to Monday, January 3, 2000.

    SELECT periodBucket('2001-01-04', interval '1 week');
    -- [2001-01-03, 2001-01-10)
    SELECT periodBucket('2001-01-04', interval '1 week', '2001-01-07');
    -- [2000-12-31, 2001-01-07)
    

Tile Functions

  • Return a set of tiles that covers the box with multidimensional tiles of the same size and duration

    tileList(bounds tbox,size float,duration interval,vorigin float=0,

    torigin timestamptz='2000-01-03') → {(index,tile)}

    tileList(bounds stbox,xsize float,[ysize float,zsize float,]

    sorigin geompoint='Point(0 0 0)') → {(index,tile)}

    tileList(bounds stbox,xsize float,[ysize float,zsize float,]duration interval,

    sorigin geompoint='Point(0 0 0)',torigin timestamptz='2000-01-03') → {(index,tile)}

    The result is a set of pairs (index,tile). If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or 'Point(0 0 0)' for the value dimension (depending on the box type) and to Monday, January 3, 2000 for the time dimension.

    In the case of a spatiotemporal grid, ysize and zsize are optional, the size for the missing dimensions is assumed to be equal to xsize. The SRID of the tile coordinates is determined by the input box and the sizes are given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.

    SELECT (gr).index, (gr).tile
    FROM (SELECT tileList(tfloat '[15@2001-01-15, 25@2001-01-25]'::tbox, 2.0, '2 days')
      AS gr) t;
    -- 1 | TBOX XT([14,16),[2001-01-15,2001-01-17))
    -- 2 | TBOX XT([16,18),[2001-01-15,2001-01-17))
    -- 3 | TBOX XT([18,20),[2001-01-15,2001-01-17))
    -- ...
    SELECT tileList(tfloat '[15@2001-01-15,25@2001-01-25]'::tbox, 2.0, '2 days', 11.5);
    -- (1,"TBOX XT([13.5,15.5),[2001-01-15,2001-01-17))")
    -- (2,"TBOX XT([15.5,17.5),[2001-01-15,2001-01-17))")
    -- (3,"TBOX XT([17.5,19.5),[2001-01-15,2001-01-17))")
    -- ...
    SELECT tileList(tgeompoint '[Point(3 3)@2001-01-15,
      Point(15 15)@2001-01-25]'::stbox, 2.0);
    -- (1,"STBOX X((2,2),(4,4))")
    -- (2,"STBOX X((4,2),(6,4))")
    -- (3,"STBOX X((6,2),(8,4))")
    -- ...
    SELECT tileList(tgeompoint 'SRID=3812;[Point(3 3)@2001-01-15,
      Point(15 15)@2001-01-25]'::stbox, 2.0, geometry 'Point(3 3)');
    -- (1,"SRID=3812;STBOX X((3,3),(5,5))")
    -- (2,"SRID=3812;STBOX X((5,3),(7,5))")
    -- (3,"SRID=3812;STBOX X((7,3),(9,5))")
    -- ...
    SELECT tileList(tgeompoint '[Point(3 3 3)@2001-01-15,
      Point(15 15 15)@2001-01-25]'::stbox, 2.0, geometry 'Point(3 3 3)');
    -- (1,"STBOX Z((3,3,3),(5,5,5))")
    -- (2,"STBOX Z((5,3,3),(7,5,5))")
    -- (3,"STBOX Z((7,3,3),(9,5,5))")
    -- ...
    SELECT tileList(tgeompoint '[Point(3 3)@2001-01-15,
      Point(15 15)@2001-01-25]'::stbox, 2.0, interval '2 days');
    -- (1,"STBOX XT(((2,2),(4,4)),[2001-01-15,2001-01-17))")
    -- (2,"STBOX XT(((4,2),(6,4)),[2001-01-15,2001-01-17))")
    -- (3,"STBOX XT(((6,2),(8,4)),[2001-01-15,2001-01-17))")
    -- ...
    SELECT tileList(tgeompoint '[Point(3 3 3)@2001-01-15,
      Point(15 15 15)@2001-01-25]'::stbox, 2.0, interval '2 days',
      'Point(3 3 3)', '2001-01-15');
    -- (1,"STBOX ZT(((3,3,3),(5,5,5)),[2001-01-15,2001-01-17))")
    -- (2,"STBOX ZT(((5,3,3),(7,5,5))),[2001-01-15,2001-01-17)")
    -- (3,"STBOX ZT(((7,3,3),(9,5,5))),[2001-01-15,2001-01-17)")
    -- ...
    
  • Return the tile of the multidimensional grid that contains the value and the timestamp

    tile(value float,time timestamptz,size float,duration interval,

    vorigin float=0.0,torigin timestamptz='2000-01-03') → tbox

    tile(point geometry,xsize float,[ysize float,zsize float],

    sorigin geompoint='Point(0 0 0)') → stbox

    tile(point geometry,time timestamptz,xsize float,[ysize float,zsize float,]duration

    interval,sorigin geompoint='Point(0 0 0)',torigin timestamptz='2000-01-03') → stbox

    If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or Point(0 0 0) for the value dimension and Monday, January 3, 2000 for the time dimension, respectively.

    In the case of a spatiotemporal grid, ysize and zsize are optional, the size for the missing dimensions is assumed to be equal to xsize. The SRID of the tile coordinates is determined by the input point and the sizes are given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.

    SELECT tile(15, '2001-01-15', 2, interval '2 days');
    -- TBOX XT([14,16),[2001-01-15,2001-01-17))
    SELECT tile(15, '2001-01-15', 2, interval '2 days', 1, '2001-01-02');
    -- TBOX XT([15,17),[2001-01-14,2001-01-16))
    SELECT tile(geometry 'Point(1 1 1)', 2.0);
    -- STBOX Z((0,0,0),(2,2,2))
    SELECT tile(geometry 'Point(1 1)', '2001-01-01', 2.0, interval '2 days');
    -- STBOX XT(((0,0),(2,2)),[2001-01-01, 2001-01-03))
    SELECT tile(geometry 'Point(1 1)', '2001-01-01', 2.0, interval '2 days',
      'Point(1 1)', '2001-01-02');
    -- STBOX XT(((1,1),(3,3)),[2000-12-31, 2001-01-02))
    

Split Functions

These functions fragment a temporal value with respect to a sequence of buckets (see the section called “Bucket Functions”) or tiles (see the section called “Tile Functions”).

  • Fragment the temporal number with respect to span buckets

    valueSplit(tnumber,size number,origin number=0) → {(number,tnumber)}

    The result is a set of pairs (number,tnumber). If the origin of values is not specified, it is set by default to 0.

    SELECT (sp).number, (sp).tnumber
    FROM (SELECT valueSplit(tint '[1@2001-01-01, 2@2001-01-02, 5@2001-01-05, 10@2001-01-10]',
      2) AS sp) t;
    --  0 | {[1@2001-01-01 00:00:00+01, 1@2001-01-02 00:00:00+01)}
    --  2 | {[2@2001-01-02 00:00:00+01, 2@2001-01-05 00:00:00+01)}
    --  4 | {[5@2001-01-05 00:00:00+01, 5@2001-01-10 00:00:00+01)}
    -- 10 | {[10@2001-01-10 00:00:00+01]}
    SELECT valueSplit(tfloat '[1@2001-01-01, 10@2001-01-10)', 2.0, 1.0);
    -- (1,"{[1@2001-01-01 00:00:00+01, 3@2001-01-03 00:00:00+01)}")
    -- (3,"{[3@2001-01-03 00:00:00+01, 5@2001-01-05 00:00:00+01)}")
    -- (5,"{[5@2001-01-05 00:00:00+01, 7@2001-01-07 00:00:00+01)}")
    -- (7,"{[7@2001-01-07 00:00:00+01, 9@2001-01-09 00:00:00+01)}")
    -- (9,"{[9@2001-01-09 00:00:00+01, 10@2001-01-10 00:00:00+01)}")
    
  • Fragment the temporal value with respect to time buckets

    timeSplit(ttype,duration interval,origin timestamptz='2000-01-03') →

    {(time,temp)}

    The result is a set of pairs (time,temp). If the origin of time is not specified, it is set by default to Monday, January 3, 2000.

    SELECT (ts).time, (ts).temp
    FROM (SELECT timeSplit(tfloat '[1@2001-02-01, 10@2001-02-10)', '2 days') AS ts) t;
    -- 2001-01-31 | [1@2001-02-01, 2@2001-02-02)
    -- 2001-02-02 | [2@2001-02-02, 4@2001-02-04)
    -- 2001-02-04 | [4@2001-02-04, 6@2001-02-06)
    -- ...
    SELECT (ts).time, astext((ts).temp) AS temp
    FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2001-02-01, Point(10 10)@2001-02-10]',
      '2 days', '2001-02-01') AS ts) AS t;
    -- 2001-02-01 | [POINT(1 1)@2001-02-01, POINT(3 3)@2001-02-03)
    -- 2001-02-03 | [POINT(3 3)@2001-02-03, POINT(5 5)@2001-02-05)
    -- 2001-02-05 | [POINT(5 5)@2001-02-05, POINT(7 7)@2001-02-07)
    -- ...
    

    Notice that we can fragment a temporal value in cyclic (instead of linear) time buckets. The following two examples show how to fragment a temporal value by hour and by day of the week.

    SELECT (ts).time::time AS hour, merge((ts).temp) AS temp
    FROM (SELECT timeSplit(tfloat '[1@2001-01-01, 10@2001-01-03]', '1 hour') AS ts) t
    GROUP BY hour ORDER BY hour;
    /* 00:00:00 | {[1@2001-01-01 00:00:00+01, 1.1875@2001-01-01 01:00:00+01),
                   [5.5@2001-01-02 00:00:00+01, 5.6875@2001-01-02 01:00:00+01)} */
    /* 01:00:00 | {[1.1875@2001-01-01 01:00:00+01, 1.375@2001-01-01 02:00:00+01),
                   [5.6875@2001-01-02 01:00:00+01, 5.875@2001-01-02 02:00:00+01)} */
    /* 02:00:00 | {[1.375@2001-01-01 02:00:00+01, 1.5625@2001-01-01 03:00:00+01),
                   [5.875@2001-01-02 02:00:00+01, 6.0625@2001-01-02 03:00:00+01)} */
    /* 03:00:00 | {[1.5625@2001-01-01 03:00:00+01, 1.75@2001-01-01 04:00:00+01),
                   [6.0625@2001-01-02 03:00:00+01, 6.25@2001-01-02 04:00:00+01)} */
    /* ... */
    SELECT EXTRACT(DOW FROM (ts).time) AS dow_no, TO_CHAR((ts).time, 'Dy') AS dow,
      asText(round(merge((ts).temp), 2)) AS temp
    FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2001-01-01, Point(10 10)@2001-01-14)',
      '1 hour') AS ts) t
    GROUP BY dow, dow_no ORDER BY dow_no;
    /* 0 | Sun | {[POINT(1 1)@2001-01-01, POINT(1.69 1.69)@2001-01-02),
                 [POINT(5.85 5.85)@2001-01-08, POINT(6.54 6.54)@2001-01-09)} */
    /* 1 | Mon | {[POINT(1.69 1.69)@2001-01-02, POINT(2.38 2.38)@2001-01-03),
                 [POINT(6.54 6.54)@2001-01-09, POINT(7.23 7.23)@2001-01-10)} */
    /* 2 | Tue | {[POINT(2.38 2.38)@2001-01-03, POINT(3.08 3.08)@2001-01-04),
                 [POINT(7.23 7.23)@2001-01-10, POINT(7.92 7.92)@2001-01-11)} */
    /* ... */
    
  • Fragment the temporal number with respect to the tiles in a value-time grid

    valueTimeSplit(tumber,size number,duration interval,vorigin number=0,

    torigin timestamptz='2000-01-03') → {(number,time,tnumber}

    The result is a set of triples (number,time,tnumber). If the origin of values and/or time are not specified, they are set by default to 0 and to Monday, January 3, 2000, respectively.

    SELECT (sp).number, (sp).time, (sp).tnumber
    FROM (SELECT valueTimeSplit(tint '[1@2001-02-01, 2@2001-02-02, 5@2001-02-05,
      10@2001-02-10]', 5, '5 days') AS sp) t;
    --  0 | 2001-02-01 | {[1@2001-02-01, 2@2001-02-02, 2@2001-02-05)}
    --  5 | 2001-02-01 | {[5@2001-02-05, 5@2001-02-06)}
    --  5 | 2001-02-06 | {[5@2001-02-06, 5@2001-02-10)}
    -- 10 | 2001-02-06 | {[10@2001-02-10]}
    SELECT (sp).number, (sp).time, (sp).tnumber
    FROM (SELECT valueTimeSplit(tfloat '[1@2001-02-01, 10@2001-02-10)', 5.0, '5 days', 1.0,
      '2001-02-01') AS sp) t;
    -- 1 | 2001-01-01 | [1@2001-01-01, 6@2001-01-06)
    -- 6 | 2001-01-06 | [6@2001-01-06, 10@2001-01-10)
    
  • Fragment the temporal point with respect to the tiles in a spatial grid

    spaceSplit(tgeompoint,xsize float,[ysize float,zsize float,]

    origin geompoint='Point(0 0 0)',bitmatrix boolean=true) → {(point,tpoint)}

    The result is a set of pairs (point,tpoint). If the origin of the space dimension is not specified, it is set by default to 'Point(0 0 0)'. The arguments ysize and zsize are optional, the size for the missing dimensions is assumed to be equal to xsize. If the argument bitmatrix is not specified, then the computation will use a bit matrix to speed up the process.

    SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceSplit(tgeompoint '[Point(1 1)@2001-03-01, Point(10 10)@2001-03-10]',
      2.0) AS sp) t;
    -- POINT(0 0) | {[POINT(1 1)@2001-03-01, POINT(2 2)@2001-03-02)}
    -- POINT(2 2) | {[POINT(2 2)@2001-03-02, POINT(4 4)@2001-03-04)}
    -- POINT(4 4) | {[POINT(4 4)@2001-03-04, POINT(6 6)@2001-03-06)}
    -- ...
    SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceSplit(tgeompoint '[Point(1 1 1)@2001-03-01,
      Point(10 10 10)@2001-03-10]', 2.0, geometry 'Point(1 1 1)') AS sp) t;
    -- POINT Z(1 1 1) | {[POINT Z (1 1 1)@2001-03-01, POINT Z (3 3 3)@2001-03-03)}
    -- POINT Z(3 3 3) | {[POINT Z (3 3 3)@2001-03-03, POINT Z (5 5 5)@2001-03-05)}
    -- POINT Z(5 5 5) | {[POINT Z (5 5 5)@2001-03-05, POINT Z (7 7 7)@2001-03-07)}
    -- ...
    
  • Fragment the temporal point with respect to the tiles in a spatiotemporal grid

    spaceTimeSplit(tgeompoint,xsize float,[ysize float,ysize float,]

    duration interval,sorigin geompoint='Point(0 0 0)',

    torigin timestamptz='2000-01-03',bitmatrix boolean=true) → {(point,time,tpoint)}

    The result is a set of triples (point,time,tpoint). If the origin of the space and time dimensions are not specified, they are set by default to 'Point(0 0 0)' and Monday, January 3, 2000, respectively. The arguments ysize and zsize are optional, the size for the missing dimensions is assumed to be equal to xsize. If the argument bitmatrix is not specified, then the computation will use a bit matrix to speed up the process.

    SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1)@2001-02-01, Point(10 10)@2001-02-10]',
      2.0, interval '2 days') AS sp) t;
    -- POINT(0 0) | 2001-01-31 | {[POINT(1 1)@2001-02-01, POINT(2 2)@2001-02-02)}
    -- POINT(2 2) | 2001-01-31 | {[POINT(2 2)@2001-02-02]}
    -- POINT(2 2) | 2001-02-02 | {[POINT(2 2)@2001-02-02, POINT(4 4)@2001-02-04)}
    -- ...
    SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint
    FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1 1)@2001-02-01,
      Point(10 10 10)@2001-02-10]', 2.0, interval '2 days', 'Point(1 1 1)',
      '2001-03-01') AS sp) t;
    -- POINT Z(1 1 1) | 2001-02-01 | {[POINT Z(1 1 1)@2001-02-01, POINT Z(3 3 3)@2001-02-03)}
    -- POINT Z(3 3 3) | 2001-02-01 | {[POINT Z(3 3 3)@2001-02-03]}
    -- POINT Z(3 3 3) | 2001-02-03 | {[POINT Z(3 3 3)@2001-02-03, POINT Z (5 5 5)@2001-02-05)}
    -- ...