Multidimensional tiling is the mechanism used to partition the domain of temporal values in buckets or tiles of varying number of dimensions. In the case of a single dimension, the domain can be partitioned by value or by time using buckets of the same size or duration, respectively. For temporal numbers, the domain can be partitioned in two-dimensional tiles of the same size for the value dimension and the same duration for the time dimension. For temporal points, the domain can be partitioned in space in two- or three-dimensional tiles, depending on the number of dimensions of the spatial coordinates. Finally, for temporal points, the domain can be partitioned in space and time using three- or four-dimensional tiles. Furthermore, the temporal values can also be fragmented according to a multidimensional grid defined over the underlying domain.
Multidimensional tiling can be used for various purposes. For example, it can be used for computing multidimensional histograms, where the temporal values are aggregated according to the underlying partition of the domain. On the other hand, multidimensional tiling can be used for distributing a dataset across a cluster of servers, where each server contains a partition of the dataset. The advantage of this partition mechanism is that it preserves proximitity in space and time, unlike the traditional hash-based partition mechanisms used in big data environments.
Figure 9.4, “Multidimensional tiling for temporal floats.” illustrates multidimensional tiling for temporal floats. The two-dimensional domain is split into tiles having the same size for the value dimension and the same duration for the time dimension. Suppose that this tiling scheme is used for distribute a dataset across a cluster of six servers, as suggested by the gray pattern in the figure. In this case, the values are fragmented so each server will receive the data of contiguous tiles. This implies in particular that four nodes will receive one fragment of the temporal float depicted in the figure. One advantage of this distribution of data based on multidimensional tiling is that it reduces the data that needs to be exchanged between nodes when processing queries, a process typically referred to as reshuffling.
Many of the functions in this section are set-returning functions (also known as a table functions) since they typically return more than one value. In this case, the functions are marked with the symbol.
Return a set of buckets that cover the span with buckets of the same size or duration aligned with the origin
bucketList(bounds unitspan,size number,origin number=0) → {(index,span)}
bucketList(bounds tstzspan,duration interval,origin timestamptz='2000-01-03') →
{(index,span)}
The result is a set of pairs (index,span)
. If the origin is not specified, it is set by default to 0 for number spans and Monday, January 3, 2000 for periods. The indices start by 1.
SELECT (bl).index, (bl).span FROM (SELECT bucketList(intspan '[15, 25]', 2) AS bl) t; -- 1 | [14,16) -- 2 | [16,18) -- 3 | [18,20) -- ... SELECT bucketList(floatspan '[-10, -1]', 2.5, -7); -- (1,"[-12, -9.5)") -- (2,"[-9.5, -7)") -- (3,"[-7, -4.5)") -- ... SELECT (bl).index, (bl).span FROM (SELECT bucketList(tstzspan '[2001-01-15, 2001-01-25]','2 days') AS bl) t; -- 1 | [2001-01-15, 2001-01-17) -- 2 | [2001-01-17, 2001-01-19) -- 3 | [2001-01-19, 2001-01-21) -- ... SELECT bucketList(tstzspan '[2001-01-15, 2001-01-25]', '2 days', '2001-01-02'); -- (1,"[2001-01-14, 2001-01-16)") -- (2,"[2001-01-16, 2001-01-18)") -- (3,"[2001-01-18, 2001-01-20)") -- ...
Return the start value of the bucket that contains the input number
valueBucket(value number,size number,origin number=0) → number
If the origin is not specified, it is set by default to 0.
SELECT valueBucket(3, 2); -- 2 SELECT valueBucket(3.5, 2.5, 1.5); -- 1.5
Return the span in the bucket space that contains the input number
spanBucket(value number,size number,origin number=0) → span
If the origin is not specified, it is set by default to 0.
SELECT spanBucket(2, 2); -- [2, 4) SELECT spanBucket(2, 2, 1); -- [1, 3) SELECT spanBucket(2, 2.5); -- [0, 2.5) SELECT spanBucket(2, 2.5, 1.5); -- [1.5, 4)
Return the start timestamp of the bucket that contains the input timestamp
timeBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03') →
timestamptz
If the origin is not specified, it is set by default to Monday, January 3, 2000.
SELECT timeBucket(timestamptz '2001-05-01', interval '2 days'); -- 2001-04-29 01:00:00 SELECT timeBucket(timestamptz '2001-05-01', interval '2 days', timestamptz '2001-01-01'); -- 2001-04-30 01:00:00
Return the period in the bucket space that contains the input timestamp
periodBucket(time timestamptz,duration interval,origin timestamptz='2000-01-03') →
tstzspan
If the origin is not specified, it is set by default to Monday, January 3, 2000.
SELECT periodBucket('2001-01-04', interval '1 week'); -- [2001-01-03, 2001-01-10) SELECT periodBucket('2001-01-04', interval '1 week', '2001-01-07'); -- [2000-12-31, 2001-01-07)
Return a set of tiles that covers the box with multidimensional tiles of the same size and duration
tileList(bounds tbox,size float,duration interval,vorigin float=0,
torigin timestamptz='2000-01-03') → {(index,tile)}
tileList(bounds stbox,xsize float,[ysize float,zsize float,]
sorigin geompoint='Point(0 0 0)') → {(index,tile)}
tileList(bounds stbox,xsize float,[ysize float,zsize float,]duration interval,
sorigin geompoint='Point(0 0 0)',torigin timestamptz='2000-01-03') → {(index,tile)}
The result is a set of pairs (index,tile)
. If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or 'Point(0 0 0)'
for the value dimension (depending on the box type) and to Monday, January 3, 2000 for the time dimension.
In the case of a spatiotemporal grid, ysize
and zsize
are optional, the size for the missing dimensions is assumed to be equal to xsize
. The SRID of the tile coordinates is determined by the input box and the sizes are given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.
SELECT (gr).index, (gr).tile FROM (SELECT tileList(tfloat '[15@2001-01-15, 25@2001-01-25]'::tbox, 2.0, '2 days') AS gr) t; -- 1 | TBOX XT([14,16),[2001-01-15,2001-01-17)) -- 2 | TBOX XT([16,18),[2001-01-15,2001-01-17)) -- 3 | TBOX XT([18,20),[2001-01-15,2001-01-17)) -- ... SELECT tileList(tfloat '[15@2001-01-15,25@2001-01-25]'::tbox, 2.0, '2 days', 11.5); -- (1,"TBOX XT([13.5,15.5),[2001-01-15,2001-01-17))") -- (2,"TBOX XT([15.5,17.5),[2001-01-15,2001-01-17))") -- (3,"TBOX XT([17.5,19.5),[2001-01-15,2001-01-17))") -- ... SELECT tileList(tgeompoint '[Point(3 3)@2001-01-15, Point(15 15)@2001-01-25]'::stbox, 2.0); -- (1,"STBOX X((2,2),(4,4))") -- (2,"STBOX X((4,2),(6,4))") -- (3,"STBOX X((6,2),(8,4))") -- ... SELECT tileList(tgeompoint 'SRID=3812;[Point(3 3)@2001-01-15, Point(15 15)@2001-01-25]'::stbox, 2.0, geometry 'Point(3 3)'); -- (1,"SRID=3812;STBOX X((3,3),(5,5))") -- (2,"SRID=3812;STBOX X((5,3),(7,5))") -- (3,"SRID=3812;STBOX X((7,3),(9,5))") -- ... SELECT tileList(tgeompoint '[Point(3 3 3)@2001-01-15, Point(15 15 15)@2001-01-25]'::stbox, 2.0, geometry 'Point(3 3 3)'); -- (1,"STBOX Z((3,3,3),(5,5,5))") -- (2,"STBOX Z((5,3,3),(7,5,5))") -- (3,"STBOX Z((7,3,3),(9,5,5))") -- ... SELECT tileList(tgeompoint '[Point(3 3)@2001-01-15, Point(15 15)@2001-01-25]'::stbox, 2.0, interval '2 days'); -- (1,"STBOX XT(((2,2),(4,4)),[2001-01-15,2001-01-17))") -- (2,"STBOX XT(((4,2),(6,4)),[2001-01-15,2001-01-17))") -- (3,"STBOX XT(((6,2),(8,4)),[2001-01-15,2001-01-17))") -- ... SELECT tileList(tgeompoint '[Point(3 3 3)@2001-01-15, Point(15 15 15)@2001-01-25]'::stbox, 2.0, interval '2 days', 'Point(3 3 3)', '2001-01-15'); -- (1,"STBOX ZT(((3,3,3),(5,5,5)),[2001-01-15,2001-01-17))") -- (2,"STBOX ZT(((5,3,3),(7,5,5))),[2001-01-15,2001-01-17)") -- (3,"STBOX ZT(((7,3,3),(9,5,5))),[2001-01-15,2001-01-17)") -- ...
Return the tile of the multidimensional grid that contains the value and the timestamp
tile(value float,time timestamptz,size float,duration interval,
vorigin float=0.0,torigin timestamptz='2000-01-03') → tbox
tile(point geometry,xsize float,[ysize float,zsize float],
sorigin geompoint='Point(0 0 0)') → stbox
tile(point geometry,time timestamptz,xsize float,[ysize float,zsize float,]duration
interval,sorigin geompoint='Point(0 0 0)',torigin timestamptz='2000-01-03') → stbox
If the origin of the value and/or time dimensions are not specified, they are set by default to 0 or Point(0 0 0)
for the value dimension and Monday, January 3, 2000 for the time dimension, respectively.
In the case of a spatiotemporal grid, ysize
and zsize
are optional, the size for the missing dimensions is assumed to be equal to xsize
. The SRID of the tile coordinates is determined by the input point and the sizes are given in the units of the SRID. If the origin for the spatial coordinates is given, which must be a point, its dimensionality and SRID should be equal to the one of box, otherwise an error is raised.
SELECT tile(15, '2001-01-15', 2, interval '2 days'); -- TBOX XT([14,16),[2001-01-15,2001-01-17)) SELECT tile(15, '2001-01-15', 2, interval '2 days', 1, '2001-01-02'); -- TBOX XT([15,17),[2001-01-14,2001-01-16)) SELECT tile(geometry 'Point(1 1 1)', 2.0); -- STBOX Z((0,0,0),(2,2,2)) SELECT tile(geometry 'Point(1 1)', '2001-01-01', 2.0, interval '2 days'); -- STBOX XT(((0,0),(2,2)),[2001-01-01, 2001-01-03)) SELECT tile(geometry 'Point(1 1)', '2001-01-01', 2.0, interval '2 days', 'Point(1 1)', '2001-01-02'); -- STBOX XT(((1,1),(3,3)),[2000-12-31, 2001-01-02))
These functions fragment a temporal value with respect to a sequence of buckets (see the section called “Bucket Functions”) or tiles (see the section called “Tile Functions”).
Fragment the temporal number with respect to span buckets
valueSplit(tnumber,size number,origin number=0) → {(number,tnumber)}
The result is a set of pairs (number,tnumber)
. If the origin of values is not specified, it is set by default to 0.
SELECT (sp).number, (sp).tnumber FROM (SELECT valueSplit(tint '[1@2001-01-01, 2@2001-01-02, 5@2001-01-05, 10@2001-01-10]', 2) AS sp) t; -- 0 | {[1@2001-01-01 00:00:00+01, 1@2001-01-02 00:00:00+01)} -- 2 | {[2@2001-01-02 00:00:00+01, 2@2001-01-05 00:00:00+01)} -- 4 | {[5@2001-01-05 00:00:00+01, 5@2001-01-10 00:00:00+01)} -- 10 | {[10@2001-01-10 00:00:00+01]} SELECT valueSplit(tfloat '[1@2001-01-01, 10@2001-01-10)', 2.0, 1.0); -- (1,"{[1@2001-01-01 00:00:00+01, 3@2001-01-03 00:00:00+01)}") -- (3,"{[3@2001-01-03 00:00:00+01, 5@2001-01-05 00:00:00+01)}") -- (5,"{[5@2001-01-05 00:00:00+01, 7@2001-01-07 00:00:00+01)}") -- (7,"{[7@2001-01-07 00:00:00+01, 9@2001-01-09 00:00:00+01)}") -- (9,"{[9@2001-01-09 00:00:00+01, 10@2001-01-10 00:00:00+01)}")
Fragment the temporal value with respect to time buckets
timeSplit(ttype,duration interval,origin timestamptz='2000-01-03') →
{(time,temp)}
The result is a set of pairs (time,temp)
. If the origin of time is not specified, it is set by default to Monday, January 3, 2000.
SELECT (ts).time, (ts).temp FROM (SELECT timeSplit(tfloat '[1@2001-02-01, 10@2001-02-10)', '2 days') AS ts) t; -- 2001-01-31 | [1@2001-02-01, 2@2001-02-02) -- 2001-02-02 | [2@2001-02-02, 4@2001-02-04) -- 2001-02-04 | [4@2001-02-04, 6@2001-02-06) -- ... SELECT (ts).time, astext((ts).temp) AS temp FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2001-02-01, Point(10 10)@2001-02-10]', '2 days', '2001-02-01') AS ts) AS t; -- 2001-02-01 | [POINT(1 1)@2001-02-01, POINT(3 3)@2001-02-03) -- 2001-02-03 | [POINT(3 3)@2001-02-03, POINT(5 5)@2001-02-05) -- 2001-02-05 | [POINT(5 5)@2001-02-05, POINT(7 7)@2001-02-07) -- ...
Notice that we can fragment a temporal value in cyclic (instead of linear) time buckets. The following two examples show how to fragment a temporal value by hour and by day of the week.
SELECT (ts).time::time AS hour, merge((ts).temp) AS temp FROM (SELECT timeSplit(tfloat '[1@2001-01-01, 10@2001-01-03]', '1 hour') AS ts) t GROUP BY hour ORDER BY hour; /* 00:00:00 | {[1@2001-01-01 00:00:00+01, 1.1875@2001-01-01 01:00:00+01), [5.5@2001-01-02 00:00:00+01, 5.6875@2001-01-02 01:00:00+01)} */ /* 01:00:00 | {[1.1875@2001-01-01 01:00:00+01, 1.375@2001-01-01 02:00:00+01), [5.6875@2001-01-02 01:00:00+01, 5.875@2001-01-02 02:00:00+01)} */ /* 02:00:00 | {[1.375@2001-01-01 02:00:00+01, 1.5625@2001-01-01 03:00:00+01), [5.875@2001-01-02 02:00:00+01, 6.0625@2001-01-02 03:00:00+01)} */ /* 03:00:00 | {[1.5625@2001-01-01 03:00:00+01, 1.75@2001-01-01 04:00:00+01), [6.0625@2001-01-02 03:00:00+01, 6.25@2001-01-02 04:00:00+01)} */ /* ... */ SELECT EXTRACT(DOW FROM (ts).time) AS dow_no, TO_CHAR((ts).time, 'Dy') AS dow, asText(round(merge((ts).temp), 2)) AS temp FROM (SELECT timeSplit(tgeompoint '[Point(1 1)@2001-01-01, Point(10 10)@2001-01-14)', '1 hour') AS ts) t GROUP BY dow, dow_no ORDER BY dow_no; /* 0 | Sun | {[POINT(1 1)@2001-01-01, POINT(1.69 1.69)@2001-01-02), [POINT(5.85 5.85)@2001-01-08, POINT(6.54 6.54)@2001-01-09)} */ /* 1 | Mon | {[POINT(1.69 1.69)@2001-01-02, POINT(2.38 2.38)@2001-01-03), [POINT(6.54 6.54)@2001-01-09, POINT(7.23 7.23)@2001-01-10)} */ /* 2 | Tue | {[POINT(2.38 2.38)@2001-01-03, POINT(3.08 3.08)@2001-01-04), [POINT(7.23 7.23)@2001-01-10, POINT(7.92 7.92)@2001-01-11)} */ /* ... */
Fragment the temporal number with respect to the tiles in a value-time grid
valueTimeSplit(tumber,size number,duration interval,vorigin number=0,
torigin timestamptz='2000-01-03') → {(number,time,tnumber}
The result is a set of triples (number,time,tnumber)
. If the origin of values and/or time are not specified, they are set by default to 0 and to Monday, January 3, 2000, respectively.
SELECT (sp).number, (sp).time, (sp).tnumber FROM (SELECT valueTimeSplit(tint '[1@2001-02-01, 2@2001-02-02, 5@2001-02-05, 10@2001-02-10]', 5, '5 days') AS sp) t; -- 0 | 2001-02-01 | {[1@2001-02-01, 2@2001-02-02, 2@2001-02-05)} -- 5 | 2001-02-01 | {[5@2001-02-05, 5@2001-02-06)} -- 5 | 2001-02-06 | {[5@2001-02-06, 5@2001-02-10)} -- 10 | 2001-02-06 | {[10@2001-02-10]} SELECT (sp).number, (sp).time, (sp).tnumber FROM (SELECT valueTimeSplit(tfloat '[1@2001-02-01, 10@2001-02-10)', 5.0, '5 days', 1.0, '2001-02-01') AS sp) t; -- 1 | 2001-01-01 | [1@2001-01-01, 6@2001-01-06) -- 6 | 2001-01-06 | [6@2001-01-06, 10@2001-01-10)
Fragment the temporal point with respect to the tiles in a spatial grid
spaceSplit(tgeompoint,xsize float,[ysize float,zsize float,]
origin geompoint='Point(0 0 0)',bitmatrix boolean=true) → {(point,tpoint)}
The result is a set of pairs (point,tpoint)
. If the origin of the space dimension is not specified, it is set by default to 'Point(0 0 0)'
. The arguments ysize
and zsize
are optional, the size for the missing dimensions is assumed to be equal to xsize
. If the argument bitmatrix
is not specified, then the computation will use a bit matrix to speed up the process.
SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint FROM (SELECT spaceSplit(tgeompoint '[Point(1 1)@2001-03-01, Point(10 10)@2001-03-10]', 2.0) AS sp) t; -- POINT(0 0) | {[POINT(1 1)@2001-03-01, POINT(2 2)@2001-03-02)} -- POINT(2 2) | {[POINT(2 2)@2001-03-02, POINT(4 4)@2001-03-04)} -- POINT(4 4) | {[POINT(4 4)@2001-03-04, POINT(6 6)@2001-03-06)} -- ... SELECT ST_AsText((sp).point) AS point, astext((sp).tpoint) AS tpoint FROM (SELECT spaceSplit(tgeompoint '[Point(1 1 1)@2001-03-01, Point(10 10 10)@2001-03-10]', 2.0, geometry 'Point(1 1 1)') AS sp) t; -- POINT Z(1 1 1) | {[POINT Z (1 1 1)@2001-03-01, POINT Z (3 3 3)@2001-03-03)} -- POINT Z(3 3 3) | {[POINT Z (3 3 3)@2001-03-03, POINT Z (5 5 5)@2001-03-05)} -- POINT Z(5 5 5) | {[POINT Z (5 5 5)@2001-03-05, POINT Z (7 7 7)@2001-03-07)} -- ...
Fragment the temporal point with respect to the tiles in a spatiotemporal grid
spaceTimeSplit(tgeompoint,xsize float,[ysize float,ysize float,]
duration interval,sorigin geompoint='Point(0 0 0)',
torigin timestamptz='2000-01-03',bitmatrix boolean=true) → {(point,time,tpoint)}
The result is a set of triples (point,time,tpoint)
. If the origin of the space and time dimensions are not specified, they are set by default to 'Point(0 0 0)'
and Monday, January 3, 2000, respectively. The arguments ysize
and zsize
are optional, the size for the missing dimensions is assumed to be equal to xsize
. If the argument bitmatrix
is not specified, then the computation will use a bit matrix to speed up the process.
SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1)@2001-02-01, Point(10 10)@2001-02-10]', 2.0, interval '2 days') AS sp) t; -- POINT(0 0) | 2001-01-31 | {[POINT(1 1)@2001-02-01, POINT(2 2)@2001-02-02)} -- POINT(2 2) | 2001-01-31 | {[POINT(2 2)@2001-02-02]} -- POINT(2 2) | 2001-02-02 | {[POINT(2 2)@2001-02-02, POINT(4 4)@2001-02-04)} -- ... SELECT ST_AsText((sp).point) AS point, (sp).time, astext((sp).tpoint) AS tpoint FROM (SELECT spaceTimeSplit(tgeompoint '[Point(1 1 1)@2001-02-01, Point(10 10 10)@2001-02-10]', 2.0, interval '2 days', 'Point(1 1 1)', '2001-03-01') AS sp) t; -- POINT Z(1 1 1) | 2001-02-01 | {[POINT Z(1 1 1)@2001-02-01, POINT Z(3 3 3)@2001-02-03)} -- POINT Z(3 3 3) | 2001-02-01 | {[POINT Z(3 3 3)@2001-02-03]} -- POINT Z(3 3 3) | 2001-02-03 | {[POINT Z(3 3 3)@2001-02-03, POINT Z (5 5 5)@2001-02-05)} -- ...