We describe next the main steps in the generation of the BerlinMOD scenario. The generator uses multiple parameters that can be set to customize the generation process. We explain in detail these parameters in the section called “Tuning the Generator Parameters”. It is worth noting that the procedures explained in this section have been slightly simplified with respect to the actual procedures by removing ancillary details concerning the generation of tracing messages at various verbosity levels.
We start by creating a first set of tables for containing the generated data as follows.
CREATE TABLE VehicleNodes(VehicleId int PRIMARY KEY, HomeNode bigint NOT NULL, WorkNode bigint NOT NULL, NoNeighbours int); CREATE TABLE Vehicles(VehicleId int PRIMARY KEY, Licence text, VehicleType text, Model text); CREATE TABLE Destinations(VehicleId int, SourceNode bigint, TargetNode bigint, PRIMARY KEY (VehicleId, SourceNode, TargetNode)); CREATE TABLE Licences(VehicleId int PRIMARY KEY, Licence text, VehicleType text); CREATE TABLE Neighbourhoods(VehicleId int, SeqNo int, Node bigint NOT NULL, PRIMARY KEY (VehicleId, SeqNo)); -- Get the number of nodes SELECT COUNT(*) INTO noNodes FROM Nodes; FOR vehId IN 1..noVehicles LOOP -- Fill the Vehicles table IF nodeChoice = 'Network Based' THEN homeNode = random_int(1, noNodes); workNode = random_int(1, noNodes); ELSE homeNode = berlinmod_selectHomeNode(); workNode = berlinmod_selectWorkNode(); END IF; IF homeNode IS NULL OR workNode IS NULL THEN RAISE EXCEPTION ' The home and the work nodes cannot be NULL'; END IF; INSERT INTO Vehicles VALUES (vehId, homeNode, workNode); -- Fill the Destinations table INSERT INTO Destinations(VehicleId, SourceNode, TargetNode) VALUES (vehId, homeNode, workNode), (vehId, workNode, homeNode); -- Fill the Licences table licence = berlinmod_createLicence(vehId); type = berlinmod_vehicleType(); model = berlinmod_vehicleModel(); INSERT INTO Licences VALUES (vehId, licence, type, model); -- Fill the Neighbourhoods table INSERT INTO Neighbourhoods WITH Temp AS ( SELECT vehId AS VehicleId, n2.NodeId AS Node FROM Nodes n1, Nodes n2 WHERE n1.NodeId = homeNode AND n1.NodeId <> n2.NodeId AND ST_DWithin(n1.Geom, n2.Geom, P_NEIGHBOURHOOD_RADIUS) ) SELECT vehId, ROW_NUMBER() OVER () AS SeqNo, Node FROM Temp; END LOOP; CREATE UNIQUE INDEX Vehicles_VehicleId_idx ON Vehicles USING BTREE(VehicleId); CREATE UNIQUE INDEX Neighbourhoods_pkey_idx ON Neighbourhoods USING BTREE(VehicleId, SeqNo); UPDATE Vehicles v SET NoNeighbours = (SELECT COUNT(*) FROM Neighbourhoods n WHERE n.VehicleId = v.VehicleId);
We start by storing in the Vehicles
table the home and the work node of each vehicle. Depending on the value of the variable nodeChoice
, we chose these nodes either with a uniform distribution among all nodes in the network or we call specific functions that take into account population and employment statistics in the area covered by the generation. We then keep track in the Destinations
table of the two trips to and from work and we store in the Licences
table information describing the vehicle. Finally, we compute in the Neighbourhoods
table the set of nodes that are within a given distance of the home node of every vehicle. This distance is stated by the parameter P_NEIGHBOURHOOD_RADIUS
, which is set by default to 3 Km.
We create now auxiliary tables containing benchmarking data. The number of rows these tables is determined by the parameter P_SAMPLE_SIZE
, which is set by default to 100. These tables are used by the BerlinMOD benchmark to assess the performance of various types of queries.
CREATE TABLE Points(PointId int PRIMARY KEY, Geom geometry(Point)); INSERT INTO Points WITH Temp AS ( SELECT PointId, random_int(1, noNodes) AS Node FROM generate_series(1, P_SAMPLE_SIZE) PointId ) SELECT t.PointId, n.Geom FROM Temp t, Nodes n WHERE t.Node = n.PointId; CREATE TABLE Regions(RegionId int PRIMARY KEY, Geom geometry(Polygon)); INSERT INTO Regions WITH Temp AS ( SELECT RegionId, random_int(1, noNodes) AS Node FROM generate_series(1, P_SAMPLE_SIZE) RegionId ) SELECT t.RegionId, ST_Buffer(n.Geom, random_int(1, 997) + 3.0, random_int(0, 25)) AS Geom FROM Temp t, Nodes n WHERE t.Node = n.RegionId; CREATE TABLE Instants(InstantId int PRIMARY KEY, Instant timestamptz); INSERT INTO Instants SELECT InstantId, startDay + (random() * noDays) * interval '1 day' AS Instant FROM generate_series(1, P_SAMPLE_SIZE) InstantId; CREATE TABLE Periods(PeriodId int PRIMARY KEY, period tstzspan); INSERT INTO Periods WITH Temp AS ( SELECT PeriodId, startDay + (random() * noDays) * interval '1 day' AS Instant FROM generate_series(1, P_SAMPLE_SIZE) PeriodId ) SELECT PeriodId, span(instant, instant + abs(random_gauss()) * interval '1 day', true, true) AS Period FROM Temp;
We generate now the leisure trips. There is at most one leisure trip in the evening of a week day and at most two leisure trips each day of the weekend, one in the morning and another one in the afternoon. Each leisure trip is composed of 1 to 3 destinations. The leisure trip starts and ends at the home node and visits successively these destinations. In our implementation, the various subtrips from a source to a destination node of a leisure trip are encoded independently, contrary to what is done in Secondo where a leisure trip is encoded as a single trip and stops are added between successive destinations.
CREATE TABLE LeisureTrip(VehicleId int, StartDate date, TripNo int, SeqNo int, SourceNode bigint, TargetNode bigint, PRIMARY KEY (VehicleId, StartDate, TripNo, SeqNo)); -- Loop for every vehicle FOR vehId IN 1..noVehicles LOOP -- Get home node and number of neighbour nodes SELECT home, NoNeighbours INTO homeNode, noNeigh FROM Vehicles v WHERE v.VehicleId = i; day = startDay; -- Loop for every generation day FOR dayNo IN 1..noDays LOOP weekday = date_part('dow', day); -- Generate leisure trips (if any) -- 1: Monday, 5: Friday IF weekday BETWEEN 1 AND 5 THEN noLeisTrips = 1; ELSE noLeisTrips = 2; END IF; -- Loop for every leisure trip in a day (1 or 2) FOR leis IN 1..noLeisTrips LOOP -- Generate a leisure trip with a 40% probability IF random() <= 0.4 THEN -- Select a number of destinations between 1 and 3 IF random() < 0.8 THEN noDest = 1; ELSIF random() < 0.5 THEN noDest = 2; ELSE noDest = 3; END IF; sourceN = homeN; FOR dest IN 1..noDest + 1 LOOP IF dest <= noDest THEN targetNode = berlinmod_selectDestNode(i, noNeigh, noNodes); ELSE targetNode = homeNode; END IF; IF targetNode IS NULL THEN RAISE EXCEPTION ' Destination node cannot be NULL'; END IF; INSERT INTO LeisureTrip VALUES (vehId, day, leis, dest, sourceN, targetN); INSERT INTO Destinations(VehicleId, SourceNode, TargetNode) VALUES (vehId, sourceNode, targetNode) ON CONFLICT DO NOTHING; sourceNode = targetNode; END LOOP; END IF; END LOOP; day = day + 1 * interval '1 day'; END LOOP; END LOOP; CREATE INDEX Destinations_vehicle_idx ON Destinations USING BTREE(VehicleId);
For each vehicle and each day, we determine the number of potential leisure trips depending on whether it is a week or weekend day. A leisure trip is generated with a probability of 40% and is composed of 1 to 3 destinations. These destinations are chosen so that 80% of the destinations are from the neighbourhood of the vehicle and 20% are from the complete graph. The information about the composition of the leisure trips is then added to the LeisureTrip
and Destinations
tables.
We then call pgRouting to generate the path for each source and destination nodes in the Destinations
table.
CREATE TABLE Paths( -- This attribute is needed for partitioning the table for big scale factors vehicle int, -- The following attributes are generated by pgRouting start_vid bigint, end_vid bigint, seq int, node bigint, edge bigint, -- The following attributes are filled from the Edges table Geom geometry NOT NULL, speed float NOT NULL, category int NOT NULL, PRIMARY KEY (VehicleId, start_vid, end_vid, seq)); -- Select query sent to pgRouting IF pathMode = 'Fastest Path' THEN query1_pgr = 'SELECT id, source, target, cost_s AS cost,' 'reverse_cost_s as reverse_cost FROM edges'; ELSE query1_pgr = 'SELECT id, source, target, length_m AS cost,' 'length_m * sign(reverse_cost_s) as reverse_cost FROM edges'; END IF; -- Get the total number of paths and number of calls to pgRouting SELECT COUNT(*) INTO noPaths FROM (SELECT DISTINCT source, target FROM Destinations) AS t; noCalls = ceiling(noPaths / P_PGROUTING_BATCH_SIZE::float); FOR i IN 1..noCalls LOOP query2_pgr = format('SELECT DISTINCT source, target FROM Destinations ' 'ORDER BY source, target LIMIT %s OFFSET %s', P_PGROUTING_BATCH_SIZE, (i - 1) * P_PGROUTING_BATCH_SIZE); INSERT INTO Paths(VehicleId, start_vid, end_vid, seq, node, edge, Geom, speed, category) WITH Temp AS ( SELECT start_vid, end_vid, path_seq, node, edge FROM pgr_dijkstra(query1_pgr, query2_pgr, true) WHERE edge > 0 ) SELECT d.VehicleId, start_vid, end_vid, path_seq, node, edge, -- adjusting direction of the edge traversed CASE WHEN t.node = e.source THEN e.Geom ELSE ST_Reverse(e.Geom) END AS Geom, e.maxspeed_forward AS speed, berlinmod_roadCategory(e.tag_id) AS category FROM Destinations d, Temp t, Edges e WHERE d.source = t.start_vid AND d.target = t.end_vid AND e.id = t.edge; END LOOP; CREATE INDEX Paths_vehicle_start_vid_end_vid_idx ON Paths USING BTREE(VehicleId, start_vid, end_vid);
The variable pathMode
determines whether pgRouting computes either the fastest or the shortest path from a source to a destination node. Then, we determine the number of calls to pgRouting. Indeed, depending on the available memory of the computer, there is a limit in the number of paths to be computed by pgRouting in a single call. The paths are stored in the Paths
table. In addition to the columns generated by pgRouting, we add the geometry (adjusting the direction if necessary), the maximum speed, and the category of the edge. The BerlinMOD data generator considers three road categories: side road, main road, and freeway. The OSM road types are mapped to one of these categories in the function berlinmod_roadCategory
.
We are now ready to generate the trips.
DROP TYPE IF EXISTS step CASCADE; CREATE TYPE step AS (linestring geometry, maxspeed float, category int); CREATE FUNCTION berlinmod_createTrips(noVehicles int, noDays int, startDay date, disturbData boolean) RETURNS void LANGUAGE plpgsql STRICT AS $$ DECLARE /* Declaration of variables and parameters ... */ BEGIN DROP TABLE IF EXISTS Trips; CREATE TABLE Trips(VehicleId int, day date, seq int, source bigint, target bigint, Trip tgeompoint, trajectory geometry, PRIMARY KEY (VehicleId, day, seq)); -- Loop for each vehicle FOR i IN 1..noVehicles LOOP -- Get home -> work and work -> home paths SELECT home, work INTO homeNode, workNode FROM Vehicles v WHERE v.VehicleId = i; SELECT array_agg((Geom, speed, category)::step ORDER BY seq) INTO homework FROM Paths WHERE VehicleId = i AND start_vid = homeNode AND end_vid = workNode; SELECT array_agg((Geom, speed, category)::step ORDER BY seq) INTO workhome FROM Paths WHERE VehicleId = i AND start_vid = workNode AND end_vid = homeNode; d = startDay; -- Loop for each generation day FOR j IN 1..noDays LOOP weekday = date_part('dow', d); -- 1: Monday, 5: Friday IF weekday BETWEEN 1 AND 5 THEN -- Crete trips home -> work and work -> home t = d + time '08:00:00' + CreatePauseN(120); createTrip(homework, t, disturbData); INSERT INTO Trips VALUES (i, d, 1, homeNode, workNode, trip, trajectory(trip)); t = d + time '16:00:00' + CreatePauseN(120); trip = createTrip(workhome, t, disturbData); INSERT INTO Trips VALUES (i, d, 2, workNode, homeNode, trip, trajectory(trip)); tripSeq = 2; END IF; -- Get the number of leisure trips SELECT COUNT(DISTINCT tripNo) INTO noLeisTrip FROM LeisureTrip L WHERE L.VehicleId = i AND L.day = d; -- Loop for each leisure trip (0, 1, or 2) FOR k IN 1..noLeisTrip LOOP IF weekday BETWEEN 1 AND 5 THEN t = d + time '20:00:00' + CreatePauseN(90); leisNo = 1; ELSE -- Determine whether it is a morning/afternoon (1/2) trip IF noLeisTrip = 2 THEN leisNo = k; ELSE SELECT tripNo INTO leisNo FROM LeisureTrip L WHERE L.VehicleId = i AND L.day = d LIMIT 1; END IF; -- Determine the start time IF leisNo = 1 THEN t = d + time '09:00:00' + CreatePauseN(120); ELSE t = d + time '17:00:00' + CreatePauseN(120); END IF; END IF; -- Get the number of subtrips (number of destinations + 1) SELECT count(*) INTO noSubtrips FROM LeisureTrip L WHERE L.VehicleId = i AND L.tripNo = leisNo AND L.day = d; FOR m IN 1..noSubtrips LOOP -- Get the source and destination nodes of the subtrip SELECT source, target INTO sourceNode, targetNode FROM LeisureTrip L WHERE L.VehicleId = i AND L.day = d AND L.tripNo = leisNo AND L.seq = m; -- Get the path SELECT array_agg((Geom, speed, category)::step ORDER BY seq) INTO Path FROM Paths p WHERE VehicleId = i AND start_vid = sourceNode AND end_vid = targetNode; trip = createTrip(Path, t, disturbData); tripSeq = tripSeq + 1; INSERT INTO Trips VALUES (i, d, tripSeq, sourceNode, targetNode, trip, trajectory(trip)); -- Add a delay time in [0, 120] min using a bounded Gaussian distribution t = endTimestamp(trip) + createPause(); END LOOP; END LOOP; d = d + 1 * interval '1 day'; END LOOP; END LOOP; RETURN; END; $$
We create a type step
which is a record composed of the geometry, the maximum speed, and the category of an edge. The procedure loops for each vehicle and each day and calls the procedure createTrip
for creating the trips. If the day is a weekday, we generate the trips from home to work and from work to home starting, respectively, at 8 am and 4 pm plus a random non-zero duration of 120 minutes using a uniform distribution. We then generate the leisure trips. During the week days, the possible evening leisure trip starts at 8 pm plus a random random non-zero duration of 90 minutes, while during the weekend days, the two possible morning and afternoon trips start, respectively, at 9 am and 5 pm plus a random non-zero duration of 120 minutes. Between the multiple destinations of a leisure trip we add a delay time of maximum 120 minutes using a bounded Gaussian distribution.
Finally, we explain the procedure that create a trip.
CREATE OR REPLACE FUNCTION createTrip(edges step[], startTime timestamptz, disturbData boolean) RETURNS tgeompoint LANGUAGE plpgsql STRICT AS $$ DECLARE /* Declaration of variables and parameters ... */ BEGIN srid = ST_SRID((edges[1]).linestring); p1 = ST_PointN((edges[1]).linestring, 1); x1 = ST_X(p1); y1 = ST_Y(p1); curPos = p1; t = startTime; instants[1] = tgeompoint_inst(p1, t); curSpeed = 0; l = 2; noEdges = array_length(edges, 1); -- Loop for every edge FOR i IN 1..noEdges LOOP -- Get the information about the current edge linestring = (edges[i]).linestring; maxSpeedEdge = (edges[i]).maxSpeed; category = (edges[i]).category; -- Determine the number of segments SELECT array_agg(Geom ORDER BY Path) INTO points FROM ST_DumpPoints(linestring); noSegs = array_length(points, 1) - 1; -- Loop for every segment FOR j IN 1..noSegs LOOP p2 = points[j + 1]; x2 = ST_X(p2); y2 = ST_Y(p2); -- If there is a segment ahead in the current edge compute the angle of the turn -- and the maximum speed at the turn depending on this angle IF j < noSegs THEN p3 = points[j + 2]; alpha = degrees(ST_Angle(p1, p2, p3)); IF abs(mod(alpha::numeric, 360.0)) < P_EPSILON THEN maxSpeedTurn = maxSpeedEdge; ELSE maxSpeedTurn = mod(abs(alpha - 180.0)::numeric, 180.0) / 180.0 * maxSpeedEdge; END IF; END IF; -- Determine the number of fractions segLength = ST_Distance(p1, p2); IF segLength < P_EPSILON THEN RAISE EXCEPTION 'Segment % of edge % has zero length', j, i; END IF; fraction = P_EVENT_LENGTH / segLength; noFracs = ceiling(segLength / P_EVENT_LENGTH); -- Loop for every fraction k = 1; WHILE k < noFracs LOOP -- If the current speed is zero, apply an acceleration event IF curSpeed <= P_EPSILON_SPEED THEN -- If we are not approaching a turn IF k < noFracs THEN curSpeed = least(P_EVENT_ACC, maxSpeedEdge); ELSE curSpeed = least(P_EVENT_ACC, maxSpeedTurn); END IF; ELSE -- If the current speed is not zero, apply a deceleration or a stop event -- with a probability proportional to the maximun speed IF random() <= P_EVENT_C / maxSpeedEdge THEN IF random() <= P_EVENT_P THEN -- Apply a stop event curSpeed = 0.0; ELSE -- Apply a deceleration event curSpeed = curSpeed * random_binomial(20, 0.5) / 20.0; END IF; ELSE -- Otherwise, apply an acceleration event IF k = noFracs AND j < noSegs THEN maxSpeed = maxSpeedTurn; ELSE maxSpeed = maxSpeedEdge; END IF; curSpeed = least(curSpeed + P_EVENT_ACC, maxSpeed); END IF; END IF; -- If speed is zero add a wait time IF curSpeed < P_EPSILON_SPEED THEN waitTime = random_exp(P_DEST_EXPMU); IF waitTime < P_EPSILON THEN waitTime = P_DEST_EXPMU; END IF; t = t + waitTime * interval '1 sec'; ELSE -- Otherwise, move current position towards the end of the segment IF k < noFracs THEN x = x1 + ((x2 - x1) * fraction * k); y = y1 + ((y2 - y1) * fraction * k); IF disturbData THEN dx = (2 * P_GPS_STEPMAXERR * rand()) - P_GPS_STEPMAXERR; dy = (2 * P_GPS_STEPMAXERR * rand()) - P_GPS_STEPMAXERR; errx = errx + dx; erry = erry + dy; IF errx > P_GPS_TOTALMAXERR THEN errx = P_GPS_TOTALMAXERR; END IF; IF errx < - 1 * P_GPS_TOTALMAXERR THEN errx = -1 * P_GPS_TOTALMAXERR; END IF; IF erry > P_GPS_TOTALMAXERR THEN erry = P_GPS_TOTALMAXERR; END IF; IF erry < -1 * P_GPS_TOTALMAXERR THEN erry = -1 * P_GPS_TOTALMAXERR; END IF; x = x + dx; y = y + dy; END IF; curPos = ST_SetSRID(ST_Point(x, y), srid); curDist = P_EVENT_LENGTH; ELSE curPos = p2; curDist = segLength - (segLength * fraction * (k - 1)); END IF; travelTime = (curDist / (curSpeed / 3.6)); IF travelTime < P_EPSILON THEN travelTime = P_DEST_EXPMU; END IF; t = t + travelTime * interval '1 sec'; k = k + 1; END IF; instants[l] = tgeompoint_inst(curPos, t); l = l + 1; END LOOP; p1 = p2; x1 = x2; y1 = y2; END LOOP; -- If we are not already in a stop, apply a stop event with a probability -- depending on the category of the current edge and the next one (if any) IF curSpeed > P_EPSILON_SPEED AND i < noEdges THEN nextCategory = (edges[i + 1]).category; IF random() <= P_DEST_STOPPROB[category][nextCategory] THEN curSpeed = 0; waitTime = random_exp(P_DEST_EXPMU); IF waitTime < P_EPSILON THEN waitTime = P_DEST_EXPMU; END IF; t = t + waitTime * interval '1 sec'; instants[l] = tgeompoint_inst(curPos, t); l = l + 1; END IF; END IF; END LOOP; RETURN tgeompoint_seq(instants, true, true, true); END; $$
The procedure receives as first argument a path from a source to a destination node, which is an array of triples composed of the geometry, the maximum speed, and the category of an edge of the path. The other arguments are the timestamp at which the trip starts and a Boolean value determining whether the points composed the trip are disturbed to simulate GPS errors. The output of the function is a temporal geometry point following this path. The procedure loops for each edge of the path and determines the number of segments of the edge, where a segment is a straight line defined by two consecutive points. For each segment, we determine the angle between the current segment and the next one (if any) to compute the maximum speed at the turn. This is determined by multiplying the maximum speed of the segment by a factor proportional to the angle so that the factor is 1.00 at both 0° and 360° and is 0.0 at 180°. Examples of values of degrees and the associated factor are given next.
0: 1.00, 5: 0.97, 45: 0.75, 90: 0.50, 135: 0.25, 175: 0.03 180: 0.00, 185: 0.03, 225: 0.25, 270: 0.50, 315: 0.75, 355: 0.97, 360: 0.00
Each segment is divided in fractions of length P_EVENT_LENGTH
, which is by default 5 meters. We then loop for each fraction and choose to add one event that can be an acceleration, a deceleration, or a stop event. If the speed of the vehicle is zero, only an accelation event can happen. For this, we increase the current speed with the value of P_EVENT_ACC
, which is by default 12 Km/h, and verify that the speed is not greater than the maximum speed of either the edge or the next turn for the last fraction. Otherwise, if the current speed is not zero, we apply a deceleration or a stop event with a probability proportional to the maximum speed of the edge, otherwise we apply an acceleration event. After applying the event, if the speed is zero we add a waiting time with a random exponential distribution with mean P_DEST_EXPMU
, which is by default 1 second. Otherwise, we move the current position towards the end of the segment and, depending on the variable disturbData
, we disturbe the new position to simulate GPS errors. The timestamp at which the vehicle reaches the new position is determined by dividing the distance traversed by the current speed. Finally, at the end of each segment, if the current speed is not zero, we add a stop event depending on the categories of the current segment and the next one. This is determined by a transition matrix given by the parameter P_DEST_STOPPROB
.