The bundled file ais.csv includes the Geom column, so it is loaded into all columns of the table with the COPY command as follows:
COPY AISInput FROM '/Workshop/ais_data/ais.csv' DELIMITER ',' CSV HEADER;
It is possible that the above command fails with a permission error. The reason for this is that COPY is a server capability, while the CSV file is on the client side. To overcome this issue, one can use the \copy command of psql as follows:
psql -d DanishAIS -c "\copy AISInput FROM '/Workshop/ais_data/ais.csv' DELIMITER ',' CSV HEADER;"
The full daily files published by the Danish Maritime Authority do not include the Geom column, which is computed below. Such a file is loaded by listing the columns explicitly so that Geom is left empty:
COPY AISInput(T, TypeOfMobile, MMSI, Latitude, Longitude, NavigationalStatus, ROT, SOG, COG, Heading, IMO, CallSign, Name, ShipType, CargoType, Width, Length, TypeOfPositionFixingDevice, Draught, Destination, ETA, DataSourceType, SizeA, SizeB, SizeC, SizeD) FROM '/home/mobilitydb/DanishAIS/aisdk-2023-06-01.csv' DELIMITER ',' CSV HEADER;
For bigger datasets, one could alternatively use the program pgloader.
We clean up some of the fields in the table and create spatial points with the following command.
UPDATE AISInput SET NavigationalStatus = CASE NavigationalStatus WHEN 'Unknown value' THEN NULL END, IMO = CASE IMO WHEN 'Unknown' THEN NULL END, ShipType = CASE ShipType WHEN 'Undefined' THEN NULL END, TypeOfPositionFixingDevice = CASE TypeOfPositionFixingDevice WHEN 'Undefined' THEN NULL END, Geom = ST_SetSRID(ST_MakePoint(Longitude, Latitude), 4326);
The above query took around 1.5 min on my desktop. Let's visualize the spatial points on QGIS.
Clearly, there are noise points that are far away from Denmark or even outside earth. This module will not discuss a thorough data cleaning. However, we do some basic cleaning in order to be able to construct trajectories:
Filter out points that are outside the window defined by bounds point(-16.1,40.18) and point(32.88, 84.17). This window is obtained from the specifications of the projection in https://epsg.io/25832.
Filter out the rows that have the same identifier (MMSI, T)
CREATE TABLE AISInputFiltered AS SELECT DISTINCT ON(MMSI, T) * FROM AISInput WHERE Longitude BETWEEN -16.1 AND 32.88 AND Latitude BETWEEN 40.18 AND 84.17; -- Query returned successfully: 11545496 rows affected, 00:45 minutes execution time. SELECT COUNT(*) FROM AISInputFiltered; --11545496