Loading the Data

The bundled file ais.csv includes the Geom column, so it is loaded into all columns of the table with the COPY command as follows:

COPY AISInput
FROM '/Workshop/ais_data/ais.csv' DELIMITER ',' CSV HEADER;

It is possible that the above command fails with a permission error. The reason for this is that COPY is a server capability, while the CSV file is on the client side. To overcome this issue, one can use the \copy command of psql as follows:

psql -d DanishAIS -c "\copy AISInput FROM '/Workshop/ais_data/ais.csv' DELIMITER ',' CSV HEADER;"

The full daily files published by the Danish Maritime Authority do not include the Geom column, which is computed below. Such a file is loaded by listing the columns explicitly so that Geom is left empty:

COPY AISInput(T, TypeOfMobile, MMSI, Latitude, Longitude, NavigationalStatus,
  ROT, SOG, COG, Heading, IMO, CallSign, Name, ShipType, CargoType, Width, Length,
  TypeOfPositionFixingDevice, Draught, Destination, ETA, DataSourceType,
  SizeA, SizeB, SizeC, SizeD)
FROM '/home/mobilitydb/DanishAIS/aisdk-2023-06-01.csv' DELIMITER ',' CSV HEADER;

For bigger datasets, one could alternatively use the program pgloader.

We clean up some of the fields in the table and create spatial points with the following command.

UPDATE AISInput SET
  NavigationalStatus = CASE NavigationalStatus WHEN 'Unknown value' THEN NULL END,
  IMO = CASE IMO WHEN 'Unknown' THEN NULL END,
  ShipType = CASE ShipType WHEN 'Undefined' THEN NULL END,
  TypeOfPositionFixingDevice = CASE TypeOfPositionFixingDevice
  WHEN 'Undefined' THEN NULL END,
  Geom = ST_SetSRID(ST_MakePoint(Longitude, Latitude), 4326);

The above query took around 1.5 min on my desktop. Let's visualize the spatial points on QGIS.

Figure 1.1. Visualizing the input points

Visualizing the input points

Clearly, there are noise points that are far away from Denmark or even outside earth. This module will not discuss a thorough data cleaning. However, we do some basic cleaning in order to be able to construct trajectories:

CREATE TABLE AISInputFiltered AS
SELECT DISTINCT ON(MMSI, T) *
FROM AISInput
WHERE Longitude BETWEEN -16.1 AND 32.88 AND Latitude BETWEEN 40.18 AND 84.17;
-- Query returned successfully: 11545496 rows affected, 00:45 minutes execution time.
SELECT COUNT(*) FROM AISInputFiltered;
--11545496