Basic Data Exploration

The total distance traveled by all ships:

SELECT SUM(length(Trip)) FROM Ships;
-- 807319558.5805709

This query uses the length function to compute per trip the sailing distance in meters. We then aggregate over all trips and calculate the sum. Let's have a more detailed look, and generate a histogram of trip lengths:

WITH buckets (bucketNo, RangeKM) AS (
  SELECT 1, floatspan '[0, 0]' UNION
  SELECT 2, floatspan '(0, 50)' UNION
  SELECT 3, floatspan '[50, 100)' UNION
  SELECT 4, floatspan '[100, 200)' UNION
  SELECT 5, floatspan '[200, 500)' UNION
  SELECT 6, floatspan '[500, 1500)' UNION
  SELECT 7, floatspan '[1500, 10000)' ),
histogram AS (
  SELECT bucketNo, RangeKM, count(MMSI) as freq
  FROM buckets left outer join Ships on (length(Trip)/1000) <@ RangeKM
  GROUP BY bucketNo, RangeKM
  ORDER BY bucketNo, RangeKM
)
SELECT bucketNo, RangeKM, freq,
  repeat('▪', ( freq::float / max(freq) OVER () * 30 )::int ) AS bar 
FROM histogram;
-- Total query runtime: 2.8 secs

 bucketno |    rangekm    | freq |              bar
----------+---------------+------+--------------------------------
        1 | [0, 0]        |  796 | ▪▪▪▪▪▪
        2 | (0, 50)       | 3752 | ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
        3 | [50, 100)     |  571 | ▪▪▪▪▪
        4 | [100, 200)    |  452 | ▪▪▪▪
        5 | [200, 500)    |  569 | ▪▪▪▪▪
        6 | [500, 1500)   |  100 | ▪
        7 | [1500, 10000) |   22 |
(7 rows)

Surprisingly there are trips with zero length. These are clearly noise that can be deleted. Also there are very many short trips, that are less than 50 km long. On the other hand, there are few long trips that are more than 1,500 km long. They look like noise. Normally one should validate more, but to simplify this module, we consider them as noise, and delete them.

DELETE FROM Ships
WHERE length(Trip) = 0 OR length(Trip) >= 1500000;
-- DELETE 820

Now the Ships table looks like Figure 1.3, “Ship trajectories after filtering”.

Figure 1.3. Ship trajectories after filtering

Ship trajectories after filtering

Let's have a look at the speed of the ships. There are two speed values in the data; the speed calculated from the spatiotemporal trajectory speed(Trip), and the SOG attribute. Optimally, the two will be the same. A small variance would still be OK, because of sensor errors. Note that both are temporal floats. In the next query, we compare the averages of the two speed values for every ship:

SELECT ABS(twavg(SOG) * 1.852 - twavg(speed(Trip))* 3.6 ) SpeedDifference
FROM Ships WHERE SOG IS NOT NULL AND
  ABS(twavg(SOG) * 1.852 - twavg(speed(Trip))* 3.6 ) > 10.0
ORDER BY SpeedDifference DESC;

  speeddifference
-------------------
 241.42049907716665
 134.61257387558112
 112.36643046964278
 110.10490793777619
  81.66118732332465
   81.5725669336415
  69.85832834619002
 57.232404771295045
 52.943341619001586
 52.921746684116904
...

The twavg computes a time-weighted average of a temporal float. It basically computes the area under the curve, then divides it by the time duration of the temporal float. By doing so, the speed values that remain for longer durations affect the average more than those that remain for shorter durations. Note that SOG is in knot, and Speed(Trip) is in m/s. The query converts both to km/h.

The query shows the first 10 ship trajectories of the 82 in the table that have a difference of more than 10 km/h. These trajectories are shown in Figure 1.4, “Ship trajectories with big difference between speed(Trip) and SOG. Again they look like noise, so we remove them with the following query

DELETE FROM Ships 
WHERE ABS(twavg(SOG) * 1.852 - twavg(speed(Trip))* 3.6 ) > 10;

Figure 1.4. Ship trajectories with big difference between speed(Trip) and SOG

Ship trajectories with big difference between speed(Trip) and SOG

Now we do a similar comparison between the calculated azimuth from the spatiotemporal trajectory, and the attribute COG:

SELECT ABS(twavg(COG) - twavg(azimuth(Trip)) * 180.0/pi()) AzimuthDifference 
FROM Ships 
WHERE ABS(twavg(COG) - twavg(azimuth(Trip)) * 180.0/pi()) > 45.0 
ORDER BY AzimuthDifference DESC;

 azimuthdifference
--------------------
  355.4200584570843
   348.213417943632
  333.7458943572906
  321.5644829906112
  309.6935360677792
  308.4444213637132
  295.5019204058323
  294.7215887580075
  267.8656764337898
 267.09343294055583
...

Here we see that the COG is not as accurate as was the case for the SOG attribute. More than 1600 trajectories have an azimuth difference bigger than 45 degrees. Figure 1.5, “Ship trajectories with big difference between azimuth(Trip) and COG visualizes them. Some of them look like noise, but some look fine. For simplicity, we keep them all.

Figure 1.5. Ship trajectories with big difference between azimuth(Trip) and COG

Ship trajectories with big difference between azimuth(Trip) and COG