Get started¶

Installation¶

On most systems, the easiest way to install Terracotta is through the Conda package manager. Just install conda, clone the repository, and execute the following command to create a new environment containing all dependencies and Terracotta:

$ conda env create -f environment.yml

If you already have Python 3.8 (or above) installed, you can just run

$ pip install -e .

in the root of the Terracotta repository instead.

Usage in a nutshell¶

The simplest way to use Terracotta is to cycle through the following commands:

terracotta optimize-rasters to pre-process your raster files;
terracotta ingest to create a database;
terracotta serve to spawn a server; and
terracotta connect to connect to this server.

The following sections guide you through these steps in more detail.

Data exploration through Terracotta¶

If you have some raster files lying around (e.g. in GeoTiff format), you can use Terracotta to serve them up.

Note

Terracotta benefits greatly from the cloud-optimized GeoTiff format. If your raster files are not cloud-optimized or you are unsure, you can preprocess them with terracotta optimize-rasters.

Assume you are in a folder containing some files named with the pattern S2A_<date>_<band>.tif. You can start a Terracotta server via

$ terracotta serve -r {}_{date}_{band}.tif

which will serve your data at http://localhost:5000. Try the following URLs and see what happens:

Because it is cumbersome to explore a Terracotta instance by manually constructing URLs, we have built a tool that lets you inspect it interactively:

$ terracotta connect localhost:5000

If you did everything correctly, a new window should open in your browser that lets you explore the dataset.

Creating a raster database¶

For Terracotta to perform well, it is important that some metadata like the extent of your datasets or the range of its values is computed and ingested into a database. There are two ways to populate this metadata store:

1. Through the CLI¶

A simple but limited way to build a database is to use terracotta ingest. All you need to do is to point Terracotta to a folder of (cloud-optimized) GeoTiffs:

$ terracotta ingest \
     /path/to/gtiffs/{sensor}_{name}_{date}_{band}.tif \
     -o terracotta.sqlite

This will create a new database with the keys sensor, name, date, and band (in this order), and ingest all files matching the given pattern into it.

For available options, see

$ terracotta ingest --help

Note: The CLI ingest command relies on naming conventions to match files against the specified key patterns. The value that matches is restricted to alphanumerics (i.e. letters and numbers). Other characters (e.g. the _ in the example above, but also -, +, etc) are considered separators between keys. So if your filename looks like sar_2019-06-24.tif then {sensor}_{date}.tif will not match.

Alternatives include renaming the files (e.g. to sar_20190624.tif), using an alternative pattern (e.g. {sensor}_{year}-{month}-{day}.tif) or using the Python API instead of the CLI to perform the ingest.

2. Using the Python API¶

Terracotta’s driver API gives you fine-grained control over ingestion and retrieval. Metadata can be computed at three different times:

Automatically during a call to driver.insert (fine for most applications);
Manually using driver.compute_metadata (in case you want to decouple computation and IO, or if you want to attach additional metadata); or
On demand when a dataset is requested for the first time (this is what we want to avoid through ingestion).

A first ingestion script using the Python API¶

The following script first defines three variables that we need to work with a terracotta database:

the database filename
the metadata keys that we will use
the metadata key-values and paths to raster files that we want to insert

Then it uses the Python API to connect to the database and insert the metadata.

example-first-ingestion-script.py¶

import os
from typing import Dict, List

import terracotta

# Define the location of the SQLite database
# (this will be created if it doesn't already exist)
DB_NAME = "./terracotta.sqlite"

# Define the list of keys that will be used to identify datasets.
# (these must match the keys of the "key_values" dicts defined in
# RASTER_FILES)
KEYS = ["type", "rp", "rcp", "epoch", "gcm"]

# Define a list of raster files to import
# (this is a list of dictionaries, each with a file path and the
# values for each key - make sure the order matches the order of
# KEYS defined above)
#
# This part of the script could be replaced with something that
# makes sense for your data - it could use a glob expression to
# find all TIFFs and a regular expression pattern to extract the
# key values, or it could read from a CSV, or use some other
# reference or metadata generating process.
RASTER_FILES = [
    {
        "key_values": {
            "type": "river",
            "rp": 250,
            "rcp": 4.5,
            "epoch": 2030,
            "gcm": "NorESM1-M",
        },
        "path": "./data/river__rp_250__rcp_4x5__epoch_2030__gcm_NorESM1-M.tif",
    },
    {
        "key_values": {
            "type": "river",
            "rp": 500,
            "rcp": 8.5,
            "epoch": 2080,
            "gcm": "NorESM1-M",
        },
        "path": "./data/river__rp_500__rcp_8x5__epoch_2080__gcm_NorESM1-M.tif",
    },
]


def load(db_name: str, keys: List[str], raster_files: List[Dict]):
    # get a TerracottaDriver that we can use to interact with
    # the database
    driver = terracotta.get_driver(db_name)

    # create the database file if it doesn't exist already
    if not os.path.isfile(db_name):
        driver.create(keys)

    # check that the database has the same keys that we want
    # to load
    assert list(driver.key_names) == keys, (driver.key_names, keys)

    # connect to the database
    with driver.connect():
        # insert metadata for each raster into the database
        for raster in raster_files:
            driver.insert(raster["key_values"], raster["path"])


if __name__ == "__main__":
    load(DB_NAME, KEYS, RASTER_FILES)

Download the script

You could start adapting this script by changing the variable definitions to match the data you want to ingest. A next step could be to read the metadata from a CSV or parse it from the filenames of the raster data.

A more advanced ingestion script using the Python API¶

The following script populates a database with raster files located in a local directory. It extracts the appropriate keys from the file name, ingests them into a database, and pushes the rasters and the resulting database into an S3 bucket.

example-ingestion-script.py¶

#!/usr/bin/env python3

import os
import re
import glob

import tqdm
import boto3
s3 = boto3.resource('s3')

import terracotta as tc

# settings
DB_NAME = 'terracotta.sqlite'
RASTER_GLOB = r'/path/to/rasters/*.tif'
RASTER_NAME_PATTERN = r'(?P<sensor>\w{2})_(?P<tile>\w{5})_(?P<date>\d{8})_(?P<band>\w+).tif'
KEYS = ('sensor', 'tile', 'date', 'band')
KEY_DESCRIPTIONS = {
    'sensor': 'Sensor short name',
    'tile': 'Sentinel-2 tile ID',
    'date': 'Sensing date',
    'band': 'Band or index name'
}
S3_BUCKET = 'tc-testdata'
S3_RASTER_FOLDER = 'rasters'
S3_PATH = f's3://{S3_BUCKET}/{S3_RASTER_FOLDER}'

driver = tc.get_driver(DB_NAME)

# create an empty database if it doesn't exist
if not os.path.isfile(DB_NAME):
    driver.create(KEYS, KEY_DESCRIPTIONS)

# sanity check
assert driver.key_names == KEYS

available_datasets = driver.get_datasets()
raster_files = list(glob.glob(RASTER_GLOB))
pbar = tqdm.tqdm(raster_files)

for raster_path in pbar:
    pbar.set_postfix(file=raster_path)

    raster_filename = os.path.basename(raster_path)

    # extract keys from filename
    match = re.match(RASTER_NAME_PATTERN, raster_filename)
    if match is None:
        raise ValueError(f'Input file {raster_filename} does not match raster pattern')

    keys = match.groups()

    # skip already processed data
    if keys in available_datasets:
        continue

    with driver.connect():
        # since the rasters will be served from S3, we need to pass the correct remote path
        driver.insert(keys, raster_path, override_path=f'{S3_PATH}/{raster_filename}')
        s3.meta.client.upload_file(raster_path, S3_BUCKET,
                                   f'{S3_RASTER_FOLDER}/{raster_filename}')

# upload database to S3
s3.meta.client.upload_file(DB_NAME, S3_BUCKET, DB_NAME)

Download the script

Note

The above scripts are just examples to show you some capabilities of the Terracotta Python API. More sophisticated solutions could e.g. attach additional metadata to database entries, process many rasters in parallel, or accept parameters from the command line.

Serving data from a raster database¶

After creating a database, you can use terracotta serve to serve the rasters inserted into it:

$ terracotta serve -d /path/to/database.sqlite

To explore the server, you can once again use terracotta connect:

$ terracotta connect localhost:5000

However, the server spawned by terracotta serve is indended for development and data exploration only. For sophisticated production deployments, have a look at our tutorials.

If you are unsure which kind of deployment to choose, we recommend you to try out a serverless deployment on AWS Lambda, via the remote SQLite driver.