A serverless Terracotta deployment on AWS Lambda

Warning

While it is possible to use Terracotta entirely within AWS’ free tier, using AWS to deploy Terracotta will probably incur some charges to your account. Make sure to check the pricing policy of all relevant services for your specific region.

Environment setup

The easiest way to deploy Terracotta to AWS Lambda is by using Zappa. Zappa takes care of packaging Terracotta and its dependencies, and creates endpoints on AWS Lambda and API Gateway for us.

See also

Zappa works best on Linux. Windows 10 users can use the Windows Subsystem for Linux to deploy Terracotta.

Assuming you alredy have Terracotta installed, follow these steps to setup a deployment environment:

  1. Create and activate a new virtual environment (here called tc-deploy), e.g. via

    $ pip install virtualenv --user
    $ virtualenv ~/envs/tc-deploy --python=python3.10
    $ source ~/envs/tc-deploy/bin/activate
    

    If you do not have Python 3.10 installed, one way to get it is via the deadsnakes PPA (on Ubuntu):

    $ sudo add-apt-repository ppa:deadsnakes/ppa
    $ sudo apt update
    $ sudo apt install python3.10-dev
    

    Alternatively, you can use pyenv or conda.

  2. Install all relevant dependencies and Terracotta via

    $ pip install -r zappa_requirements.txt
    $ pip install -e .
    

    in the root of the Terracotta repository.

  3. Install the AWS command line tools via

    $ pip install awscli
    
  4. Configure access to AWS by running

    $ aws configure
    

    This requires that you have an account on AWS and a valid IAM user with programmatic access to all relevant resources.

Make sure that you have proper access to S3 and AWS Lambda before continuing, e.g. by running

$ aws s3 ls

Optional: Setup a MySQL server on RDS

Setting up a dedicated MySQL server for your Terracotta database is slightly more cumbersome than relying on SQLite, but has some decisive advantages:

  • Removes the overhead of downloading the SQLite database.

  • The contents of the database are accessible from the outside, and ingesting additional data is more straightforward.

  • Multiple Terracotta instances can use the same database server.

To set up a MySQL server on AWS, just follow these steps:

  1. Head over to RDS and create a new MySQL instance. You can either use one of the free-tier, dedicated MySQL servers, or the AWS Aurora MySQL flavor.

    The default settings for RDS are unfortunately far from optimal for Terracotta. You should tweak them by creating a new “parameter group” and setting

    wait_timeout = 1
    max_connections = 16000
    

    Don’t forget to apply the parameter group to your RDS instance.

  2. By default, your Terracotta Lambda function will not have access to the RDS instance. To allow access, you will have to add it to the same security group and subnets as your RDS instance. You can achieve this by adding a section like this one to your zappa_settings.toml (see below):

    [development.vpc_config]
    SubnetIds = ["subnet-xxxxxxxx","subnet-yyyyyyyy", "subnet-zzzzzzzz"]
    SecurityGroupIds = ["sg-xxxxxxxxxxxxxxxxx"]
    

    You can extract the correct IDs by clicking on your RDS instance.

  3. By adding the Lambda function to a VPC, it loses access to S3. To re-enable it, go to the VPC settings and create an endpoint for the VPC of the Lambda function, pointing to AWS S3 (e.g. com.amazonaws.eu-central-1.s3).

You are now ready to continue with the following step!

Populate data storage and database

The recommended way to ingest your optimized raster files into the database is through the Terracotta Python API. To initialize your database, just run something like

>>> import terracotta as tc

>>> # for sqlite
>>> driver = tc.get_driver('tc.sqlite')

>>> # for mysql
>>> driver = tc.get_driver('mysql://user:password@hostname/database')

>>> key_names = ('type', 'date', 'band')
>>> driver.create(key_names)

You can then ingest your raster files into the database:

>>> rasters = {
...     ('index', '20180101', 'ndvi'): 'S2_20180101_NDVI.tif',
...     ('reflectance', '20180101', 'B04'): 'S2_20180101_B04.tif',
... }
>>> for keys, raster_file in rasters.items():
...     driver.insert(keys, raster_file,
...                   override_path=f's3://tc-data/rasters/{raster_file}')

Verify that everything went well by executing

>>> driver.get_datasets()
{
    ('index', '20180101', 'ndvi'): 's3://tc-data/rasters/S2_20180101_NDVI.tif',
    ('reflectance', '20180101', 'B04'): 's3://tc-data/rasters/S2_20180101_B04.tif',
}

Finally, just make sure that your raster files end up in the place where Terracotta is looking for them (the paths returned by get_datasets()). You can e.g. use the AWS CLI:

$ aws s3 sync /path/to/rasters s3://tc-data/rasters
$ aws s3 cp /path/to/tc.sqlite s3://tc-data/tc.sqlite # if using sqlite

To verify whether everything went well, you can start a local Terracotta server:

$ terracotta serve s3://tc-data/tc.sqlite
$ terracotta connect localhost:5000

Deploy via Zappa

The Terracotta repository contains a template with sensible default values for most Zappa settings:

zappa_settings.toml.in
[development]
app_function = "terracotta.server.app.app"
aws_region = "eu-central-1"
profile_name = "default"
project_name = "tc-test"
runtime = "python3.9"
s3_bucket = "zappa-teracotta-dev"
exclude = [
    "*.gz",
    "*.rar",
    "boto3*",
    "botocore*",
    "s3transfer*",
    "awscli*",
    ".mypy_cache",
    ".pytest_cache",
    ".eggs"
]
debug = true
log_level = "DEBUG"
xray_tracing = true
touch_path = "/keys"

timeout_seconds = 30
memory_size = 500

    [development.aws_environment_variables]
    TC_DRIVER_PATH = "s3://tc-testdata/terracotta.sqlite"
    TC_DRIVER_PROVIDER = "sqlite-remote"
    TC_REPROJECTION_METHOD = "linear"
    TC_RESAMPLING_METHOD = "average"
    TC_XRAY_PROFILE = "true"

    [development.callbacks]
    settings = "zappa_settings_callback.check_integrity"


[production]
extends = "development"

# WARNING: using a cache cluster incurs additional costs (not covered by AWS free tier)
cache_cluster_enabled = true
cache_cluster_size = 0.5
cache_cluster_ttl = 3600

debug = false
log_level = "WARNING"
xray_tracing = true

Copy or rename zappa_settings.toml.in to zappa_settings.toml and insert the correct path to your Terracotta database into the environment variables. To execute the deployment, run

$ source ~/envs/tc-deploy/bin/activate
$ zappa deploy development

Congratulations, your Terracotta instance should now be reachable! You can verify the deployment via terracotta connect.