Distributed Data Infrastructure

This page provides information about the data management services in the LEXIS Platform. Component that holds all services together is called Distributed Data Infrastructure (DDI). The DDI is responsible for managing data and metadata in the LEXIS Platform. It provides a set of APIs for data upload, download, staging, transfer, and metadata management.

DDI is handling all data in the form of datasets. A dataset can contain entire tree of files and folders. Each dataset has a set of metadata values indexed in an OpenSearch instance and stored as iRODS metadata.

In this section we describe all services implementing the APIs and the way they are deployed on the locations.

Services in Distributed Data Infrastructure

Here is a list of services and APIs in the DDI. They are divided into two groups:

  • Services deployed in the LEXIS Platform Core
    • Metadata API

    • Transfer API including Transfer Worker

    • Staging API

    • Synchronisation script

    • OpenSearch + Additional services (e.g. Redis, PostgreSQL)

  • Services deployed on the target sites
    • iRODS zone

    • Staging worker

The DDI is based on locations. Each location represents a system connected to the platform. Platform supports multiple types of locations:

  • HPC cluster over SSH (SFTP)

  • NFS / local POSIX

  • iRODS zones