iRODS zone
Currently, iRODS zones are a main system for storing data and their metadata in the platform.
The integrated rule-oriented data system (iRODS) is a solution which offers abstraction of physical data storage systems in the form of datasets and collections similar to the normal computer files and folders. Apart from this, it also implements a custom binary data transfer protocol which uses multiple parallel TCP streams to transfer data between the instances. Each zone uses an SQL server to store the hierarchy of datasets/collections and their metadata. The data transfers are encrypted using TLS.
Basic iRODS deployment is a so-called zone, which has its own set of users and a storage assigned on a local storage array. Each zone maintains its own tree of datasets/collections, their metadata and enforces its access rules. Multiple zones can set up a federation between them and use the same parallel transfer protocol to move data between them. Therefore, it is assumed that each location providing storage resources to the platform deploys its own iRODS zone and federates with others as needed.
Integration to LEXIS
The LEXIS Platform builds a set of APIs which use iRODS to store and transfer data and their metadata. This to an extent leverages work provided by EUDAT, mainly services such as B2SAFE, B2HANDLE and B2STAGE. EUDAT provides mainly capability to obtain a PID for a dataset in the platform through its B2HANDLE and traceable replication between remote locations.
Persistent unique identifiers (PIDs)
PIDs play a crucial role in the FAIR data management principles and in any sustainable data management plan. Within EUDAT systems, data can be directly addressed (and e.g. then retrieved) via B2HANDLE PIDs, and a few “key metadata” are directly stored in the each PID entry. In LEXIS, the B2HANDLE client is deployed on the iCAT servers as a Python library. PIDs can thus be assigned to any object or collection within iRODS, helping us to make the public results of LEXIS Workflows “FAIR”.
Managing users and projects through iRODS-Keycloak syncing mechanism
One of the current limitations of the iRODS system is inability to use OpenID/JWT tokens as means for authentication to its API directly. It is realised via a token broker and a user metadata stored in the user account. The user account stored in an iRODS zone corresponds to a user of the LEXIS Platform and contains a subject id (SID) of the user stored in the LEXIS Platform Keycloak. Using this mechanism, calls to the iRODS API can be authenticated by tokens issued by the LEXIS AAI service.
Implementation of this mechanism relies on a periodic synchronisation of the LEXIS projects and associated users to each zone, according to its assignment to a particular LEXIS Project. The script queries the Keycloak API and synchronises the projects and users. For projects, it maintains and enforces a structure of collections (folders) in each iRODS zone, and for users, it keeps the user accounts and their SIDs synchronised in each associated zone. This allows to maintain the access rights across the federation according to the user’s role in each project.