iRODS zone#
Currently, iRODS zones are a main system for storing data and their metadata in the platform.
The integrated rule-oriented data system (iRODS) is a solution which offers abstraction of physical data storage systems in the form of datasets and collections similar to the normal computer files and folders. Apart from this, it also implements a custom binary data transfer protocol which uses multiple parallel TCP streams to transfer data between the instances. Each zone uses an SQL server to store the hierarchy of datasets/collections and their metadata. The data transfers are encrypted using TLS.
Basic iRODS deployment is a so-called zone, which has its own set of users and a storage assigned on a local storage array. Each zone maintains its own tree of datasets/collections and their metadata and enforces its access rules. Multiple zones can set up a federation between them and use the same parallel transfer protocol to move data between them. Therefore, it is assumed that each location providing storage resources to the platform deploys its own iRODS zone and federates with others as needed.
Reasons for choosing iRODS#
When designing the data management system for the LEXIS Platform and choosing right data back-end technology, there were several requirements to be fulfilled. These included:
Unified access to LEXIS data in a file-system-like semantics
Reliability and redundancy
Support for diverse storage back-end systems
Support for the LEXIS AAI
Support for storage policies, for example selective data mirroring
Support for metadata and persistent identifiers in the system
Support for system access via REST APIs
Excellent open systems in this sector are, for example, iRODS, Onedata, Rucio and dCache. However, the best fitting system for the LEXIS Platform was iRODS. It stands out for its intuitive file-system-like semantics, flexibility in storage policies and metadata it stores, file-system-like view on all data, high-availability setup, its support for various storage back-ends, support for implementing storage and mirroring policies, various iRODS clients available and, most of all, for its integration in the feature-rich European projects.
Managing users and projects through iRODS-Keycloak syncing mechanism#
In previous version, one of the limitations of the iRODS system was its inability to use OpenID/JWT tokens as means for authentication to its API directly. It was realised via a token broker and a user metadata stored in the user account.
In the latest version of the DDI, there is iRODS HTTP API service integrated that supports OpenID/JWT authentication natively. This allows direct authentication of every call coming to the API without a need of additional broker service. The HTTP API provides a unified RESTful interface to the iRODS system that can be used by various clients, like DDI, portal, and others.
However, iRODS still requires a mechanism to periodically synchronise the LEXIS projects and their associated users across zones based on their assignment to a specific LEXIS project. The script queries UserOrg service and synchronises the projects and users.
For data resources, it maintains and enforces a structure of collections (folders) in each iRODS zone. For users, it keeps the user accounts synchronised in each associated zone. For projects, it keeps iRODS groups, where users are assigned to the groups. The project assignment to data resource is done via group ACL permissions on the collections. This approach allows to maintain the access rights across the federation according to the users’ role in each project.
Tracking data resources across iRODS zones#
In version 2.4.0 of the DDI, a new iRODS zone structure was introduced. Each data resource, representing storage allocated to a computational project, is mapped to its own iRODS collection.
This structure enables tracking of individual data resources, allowing us to notify users when a resource reaches its capacity limit. In addition, because LEXIS projects are represented as iRODS groups, data resources can be easily shared between LEXIS projects when needed.
Overall, this mechanism reflects real-world computational allocations and storage resources within the iRODS zone.