Skip to main content
Version: 2.0

How it works

Introduction

The main purpose of Data manager or rs-dam microservice is to populate the Regards meta-catalog. The microservice is composed of several modules : crawler module, dam module(main core), indexer module, model module and opensearch module.

To do so, this microservice uses :

  • Data sources to retrieve products from several catalogs. Data sources are configured through an UI or XML file (AIP, GeoJSON, external database: REGARDS UI).
  • Data models to transform and standardize crawled products before adding them into the meta catalog, represented by the Elasticsearch index.
  • Data access rights to calculate access rights of each product in the meta catalog. Access rights concern the permissions granted to a group of users for accessing a set of products that constitute a dataset ( see REGARDS UI).

This microservice is required to expose products managed by the OAIS Products Manager (rs-ingest microservice), the GeoJson Products Manager (rs-fem microservice), or products accessible from an external database or a web service.

Each Elasticsearch index stores products for each project or tenant created in the REGARDS application.

Meta catalog population

A scheduler is launched to iterate through each configured datasource, and Data Management performs the following tasks:

  1. Retrieve new products from the catalog: Using an implementation of the IDataSourcePlugin interface, the system retrieves new products and transforms the datasource-specific product format into the REGARDS standard format.

  2. Insert or update products in the Elasticsearch index: New or updated products are inserted into or modified within the appropriate Elasticsearch index.

  3. Update access groups for products: If needed, access groups are updated for products using an implementation of the IDataObjectAccessFilterPlugin interface to apply any necessary product filtering.

  4. Compute calculated attributes: If the data model contains calculated attributes, these are computed using an implementation of the IComputedAttribute interface.

info

Products aspiration is sequential, one data source after another.

Crawling is performed sequentially. The execution frequency is configurable by the user ( see Monitoring UI). The system determines the next datasource to be ingested by REGARDS.

Retrieve new products from data sources

To manage different data sources, an extension point (see the implementation of the IDataSourcePlugin interface) is used to handle the specific requirements for loading products in the REGARDS format :

  • Products: A DATA Entity, as defined by a model in REGARDS
  • A set of products: A DATASET Entity, as defined by a model in REGARDS

The following types of crawlers are available:

  • AIP Crawlers: These crawlers allow crawling of SIPs from the rs-ingest microservice. Incremental ingestion uses the last data update.
  • Feature Crawlers: These crawlers allow crawling of features from the rs-frm microservice. Incremental ingestion uses the last data update.
  • Database Crawlers: These crawlers allow crawling of data from an external database, with the following modes:
    • Non-incremental ingestion (not recommended)
    • Incremental ingestion based on the last data update
    • Incremental ingestion based on the data identifier

The user selects the incremental ingestion mode during datasource creation.

  • Web Source crawlers allows to crawl data from an OpenSearch Web Source: incremental aspiration bas on the data last update date.

The configuration of the extension point plugin can be used to define, as needed, the type of ingestion, the data source refresh rate (in seconds), and the overlap duration (in seconds) to prevent data loss.

Next, a mapping must be created between the datasource products and the REGARDS model data before indexing the products.

info

Configuration options are available for various connectors used with the crawler's external database ( see UI). The PostgreSQL connector is available as: postgresql-db-connection (1.0-SNAPSHOT).

Insert or Update new products in meta catalog

Dataset and Data entities are stored in a different Elasticsearch index for each project/tenant in REGARDS application. There is only one index for each tenant.

The Data entities are never stored in the REGARDS database. The Dataset entities are stored in the REGARDS database with the following information :

  • creation date and update date,
  • Identifier of the Uniform Resource Name (example: URN:AIP:DATASET:validation: 39c574a0-2ad6-4f47-9f4a-251d494892b1:V1)
  • model of the products in this dataset
  • Identifier of the dataset model
  • Identifier of the plugin used to load products from a data source
  • sub-setting criterion setting on a Dataset for Elasticsearch

Access rights calculation for dataset

note

Acces rights are defined for each dataset and group of users as follows :

  • Dataset and Data access
  • Dataset access
  • Full access to dataset, but partial access to Data (filtered by dynamic plugins)
  • No access

Any change in access rights between a group of users and a dataset has an impact on the meta catalog stored in Elasticsearch. Access rights are indicated in each dataset and products.

Access rights calculation are made when :

  • There is a data modification (dataset update, add or remove data object, ...)
  • There is a user group modification

Dynamic plugins (see extension point with IDataObjectAccessFilterPlugins interface) are made to re-calculate access rights every day. Access rights will be applied to data filtered by the OpenSearch query. The periodicity of re-calculation of dynamic plugins is set to once a day by default, but it is configurable in the microservice properties with the properties regards.access.rights.update.cron. The value is in standard cron format.

Elasticsearch index representation

The following tables show the structure of stocked entity in Elasticsearch index of REGARDS.

Entity for DATA type

NomTypeDescription
typetextEntity type: DATA
creationDateDate (format: date_optional_time)Creation date of entity
lastUpdateDate (format: date_optional_time)Update date of entity
dataSourceIdlongData source identifier
datasetModelNamestextList of dataset model names
groupstextList of group names for access right
idlongEntity technical identifier for database
internalbooleantrue if a entity of DATA type is internal(created from AIP) or false, external (created from external database)
ipIdtextIdentifier of Uniform Resource Name type (format: URN:StringId:DATA:tenant:UUID(entityId)
,order
[:REVrevision])
metadataObject(see details below)Information about a group access to a specific dataset for data objects
modelObjectEntity model
model.descriptiontextModel description
model.idlongModel technical identifier for database
model.nametextModel name (identical with model property of feature)
model.typetextModel type : DATA
newPointgeo_pointBounding box north west point
setPointgeo_pointBounding box south east point
openSearchSubsettingClausetextRepresentation of the above subsetting clause as an OpenSearch string request
tagstextList of tags (included related dataset)
wgs84geo_shapeGeometry projection on WGS84 crs
featureObject(see details below)Raw entity feature

Metadata for DATA type of entity

NameTypeDescription
groupsMapMap of group names with access right for dataset
groups.<name>.datasettextIdentifier of Uniform Resource Name type for dataset
groups.<name>.dataAccessRightbooleantrue if access right for the dataset; otherwise false
modelNamesMapMap of model names with dataset URN
modelNames.<name>.<URN>textIdentifier of Uniform Resource Name type for dataset

Feature for DATA type of entity

NameTypeDescription
sessionOwnertextSession owner
SessiontextSession name
virtualIdtextVirtual identifier of URN type in order to indicate if this is the last version (format: URN:StringId:DATA:tenant:UUID(entityId):LAST)
providerIdtextProvider identifier
entityTypetextEntity type : DATA
labeltextEntity label (sometimes identical provider identifier property)
modeltextModel name of entity (identical with name property of model)
filesObjectProduct-related entity files (example: thumbnail, quicklook, rawdata...)
tagstextList of tags (included dataset identifier)
lastbooleantrue if this the last version; otherwise false
versiontextEntity version
idtextIdentifier of Uniform Resource Name type (identical with IpId property)
geometryObjectInformation package geometry in GeoJSON RFC 7946 Format
geometry.coordinatesdoubleGeometry coordinates
geometry.typetextGeometry type (Point, MultiPoint, LineString, Polygon, MultiPolygon...)
geometry.bboxarrayGeometry bounding box. List of points coordinates [xmin, ymin, xmax, ymax] in Double type.
geometry.crstextCoordinate reference system. If not specified, WGS84 is considered as the default CRS
normalizedGeometryObjectGeometry but normalized to be used on a cylindrical project
normalizedGeometry.coordinatesdoiNormalized geometry coordinates
normalizedGeometry.typetextNormalized geometry type (Point, MultiPoint, LineString, Polygon, MultiPolygon...)
normalizedGeometry.bboxarrayGeometry bounding box. List of points coordinates [xmin, ymin, xmax, ymax] in Double type.
normalizedGeometry.crstextCoordinate reference system. If not specified, WGS84 is considered as the default CRS
typetextFeature
crstextCoordinate Reference System (default value: WGS84)
propertiesObjectDATA model attributes

Entity for DATASET type

NomTypeDescription
typetextEntity type: DATASET
creationDateDate (format: date_optional_time)Creation date of entity
lastUpdateDate (format: date_optional_time)Update date of entity
dataModeltextModel of Data type for entities included in this dataset
dataSourceIdlongData source identifier
groupstextList of group names for access right
idlongEntity technical identifier for database
internalbooleantrue if a entity of DATA type is internal(created from AIP) or false, external (created from external database)
ipIdtextIdentifier of Uniform Resource Name type (format: URN:StringId:DATASET:tenant:UUID(entityId)
,order
[:REVrevision])
metadataObject(see details below)Information about a group access to a specific dataset for data objects
modelObjectEntity model
model.descriptiontextModel description
model.idlongModel technical identifier for database
model.nametextModel name (identical with model property of feature)
model.typetextModel type : DATASET
newPointgeo_pointBounding box north west point
setPointgeo_pointBounding box south east point
openSearchSubsettingClausetextRepresentation of the above subsetting clause as an OpenSearch string request
plgConfDataSourceObjectPlugin configuration for the extension point (IDataSourcePlugin interface)
plgConfDataSource.activebooleanActive or not the plugin
plgConfDataSource.businessIdtextPlugin business identifier
plgConfDataSource.labeltextPlugin label
plgConfDataSource.parametersnestedConfiguration parameters of the plugin
plgConfDataSource.pluginIdtextPlugin identifier
plgConfDataSource.priorityOrderlongPriority order of the plugin.
plgConfDataSource.versiontextPlugin version
tagstextList of tags
wgs84geo_shapeGeometry projection on WGS84 crs
featureObject(see details below)Raw entity feature

Metadata for DATASET type of entity

NameTypeDescription
dataObjectsGroupsMapMap of group names with access right for dataset
dataObjectsGroups.<name>.groupNametextGroup name
dataObjectsGroups.<name>.dataFileAccessbooleantrue if access right for files of product; otherwise false
dataObjectsGroups.<name>.dataObjectAccessbooleantrue if access right for objects of products; otherwise false
dataObjectsGroups.<name>.dataAccessbooleantrue if access right for data of products; otherwise false
dataObjectsGroups.<name>.metaDataObjectAccessFilterPluginBusinessIdStringPlugin identifier for the extension point : IDataObjectAccessFilterPlugins

Feature for DATASET type of entities

NameTypeDescription
dataObjectsFilesAccessGrantedbooleantrue if granted Access for data object files; otherwise denied access
dataObjectsAccessGrantedbooleantrue if granted Access for data objects; otherwise denied access
licencetextLicence for dataset
virtualIdtextVirtual identifier of URN type in order to indicate if this is the last version (format: URN:StringId:DATASET:tenant:UUID(entityId):LAST)
providerIdtextProvider identifier
entityTypetextEntity type : DATASET
idtextIdentifier of Uniform Resource Name type (format: URN:StringId:DATASET:tenant:UUID(entityId)
,order
[:REVrevision])
labeltextLabel of dataset
modeltextModel name of entity (identical with name property of model)
filesObjectProduct-related entity files
tagstextList of tags
versionintegerEntity version
typetextFeature
propertiesObjectDATA model attributes