Skip to main content
Version: 2.0

REGARDS dataprovider microservice conception

note

This guide talks about Product in the Dataprovider point of view. Ingest product is the real product at Regards point of view.

Dataprovider is used to acquire some products from scanning folders, through acquisition chains. Chains are configured to generate products with one or multiple scanned files.

An acquisition chain has several steps :

Data provider plugins

Start an acquisition chain

Acquisition chains can be triggered

  • periodically, with totally configurable intervals
  • manually, from admin UI or directly with Rest API.
  • restarted (restart only errors detected from a previous acquisition), from admin UI or directly with Rest API

Scan step

Scan is the action to explore a folder (and a complete tree if needed, depending on which plugin you use), and analyse each file.

Multiple scan path can be indicated. These scan paths are associated to a "since date". Files with an earlier date will not be processed.

To scan a folder, a scan plugin job is used. There are several ways to scan a folder.

info

After a scan, the since date associated with the scan path is updated to the last file scanned date. Meaning that these files will not be scanned again if the chain is relaunched. Only modified files will be scanned in the future.

Scanned files will be used in the next steps.

Validation step

The validation step only validate each file according to a plugin algorithm. Check here for more details about validation plugin. All valid files go to the next step.

Product creation step

Files are wrapped inside product. Depending on the configuration, multiple files can be linked to a single product. In configuration, you can indicate that some files are optionals.

The product plugin generates a unique product name. See here for more information about this plugin.

Products with the same id can exist in case of an exisiting file modification. In this case, a new version of this product will be created.

These product are stored in dataprovider database with a particular state (see productState attribute of Product objects in Java) :

  • ACQUIRING : If the product is not complete (missing files)
  • COMPLETED : If the product is complete (without optional files - some files can still be added )
  • FINISHED : If the product is complete (with optional files included - no more file can be added to the product)
  • UPDATED : If the product was complete before the new file acquired. (means a file has been modified)
info

In acquisition chains configuration, you can configure the product to create. You can specify how many files are contained in products and some others informations :

  • file type
  • file is mandatory or not
  • file mime type

SIP generation step

SIP generation is done by plugin. Check here for more details.

The purpose of this step si to create a SIP product with metadata associated to the files of the product.

Once the product has been generated in SIP format, it is sent to the OAIS catalog (the rs-ingest microservice) by AMQP.

The ingest chain to use can be selected in the data provider acquisition chain configuration.

The versioning method described in rs-ingest documentation is also configured in the acquisition chain on the Data provider microservice.

Ingest step

The Ingest microservice treats each product and sends an acknowledgement by AMQP message, which indicates if a Regards product is correctly ingested or not.

You can find here the Sip ingestion AMQP API.

Dataprovider catches these amqp message and updates the Dataprovider product state.

Following Dataprovider Product workflow

A product (in dataprovider) has a ProductSipState attribute. Values for this state are the following :

  • NOT_SCHEDULED : SIP is not yet scheduled because the related product is not Finished or Completed See product creation step
  • NOT_SCHEDULED_INVALID : SIP is not scheduled because the related product is Invalid See product creation step. Data provider has to resubmits its files to fix the problem.
  • SCHEDULED : SIP generation has been scheduled as a job.
  • SCHEDULED_INTERRUPTED : SIP generation interrupted by user
  • GENERATION_ERROR : SIP has not been generated because an error occured.
  • SUBMITTED : SIP has been generated and submitted to INGEST
  • INGESTION_FAILED : SIP has been generated but INGEST refuses to treat it