REGARDS dataprovider microservice conception
This guide talks about Product in the Dataprovider point of view. Ingest product is the real product at Regards point of view.
Dataprovider is used to acquire some products from scanning folders, through acquisition chains. Chains are configured to generate products with one or multiple scanned files.
An acquisition chain has several steps :
Start an acquisition chain
Acquisition chains can be triggered
- periodically, with totally configurable intervals
- manually, from admin UI or directly with Rest API.
- restarted (restart only errors detected from a previous acquisition), from admin UI or directly with Rest API
Scan step
Scan is the action to explore a folder (and a complete tree if needed, depending on which plugin you use), and analyse each file.
Multiple scan path can be indicated. These scan paths are associated to a "since date". Files with an earlier date will not be processed.
To scan a folder, a scan plugin job is used. There are several ways to scan a folder.
After a scan, the since date associated with the scan path is updated to the last file scanned date. Meaning that these files will not be scanned again if the chain is relaunched. Only modified files will be scanned in the future.
Scanned files will be used in the next steps.
Validation step
The validation step only validate each file according to a plugin algorithm. Check here for more details about validation plugin. All valid files go to the next step.
Product creation step
Files are wrapped inside product. Depending on the configuration, multiple files can be linked to a single product. In configuration, you can indicate that some files are optionals.
The product plugin generates a unique product name. See here for more information about this plugin.
Products with the same id can exist in case of an exisiting file modification. In this case, a new version of this product will be created.
These product are stored in dataprovider database with a particular state (see productState attribute of Product objects in Java) :
- ACQUIRING : If the product is not complete (missing files)
- COMPLETED : If the product is complete (without optional files - some files can still be added )
- FINISHED : If the product is complete (with optional files included - no more file can be added to the product)
- UPDATED : If the product was complete before the new file acquired. (means a file has been modified)
In acquisition chains configuration, you can configure the product to create. You can specify how many files are contained in products and some others informations :
- file type
- file is mandatory or not
- file mime type
SIP generation step
SIP generation is done by plugin. Check here for more details.
The purpose of this step si to create a SIP product with metadata associated to the files of the product.
Once the product has been generated in SIP format, it is sent to the OAIS catalog (the rs-ingest microservice) by AMQP.
The ingest chain to use can be selected in the data provider acquisition chain configuration.
The versioning method described in rs-ingest documentation is also configured
in the acquisition chain on the Data provider
microservice.
Ingest step
The Ingest microservice treats each product and sends an acknowledgement by AMQP message, which indicates if a Regards product is correctly ingested or not.
You can find here the Sip ingestion AMQP API.
Dataprovider catches these amqp message and updates the Dataprovider product state.
Following Dataprovider Product workflow
A product (in dataprovider) has a ProductSipState attribute. Values for this state are the following :
- NOT_SCHEDULED : SIP is not yet scheduled because the related product is not Finished or Completed See product creation step
- NOT_SCHEDULED_INVALID : SIP is not scheduled because the related product is Invalid See product creation step. Data provider has to resubmits its files to fix the problem.
- SCHEDULED : SIP generation has been scheduled as a job.
- SCHEDULED_INTERRUPTED : SIP generation interrupted by user
- GENERATION_ERROR : SIP has not been generated because an error occured.
- SUBMITTED : SIP has been generated and submitted to INGEST
- INGESTION_FAILED : SIP has been generated but INGEST refuses to treat it