Skip to main content
Version: 2.4.0

REGARDS file access microservice

Introduction

The file-access microservice is responsible for performing physical file operations on storage systems. Unlike RS-FILE-CATALOG, which manages file metadata and lifecycle, the file-access microservice focuses exclusively on the actual interaction with storage backends.

This separation of concerns enables:

  • Better scalability through distributed workers
  • Support for multiple storage technologies through plugins
  • Independent scaling of catalog management and physical storage operations

Architecture Overview

File-access follows a worker-based architecture where the microservice coordinates operations but delegates actual file transfers to specialized workers. This design enables horizontal scalability and isolates storage operations from the main service to prevent performance degradation during periods of high file storage activity.

Core Components

File Access Microservice

The main microservice is responsible for:

  • Receiving storage/retrieval/deletion requests from file-catalog or file-packager
  • Validating requests and determining the appropriate storage plugin
  • Dispatching work to storage workers via AMQP for file storage operations
  • Executing file retrieval, deletion, and availability checks directly through plugins
  • Collecting worker responses and forwarding results to requesters
  • Managing storage plugin configurations

For storage operations, the microservice delegates the work to workers (who download from source and upload to target). For retrieval, deletion, and availability checks, the microservice handles these directly through its plugins.

Storage Workers

Storage workers are independent applications that:

  • Listen for storage requests on dedicated AMQP queues
  • Download files from source URLs
  • Upload files to target storage locations
  • Extract file metadata (size, checksum, dimensions for images)
  • Report success or failure back to the file access microservice

Each storage location type requires a corresponding worker implementation:

Workers can be scaled independently based on workload. Multiple instances of the same worker type can process requests in parallel, ensuring high throughput for storage operations.

Storage Plugins

Storage plugins define how to interact with specific storage technologies. They implement the IStorageLocation interface and provide methods for:

  • Retrieving files: Synchronously reading files from storage (online storage only)
  • Deleting files: Removing files from storage
  • Managing restoration: Checking availability and restoring archived files (nearline storage only)

Available plugins include:

Storage vs Retrieval

Storage operations are delegated to workers for scalability, but retrieval operations (reading files) are handled directly by plugins within the microservice for better response times.

Workflows

File Storage Workflow

When file-catalog sends a FileStorageRequestReadyToProcessEvent, file-access identifies the target storage location and loads the corresponding plugin. The plugin generates worker configuration objects containing storage credentials, target paths, and file source URLs. The microservice publishes StorageWorkerRequestEvent messages to the worker manager, which routes them to available workers. Workers download files from source URLs, verify integrity (checksum), extract metadata, and upload files to target storage locations. Small files below the packaging threshold are stored locally instead of being uploaded immediately. Workers send ResponseEvent messages back with success/failure status, storage URLs, and extracted metadata. The microservice transforms these into StorageResponseEvent messages and sends them back to file-catalog.

Plugin and Worker Coordination

The relationship between plugins and workers is crucial for understanding file-access:

Plugin Responsibilities

  • Run inside the microservice JVM
  • Define storage-specific logic and configuration
  • Handle retrieval operations (reading files from storage)
  • Handle deletion operations (removing files from storage)
  • Handle availability checks (for nearline storage)
  • Generate worker configuration objects for storage operations

Worker Responsibilities

  • Run as independent applications
  • Handle storage operations only
  • Download files from source URLs
  • Upload files to target storage locations
  • Extract file metadata during storage
  • Can be scaled horizontally based on workload
  • Report results asynchronously via AMQP

Communication Flow

[file-catalog] 
↓ AMQP: FileStorageRequestReadyToProcessEvent
[file-access]
↓ AMQP: StorageWorkerRequestEvent
[Worker Manager]
↓ Routes to appropriate worker
[Storage Worker]
↓ Downloads file from source
↓ Uploads file to storage
↓ AMQP: ResponseEvent
[file-access]
↓ Processes response
↓ AMQP: StorageResponseEvent
[file-catalog]

Key Design Patterns

Plugin-Based Storage Abstraction

Storage operations are abstracted through plugins that implement a common interface. This pattern:

  • Enables support for multiple storage technologies (local, S3, S3 Glacier)
  • Isolates storage-specific logic from the main microservice
  • Allows adding new storage types without modifying core code
  • Provides configuration management for each storage location

Worker-Based Scalability

File storage is delegated to independent worker applications. This pattern:

  • Enables horizontal scaling of storage operations
  • Isolates storage activity from the main microservice JVM
  • Allows independent deployment and scaling of worker types
  • Prevents storage operations from degrading microservice performance

Small File Optimization

Small files receive special handling before storage:

  • Files below a threshold are stored locally first
  • They are later packaged by file-packager for efficient storage
  • Reduces API calls to expensive nearline storage systems
  • Improves overall storage efficiency and cost

Storage Location Configuration

Each storage location configured in REGARDS requires:

  • A Storage Plugin: Configured in file-access with:

    • Plugin type (Local, S3, S3Glacier)
    • Storage endpoint and credentials
    • Root storage path
    • File naming strategy
    • Small file packaging settings
    • Deletion policy
  • One or More Workers: Deployed as separate applications with:

    • Connection to AMQP for receiving requests
    • Access to the microservice configuration (for storage credentials)
    • Sufficient resources (CPU, memory, network bandwidth) for file operations

Workers must be compatible with their corresponding plugins. For example, the s3-glacier-storage-worker must be used with the S3GlacierStorage plugin.

Error Handling and Retry Logic

File-access implements robust error handling:

Storage Errors

  • Download failures: Workers report errors if they cannot download source files
  • Upload failures: Workers report errors if storage systems reject uploads
  • Checksum mismatches: Workers verify file integrity and report checksum errors
  • Worker unavailability: If no workers are active, requests are marked as errors

Retry Mechanism

  • Failed requests remain in the file-catalog database
  • File-catalog can retry failed operations
  • Workers implement exponential backoff for transient errors

Response Types

  • SUCCESS: File was successfully stored/deleted
  • ERROR: Operation failed with specific error code and message
  • DELAYED: Operation was accepted but will be processed later (for throttling)
  • SKIPPED: Operation was skipped (e.g., worker inactive)

Integration Points

With File Catalog

  • Input: FileStorageRequestReadyToProcessEvent requesting physical storage
  • Output: StorageResponseEvent confirming storage completion or failure
  • Small Files: Responses indicate if files were stored or deferred for packaging

With File Packager

  • Input: FileStorageRequestReadyToProcessEvent for archive storage
  • Output: StorageResponseEvent confirming archive storage

With Storage Workers

  • Input: ResponseEvent messages from workers confirming operations
  • Output: StorageWorkerRequestEvent dispatching storage operations
  • Metadata: Workers extract and return file metadata during storage