The Ideal Archive

The Ideal Digital Archival

This explores the ideal archive solution that balances redundancy, metadata, encryption, and cost. The goal is to find the ideal archiving solution that can cater to complex data requirements and large data storage volumes while still being cost-effective.

The archival system has a lot to do. The control center (the User Interface) simplifies all that data management, even when the data may be offline, or on slow connections. It provides different relocation methods to improve access, security and/or reliability.

The ideal solution should bring together the following:

Web-based UI to manage folder hierarchies and file attributes and search.
Spanning storage systems that combine online storage systems' speed, cost efficiency flexibility.

Free cloud storage systems such as google drive and Onedrive.
Lost cost storage systems such as Backblaze and Wasabi.
USB and other direct-attached storage systems.
Cloud storage systems such as S3, google cloud storage, azure storage and some backup systems

File encryption on the client rather than volume-based encryption. This introduces a range of key management challenges.

The following high-level features should cover everything needed from any archiving solution. More details are provided on other pages to describe their meaning and variations. Please comment on key requirements for your perfect archiving solution that aren't covered anywhere on this blog.

Redundancy

Control the number of copies
Control independence in redundant copies (physical location, vendor etc)
Price scales according to the total storage volume.
Supports local hard disks for speed and $/GB

Encryption

Encryption on the client (before upload)
Files are encrypted with separate keys.
Encryption can be controlled, data is protected for privacy by default.

Standards

Where possible, rely on common tools or libraries to avoid lock-in to a particular technology/vendor.
Otherwise, rely on clearly described formats and protocols, preferably commonly used open standards.
At a minimum, data from the resulting archive should be accessible with a simple text editor and basic scripting.

Metadata

File discovery and navigation using attributes and tags
Metadata can be accessed independently of data.
Metadata on encrypted files is encrypted with a different key.

Flexible Operations

Embeddable, so it can run without a dedicated server.
Asynchronous communication, so distributed systems can operate at different times.
Queue-based communication, so distributed systems can operate without direct connection.

Process/Workflow/Collaboration

Extensible rule system to recognize patterns in files and folders and apply attributes accordingly.
Data collection rules to ask users to choose attributes for file groups before importing the data
Approval-based workflows for folder groups, especially workflow to support the archival (deletion) process. This can be used to ensure sufficient metadata is collected for files before they are encrypted and locally deleted.

Other thoughts on the ideal backup strategies:

https://www.backblaze.com/blog/the-3-2-1-backup-strategy/

Search This Blog

Puddler