Meta Stream Model

Data Exchange Structure Overview

There are 4 main data structures used in data exchange:

  1. Files/Folders (protobuf streams)
  2. Request Data (json data)
    1. (+ optional protobuf extension)
  3. Response Data (json data with multiple response parts)
    1. Extension Data (protobuf streams)
    2. File Data (raw file data)

File/Folder Stream Data

File System metadata is written as a compressed protobuf stream. This extensible format represents the main file hierarchy, classification and core file metadata.

The stream data is ordered to minimise storage and processing costs to apply incremental udates to table-based hierarchies. For example, readers can batch reads/writes to process 20,000 metadata file entries per second into a SQLite database,

Node Data Item:

  • Id (32bit int, unique per file system)
  • ParentId (nullable 32bit int, unique per file system)
  • Name
  • ModifiedDate
  • PermissionMask
  • ClassificationMask (12 bytes)
  • OwnerId
  • GroupId
  • FileLength
  • Hashes (any combination of MD5, Sha1, Sha256, City64, XxHash64)
  • MetaExtensions (extensible, some examples are)
    • Height/Width
    • Duration
    • ID3 (Title, Artist, Album, Year, Comment, Genre)
    • Doc (Title, Author, Comments, Tags, LastModified, Created)
    • Key (Encryption key information)

Request Data

Puddler Requests are instructions to collect data from a Drive (or bucket, or other data source). Examples of these requests types are:
  1. Re-scan the main file hierarchy.
  2. Collect Hash data for a subset (or all) files in a drive.
  3. Collect Meta Extensions for a subset of files.
  4. Archive files by copying them to an external archive (optionally encrypting or compressing them).
  5. Share files by copying them to a Cloud Data store and generating some share links.
  6. Generate thumbnails or other derived data.
Puddler writes each request to the event-store whether it is being handled asynchronously or not. By reading each request and response in sequence, a new node can rebuild the database of knowledge.

Response Data

Puddler responses always include a Json file header, but may also include other data files. Examples of responses are:
  1. Stream Data Results: The json file will just contain the status and reference to the File/Folder data streams.
  2. "Archive request" results have the same structure as "Stream Data" results. However, the archive results are an indication of files added to the archive and their storage details (encryption key ids etc).
  3. "Share" results are the links to the shared files. 
  4. Thumbnail results are a list of raw files written directly into the cloud drive. Depending on the type of cloud drive it may allow playback of video.

Comments