Classification

 

Core Classifications

Classifications imply the "value" of files and instruct Puddler on how to handle them.  At the highest level, handling rules may work like this:
  • Ignore files delivered by some external provider.
  • Create redundancy (backup) files that are created by the owner, although separate files with privacy concerns.
  • Keep files linked to a contract or commercial enterprise separate from personal files.
  • Organise files that are related to projects with either a tagging or hierarchy (depending on user preference)
Puddler defines a series of core classifications as defined in the tables below that can be referenced in Puddler instructions.  Puddler also allows for manual tagging with custom tags, and custom rules can also define their own classifications, which will flow into the core database.

Examples of Puddler instructions that reference classifications:
  • Archive any private, contractual, or key file with encryption to b2://puddler_private/.
  • Archive any non private, non-professional, non-systemnon-cloud synced file to b2://puddler_core/ 
  • Archive any professional data with encryption to b2://puddler_professional/.



Folder Classifications

Name Description
System

A folder specifically linked to the function of the operating system.

On windows this is typically just C:/Windows. On unix systems this includes /dev, /sys, /etc, /sbin, /proc, /boot

User System A folder containing user specific files that configure or enable system behavior. This includes:
  • Cookies
  • Favorites, History etc
  • InternetCache
  • Recent
  • SendTo, StartMenu, NetworkShortcuts folders (and other files containing special launch scripts)
  • Templates
  • Fonts, Resources
Recovery

A folder that is setup for recovery of lost data

Recycle bin on windows and lost + found on linux.

System App Data

A folder that contains data dedicated to an application or group of applications when that folder is kept separate from the application executables itself. This will most typically transient for an application and most files here will not be useful.

On windows this is is typically C:/ProgramData and folders within c:/Users/*/AppData

ProjectDb

A folder (typically project specific) that contains related data for an known project application or repository. These folders will often contain derived data, or data from remote repositories or transient data.

Examples: ".vs", "obj", "node_modules", ".android", ".nuget" etc

System App

A folder where Application's are installed. Typically folders that are either System or Application Data are not also Application folders.

On windows this will be /Program Files and /Program Files (x86), On unix systems this includes /bin, /lib, /opt, /usr

Temp A folder specifically designated by the operating environment for storage of temporary files. This includes the "InternetCache" Special Folders
Link A symlink or reparse point that might indicate the files are a copy of other files existing on the same drive.
Cloud Sync A folder that is already synchronized to some cloud storage. This includes GoogleDrive, OneDrive and GIT repository folders. Does not include drives that are backed-up at the root level.
Version Control Any folder that is the root of a Git, CVS, SVN repository. Note, this may or may not also have the CloudSync tag.
User Home Any folder that is a users home directory.
User Documents Any folder that is designated as a preferred store for a User Documents (either by the operating system or shell configuration etc)
User Downloads Any folder that is designated as a preferred store for a User Downloads (either by the operating system or shell configuration etc)
User Media Any folder that is designated as a preferred store for a User audio/video/picture (either by the operating system or shell configuration etc)
User Project Any folder that is recognized as a project that contains a series of files that together compose a project. This is determined by project files from known systems such as Adobe Premier, Eclipse or Visual Studio
Manually Reviewed Folders will recieve this mark when the folder has gone through some human workflow to confim its classification. Generally this means that one of the following flags have been applied: UserDocumentsChild | UserDownloadsChild | UserMediaChild | UserProjectChild. If the flag is set, but those flags are not set, then subfolders need to be reviewed.

File Classificatons

Name Description
System

A file specifically linked to the function of the operating system (hiberfil.sys)

Link A symlink or reparse point that indicates the original file is at another location.
ProbablyDerived A file is probably derived from another set of files. It can be hard to accurately determine this so its really just a flag to indicate that it is probably Derived more than a precise determination.
ActuallyDerived A file is actually derived from another set of files and as such can be reconstructed by the original data.
Audio An audio file *.mp3, *.wav etc.
Video A video file *.mpg, *.avi etc.
Image An image file *.jpg, *.gif etc.
Code An source code file *.cs, *.java etc.
Document An editable document *.docx, *.txt etc.
DocumentOutput An readonly document *.pdf, *.ps etc.
Executable An executable binary *.exe, *.dll, *.so etc.
Script An executable non-binary *.ps, *.bat.
Launcher The file is used by an application to launch something. This could be a putty config, rdp file etc.
Archive An compressed file *.zip, *.7z, *.arj etc, or any other sort of backup.
Encrypted An encrypted file (a known format with internal encryption, not a file system encrypted file)
Key An key (a file that is a known format for encryption/decryption): *.pfx etc.
Downloadable An file that is known to be globally available. (files downloaded from public websites will have this flag)
Contractual An file that has contractual meaning and so needs to be kept. This could be an actual contract or insurance, property, financial agreements. This flag is hard to specify automatically, but AI tooling can suggest it to the user for confirmation.
Receipt An file that is a receipt or some kind of evidence for the owning user.
Author The file was produced in some way directly by the user. Generally this means any photo taken by or document written by the user. This marker is generally only applied to one file in a given project. So, a adobe premier or visual studio project would only mark one file. This is most typically set automatically so, files that come from the users camera get this, but files from the internet dont. The accuracy of this field may be low.
Personal An file relates to the user in their personal capacity (family photo). (set manually)
Professional An file relates to the user in their professional capacity (employment contract, work receipt, actual work document etc).
Critical The user deems this file as critical, it should be kept with physical and commercial redundancy.
Private The content should be kept secret and therefore encrypted.
ApplicationBinary The file has something to do with the operation of application (dll, font, vendor specific data)
ApplicationTransient The file is a transient file that can be ignored cache data etc.
ApplicationConfiguration The file is a user specific configuration file.
ApplicationUnknown The file is a known application file, but its purpose is not one of those listed above.
TransientBackup Files tagged with this should be actually useful, such as a adobe premier backup or ms word backup file. If its just a transient file the application uses during its operation but has no other purpose, then it should just have the ApplicationTransient flag
Database A database or file associated with data storage.
AutomatedReview These flags have been reviewed by the automated system. This means direct access to the file has been used to inspect the file content and confirm the type information.
ManualReview These flags have been reviewed manually by the user.

Comments