Classification
Core Classifications
- Ignore files delivered by some external provider.
- Create redundancy (backup) files that are created by the owner, although separate files with privacy concerns.
- Keep files linked to a contract or commercial enterprise separate from personal files.
- Organise files that are related to projects with either a tagging or hierarchy (depending on user preference)
- Archive any private, contractual, or key file with encryption to b2://puddler_private/.
- Archive any non private, non-professional, non-system, non-cloud synced file to b2://puddler_core/
- Archive any professional data with encryption to b2://puddler_professional/.
Folder Classifications
Name | Description |
---|---|
System | A folder specifically linked to the function of the operating system. On windows this is typically just C:/Windows. On unix systems this includes /dev, /sys, /etc, /sbin, /proc, /boot |
User System | A folder containing user specific files that configure or enable system behavior. This includes:
|
Recovery | A folder that is setup for recovery of lost data Recycle bin on windows and lost + found on linux. |
System App Data | A folder that contains data dedicated to an application or group of applications when that folder is kept separate from the application executables itself. This will most typically transient for an application and most files here will not be useful. On windows this is is typically C:/ProgramData and folders within c:/Users/*/AppData |
ProjectDb | A folder (typically project specific) that contains related data for an known project application or repository. These folders will often contain derived data, or data from remote repositories or transient data. Examples: ".vs", "obj", "node_modules", ".android", ".nuget" etc |
System App | A folder where Application's are installed. Typically folders that are either System or Application Data are not also Application folders. On windows this will be /Program Files and /Program Files (x86), On unix systems this includes /bin, /lib, /opt, /usr |
Temp | A folder specifically designated by the operating environment for storage of temporary files. This includes the "InternetCache" Special Folders |
Link | A symlink or reparse point that might indicate the files are a copy of other files existing on the same drive. |
Cloud Sync | A folder that is already synchronized to some cloud storage. This includes GoogleDrive, OneDrive and GIT repository folders. Does not include drives that are backed-up at the root level. |
Version Control | Any folder that is the root of a Git, CVS, SVN repository. Note, this may or may not also have the CloudSync tag. |
User Home | Any folder that is a users home directory. |
User Documents | Any folder that is designated as a preferred store for a User Documents (either by the operating system or shell configuration etc) |
User Downloads | Any folder that is designated as a preferred store for a User Downloads (either by the operating system or shell configuration etc) |
User Media | Any folder that is designated as a preferred store for a User audio/video/picture (either by the operating system or shell configuration etc) |
User Project | Any folder that is recognized as a project that contains a series of files that together compose a project. This is determined by project files from known systems such as Adobe Premier, Eclipse or Visual Studio |
Manually Reviewed | Folders will recieve this mark when the folder has gone through some human workflow to confim its classification. Generally this means that one of the following flags have been applied: UserDocumentsChild | UserDownloadsChild | UserMediaChild | UserProjectChild. If the flag is set, but those flags are not set, then subfolders need to be reviewed. |
File Classificatons
Name | Description |
---|---|
System | A file specifically linked to the function of the operating system (hiberfil.sys) |
Link | A symlink or reparse point that indicates the original file is at another location. |
ProbablyDerived | A file is probably derived from another set of files. It can be hard to accurately determine this so its really just a flag to indicate that it is probably Derived more than a precise determination. |
ActuallyDerived | A file is actually derived from another set of files and as such can be reconstructed by the original data. |
Audio | An audio file *.mp3, *.wav etc. |
Video | A video file *.mpg, *.avi etc. |
Image | An image file *.jpg, *.gif etc. |
Code | An source code file *.cs, *.java etc. |
Document | An editable document *.docx, *.txt etc. |
DocumentOutput | An readonly document *.pdf, *.ps etc. |
Executable | An executable binary *.exe, *.dll, *.so etc. |
Script | An executable non-binary *.ps, *.bat. |
Launcher | The file is used by an application to launch something. This could be a putty config, rdp file etc. |
Archive | An compressed file *.zip, *.7z, *.arj etc, or any other sort of backup. |
Encrypted | An encrypted file (a known format with internal encryption, not a file system encrypted file) |
Key | An key (a file that is a known format for encryption/decryption): *.pfx etc. |
Downloadable | An file that is known to be globally available. (files downloaded from public websites will have this flag) |
Contractual | An file that has contractual meaning and so needs to be kept. This could be an actual contract or insurance, property, financial agreements. This flag is hard to specify automatically, but AI tooling can suggest it to the user for confirmation. |
Receipt | An file that is a receipt or some kind of evidence for the owning user. |
Author | The file was produced in some way directly by the user. Generally this means any photo taken by or document written by the user. This marker is generally only applied to one file in a given project. So, a adobe premier or visual studio project would only mark one file. This is most typically set automatically so, files that come from the users camera get this, but files from the internet dont. The accuracy of this field may be low. |
Personal | An file relates to the user in their personal capacity (family photo). (set manually) |
Professional | An file relates to the user in their professional capacity (employment contract, work receipt, actual work document etc). |
Critical | The user deems this file as critical, it should be kept with physical and commercial redundancy. |
Private | The content should be kept secret and therefore encrypted. |
ApplicationBinary | The file has something to do with the operation of application (dll, font, vendor specific data) |
ApplicationTransient | The file is a transient file that can be ignored cache data etc. |
ApplicationConfiguration | The file is a user specific configuration file. |
ApplicationUnknown | The file is a known application file, but its purpose is not one of those listed above. |
TransientBackup | Files tagged with this should be actually useful, such as a adobe premier backup or ms word backup file. If its just a transient file the application uses during its operation but has no other purpose, then it should just have the ApplicationTransient flag |
Database | A database or file associated with data storage. |
AutomatedReview | These flags have been reviewed by the automated system. This means direct access to the file has been used to inspect the file content and confirm the type information. |
ManualReview | These flags have been reviewed manually by the user. |
Comments
Post a Comment