How Content Classification Works in Microsoft 365

Sudakshina Bhattacharjee | 17 Apr 2020

Imagine all the content that your organisation creates, revises, stores and shares…and the level of manual admin that is involved in keeping all this content organised.

In this perpetually busy world, where multitasking is the name of the game, we find ourselves having less and less time to identify, analyse and classify our most valuable business asset: our content.

The risk of not continually classifying our content could mean that we would be ignoring the strategic value and intelligence that our content could give us.

How content classification currently works

Traditional legacy and partially automated classification methods are not enough to manage huge volumes of data. There are typically two ways in which content is classified: supervised and progressive.

Supervised classification involves the application of an Enterprise Content Management system to manually classify various forms of business content and store them within containers in repositories.

Progressive classification bypasses the need for manual supervision. It applies Machine Learning (ML), which is an AI application that enables systems to learn and improve its experience without being explicitly programmed, to identify unique and regular sets of data for instant access.

This could help to bring down costs significantly as less investment is required to train staff and management to look after it all.

The benefits of progressive classification

When applied correctly, progressive classification can improve user experience, because manual data processing is typically replaced with an intelligent, automated system, which businesses can figure to adapt to their evolving requirements.

By leveraging artificial intelligence and Machine Learning, the repositories of unstructured data that exist within legacy systems are automatically processed, which could be used to get some value in the form of new insights or unknowns.

Documents are grouped from various business lines for instant access and amendments. Some of these documents typically contain sensitive information, and this can further be clustered into subgroups based on the metadata that the AI/ML application detects. For example, invoices that require urgent attention or employee information that no longer requires retaining.

Bear in mind: all this is done via automation. Yes, things are getting exciting!

Discover your content using our Content Analyser tool

How sensitivity and retention labelling work in Microsoft 365

With intelligent automation, organisations can finally say goodbye to manually filing and archiving documents in folders – be this digitally or physically (yes, physical filing still happens!) – especially those that contain sensitive and confidential information.

By leveraging the principles of progressive classification, Microsoft 365 enables your organisation to classify content with sensitive and retention labelling. It applies AI and ML to detect content that:

  • needs to be classified as containing sensitive information
  • needs to be retained for a fixed duration of time
  • can only be accessed by certain individuals in your organisation

Sensitivity labelling in Microsoft 365

Sensitive documents typically include information that is bound by government data protection laws or compliance requirements from regulatory bodies. This could include confidential information which organisations are required to keep records of, such as medical or banking information.

Microsoft describes a sensitivity label to be like a stamp that’s applied to content that is:

  • Customisable: This is where categories can be created for different levels of sensitive content that your organisation has, such as: ‘Personal,’ ‘Confidential’, or ‘Highly Confidential’.
  • Clear Text: As labels are stored in clear text format within the content’s metadata, third-party apps and external services could apply their own protective actions, where required.
  • Persistent: A sensitivity label, including its protection settings, is tagged to the metadata of the email or document and roams with it. Retention or compliance policies are applied to such content using this metadata.

The content that can be protected using sensitivity labels and retention policy plans in Microsoft 365 include:

  • Emails in Microsoft Exchange
  • Chats in Microsoft Teams
  • Documents, spreadsheets, slide decks and projects stored in OneDrive, SharePoint and Office 365 Groups

Retention labels and Retention Policies in Microsoft 365

There is a crucial difference between a retention label and a retention policy. Allow us to elaborate.

Your organisation may have to take specific actions on certain documents, such as:

  • work visas issued to employees that can’t be edited or deleted;
  • press material that requires deleting after a certain duration;
  • tax forms that are required to be retained for a minimum period of time.

In the olden days, you would manually stick labels on such paperwork and then file them in a physical cabinet somewhere.

In today’s digital age, you would apply a retention label to the electronic documents stored in your content repository within SharePoint or OneDrive For Business in Microsoft 365. You can use retention labels to take the right action on the right document.

You can, of course, apply retention labels manually. Automation will work if the content matches certain conditions, such as specific types of sensitive information or specific keywords that match search queries.

We think using automation is essential because you would not need to train your users on the classifications, nor rely on them to classify content correctly, thereby freeing up their time to concentrate on work that matters.

A retention label policy is a group of retention labels that are to be used in a specific location. For example, all work visas with the sensitivity label ‘Highly Confidential’ could be classified within a retention policy that prohibits the content from being deleted for ‘X’ number of years.

Azure Information Protection (AIP) Labels

Azure Information Protection labels are applied to those information types which are shared within but located outside Microsoft 365. This protection label can be accessed by either users or administrators or both.

Think of AIP labels as an advanced form of retention labelling. It’s advanced because emails and documents classified with it are identifiable regardless of who these are shared with or where these are stored. Labels can be visual, such as headers, footers or watermarks. The metadata added to the content is in a clear text format, which ensures that other services, such as data loss prevention solutions, can identify the classification and take relevant actions.

This is a broad subject. Look out for another post soon where we will share with you our recommendations on how to leverage content classification using Microsoft 365.