3 Ways Unstructured Data Analysis Will Affect Every Auditor
It is no secret that, in today’s global economy, the cup of data overfloweth. As our means of collecting, analyzing, and interpreting data grow, the availability and sheer volume of data expand accordingly. It’s both a boon and a bane.
On the one hand, the insights this ever-multiplying body of data can reveal to us are enormous, unprecedented, and game-changing. Auditors leveraging big data to inform business decisions are already displaying a competitive advantage, and the industry is just getting started.
On the other hand, this deluge of data is a curse; there’s simply too much of it, and separating the idiomatic wheat from the chaff is increasingly challenging. The problem compounds when we add unstructured data to the mix.
Consider this: the data we know – the data most auditors work with every day – is considerable in volume, yet comparatively easy to understand. This structured data is quantitative and organized in a fixed format, with defined parameters. Think spreadsheets, databases, and ERP systems, and the data contained therein. Think balance sheets and expense reports.
That organized, structured data is naturally of high value, but it represents but a fifth of the data available to organizations.
The rest – the other 80% of what market analysts at IDC project will be 175 zettabytes (or 175 trillion gigabytes) of global data by 2025 – is what we call unstructured data. For this, think of everything outside of organized data, including:
- Those emails you responded to this morning.
- The text messages you send from any device.
- Your active smart home device, which is listening to you at all times.
- Those search queries you enter into Google.
- The post you recently liked and commented on.
- That unfinished novel on your laptop.
- The article you are reading right now.
In other words, unstructured data is simply any information that can’t be contained in spreadsheets or relational databases. It’s qualitative data, and it is inherently disorganized and difficult to manage, monitor, and maintain. And most of it is worthless. But a good chunk of it is deadly useful; it contains insights that can radically reshape the audit function and unlock new pathways to prosperity.
It’s important to note here that while no one is likely to hack into your personal computer and comb through your unfinished novel for actionable insights, the information you consume and share through your employer’s digital media and channels (emails, PDFs, Slack, documents, etc.) is generally all fair game for collection by the organization when it comes to unstructured data analysis.
Throughout 2020, we’ll be exploring the what, why, and how of unstructured data in greater depth, but today, let’s look at three simple ways unstructured data analysis is already changing the ways we view audits.
1. A new lens on fraud detection and analysis
Behind every fraud activity is a human dimension; a motive, a means, and an unerring human capacity to screw something up along the way. In Enron’s case, it was in part a propensity to claim projected profits as actuals. With Bernie Madoff’s case, it was a whistleblower’s insight and the imperfect application of a helpful little rule called Benford’s Law. Fraud motives range from greed to desperation to survival (though, let’s be honest: it’s generally greed), and they generally come about with the right alignment of opportunity, pressure, and rationalization, as this article explores. but we can’t get a whole picture of fraud and the narrative behind it without peering into the unstructured data surrounding it: employee emails, texts, and social media activity, for example.
Unstructured data analysis tools have the capability of peering into all of the proprietary and public data that might be related to a fraud to help CFEs detect and address the fraud with more ammunition. Did John the investment banker share glowing reviews of a company online and then release his shares in the company before its stock dropped substantially? Is the company paying out insurance claims with big fraud markers? Is Mary in accounting disproportionately mentioning a certain client or individual in emails? These aren’t smoking guns, but the behaviours behind these communications all colour fraud detection and investigation.
But it’s impossible to read every company email, you say. It isn’t, but who wants to do that manually? Enter unstructured data analysis, our automated window into the massive yet finite universe of unstructured data. The right tool, supported by the right audit data analytics software, will let us pore through this data for the right indicators in seconds, not centuries.
2. Augmenting culture and CSR audits
Maybe it was the realization that millennials now make up the biggest segment of the workforce. Maybe it was corporate social responsibility (CSR) foibles shedding new light on disastrous working conditions and supply chain ethics (think of the rubble-covered GAP t-shirts in the aftermath of the Dhaka garment factory collapse). Maybe it was the new level of visibility into corporate activities through social media.
Or maybe it was all of the above, but one thing is clear: at some point in the past decade, organizations started taking their role in the world a little more seriously. Take the nascent emergence of CSR audits and the culture audit, which is a complete examination and report on the state of an organization’s ethics, assumptions, norms, morals, values, behaviours, and anything that contributes to the essence of what they are: including how employees and the public feel about them.
In other words, the data that informs a culture audit is almost all qualitative and unstructured. This is where unstructured data analysis tools can be implemented to help execute a culture audit and substantiate its findings.
3. Deep insight through the (imperfect) art of sentiment analysis
The most obvious wellspring of publicly available unstructured data that most of us use in our daily lives is – you guessed it – social media. On social media channels, employees aren’t constrained by corporate protocol. Customers have free rein to vent their frustration with a product or service to all corners of the globe within seconds. And influencers can make or break a life’s work of effort in entrepreneurship in seconds.
Our standard means of assessing brand sentiment across social channels is typically very manual. However, many strides have been made in sentiment analysis, and automated tools now give us a glimpse into the collective sentiment towards our businesses and our activities.
For example, with the right unstructured data analysis tool, an organization is able to analyze publicly available data (think Twitter feeds) to get a sense of the overall sentiment harboured towards their company. This information can be enormously valuable not just to help augment culture audits, but for hiring purposes, assessing stock prices, analyzing fraud, improving operations, and understanding corporate identity.
Sentiment analysis is an imperfect art that gradually improves. For example, an employee could write the words “I hate my job” in a Twitter post, yet attach a picture showing the happy employee with a giant gift they just received from their boss. In this case, the comment was meant as ironic – the picture clearly indicates that they love their job. But a machine wouldn’t detect the inverse sentiment in this example – yet. As with all areas of machine learning, strides are being made in establishing social context and negation, but the exercise of sentiment analysis will never be perfect. Increasingly useful, but not perfect.
The bottom line
Harnessing unstructured data seems like it carries enormous potential: To make better detectives of CFEs; to understand why markets just don’t like your company; to learn what your employees really think of you. But the bottom line here is that unstructured data analysis is but a tool in an auditor’s ever-broadening toolkit. Far from a panacea, unstructured data analysis isn’t perfect and it doesn’t pretend to be. But just as all audit intelligence deepens our capacity to contextualize things like fraud and sentiment, unstructured data is the surface sediment in this Klondike rush of data from which the gold nuggets of insight are found.
Paul Leavoy is a writer who has covered enterprise management technology for over a decade. Currently, he researches and writes on data analytics and internal audit technology for CaseWare IDEA. Contact Paul directly or follow @CasewareIDEA to learn more.