How to See Beyond the Data

Data analytics and AI technology have worked their way into many industries, from marketing to medicine. The legal field has also found these solutions helpful to organize and analyze mountains of legal case data.

Legal analytics have the potential to be an incredibly useful tool for lawyers and law firms. It’s a technology that can help lawyers make more compelling arguments by sifting through obscure records to find facts and numbers that apply exactly to each case. Analytics also can improve a firm’s reputation by backing up success reports with numbers.

But all the data in the world isn’t useful if the person looking at it can’t make sense of what they’re seeing.

In data analysis, you have to be very cautious about where your data comes from and how it’s used. Drawing conclusions from bad data can derail your efforts, and even cause harm.

How Do You Know If Your Data Is Good?

There are, generally speaking, five criteria for knowing whether your data is ready to be analyzed:

The question you’re asking of the data is as specific as possible.
The data used is relevant to that question.
The data you have is accurate.
The data has as few missing values as possible; it “connects” the values being measured into a meaningful set.
The data set is large.

Your data should also be timely, meaning it was the most recent you could gather, and consistent, meaning it’s in the same format across the board and can be cross-referenced to itself.

How to Maintain a Pipeline of Quality Data

To get your data to the point where it meets the criteria above, you’ll need to build a system that consistently gives you great information to work with. There are seven steps you can take in order to accomplish that. We’ll describe a few of them briefly here.

1: Rigorous control of incoming data.

Examine the format and patterns of the data coming in. Make sure each record is consistent. Look at the way the data is distributed and check for abnormalities. Check to see that there are as few values missing from that data as possible.

This sounds like a lot of work, but a good data profiling tool can be used in most cases to automate this process. It can check the data when you first get it and alert you to inconsistencies and abnormalities.

2: A pipeline design that avoids duplicates.

Sometimes, especially when large amounts of data are being sifted, duplicate records can be created by different teams working toward different purposes. This can throw off your results, but you can avoid it if you design your data pipeline correctly.

Create a data governance program that establishes clear ownership of data and promotes data sharing with other departments. Centralize data asset management and modeling, then review that process regularly. Make sure that data pipelines at the enterprise level are also clearly designed and shared across your organization.

3: Data integrity enforcement.

Make sure your system is designed to maintain the integrity of its data as it grows. Law, in particular, has a massive amount of data that has to be digitized in order to be useful in AI applications. That data needs to be complete, accurate, and easily referenceable even when it encompasses terabytes upon terabytes of information.

4: Data lineage traceability

No matter how big the system or how large the dataset, you should be able to troubleshoot a problem in a predictable timeframe. Being able to trace data back to its roots makes this process much easier.

Clearly documenting and modeling each dataset from the start will make it easier to trace a problem back to its source when you find an error. Keeping track of both metadata (the relationships between datasets) and the data itself will give you a concrete record to use when troubleshooting. Tracking can be accomplished with timestamps, tables, and link trees, among other methods.

The Takeaway: Don’t Be Fooled by “Pretty” Data

Parsing the data in front of you into something meaningful starts with understanding it. Know what you’re looking for, and make sure the data you have is relevant to your question. If you’re trying to find an obscure piece of Kentucky law, for example, you wouldn’t search through a database of cases from New York. You’d start with Kentucky’s court system and narrow it down further and further until you got just the facts you needed.

Understanding what makes data good or bad will help you see beyond the figures you’re presented with. Is everything in that interactive infographic relevant to the question? Are there missing values in that data visualization and if so, how many?

You should also be able to understand the context of the data presented. A chart might show an uptick in crime until you zoom out to a wider time frame and see it was an aberrant spike cherry-picked to fit a narrative. Bad data can be dressed up to look very pretty.

Legal analytics is a powerful tool that can save an enormous amount of work. Just be sure you aren’t being fooled by data that’s pretty instead of accurate.