Predictive Coding, Part I: Your Introduction To Predictive Coding

February 11, 2014

Businessman looking through a magnifying glassIn 2006, the Federal Rules of Civil Procedure formalized case law and made electronically stored information fully discoverable.

It might as well have opened Pandora’s box.

While attorneys relished the new opportunity to review emails, text messages and other electronic information, it quickly became apparent that the sheer volume and complexity of “Big Data” amounted to biting off more than anyone could chew. One report from Duke University, for example, found that for a case that went to trial, there was an average of 4.98 million pages of documents that required review. Of those, only 4,772 – about 1 percent – ended up containing information that was relevant to the case.

Obviously, e-discovery that is undertaken the way traditional paper-document discovery is conducted will end up being immensely costly and an extraordinary time commitment. That is why predictive coding is fast being seen as a way to regain control of potentially runaway discovery processes.

In short, predictive coding uses an automated means to sift through voluminous documents and select the ones likely to be relevant for human review. In predictive coding, information such as timeframes, keywords, senders or recipients is entered into a means of electronically sorting documents. The resulting documents can be grouped by similarity and prioritized by various factors. The optimal end result: far fewer documents, and ones that are more likely to be relevant for attorneys to review.

Now, the process described above is the way predictive coding would work in a dream scenario. The fact is, predictive coding is, at this point, complex and unwieldy in its own ways. As such, it is simply not necessary for every case and has been reserved for a small percentage of lengthy, complicated cases.