August 21, 2013
Litigation has transformed over the past two decades from discovering documents in file cabinets to discovering a smoking gun in an email or twitter post. Nearly all information in today’s digital world is created and maintained exclusively in electronic form. This deluge of data has created a document review tsunami.
Currently keyword search is the primary means of document retrieval during discovery even though various studies have concluded that keyword search has limited reliability at returning the most responsive documents in a litigation matter. However, condemning keyword search technology is unfair in this context, as keyword search was the only viable means of retrieval until recently. Furthermore, Rule 1 of the Federal Rule of Civil Procedure does not require perfection, instead a balance of efficiency, cost, and justice are taken into account.
Fortunately a new toolset called predictive coding is coming online. It stands to offer a solution to the overwhelming amount of data and is likely to transform some component underlying the economics of discovery. With the right business model, law firms and corporate counsel can implement an e-discovery or litigation readiness program using predictive coding, which can simultaneously act as a source of revenue and cut out the middleman vendors, and ultimately lessen the added expenses for the client.
In a nutshell, predictive coding is software that is trained by a user to predict which documents in a document set will be responsive and which will be non-responsive. Predictive coding goes by many names, including computer-assisted review and technology-assisted review.
Predictive coding aims to reduce the number of documents reviewed by ranking the documents according to a calculated level of responsiveness. Instead of looking at every email written by a custodian over a three-year time period, predictive coding uses a number of factors including keywords, writing style, subject matter of the writing, and even punctuation style, to determine the chain of documents that are most relevant to the matter. These underlying programmable algorithms will vary between software brands.
Technically speaking, predictive coding software is trained by a senior attorney or partner to look for documents similar to documents that the training attorney deems responsive. This is done by the attorney reviewing a relatively small “seed set” of documents, and coding the documents as either responsive or not responsive. The properties within these documents are identified by the computer, which it then uses to create an algorithm to determine the relevance of other documents.