ESI: The Odyssey — Part 1: Know Your Hash Value

“Know Your Hash Value”

The very nature of electronic evidence poses numerous, unique challenges to the discovery process and the admission of electronic evidence. ESI, like any information, is discoverable in litigation, but one of the things that makes ESI different is its sheer volume. The vast majority of newly produced information is created and stored electronically. When you combine this with the relative affordability of cloud storage, a staggering number of electronic documents are being created and stored, even by small business clients. Thus, discovery in litigation, even in cases in which the amount of controversy is relatively modest, can embrace huge quantities of data.

So what can litigants and counsel dealing with an enormous universe of documents do to make the e-discovery process a little less daunting? Harnessing the powers of the hash value is a great place to start. A hash value is a numeric value of a fixed length that uniquely identifies data. A hash value can be assigned to a file, a group of files, or a portion of the file, and is based on an algorithm applied to the characteristics of the data set. The hash value is generated for the evidence container at the time of collection and may be embedded in the container file or saved in an associated audit log file. The data, whether the collection copy or any other copy made subsequently, can repeatedly be “hashed” and will return the same unique hash value every time assuming the contents have not been altered.

In addition to acting as unique identifiers of data, hash values have a variety of uses, the most common at this point being identifying and removing duplicate documents in the review environment. Using hash values for de-duping purposes can reduce the reviewable document set by a staggering percentage. Anyone who has had to perform a document review containing emails has had the experience reading the same email (and its attachment or attachments) over and over and over again during the review process, as any given email can become a part of a thread and may have been sent to ten different custodians. But it is possible for a platform to generate hash values to de-duplicate emails and build thread views. Say an investigation had 10 custodians’ emails collected, all of whom received an identical email with identical attachments. By using hash value identification and de-duplication, whoever is performing the document review would only have to read that email and the attachments once, rather than ten times. You can imagine the cost and time savings this generates in document review.

Further, if you hash the file, you know have a “digital fingerprint” to establish the precise content of the file when produced, whether or not you produce the file in native format. The producing party gets to maintain a pristine production set, just like having a true and correct set of a paper production in a warehouse back in the day. This is huge for authentication purposes, since any subsequent unaltered copy of that file will have an identical hash value to that of the original file, allowing the hash value to serve as the proof of authentication. 

While there has not yet been a reported civil case dealing directly with the authenticating powers of the hash value, it has arisen in the criminal context. In United States v. Gasperini, 729 Fed. App’x 112 (2d Cir. 2018), Fabio Gasperini was charged by the United States of misdemeanor computer intrusion and was ultimately convicted. 729 Fed. App’x at 113. The United States entered into evidence copies of the hard drives that had been seized by Italian authorities. Id. at 114. The United States established that the evidence was authentic by the testimony of an Italian investigator who participated in making the copies who testified that the “hash values” of the two hard drives were identical. Id. The court held that the copy of the hard drive had been properly authenticated by the district court. Although the evidence was not entered under the recently amended Federal Rule of Evidence 902(14), the Second Circuit nevertheless used the language of amended FRE 902(14) to uphold the district court’s decision. Id. at 115. Specifically, the court cited the Advisory Committee’s notes, which state “Today, data copied from electronic devices, storage media, and electronic files are ordinarily authenticated by ‘hash value.’ … This amendment allows self-authentication by a certification of a qualified person that she checked the hash value of the proffered item and that it was identical to the original.” Id.

The hash value for purposes of authenticating ESI has only just begun to be recognized by the courts, but it is clear that hash values are and will be, going forward, a primary vehicle for ensuring the authenticity of ESI. Moreover, the use of the hash value in terms of de-duplication in voluminous electronic document productions promises to become far more widespread and stands to save litigants and counsel enormous resources in document review. For all these reasons, it is high time we start to understand the value of the hash value.

Shields | Mott LLP utilizes e-discovery software for the analysis and development of construction law litigation in the New Orleans area.