What
is Metadata?
Quite
simply it is data about data. In the context of electronic documents,
metadata is information about the document that is routinely recorded
by the software that created the document but which is not shown
on the face of the document.
Metadata
can be accessed in several ways: by viewing the properties of the
document in the application that created it (for example MSWord)
or by using specially written software.
Typical
metadata includes:
· The name of the document;
· The author of the document, as determined by the computer
system. This information is not necessarily a reliable guide to
who actually created the document or who last worked on it. Documents
may pass through many revisions and be copied and sent to many different
people. The authorship information will, however, not necessarily
change. For a complete analysis it would be necessary to have the
underlying information about how and when the system recorded the
name of that individual or company as the document author.
· The company from which the document originates. This information
is subject to the same caveats and limitations as the authorship
information;
· The location of the file on the computer system;
· When the file was created –a record of the time and
date when the file was created at the location from which it has
been opened;
· When the file was last accessed – a record of the
time and date when the file was last opened;
· When the file was last modified – a record of the
time and date when the size of the file changed. This is usually
a reliable indicator of when the file was last worked on and data
was added or deleted;
· By whom the file was last saved.
Additional
metadata that can be extracted by use of specialist software includes:
· When the document was last saved;
· When it was last printed; and
· The identity of last 10 authors and document locations.
This can be a very valuable piece of information as it can indicate
how a document got to its current location and who previously worked
on it.
The time and date stamps on files can be extremely valuable but
care must be taken to check the time on the computer’s internal
clock as this can often be significantly different from “real”
time and, if computers are being operated in different time zones,
the time zone setting on the computer from which the document originates
must be checked and any necessary conversion must be made. This
is particularly important if documents are being sent across different
time zones (for example, from the US to the UK) and precise times
are important to the case.
Metadata can be extracted from any live document or any recovered
deleted document. It is, however, essential that a proper forensic
copy of the original file is made before metadata is analysed: this
must be done to preserve the original time and date stamps and other
potentially important information about the file. It is likely that
a non-forensic copy will contain inaccurate information because,
if the file has been opened and then copied from the original computer
system, time and date stamps and possibly other information will
be changed.
The information contained in metadata can be combined with other
information obtained from other investigative techniques (for example
e-mail and phone record analysis) to add significant value to an
investigation.
Case
study
A UK company was faced with claims by a former employee alleging
unfair dismissal and sex discrimination. These claims were disputed.
Computer forensics experts were asked to carry out a forensic examination
of a computer belonging to our client and used by the employee to
look, among other things, for copies of two letters.
The
letters in question were addressed to the employee's line manager.
Both were dated 22 September 2003. The first was a draft and had
been sent for comment by email to a senior member of the employee's
team on 22 September 2003. The second had been handed as hard copy
to the employee's line manager on the afternoon of 26 September
2003. The second letter was identical to the first except that it
contained a paragraph alleging sex discrimination.
It
became very important during the litigation to establish exactly
when and, if possible, on whose advice the second letter had been
created. The employee was claiming that both the draft and the final
version of the letters had been written on the same day.
A
forensic image of the computer was taken and this was searched,
using key words and time and date searches, for the two letters.
Copies of both letters were found and it was possible to determine
very quickly from the basic properties encoded into the documents
that the first had been created late in the evening on 21 September
(a
|
|
Sunday)
and the second during the morning of 26 September.
By
deeper analysis of the metadata in these documents it was possible
to tell who the authors were and when the documents had been printed.
It was also possible determine the file path of the earlier versions
of the documents. That is to say it could be said how the document
had got onto the computer.
This
analysis confirmed the creation dates and also showed that an early
version of the second letter had been copied from an external storage
device (probably a USB pen drive). This letter had been copied from
the device to the desktop and then to the “my documents”
folder on the hard disk of the computer.
Metadata
about the original author of the letters and the company from which
they had originated was cross referenced with records from the mobile
phone issued to the employee and emails found on the hard disk.
Taking
all this information together, we were able to say with reasonable
certainty from whom the employee had received the letter and who
had provided advice about the crucial paragraph which, as the metadata
showed, could not have been written on the date on the face of the
letter.
Metadata and disclosure
“
Electronic documents contain “hidden” information not
reflected in paper documents… such hidden information…
complicates review of the documents in advance of production as
the reviewing persons must know how and where to look for such information
and determine whether reviewing for such information is needed”
In
summary, the CPR disclosure rules state that:
• A document is “anything on which information of any
description is recorded”. This clearly includes electronic
documents.
• Standard disclosure entails a “reasonable” search
for all documents adversely affecting or either party’s case
or support the other party’s case.
• Specific disclosure can be ordered at the request of the
parties which relates to specific documents or classes or documents.
Relevant
factors to be considered in deciding if a search is reasonable are:
· The number of documents involved;
· The nature and complexity of the proceedings;
· The ease and expense of retrieval; and
· The significance of any document likely to be located
Bearing
these factors in mind, under what circumstances might a party be
obliged to produce a document in its original electronic format
preserving the original metadata?
“The
[CPR] contain no guidance whatsoever as to the type of search which
would be reasonable in the context of electronic data… there
are no hard and fast rules that could possibly be devised which
would be appropriate in every piece of litigation or, for that matter,
even in every piece of commercial litigation… it should, it
is submitted, be very rare for any disclosure exercise to require
recourse to replicant data, back up data or residual data …
a possible exception might be a fraud case”
It
seems to be accepted that standard disclosure ought to encompass
a search of the active data on a computer system .
There
may be many cases and not just those where fraud is alleged, where
information contained in metadata will be immensely valuable to
provide evidence as to an important issue.
In
these cases, it is important, in our opinion, that those documents
the provenance of which are disputed are identified at as early
a stage as possible and agreement obtained (probably at the case
management conference) or orders sought for their preservation in
their original electronic format.
It
is likely that, in the case of a disputed document, a search would
have to be undertaken for residual copies of that document. This
is because the existence of earlier drafts and the circumstances
of their creation may be crucial. This will inevitably mean the
use of computer forensic techniques to recover residual data. That
has cost and time implications but these not need be as great as
imagined if the exercise is approached in a pragmatic and reasonable
fashion: it is possible for write protected copies of individual
files or groups of files to be made in a forensically sound manner.
Once
identified and located the document should be copied in a forensically
sound way to preserve its metadata and a procedure agreed upon for
extraction and examination of the metadata. It may be that the party
requesting the search for the disputed document will have to, initially
at least, bear the cost of production .
In
commercial litigation, it is our experience that in appropriate
cases, particularly where fraud is alleged, the courts are likely
to be sympathetic to applications for specific disclosure of document
metadata.
|