To determine which documents to "OCR", one should first consider why OCRing at all. For a certain percentage of the collection, this is wasted money. The trick is separating the wheat from the chaff. If the database contains bibliographic coding, this may be enough to identify certain key "OCR worthy" documents.
Note #1: One does not need the OCR to manually bibliographically code documents. Further, the fewer the fields coded, the lower the cost. One may use bibliographic coding as a way to identify which documents to OCR.
Note #2: Some services rely on software to generate their "bib coding" from the OCR. These same services may wish to generate the OCR, too.
If the collection is not too large, one should consider OCRing every page immediately. Relative to the cost to litigation the case (and the financials at stake, should one lose the case), OCR may be an inexpensive investment. Your team can use OCR to find potentially key documents when performing subjective searches. It can even help identify potentially privileged documents. It will not, however, differentiate between documents that contain a given name versus ones authored by the same person.
Always provide the OCR vendor with a list of all important names and keywords. The OCR software uses your lists to generate a better quality product. Provide the OCR vendor with these lists as soon as possible.
Hint: One may wish to identify all the different types of documents so that certain types escape the OCR process (and cost). If your collection has bibliographic coding, one may be able to exclude certain files from OCR treatment. As example, one may choose to OCR all documents by author or document type, such as a memo or contract.
Managing Expectations
Turnaround times are important. If the legal team knows that it takes 36 hours to generate and ship a production, they can schedule their time accordingly. For ongoing issues, keep the team appraised on a regular basis, not just when a problem manifests. Don't forget to put this information into email for future reference.
Rolling productions, computer forensics, discovery collection and the processing of electronic discovery all have something in common. Knowing the final invoice at the start of the project may not be possible. By all means get an estimate. But a regular update, such as a weekly cost tally is essential. No one likes big expensive surprises.
One way to help reach this goal is to use a Litigation Budget spreadsheet. The spreadsheet should show all potential costs to litigate the case, not just collect ediscovery. Lieb's book, "Litigation Support Department" includes such a spreadsheet. The attorneys can use the spreadsheet to understand how cost will scale along with the amount of discovery processed.
Date Fields vs. Text Fields
Use a "date" type field for storing dates, not a "text" type field. A "date" field is one that requires the value to be (1) a valid date and (2) a consistent syntax format. A "text" field simply contains text.
A date format field requires every record entry to follow the same "YYYY/MM/DD" format. It doesn't cost anything to use this option. Some of the benefits gained by using this format include the ability to sort records chronologically and filter by date range. Further, the team can then export all the records to TimeMap and create multiple chronologies in mere minutes.
Image Stamping
Always keep one set of images that are not stamped, redacted or modified in any way. A second time, don't stamp your original images. For preproduction databases, feel free to stamp an "internal control number" or "preproduction bates number" on a copy. If you keep an untouched copy, subsequent productions do not show any internal control numbers or gaps in the internal control numbers. If a document gets produced multiple times, each copy will only contain those production stamps the team desires. If your team redacts a document, they still want to be able to see the full image themselves.
Hint!: In your internal database, be certain to store all preproduction and production numbers. There will come a day when a person will need to search by one name or another. Do not "paint yourself into a corner".
CD Inventory
If Litigation Support makes a CD, keep a copy. Keep a set of all productions in a Litigation Support media inventory. For the inevitable "surprise requests", it is very easy to copy a CD one already has handy. It may be difficult and time consuming to recreate CDs from scratch. In some cases it may be impossible.
Keep a copy with Litigation Support. One should not need to call multiple attorneys and paralegals to find a copy. This falls under the risk management header. As each CD costs ten cents these days, creating two CDs instead of one takes very little time.
Hint!: Music stores sell binders designed to hold anywhere from 16 to 256 CDs. These binders cost very little money. They are ideal for your inventory. CD jewel cases are expensive, crack and take up a lot of space.
Technical Specifications
Make sure the vendor can match your firm technical specifications before they start to actually process and deliver product. Unless your law firm specifies inclusion of certain information on the media label, it may not appear. I've seen CD labels with the law firm name and address but neither the vendor's contact information nor media content information. At least I knew the CD was for the firm.
The same need for the firm to specify requirements holds true for file, folder and volume naming. Should the images be .TIF or .PDF format? What image resolution does the firm desire? Does your vendor know what "path" your firm uses? A lot of Litigation Support's time is spent modifying vendor product so the technician can load the data into the firm's document review software. If litigation support's time is billable, this means the client is paying twice for the same product.
You get what you ask for. If you don't specify what you want, don't be surprised if you don't get what you need.
Note #1: The Litigation Support Technical Standards document from http://www.eDiscovery.org is an easy way to outline firm standards.
Note #2: If the vendor product is wrong, it is better to be consistently wrong from CD to CD. Nothing ruins an afternoon more quickly than receipt of 10 CDs where each one has unique issues. |