
Michael Cizmar
President, Managing Director @ MC+A
The challenge of LLMs
Large language models (LLMs) excel at general-purpose tasks, but for specific, repeatable use cases, a fine-tuned model will outperform them. This results in more accurate outputs, faster inferences, and lower costs. If you’re interested in a demonstration, feel free to reach out to us.
Background
Jocko Willink was once asked what exercise someone should do if they wanted to improve the number of pull-ups they could do. His response was, simply, “Pull-ups.” This direct answer emphasized doing the very thing the individual was avoiding. The same principle applies to diet and exercise—if you want to lose weight, you need to diet and exercise. There’s no way around that fact (though new medications may aid the diet aspect).
Carrying this principle forward, there is no free lunch when it comes to using AI in your processes. In the legal world, tasks like objective coding in eDiscovery processing can cost between $3 to $4 per 1,000 pages. This may tempt you to simply wrap your own process around an LLM call, but doing so is not free and potentially will cost you more time and money.
What is Objective Coding
Objective coding (according to Wikipedia) is the process of generating summary or keyword data from a document. Unlike subjective coding, objective coding deals with factual information, such as the document’s original date, author, and type. This is especially crucial in litigation, where each party must efficiently organize vast volumes of data. Objective coding is a critical step in a law firm’s organizational process during discovery.
Search (now AI) consultants have frequently performed this task, and with the rise of LLMs, it might seem like a solved problem. While LLMs offer powerful ad-hoc capabilities through zero-shot questions, achieving high accuracy requires fine-tuning with specific data.

Prompting your way to Objective Coding
Recently, we experimented with using Microsoft’s GraphRag for this process. GraphRag performs tasks like entity extraction on unstructured text and stores it in a knowledge graph. While we’ll exploring knowledge graphs in future posts, our focus here is on the challenges we encountered.
It quickly became evident that LLMs were not sufficiently accurate enough for the task. Moreover, the prompts generated by GraphRag totaled around 3,000 tokens before the text was even processed. To put this in perspective:
When you process a document, it is tokenized and split into chunks—typically 500 tokens per chunk. If 500 tokens of your document are wrapped in 3,000 tokens of prompt, that results in 3500 input tokens sent to the LLM. The LLM’s response forms the output tokens. Ideally, it is a concise answer, but you are paying for both input and output token.
Let’s break down the potential costs using the Enron dataset as our example (using GPT4):
- The Enron dataset contains approximately 500,000 emails, estimated at 500 million to 1 billion tokens.
- 1,000,000 API calls * 3,500 tokens / 1,000 tokens * $0.025 = $87,500 inbound
- 1,000,000 API calls * 500 tokens outbound / 1,000 tokens * $0.01 = $5,000 outbound
- Estimated Total Cost: $92,500
Regardless of the number of tokens you can process a minute, with 500 million tokens and 500 tokens per call, you’re looking at 1 million calls. At 3,500 tokens per inbound API call.
In our case, with GraphRag, we used multiple methods, including Ollama with various models, and Azure’s OpenAI. With Azure, we encountered a limit on tokens per minute that slowed down the ingestion. I’m sure as you use Azure more the limit will be increased but it did slow our ingestion down and limits the advantages of the “on demand cloud”.
This substantial cost for a single task, that in our tests achieved around 50% accuracy. Add to that the processing time for each call takes between 5 and 15 seconds which is a long time.
A Better Approach
Given the cost and accuracy limitations of the native LLM approach, we began exploring vision models to improve performance. As Supreme Court Justice Stewart famously said about obscenity, “I know it when I see it”. Reproducing this human intuition is key to making AI effective in this process.
Many legal documents have structure that provides crucial context, which is often lost when converted to plain text for LLMs. By switching to a vision model and sending both text and image of the document, we archived remarkable improvements:
- Response time reduced to about 400ms, running locally
- Accuracy increased to over 95%
- These results were achieved with an initial training set of just 100 documents
This approach leverages the visual cues and layout of documents, mimicking how humans process information and leading to more accurate and efficient coding.

Closing Thoughts
In conclusion, while general-purpose LLMs offer powerful capabilities, specialized tasks like objective coding benefit significantly from more tailored approaches. Our vision model-based solution demonstrates that by combining AI with domain-specific knowledge, we can achieve higher accuracy, faster processing times, and lower costs. If you’re facing similar challenges in document processing or other specialized tasks, we invite you to reach out and explore how our innovative AI solutions can benefit your organization.
Go Further with Expert Consulting
Launch your technology project with confidence. Our experts allow you to focus on your project’s business value by accelerating the technical implementation with a best practice approach. We provide the expert guidance needed to enhance your users’ search experience, push past technology roadblocks, and leverage the full business potential of search technology.