NewGenApps

What does an AI consulting firm actually do?

In short: An AI consulting firm scopes the right problems, designs and builds working solutions, and — the distinction that matters — takes that work through verified production so your team can run it. The firms worth hiring keep senior practitioners on the problem from first scoping call to live system; many sell a strategy document and hand the build to a separate team. The difference between those two models is the difference between a recommendation on a slide and a system your business operates and trusts.


What does an AI consulting firm actually do?

An AI consulting firm identifies which AI problems are worth solving for a specific business, designs and implements a solution, and verifies that the solution works reliably before it goes live — then ensures the client's team can operate and maintain it.

The job has three parts. First, problem selection: deciding which use cases justify the cost and risk of AI, which are better served by conventional software, and which should not be built at all. Most organizations have more ideas than viable candidates. The first filter is elimination.

Second, solution design and build: architecture, data pipelines, model selection, integration with the systems the use case actually touches, and the evaluation harnesses that let you measure whether it works on real inputs rather than curated ones.

Third, production verification and handover: evidence that the system holds up under real conditions — not under demo conditions — and documentation your team can operate from after the engagement closes.

The defining characteristic of a firm that does this well is that the senior practitioners who scoped the work are the ones who build and prove it. No reassignment to a team you never met during the sale; no separate delivery vendor brought in after the strategy is signed. The point where a strategy firm hands the build to someone else is exactly where the constraints discovered in code stop reaching the person with the authority to change the design.

If you want to understand how this applies to an actual engagement, start with AI consulting.


What is the difference between AI consulting and AI development?

AI consulting decides what to build and whether to build it; AI development builds it. The two overlap at the implementation phase — and the best AI consulting firms do both.

AI consulting (pure strategy model) AI consulting + development (full-arc model)
Decides what to build Yes Yes
Builds the solution No — hands off to a development vendor Yes — same practitioners
Verifies it in production Rarely Yes — part of the engagement
Leaves your team able to run it Depends on handoff quality Yes — explicit handover gate

The line between the two roles blurs in practice, and that blurring is the problem. A consulting firm that stops at "decide what to build" and passes the work to a separate development shop creates a coordination gap. The team who designed the solution is not the team who discovers its limitations in code, and the constraints that surface during build never make it back to the person with the authority to rethink the design.

The firms that carry both roles are not doing two jobs — they are closing a gap that exists structurally in the two-vendor model. For more on what the build-vs-buy decision looks like in practice, see what "production build" actually means.


What does a good AI consulting engagement look like, step by step?

A structured AI consulting engagement runs in four stages: opportunity selection, proof of concept, production build with evaluation, and operational handover.

Each stage is a filter. Each one is allowed to kill the project. The willingness to stop is what separates a delivery firm from a vendor protecting a contract.

  1. Opportunity selection. The engagement starts before any model is written. The consulting team audits candidate use cases against three criteria: data availability, measurable business value, and AI-appropriate risk profile. A use case where a confident wrong answer is expensive — a customer-facing claim, a financial figure, a legal or compliance statement — demands a different, costlier design than one where a wrong answer is cheap to catch. The output is a ranked shortlist with an explicit success metric for each viable case and a written reason the deferred cases were deferred.

  2. Proof of concept. A working prototype scoped to validate the core technical assumption — not a demo, not a slide. The POC runs against real, messy data, not a curated set optimized for the room. It is evaluated, not just demonstrated: measured against the success metric from stage one on inputs the system has not been tuned to. A demo that works in the room and a POC that holds up on held-out real data are categorically different artifacts. Conflating them is the most common reason a "successful pilot" never reaches production. Gartner predicted, in a July 2024 press release, that at least 30% of generative-AI projects would be abandoned after proof of concept by end of 2025, citing poor data quality, inadequate risk controls, and unclear business value — a figure that points squarely at the gap between a POC passing and a POC being production-ready (Gartner, 29 July 2024).

  3. Production build with evaluation. This is where most of the real engineering lives, and where advisory-only firms stop. Production-grade delivery adds, at minimum: integration with the systems and data the use case actually touches (production data is incomplete, inconsistent, and changing — a notebook on a fixed file and a live pipeline are different engineering problems); evaluation harnesses that score outputs continuously against the success metric, not once at launch; guardrails and failure handling calibrated to the cost of being wrong — grounding and retrieval where factual accuracy matters, input/output validation, human review on high-stakes paths, and fallbacks for model uncertainty. The OWASP Top 10 for LLM Applications (2025) catalogs the failure classes a serious build must defend against — prompt injection, sensitive-data leakage, excessive agency among them; and monitoring instrumentation so the team can see accuracy, groundedness, and cost degradation before users do. McKinsey found, in its November 2025 State of AI report, that only about 7% of organizations had fully scaled AI across the enterprise, with most stuck between pilot and production (McKinsey QuantumBlack, November 2025). Orchestrating retrieval, evaluation, guardrails, and model components is the mechanism of this stage, not its purpose. The purpose is narrower and harder: a system that holds up under real conditions, with a documented record that it was verified before users depended on it.

  4. Operational handover. The engagement closes when the client's team can run the system without the consulting firm on call. This means documented architecture, runbooks, the evaluation suite handed over so the client can re-prove the system after a change, and at least one operated production cycle with the internal team driving. Proven is not a launch-day event; it is a capability the client keeps. A firm that cannot hand over the means of verification has delivered something the client cannot re-check — which is to say, something the client cannot fully trust.


Who actually does the work on an AI consulting engagement?

At most firms, a senior practitioner scopes the work and a more junior team delivers it — the standard professional-services arrangement, where senior time is sold and supervised at the top and execution happens further down. That arrangement is well suited to structured analysis a capable generalist can produce under review.

Production AI is a different kind of deliverable. The judgment that decides whether it succeeds shows up in the build, not the plan: choosing which use cases are feasible, designing an evaluation that measures the right thing on real data, deciding which failure modes need guardrails and which are cheap to catch, recognizing when a proof of concept's apparent success is an artifact of a curated test set. These are senior calls, and in a delivery model that pushes execution to the most junior tier, they land with the people least equipped to make them — and least likely to escalate when something is off.

NewGenApps keeps the named senior practitioners on the work rather than supervising it. The person who scopes the use case is the person committing production code and signing off on the evaluation. Scoping holds up better because the person who discovers the constraints during build is the person accountable for the design they came from.

Read more about how we staff engagements on how we work.


What should you ask an AI consulting firm before you hire them?

Before committing to an engagement, ask four questions that reveal whether the firm takes work through production or stops at strategy.

  1. Who will be doing the technical work day to day — and can I meet them before we sign? This reveals whether the senior people you met in the sales process are the delivery team, or whether you are buying access to a partner's reputation and receiving a junior team's time.

  2. How does the engagement end — what does handover look like? A firm with a genuine production model can describe the handover gate specifically: what documentation is produced, what the evaluation suite looks like, and how the client team is trained to operate the system. A firm that exits after a POC will not have a clear answer.

  3. How do you evaluate that the system works before it goes live? This reveals whether the firm has a structured evaluation practice — defined success metrics, held-out test sets, red-teaming, edge-case coverage — or relies on informal judgment. A firm that says "we test it and it looks right" has not built the evidence a production system needs.

  4. What does a past engagement that went into production look like — without naming the client? A firm with a genuine production track record can describe the architecture, the evaluation approach, and the challenges encountered — without disclosing client identity. One that has not taken work to production cannot describe what that experience is actually like. Sector, scale, and geography descriptors are sufficient; confidentiality is not a reason to be unable to characterize the work at all.


Frequently asked questions

Is AI consulting the same as AI development?

No. AI consulting determines which problems justify an AI solution and how to design it; AI development builds the solution. The most capable firms combine both: the same team that scopes the problem builds and verifies the production system. Firms that only consult hand the build to a separate vendor, creating a structural gap between design intent and delivered system.

How long does an AI consulting engagement typically take?

A scoped proof of concept can run four to eight weeks. A production build — which includes evaluation, integration, and operational handover — typically adds eight to sixteen weeks on top of that, depending on integration complexity and the maturity of the client's data infrastructure. Engagements that promise AI in production in two weeks are either narrowly scoped or skipping the evaluation phase.

What kinds of problems do AI consulting firms typically work on?

The most common categories are: document processing and extraction (turning unstructured text into structured data), decision-support systems (AI-assisted triage, prioritization, or classification), conversational interfaces grounded in a company's own data, and workflow automation where deterministic rules fail on edge cases. The common thread is that these are problems where the output needs to be accurate enough to be operationally useful — which is why evaluation is the critical gate, not the build itself.

What makes an AI consulting firm worth hiring?

Three things: senior practitioners doing the delivery work, not supervising it; a structured evaluation practice — they can describe how they verify a system works before it goes live; and a production track record — they can describe, without naming clients, engagements that went live and that the client's team runs today. A firm that can only describe strategy work or proofs of concept has not closed the production gap.

How is AI consulting different from hiring an AI vendor or buying an AI product?

An AI vendor sells a pre-built product configured for your context. An AI consulting firm diagnoses the problem first and determines whether a pre-built product fits, a custom build is warranted, or the problem should not be solved with AI at all. The consulting function is the evaluation and selection layer; the development function is the build layer when custom work is justified. A competent firm will sometimes conclude the right answer is to buy, not build — and say so.


If you want to see what this looks like as an engagement, start with AI consulting or read how we work.

Book an AI working session