Fundamentals

How to Evaluate an Actuary's Report: A Self-Insured's Scorecard

A section-by-section scoring framework for judging whether the methods, diagnostics, assumptions, and range in your reserve report are sound, not just present.

By Sam · April 2026 · 14 min read

You received the actuarial report. Maybe you already compared it against the documentation checklist and confirmed the required items are present: methods by accident year, development factors, trend selections, range, data reconciliation. That is a necessary first step, but it is not sufficient.

A report can include every required item and still produce a number you should not trust. The method selections can be present but poorly reasoned. The diagnostic review can exist in name but consist of boilerplate. The range can appear in a table without any meaningful construction methodology behind it. Presence is not quality.

This article provides a scoring framework: five sections of the report, evaluated not for whether they exist but for whether they are sound. It draws on every dimension of the reserving process (methods, diagnostics, industry-specific considerations, structural issues, professional standards, and the monitoring relationship) to give you a single tool for reading the report you actually received and deciding whether you can rely on it.

How to use the scorecard

For each of the five sections below, the scorecard describes three tiers:

Strong: The section meets or exceeds what a well-documented, well-reasoned analysis should contain. You can evaluate the actuary’s judgment and form your own view of whether the choices are reasonable.
Adequate: The section contains the required information but lacks the specificity or supporting rationale needed for full evaluation. You should request clarification before booking the number.
Red flag: The section is missing, generic, or internally inconsistent. You should not book the number until the deficiency is resolved.

The tiers are not letter grades. A report with all five sections at “adequate” is workable; one section at “red flag” may be fixable with a single conversation. The purpose is to identify where the report gives you the information you need and where it does not.

Section 1: Method selection

The methods section is the most consequential part of the report. It determines which signal the actuary trusts for each accident year and line of business, and for the most recent two or three accident years, that choice can swing the IBNR estimate by 20% or more.

Strong: The report shows, for each accident year, which method produced the selected ultimate: chain ladder (paid, reported, or both), Bornhuetter-Ferguson, expected claims, frequency-severity, or a blend. It shows the results of alternative methods alongside the selected method. For recent accident years where the methods diverge, it explains why one method was weighted more heavily. The rationale is specific: “BF was selected for accident year 2025 because the 12-month paid chain ladder CDF of 8.5 produces a highly leveraged estimate,” not “BF was selected for immature years.”

Friedland puts it directly: “there is no single ‘right’ way for the actuary to select ultimate claims” (p. 348). That is precisely why the report must show the choice and its reasoning. If two reasonable methods produce materially different answers and the report does not explain the selection, you are trusting the actuary’s judgment without seeing it.

Adequate: The report identifies the selected method by accident year and shows alternatives, but does not explain why one was preferred over another. You can see the results but must infer the reasoning.

Red flag: The report provides a single blended ultimate per accident year with no method attribution. Or the report shows multiple methods but labels one “selected” without any explanation of the selection logic. Or the same method is used for every accident year regardless of maturity, with no discussion of whether that is appropriate.

What to check: Compare the method selections to what you know about your program. If the actuary selected reported chain ladder for the most recent accident year, ask whether case reserve adequacy has been stable. If they selected paid chain ladder, ask whether settlement speed has been stable. If the answer to either question is unknown, the method selection is unsupported.

Section 2: Diagnostic review

The diagnostic section reveals whether the actuary investigated the conditions that make the selected methods valid or invalid. A chain ladder estimate assumes that future development will follow historical patterns. If something in your operation changed, that assumption may not hold, and the actuary needs to have investigated whether it does.

Strong: The report identifies specific operational changes that occurred during the experience period (or confirms that none occurred, with supporting evidence). Changes investigated include TPA transitions, case reserve adequacy shifts, settlement speed changes, claims management program changes, retention or limit changes, and claim mix shifts. For each change identified, the report describes how the actuary addressed it: whether they applied a Berquist-Sherman adjustment, reorganized the data, excluded affected years, or concluded the change was immaterial and explained why. The diagnostic framework should be visible in the report, not buried in internal files.

Adequate: The report acknowledges that the actuary reviewed operational changes and states findings at a high level, but does not describe the investigation methodology or the specific adjustments. You know changes were considered but cannot evaluate how they were treated.

Red flag: The report makes no mention of operational changes, or includes only a generic disclaimer (“we are not aware of any changes that would materially affect the analysis”). For a self-insured program where TPA transitions, management changes, and claims practice shifts are common, that assumption should be supported, not asserted.

What to check: Think about what you know happened in your program during the experience period. Did the TPA change? Did the claims manager change? Did settlement authority thresholds change? Did you implement a new return-to-work or medical management program? If you know something changed and the report does not mention it, the diagnostic review missed it. The leading indicators that predict adverse development (reporting lag, attorney involvement rate, closure rate) should show up in the discussion if they moved.

Section 3: Trend and assumption support

Trend assumptions drive the expected claim ratios used in BF and expected claims methods, severity and frequency projections, and the exposure adjustments that convert historical experience to current-level estimates. For a program with three immature accident years estimated using BF, the expected claim ratio and the trends behind it are often the single most influential assumptions in the report.

Strong: The report states each material trend rate (severity, frequency, medical inflation, indemnity inflation) and shows how it was derived. Derivation might come from the program’s own data (five-year average severity trend), an industry benchmark (NCCI rate level changes), or a combination. The expected claim ratio for BF is stated, its derivation is shown (prior-year ultimate trended forward, industry benchmark adjusted for program experience, or similar), and the sensitivity of the estimate to the ratio is disclosed. For line-specific programs, the trend components should reflect the line’s characteristics: indemnity, medical, and allocated loss adjustment expense components for workers compensation; severity trend for auto bodily injury.

Adequate: The report states the expected claim ratio and trend rates but does not show the derivation. You know what numbers were used but cannot evaluate whether they are reasonable for your program.

Red flag: The report uses BF or expected claims for recent years but does not disclose the expected claim ratio. Or the report applies trend rates without stating them. Or the expected claim ratio is described as “based on industry benchmarks” with no further specification of which benchmarks, what adjustments were made, or how the program’s own experience was incorporated.

What to check: Compare the expected claim ratio to your program’s actual loss ratios for mature accident years. If the ratio the actuary used for accident year 2025 is 65% but your actual ratios for 2020 through 2023 (fully developed) are all between 70% and 80%, ask what justifies the lower assumption. Conversely, if the ratio seems high relative to recent experience, ask what trend or benchmark is driving it upward.

Section 4: Range construction and sensitivity

A point estimate tells you where the actuary thinks the reserve should land. A range tells you how uncertain that landing is. But not all ranges are created equal. A range that is constructed from the spread of alternative methods or alternative assumptions is informative. A range that is the point estimate plus or minus a round percentage is not.

Strong: The report explains how the range was constructed. The low and high endpoints correspond to identified methods, assumption sets, or scenarios (for example: low = chain ladder for all years; high = BF with a conservative expected claim ratio for recent years). The report identifies the material risks of adverse deviation: the specific assumptions that, if wrong, would push the result toward the high end. Sensitivity analysis shows how much the estimate moves when the highest-leverage assumptions change (for example, a 5-point change in the expected claim ratio for the most recent accident year moves the total IBNR by a stated dollar amount or percentage). For public entity or hospital programs with long tails, the sensitivity section should address tail-factor uncertainty explicitly.

Adequate: The report provides a range with endpoints tied to methods or assumption variations, but does not include explicit sensitivity analysis or does not identify the specific risks of adverse deviation.

Red flag: The range endpoints are round numbers or round percentages of the point estimate with no connection to methods or assumptions. Or the report provides a single number with no range at all. Or the report states that “actual results may differ from estimates” without identifying specific, addressable risks.

What to check: Look at where the selected estimate sits within the range. If it sits at the low end, ask what assumptions would need to hold for the low estimate to be the outcome. If it sits at the midpoint, ask what drives the spread. The range is useful only if you can trace its endpoints to decisions the actuary made. Also verify whether the range separates pure IBNR from IBNER; the uncertainty profile differs for each component.

Section 5: Data foundation and documentation

The four sections above evaluate the analytical content of the report. This section evaluates the foundation beneath it: the data the actuary used, the reconciliation performed, and the disclosures that make the report auditable.

Strong: The report identifies every data source (TPA claim extracts, payroll records, exposure data, premium data, benchmarks), the valuation date, and any known data limitations. It confirms that the actuary reconciled the triangles to summary data and discloses any discrepancies. Management reliances are stated explicitly: what the actuary relied on you to tell them (that all known claims were reported, that no material litigation is pending, that no operational changes are planned). Material changes from the prior analysis are identified with direction and approximate impact. The report states whether the estimate is an actuarial central estimate under ASOP 43 or some other basis, and meets the disclosure standard set by ASOP 9.

Adequate: The report identifies the data sources and valuation date but does not discuss reconciliation or data limitations. Management reliances are not stated, or are stated generically. The estimate type is implied but not explicitly labeled.

Red flag: The report does not identify the source of the triangles or the valuation date. Or the triangles in the report do not reconcile to the data your TPA provided. Or the report contains no management reliances, which almost certainly means reliances exist but were not documented. Or the report is silent on whether the number is a central estimate, a management target, or something else.

What to check: Pull the latest claim summary from your TPA and compare the totals to the diagonal of the actuary’s triangle. If paid losses on the actuary’s triangle for the most recent accident year do not match what your TPA shows as of the same valuation date, there is a data problem that must be resolved before the estimate is reliable.

The self-insured scoring gap

Insurance companies receive reserve opinions that follow NAIC-driven formats with built-in regulatory review. If a carrier’s appointed actuary omits a required disclosure, the state insurance department may flag it during the annual statement review.

Self-insured programs have no comparable backstop. The format varies by firm. The level of detail varies by engagement. There is no regulatory reviewer scoring the report against ASOP requirements. The result is wide variation in report quality, and the buyer is the only quality control.

This scoring gap is not adversarial. Many actuarial firms produce thorough, well-documented reserve studies for self-insured clients. The issue is that the buyer often has no framework for distinguishing a thorough report from a thin one. Two reports can arrive with similar formatting and similar-sounding language, and one can contain genuine diagnostic analysis while the other contains boilerplate wrapped around a mechanical calculation. Without a scorecard, both look the same.

The five sections above are the framework. If you can evaluate each section and articulate what is strong, what is adequate, and what is missing, you have the basis for a productive conversation with your actuary and a defensible position when you book the number.

Putting the scorecard to work

The scorecard is most useful in three moments.

When you receive the annual report. Walk through the five sections before booking the number. If any section is at “red flag,” request the missing information before the number flows into the balance sheet. If sections are at “adequate,” decide whether the missing specificity is material enough to pursue or whether you can accept the estimate with noted limitations.

When you compare year-over-year reports. Apply the scorecard to this year’s report and last year’s side by side. If last year’s report selected BF for the three most recent accident years and this year switches to chain ladder, the methods section should explain the change. If the diagnostic section was strong last year and is now generic, ask what changed in the actuary’s process. Year-over-year regression in report quality is a signal worth investigating.

When you evaluate a new actuarial firm. If you are considering changing actuaries, ask each candidate to provide a sample report (anonymized) and score it against the five sections. A firm that produces strong reports for other clients will likely produce strong reports for you. A firm whose sample report has red flags in two sections is unlikely to improve once they win the engagement.

What a buyer should ask their actuary

These five questions map directly to the scorecard sections:

For each of the two most recent accident years, which method did you select and why? This tests whether the methods section contains genuine reasoning or just labels.
What operational changes did you investigate, and what adjustments (if any) did you make? This tests whether the diagnostic review was a real investigation or a formality.
How did you derive the expected claim ratio for the BF estimate, and how sensitive is the total IBNR to that ratio? This tests assumption support and sensitivity disclosure.
What defines the low and high ends of the range? This tests whether the range is constructed from analytical alternatives or manufactured from round percentages.
Did the triangles reconcile to the TPA data, and are there any data limitations I should know about? This tests the data foundation and closes the loop on management reliances.

If the actuary can answer all five from the report itself, the report is well-documented. If they need to go back to their files to answer any of them, the report is missing information that the professional standards require in the disclosure.

What to require in documentation

Incorporate the scorecard into your engagement process:

Include the five scorecard dimensions in the engagement letter as deliverable expectations. This sets the standard before the work begins.
At report delivery, score each section. Document your assessment and share it with the actuary as part of the review conversation.
Track scores year over year. Consistent “strong” marks indicate a reliable engagement. Declining scores warrant a conversation about process and resources.
If you are in a captive structure with a board that reviews the reserve, the scorecard provides a structured basis for the board’s review rather than relying on the actuary’s verbal summary.
For programs that use interim monitoring, the scorecard applies in a lighter form at each quarterly review. The methods and trend sections are inherited from the annual study; the diagnostic and data sections should still be evaluated for each interim deliverable.