Over the past several weeks, as we've reviewed how to re-create the whole market pricing exercise, we've critically examined what's "broken" with the current system and how it impacts the entire market pricing concept. One item that came quickly to light that no one ever brings up is the sampling strategy that survey houses use when they administer their salary surveys. Let me explain what this is and why it's a critical problem that, in large part, invalidates the results of the survey.
In her landmark book "How to Conduct Surveys: A Step-by-Step Guide", Arlene Fink (Professor of Medicine and Public Health at the University of California, Los Angeles) devotes an entire chapter to Sampling, given it's importance to the validity and reliability of the survey.
In general, there are two types of sampling used by surveyors: Random and Convenience. We know of no salary survey house that utilizes a Random Sampling methodology when they conduct their salary surveys; all rely on a Convenience Sampling methodology due to its ease of administration and lower cost.
Random (probability) sampling is preferred over convenience sampling because it produces more representative, less biased data that can be validly generalized from the sample to the full population. Practically speaking, this means that the salary survey employing a convenience sampling strategy to gather its data cannot state that the statistics it calculates from that survey -- for instance, the median salary for a given job -- accurately represents the median salary of the market. Simplified statements like "we have a large sample size" don't correct for the original sin of employing a faulty sampling strategy.
Any survey house representative who suggests otherwise either (a) doesn't understand statistics, or (b) is knowingly trying to sell you a product -- data, in this case -- that doesn't meet the quality standards which are being advertised.
Here's a little info on both types of sampling methodologies and why convenience sampling is such a significant problem, essentially invalidating the results of salary surveys generated by survey houses:
Definitions
Random sampling means every unit in the target population has a known, typically equal, probability of being selected, often via lottery-style or algorithmic randomization. Convenience sampling selects whoever is easiest to reach (e.g., current clients or former survey participants), so not everyone in the population has a chance to be included.
Key advantages of random sampling
Representativeness and external validity
Because selection is random from a defined population, a random sample is much more likely to mirror the population’s characteristics, making it appropriate for generalizing results. As such, the median base pay rate for a job from the salary survey is pretty close to the actual median base pay rate for the population as a whole.
By contrast, convenience samples over‑represent people who are easy to reach, so the odds are that the results won't reflect the broader population. This is why, when comparing the results of market data for a job across different salary surveys, even when holding the scope of the cuts constant, you almost always see different results.
Reduced selection bias
Random selection removes the researcher’s discretion in who is included, which greatly reduces systematic selection bias.
Convenience samples are explicitly chosen based on accessibility or willingness to participate, so they are “open to bias” and can systematically skew results.
Due to their expense, salary surveys from the large survey houses typically over-represent large companies with big discretionary budgets who can afford the surveys and under-represent smaller/start-up companies.
Ability to use inferential statistics
Probability samples (including random samples) meet the assumptions needed for inferential statistics, allowing estimation of sampling error and confidence intervals and valid tests of hypotheses about the population. In other words, the base-pay median calculated from in a random-sample survey is expected to be very close to the base-pay median of the population.
With convenience sampling, inclusion probabilities are unknown, so you generally cannot make statistically defensible inferences or population-level estimates from the data. In other words, you cannot suggest that the median base-pay rate from your survey is the same -- or even approximates -- that of the population.
Replicability and methodological credibility
Random sampling based on a defined frame and a clear randomization procedure can be replicated by other researchers, supporting reliability and scientific credibility.
Convenience samples depend heavily on time, place, and who happened to be available, making results difficult to replicate and weakening perceived rigor.
Stronger basis for high‑stakes decisions
Because random samples reduce bias and support generalization, they are more appropriate when findings will inform major policy, product, or financial decisions. This makes random sampling perfect for salary surveys.
Convenience sampling is better suited to low‑stakes, exploratory, or pilot work where speed and cost matter more than precise estimates (e.g., early usability tests, hypothesis generation). The problem with using the results of convenience sampling for exploratory or "pulse" surveys, however, lies in how the results are communicated, often suggesting the results as iron-clad fact rather than approximation. So, for example, the results of the Mercer March 2025 US Compensation Planning Survey -- Mercer QuickPulse US Compensation Planning Survey -- which didn't describe a formal methodology section (e.g., sampling frame or data collection model) -- should be accepted with a grain of salt.
How Does the Data from BACABA's Market Pricing Services Compare?
BACABA's data comes from the Economic Research Institute's database of compensation for over 18,000 job titles, 1,000 industries and 9,400 locations, pulling data from multiple sources for their database as identified below. ERI goes to great pains to overcome the problems encountered from convenience sampling, and present their full methodology as follows:
"In general, most individual surveys report participants, but do not tie specific data to those participants. All compensation research firms, including ERI, wish to safeguard the privacy of individual survey participants. In general, ERI does not confirm whether a specific employer's data is included in any particular Assessor Series application analysis, that is, unless the employer has publicly released this information. Participation may have been via ERI's patented on-line survey, ERI Salary Surveys, ERI field job analyses, ERI's eDOT Skills Project, Occupational Assessor's cybernetic selected characteristics of occupations contribution to the latter, digitized reading of IRS public documents, US SEC proxies, 10-Ks, and 8-Ks, manual digitization of public UK/Euro countries' companies' annual reports, Canadian SEDAR data (under license), and/or other data licensed for use in the Assessor Series from organizations such as Statistics Canada, national statistics offices of other countries, and others. All of these sources comply with US DOJ/FTC Antitrust Safety Zone Statements by meeting the following conditions: 1) provider participation in surveys is managed by a third-party; 2) the information provided by survey participants is data more than three months old; and 3) there are at least five providers reporting data upon which each disseminated statistic is based, no individual provider's data represents more than 25% on a weighted basis of that statistic, and any information disseminated is sufficiently aggregated such that it would not allow recipients to identify the prices charged or compensation paid by any particular provider (unless part of the public record).
ERI also provides total population statistics that will help subscribers to evaluate whether an adequate population of incumbent employees within the area for which employers are competing for talent has been surveyed. In this regard, ERI is peerless. ERI’s combination of multiple survey data means that they are analyzing the largest populations possible, in most cases much more than 30% of the employers in a given area. There are currently over 46 million US and Canadian employees included in ERI's Salary Assessor database. Since they have analyzed so many sources in order to report updated consensus results, they expect their pay data to be more representative of market norms than any one specific published survey, particularly if it relies on a smaller sample (e.g., SEC proxies alone) or is out of date. According to the statistical laws of large numbers, Central Tendency and Bernoulli’s Law, Assessors that aggregate multiple overlapping sources covering virtually entire populations will be more accurate in normative terms than any one survey of a more limited sample."