Business Services Industry
Identifying the generating distribution of business and economics data: an empirical method
Journal of the Academy of Business and Economics, March, 2004 by Laurence R. Takeuchi, Joseph Richards
ABSTRACT
A general framework for empirically identifying familial membership within the Pearson system of distributions is proposed. Specifically, the research addresses the problem of constructing a point and a non-parametric confidence interval estimate of familial membership. Unlike most identification procedures that require the data to support or negate membership in a specific hypothesized family, e.g., Chi-Square and Empirical Distribution Function goodness of fit, the presented approach uses bootstrap re-sampling techniques to help identify the likely generating family of the data.
1. INTRODUCTION
The study of systems of probability distributions first appeared in the literature in the late 1800's when Karl Pearson (1895) began to question the basic assumption of normal theory. His empirical studies revealed non-normal characteristics to be inherent features of many populations. Initially, many of Pearson's contemporaries doubted the need for curves (systems of distributions) other than the normal density. However, by the turn of the century, theoreticians and empiricists had accepted the possibility of non-normality, and began to explore various typologies for alternative distributions. Of these typologies, the Pearson (1902a; 1902b) system of probability distributions became the most cited.
The Pearson, as well as complementary systems of distributions, such as Johnson's [S.sub.U] and [S.sub.B] system (Johnson 1949) are particularly relevant to research in economic and business disciplines when the phenomenon under inquiry is assumed to be governed by a probability law. Often, the specific law in question is unknown, but its realizations (data) are available. In such instances these systems provide a general familial space of distributions in which one member, i.e., a point in the space, can be identified as the "most likely" candidate generating the observed phenomenon. Thus, knowledge of general systems of distributions (families) can aid the researcher in addressing the basic problem: Given a sample set of independently, identically distributed (iid) observations, how does one empirically determine the implied generating family?
In much economics and business research however, the problem of familial identification is circumvented by imposing an explicit a' priori functional form for the generating family, e.g., X is distributed normal. Mathematical tractability often determines the choice of the assumed family, e.g., closure of normally distributed variables under addition, rather than empirical or theoretical arguments. Simplifying the problem in this manner reduces the general research issue of familial identification to parameter estimation within a given family; in the case of the normal density family, the estimation of [mu] and [sigma]. Imposing a' priori distribution can be costly in terms of the congruity of the research model with the phenomena being studied, and hence the usefulness of the research. While there are many procedures to reject a specific distribution, such as normality, a more useful approach would be to determine the family or families of distributions that are consistent with the data.
We can formally state the simplified problem as follows: Let the random component of the phenomenon of interest be represented by an iid random variable X. Further, let the distribution of X belong to a family of distributions {f(x;[omega]): [omega] [member of] [OMEGA]}. Each member f (x;[omega]) of {f(x;[omega]): [omega] [member of] [OMEGA]} is uniquely defined (identified) by the values of the parameter set [omega] [member of] [OMEGA]. For example, assume that the distribution of a random variable X follows the normal probability law [psi](x;[mu],[sigma]). The law [psi](x;[mu],[sigma]), however, can be viewed as a general family of distributions {[psi](x;[omega]): [omega][member of] [OMEGA]}, where [omega] = ([mu], [sigma]), and [OMEGA] = {([mu],[sigma])| -[infinity], < [mu] < [infinity], 0 < [sigma] < [infinity]}. Since a unique member of this family exists for each pair ([mu], [sigma]) [member of] [OMEGA], the simplified research problem is to decide on the basis of data which member, or members of the assumed family, "best" represents the distribution of X. Thus, the problem of within-family identification is one of statistical estimation, estimation of the underlying parameter(s) that uniquely identify a member of the assumed family implied by the data. It begs the question however, of the empirical or conceptual validity of the a'priori familial specification. This question is important when parameter estimation, and hence, membership inference depends on the general family in question.
This paper proposes a statistical procedure to identify the likely candidate(s) of probability distribution(s) that generated a set of observed data. The procedure is useful when data realizations are assumed to be governed by a single, unknown, continuous distribution having membership in the Pearson system of probability densities. Using this procedure a researcher can construct a point and, more informatively, a joint confidence interval estimate that respectively identify a single Pearson class or classes (families) of distributions that could have propagated the observed data. The procedure employs a computationally intensive technique referred to as "bootstrap" (Efron, 1985; Efron, 1982; Efron & Tibshirani, 1993; Efron & Tibshirani, 1986).
- 5 Rules for Immediate Annuities
- Death in the Family: 12 Things to Do Now
- Dumbest Things You Do With Your Money
- 6 Online Networking Mistakes to Avoid
- 401(k) Mistakes to Avoid
- 5 Economic Scenarios to Keep You Up at Night
- The Real ‘Best Places to Retire’
- Best Credit Cards for You
- 12 Tough Questions to Ask Your Parents
- The Real ‘Best Colleges’
- Home Buyer Tax Credit: How to Cash In
- Why You Shouldn't Bash Cash
- 8 Phony 'Bargains' and Better Alternatives
- Danger: 3 Debit Card Scams to Avoid
- 6 Myths About Gas Mileage
- 29 Fees We Hate Most
- Quick and Easy Ways to Boost Returns
- Best Stocks to Buy Now
- Lower Your Taxes: 10 Moves to Make Now
- New Jobs: 8 Lessons from Real-Life Career Switchers
- The New Job Market: Who Wins and Who Loses?
- Health Care Reform's Public Option: Everything You Need to Know
- Volunteer Work When Unemployed: Should You Work for Free?
- Whose Recovery Is This?
- Long-Term-Care Insurance: 4 Biggest Risks to Avoid
Content provided in partnership with
Most Recent Business Articles
- Multiple criteria evaluation and optimization of transportation systems
- Multi-criteria analysis procedure for sustainable mobility evaluation in urban areas
- A two-leveled multi-objective symbiotic evolutionary algorithm for the hub and spoke location problem
- Multi-criteria analysis for evaluating the impacts of intelligent speed adaptation
- The development of Taiwan arterial traffic-adaptive signal control system and its field test: a Taiwan experience
Most Recent Business Publications
Most Popular Business Articles
- 7 tips for effective listening: productive listening does not occur naturally. It requires hard work and practice - Back To Basics - effective listening is a crucial skill for internal auditors
- LIFO vs. FIFO: a return to the basics
- FAS 109: a primer for non-accountants - Financial Accounting Standards Board's "Statement 109: Accounting for Income Taxes"
- Too Young to Rent a Car? - 25-years-old the minimum age for car renting - Brief Article
- Design a commission plan that drives sales - Sales Commissions


