Science of Observation (also called (Data) Analytics or Data Science)

Statistics as Science of Observations

"Statistics" is also used for Science of Observations, whereby a student of this discipline learns Data management, Probability of Events, Estimations, Testing of Hypothesis, Sampling Techniques, Quality Control, Operations Research, Design of Experiments, Factor Analysis, Correlations, Regressions... etc... There is a lot of material available on net and in different sciences of Data like Data Analytics, Data Science for this part of Statistics. So its not required to rewrite the same thing again over here as there are many good authors who had written more beautifully and in more easy and comprehend able form. Instead I will go by solving problems and speaking the theory concerning them and quoting available references for them. This is the only fastest way to comprehend the subject from practical view point also.

There are Sampling distributions, often used in Statistics, for various reasons, whose tables are provided or referred to. What way these tables are used or read, be known. Let us discuss some of the very important tables used in Statistics, especially in testing of hypothesis.

Standard Normal Distribution or Z- score or N(0,1)

Z= Standard Normal Distribution (Reading the Z -table ) mean= 0 ,var =1

Often we need the Statistical tables to be read by the user, in order to generally test a hypothesis or solve some other cases. There are many references available on net. We can refer one such here. Commonly known as Z Score Statistic follows Standard Normal Distribution N(0,1), under some assumptions ....Click here

or a video on Z-score significance can also be viewed on YouTube video (click here ) or ....Click here to see a video by "statisticsfun" to understand.

The Z - score is defined by z = (x – μ) / (σ / √n) ~ N(0,1). To read the table, generally two types of Table are available.

We are often to memorize that

Z - critical value for one side (right side) at 5% level of significance = 1.69

and Z - Critical value for both side at 5% level of significance = ± 1.96

There are other types of Z-tables also like giving area from minus infinity to that ordinate point like for given area "a", the ordinate Za is given in the table, such that : a = Probability ( Z <= Za ) = ∫(−∞,a] d(N(0,1) ) . Such type of tables are also called cumulative probability distribution tables and in this case, it is cumulative standard normal probability distribution tables. In the body of table, cumulative probability is given whereas on top and left stubs ordinates are given or can be approximated.

Student's t - statistic for two populations

t- distribution with n degrees of freedom

Student's t- test Statistic tables are generally given with left side stub having the degrees of freedom (df) and top as three lines where ta is given and then second line values of α for one tail and the third line the equivalent value of α for two tail, where α is level of significance. like to see .....Click here

In the body of table, the corresponding ordinates or critical values are given.

For understanding more on t- Statistic, there is enough material on net. To read one such (i) Click here...... or (ii) Click here........

Chi - Square Statistic χ2 Table

The table of Chi-Square is generally given with α (the critical region or level of significance ) at the top and ν ( the degrees of freedom (df) ) at the left stubs. In the body of the table, the ordinates (critical values) are given. So, for getting any critical value, one needs to go to table value at the cross-section of the particular row and column and this value would be χ2 for that ν and for that α , which we had taken as row and column.to view please refer .....Click here....

F ( ν1 , ν2 ) - Statistic Table

Tabulated are critical values for the F - distribution. The column headings give the numerator degrees of freedom n1 and the row headings the denominator degrees of freedom n2. Lower one-sided critical values may be found from these tables by reversing the degrees of freedom and using the reciprocal of the tabled value at the same significance level (100 minus the percent for the percentile). To view the F-table, we have a lot of material on net. For ready reference, please visit the resource on one such site for pdf..... Click here, or here the another one. Click here

Random Numbers and Random Number Table

Random Numbers are important and are used extensively in Sampling designing to ensure objectivity in sampling design.

Random Numbers digits, have equal probability of occurrence in a list of numbers, which is composed of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in such a way that in the list each digit has no predictable relationship to the digits that preceded it or to the digits that followed it. Invariably, all sampling plans in statistics are based on the assumption that some or all sampling unit of the constituting sample are chosen at random to bring objectivity in the sampling plan. It ensures, at that selection time, every sampling unit of the lot (target population) has an equal chance of being selected as part of the sample. This technique can be modified also to give sampling unit to be selected with probability proportional to some other specific variable like size. For achieving these goals, Statistician requires random number tables.

How to use Random Numbers table in a typical case of Simple Random Sampling Click here to view AP Statistics: How to Sample with a Random Number Table

Note: Economy Vs Ease in sampling:

Randomization process through the “remainder method” at second stage unit (SSU) is a cumbersome process for the enumerators. To explain it, let us take following example Example –. Draw a Sample SRSWOR of 4 Households (HHs) from a list of 40 HHs. Let the 2 digit random numbers be 32, 78, 68, 91, 53, 01,23, 03.

Method-1 The sample will be { (32),(78)X,(68)X,(91)X,(53)X, (01), (23) and (03)}; rejecting if it is > 40. Note X means not acceptable. Sample will be 32nd, 1st, 23rd, 3rd. Eight random numbers used here.

Method-2 The sample will be { (32),(38) [ arrived by remainder when 78 is divided by 40],(28),(91)X (as > 80),(13) }; rejecting only if it is > highest multiple of 40 below 99 [ 99 is chosen because it is highest 2 digit number, as we are doing sampling by 2 digit random numbers ], (which is here 80.- this is being done for tail end correction). Sample by this method will be 32nd, 38th,28th, 13rh. Five random numbers are used here.

The method-2 (the remainder method) is economical to method-1, whereas method-1 is easy to operate. In some sampling designs, the method –2 is employed at First Stage Units (FSU) and at SSU levels. it may be assessed that the method is useful for FSUs ( say i.e. for villages selection), because it might be that economy really matters here but for SSU ( say for selection of House-Holds within villages , ) the Method –1 might be a better option as it is easy for the data collectors to operate the procedure in the field. Generally FSU selections are done at headquarters level and by Computer, an opted difficult process will not be prone to errors. However at field level, where enumerators are doing job manually, they may be allowed to exercise a relatively easy option.

Now-a-days, various packages of data collections are available, and this automaton can be included in the package itself, sometimes it is so integrated as in National Family Health Survey (NFHS) that SSU are also selected by headquarters by a more computationally skilled people or computer. There this distinction of selection by method-1 or method-2 is immaterial.

Glimpse of Statistics as Science of Observations

Watch "Statistics made easy ! ! ! Learn about the t-test, the chi square test, the p value and more" on YouTube by Greg Martin.

Statistics as Science of Observations in various disciplines of study

One dimension of studying Statistics as science of Observation (SOO) is to study abstractly as a pure subject for different Chapters like Probability of Events, Estimations, Testing of Hypothesis, Sampling Techniques, Quality Control, Operations Research, Design of Experiments, Factor Analysis, Correlations, Regressions... etc... and then apply them in applied world. Usually, students graduating in Statistics, generally does this way. Mathematical proofs takes precedence here, and applied things remain at lower preference. Such graduates can evolve theories, but they often are not able to translate them in applied world, as they give lesser time to application side. This Statistics Science is also called Mathematical Statistics. Such statistician are Abstract Statistician.

Second dimension of studying Statistics to study them within various disciplines of studies like Agricultural Statistics, Medical Statistics, Psychometric Analysis, Bio-Statistics etc. in their respective disciplines. Here applied version of this science is studied, the student graduating this way knows practicality of different statistical formulas and protocols and can interpret data of their field. He, however, cannot mathematically evolve the theory. Applied Statistician is just like driver of car like science of Statistics and abstract Statistician is like an engineer of this car !

Some other ways Statistics Science can be comprehended

Statistics science, besides abstract and applied, has some more dimensions, through which it can be comprehended. One way is to go by lessons or topics and another way is going by example or by answering or solving problems.Learning Statistics science by lessons or topics is slightly different from learning by Chapters, (which is also called topics). Topics is used for lessons as well as Chapters. In learning by Chapters, this science is learnt by themes, while learning by lessons, it is learnt by more granular level, may be at algorithms level. So learning Probability is learning by topics / chapters, while learning conditional probability is learning by topic/ lessons. However, for those chapters or topics which are less known, because they are not studied in greater details in usual graduate classes, learning them will be learning by topics ; example learning "National Accounts Statistics". These topics are very important from for carrying out day to day work of any Government or an Organisation and their study material remains available within the ambit of the concerned organisation/ government. It is generally not taught in colleges and if it is taught, then not in those greater details. This is gap in our education system ! Whatever, generally, is taught, is not used, and whatever is used, is not taught fully. Some such examples of topics or lessons from statistics science viewpoint are:

National Accounts Statistics,
Poverty Evaluation,
Gender Statistics etc.

Statistics-Science- (learning by problem solving) [Statistics Practitioner]

The fourth dimension of learning Statistics science is learning it by problem solving method. It gives immediate solution to the problem, but keeps a large part of statistics-science opaque from practitioner of Statistics. Here, one has to do lot of researched reading for tackling that problem. The Operations Research was developed this way only during war-time to solve problems arisen. The person who learns by this way can better be called a practitioner, because it develops skill to solve the current situation. They document these best practices and practice accordingly. Some such problem solving that are often needed by the Statistics practitioner are

What should be sample size ?
How to design Schedule of Questionnaire for this survey?
How to minimize transportation cost?
Which sampling design will be better?
How to stratify the sampling to get most out of this sampling design ?

Statistical Packages Indispensable tools for Statistics Practitioner

While practicing Statistics Science, it is much preferred that the practitioner of this science should have good knowledge of Statistical Packages, which are now-a-days available in numbers and are required by them. Packages help them in speedy calculations without knowing inside nitty-gritty of the science behind it. Some such packages that are worth mention, are as follows:

SAS
SPSS
Microsoft Excel
Stata

Yet another set of packages also exist, which gives specific capability to the practitioner for their specific jobs, like

CS-Pro:for data collection, Questionnaire or Schedule designing, validating and verification, and ultimately tabulating it. Used for Census extensively and also for Sample Surveys.
Survey-CTO: for data collection, Questionnaire designing, validating and verification, and ultimately tabulating it. Used for Questionnaire extensively.
Google-Forms: for Collecting and organizing information. Many small feedback surveys are being organised with this google application. It is integrated with many google - products and easily available.

Yet again another set of packages exist in the field of different sciences like in Monsoon studies, Measure evaluations, Cause of Death studies, etc which are using Statistics under their theoretical models.

Databases

There are number of Database packages, which stores data, validates and verifies them and helps building report and tables. Since there are other uses of database besides Statistical Analysis like using it for Management Information System (MIS) or using it for automating a process or providing intelligence to the system; it is, therefore, developed as a different topic in Computer Science, evolving concepts like structured database (DB), relational DB, NoSQL DB., Object Oriented databases or Distributed Hierarchical Databases (Hadoop). Some DB packages are:

Microsoft Access
Microsoft SQL server
MySQL
MongoDB
Redis
PostgreSQL

Topics in Mathematical Statistics / Applied Statistics /

The following topics are studied in Mathematical Statistics / Applied Statistics: ( A site worth visiting is statisticshowto.com)

(i) Probability & Distribution: Classical and axiomatic definitions of Probability and consequences. Law of total probability, Conditional probability, Bayes' theorem and applications. Discrete and continuous random variables. Distribution functions and their properties. Standard discrete and continuous probability distributions - Bernoulli, Uniform, Binomial, Poisson, Geometric, Rectangular, Exponential, Normal, Cauchy, Hyper geometric, Multinomial, Laplace, Negative binomial, Beta, Gamma, Lognormal. Random vectors, Joint and marginal distributions, conditional distributions, Distributions of functions of random variables. Modes of convergences of sequences of random variables - in distribution, in probability, with probability one and in mean square. Mathematical expectation and conditional expectation. Characteristic function, moment and probability generating functions, Inversion, uniqueness and continuity theorems. Borel 0-1 law, Kolmogorov's 0-1 law. Tchebycheff's and Kolmogorov's inequalities. Laws of large numbers and central limit theorems for independent variables.

Learn from Probability, Set Theory Symbols, Basic Probability & Statistics,

(ii) Statistical Methods: Collection, compilation and presentation of data, charts, diagrams and histogram. Frequency distribution. Measures of location, dispersion, skewness and kurtosis. Bivariate and multivariate data. Association and contingency. Curve fitting and orthogonal polynomials. Bivariate normal distribution. Regression-linear, polynomial. Distribution of the correlation coefficient, Partial and multiple correlation, Intraclass correlation, Correlation ratio. Standard errors and large sample test. Sampling distributions of sample mean, sample variance, t, chi-square and F; tests of significance based on them, Small sample tests. Non-parametric tests-Goodness of fit, sign, median, run, Wilcoxon, Mann-Whitney, WaldWolfowitz and Kolmogorov-Smirnov. Order statistics-minimum, maximum, range and median. Concept of Asymptotic relative efficiency.

Learn from : Statistical Methods (more abstract type)

Model Oriented Estimations : Linear Least Squares Regression, Nonlinear Least Squares Regression, Weighted Least Squares Regression, LOESS (aka LOWESS)

(iii) Numerical Analysis: Finite differences of different orders: , E and D operators, factorial representation of a  polynomial, separation of symbols, sub-division of intervals, differences of zero. Concept of interpolation and extrapolation: Newton Gregory's forward and backward interpolation formulae for equal intervals, divided differences and their properties, Newton's formula for divided difference, Lagrange’s formula for unequal intervals, central difference formula due to Gauss, Sterling and Bessel, concept of error terms in interpolation formula. Inverse interpolation: Different methods of inverse interpolation. Numerical differentiation: Trapezoidal, Simpson’s one-third and three-eight rule and Waddles rule. Summation of Series: Whose general term (i) is the first difference of a function (ii) is in geometric progression. Numerical solutions of differential equations: Euler's Method, Milne’s Method, Picard’s Method and Runge-Kutta Method.

(iv) Linear Models: Theory of linear estimation, Gauss-Markov linear models, estimable functions, error and estimation space, normal equations and least square estimators, estimation of error variance, estimation with correlated observations, properties of least square estimators, generalized inverse of a matrix and solution of normal equations, variances and covariances of least square estimators. One way and two-way classifications, fixed, random and mixed effects models. Analysis of variance (two-way classification only), multiple comparison tests due to Tukey, Scheffe and Student-Newmann-Keul-Duncan.

(v) Statistical Inference and Hypothesis Testing: Characteristics of good estimator. Estimation methods of maximum likelihood, minimum chisquare, moments and least squares. Optimal properties of maximum likelihood estimators. Minimum variance unbiased estimators. Minimum variance bound estimators. Cramer-Rao inequality. Bhattacharya bounds. Sufficient estimator. factorization theorem. Complete statistics. Rao-Blackwell theorem. Confidence interval estimation. Optimum confidence bounds. Resampling, Bootstrap and Jacknife. Hypothesis testing: Simple and composite hypotheses. Two kinds of error. Critical region. Different types of critical regions and similar regions. Power function. Most powerful and uniformly most powerful tests. Neyman-Pearson fundamental lemma. Unbiased test. Randomized test. Likelihood ratio test. Wald's SPRT, OC and ASN functions. Elements of decision theory.

(vi) Sampling Techniques: Concept of population and sample, need for sampling, complete enumeration versus sampling, basic concepts in sampling, sampling and Non-sampling error, Methodologies in sample surveys (questionnaires, sampling design and methods followed in field investigation) by NSSO. Subjective or purposive sampling, probability sampling or random sampling, simple random sampling with and without replacement, estimation of population mean, population proportions and their standard errors. Stratified random sampling, proportional and optimum allocation, comparison with simple random sampling for fixed sample size. Covariance and Variance Function. Ratio, product and regression methods of estimation, estimation of population mean, evaluation of Bias and Variance to the first order of approximation, comparison with simple random sampling. Systematic sampling (when population size (N) is an integer multiple of sampling size (n)). Estimation of population mean and standard error of this estimate, comparison with simple random sampling. Sampling with probability proportional to size (with and without replacement method), Des Raj and Das estimators for n=2, Horvitz-Thomson’s estimator Equal size cluster sampling: estimators of population mean and total and their standard errors, comparison of cluster sampling with SRS in terms of intra-class correlation coefficient. Concept of multistage sampling and its application, two-stage sampling with equal number of second stage units, estimation of population mean and total.Double sampling in ratio and regression methods of estimation. Concept of Interpenetrating sub-sampling.

(vii) Econometrics: Nature of econometrics, the general linear model (GLM) and its extensions, ordinary least squares (OLS) estimation and prediction, generalized least squares (GLS) estimation and prediction, heteroscedastic disturbances, pure and mixed estimation. Auto correlation, its consequences and tests. Theil BLUS procedure, estimation and prediction, multi-collinearity problem, its implications and tools for handling the problem, ridge regression. Linear regression and stochastic regression, instrumental variable estimation, errors in variables, autoregressive linear regression, lagged variables, distributed lag models, estimation of lags by OLS method, Koyck’s geometric lag model. Simultaneous linear equations model and its generalization, identification problem, restrictions on structural parameters, rank and order conditions. Estimation in simultaneous equations model, recursive systems, 2 SLS estimators, limited information estimators, k-class estimators, 3 SLS estimator, full information maximum likelihood method, prediction and simultaneous confidence intervals

(viii) Applied Statistics: Index Numbers: Price relatives and quantity or volume relatives, Link and chain relatives composition of index numbers; Laspeyre's, Paasches’, Marshal Edgeworth and Fisher index numbers; chain base index number, tests for index number, Construction of index numbers of wholesale and consumer prices, Income distribution-Pareto and Engel curves, Concentration curve, Methods of estimating national income, Inter-sectoral flows, Inter-industry table, Role of CSO. Demand Analysis Time Series Analysis: Economic time series, different components, illustration, additive and multiplicative models, determination of trend, seasonal and cyclical fluctuations. Time-series as discrete parameter stochastic process, auto covariance and autocorrelation functions and their properties. Exploratory time Series analysis, tests for trend and seasonality, exponential and moving average smoothing. Holt and Winters smoothing, forecasting based on smoothing. Detailed study of the stationary processes: (1) moving average (MA), (2) auto regressive (AR), (3) ARMA and (4) AR integrated MA (ARIMA) models. Box-Jenkins models, choice of AR and MA periods. Discussion (without proof) of estimation of mean, auto covariance and autocorrelation functions under large sample theory, estimation of ARIMA model parameters. Spectral analysis of weakly stationary process, periodogram and correlogram analyses, computations based on Fourier transform

(ix) Demography and Vital Statistics: Sources of demographic data, census, registration, ad-hoc surveys, Hospital records, Demographic profiles of the Indian Census. Complete life table and its main features, Uses of life table. Makehams and Gompertz curves. National life tables. UN model life tables. Abridged life tables. Stable and stationary populations. Measurement of Fertility: Crude birth rate, General fertility rate, Age specific birth rate, Total fertility rate, Gross reproduction rate, Net reproduction rate. Measurement of Mortality: Crude death rate, Standardized death rates, Age-specific death rates, Infant Mortality rate, Death rate by cause. Internal migration and its measurement, migration models, concept of international migration. Net migration. International and postcensal estimates. Projection method including logistic curve fitting. Decennial population census in India.

(x) Statistical Quality Control: Statistical process and product control: Quality of a product, need for quality control, basic concept of process control, process capability and product control, general theory of control charts, causes of variation in quality, control limits, sub grouping summary of out of control criteria, charts for attributes p chart, np chart, c-chart, V chart, charts for variables: R, ( X ,R), ( X ,σ) charts. Basic concepts of process monitoring and control; process capability and process optimization. General theory and review of control charts for attribute and variable data; O.C. and A.R.L. of control charts; control by gauging; moving average and exponentially weighted moving average charts; Cu-Sum charts using V-masks and decision intervals; Economic design of X-bar chart. Acceptance sampling plans for attributes inspection; single and double sampling plans and their properties; plans for inspection by variables for one-sided and two sided specification.

(xi) Multivariate Analysis: Multivariate normal distribution and its properties. Random sampling from multivariate normal distribution. Maximum likelihood estimators of parameters, distribution of sample mean vector. Wishart matrix – its distribution and properties, distribution of sample generalized variance, null and non-null distribution of multiple correlation coefficients. Hotelling’s T and its sampling distribution, application in test on mean vector for one and more multivariate normal population and also on equality of components of a mean vector in multivariate normal population. Classification problem: Standards of good classification, procedure of classification based on multivariate normal distributions. Principal components, dimension reduction, canonical variates and canonical correlation — definition, use, estimation and computation.

(xii) Design and Analysis of Experiments: Analysis of variance for one way and two way classifications, Need for design of experiments, basic principle of experimental design (randomization, replication and local control), complete analysis and layout of completely randomized design, randomized block design and Latin square design, Missing plot technique. Split Plot Design and Strip Plot Design. Factorial experiments and confounding in 2 and 3 experiments. Analysis of covariance. Analysis of non-orthogonal data. Analysis of missing data.

(xiii) Operations Research and Reliability: Definition and Scope of Operations Research: phases in Operation Research, models and their solutions, decision-making under uncertainty and risk, use of different criteria, sensitivity analysis. Transportation and assignment problems. Bellman’s principle of optimality, general formulation, computational methods and application of dynamic programming to LPP. Decision-making in the face of competition, two-person games, pure and mixed strategies, existence of solution and uniqueness of value in zero-sum games, finding solutions in 2x2, 2xm and mxn games. Analytical structure of inventory problems, EOQ formula of Harris, its sensitivity analysis and extensions allowing quantity discounts and shortages. Multi-item inventory subject to constraints. Models with random demand, the static risk model. P and Q- systems with constant and random lead times. Queuing models – specification and effectiveness measures. Steady-state solutions of M/M/1 and M/M/c models with associated distributions of queue-length and waiting time. M/G/1 queue and Pollazcek-Khinchine result. Sequencing and scheduling problems. 2-machine n-job and 3-machine n-job problems with identical machine sequence for all jobs Branch and Bound method for solving travelling salesman problem. Replacement problems – Block and age replacement policies. PERT and CPM – basic concepts. Probability of project completion. Reliability concepts and measures, components and systems, coherent systems, reliability of coherent systems. Life-distributions, reliability function, hazard rate, common univariate life distributions – exponential, weibull, gamma, etc. Bivariate exponential distributions. Estimation of parameters and tests in these models. Notions of aging – IFR, IFRA, NBU, DMRL and NBUE classes and their duals. Loss of memory property of the exponential distribution. Reliability estimation based on failure times in variously censored life-tests and in tests with replacement of failed items. Stress-strength reliability and its estimation.

(xiv) Survival Analysis and Clinical Trial: Concept of time, order and random censoring, likelihood in the distributions – exponential, gamma, Weibull, lognormal, Pareto, Linear failure rate, inference for these distribution. Life tables, failure rate, mean residual life and their elementary classes and their properties. Estimation of survival function – actuarial estimator, Kaplan – Meier estimator, estimation under the assumption of IFR/DFR, tests of exponentiality against non-parametric classes, total time on test. Two sample problem – Gehan test, log rank test. Semi-parametric regression for failure rate – Cox’s proportional hazards model with one and several covariates, rank test for the regression coefficient. Competing risk model, parametric and non-parametric inference for this model. Introduction to clinical trials: the need and ethics of clinical trials, bias and random error in clinical studies, conduct of clinical trials, overview of Phase I – IV trials, multicenter trials. Data management: data definitions, case report forms, database design, data collection systems for good clinical practice. Design of clinical trials: parallel vs. cross-over designs, cross-sectional vs. longitudinal designs, review of factorial designs, objectives and endpoints of clinical trials, design of Phase I trials, design of single-stage and multi-stage Phase II trials, design and monitoring of phase III trials with sequential stopping, Reporting and analysis: analysis of categorical outcomes from Phase I – III trials, analysis of survival data from clinical trials.

Or summarily:

Random Variable and Probability

Descriptive Statistics
Discrete and Continuous random variables
Distribution functions ( pmf and pdf)
Distribution functions ( Normal, Chi Square, Student's t, etc)
Central Limit Theorem
Testing of Hypothesis
Non-Parametric Analysis
Multivariate Analysis
Correlation
Regression Analysis
Factor Analysis
Analysis of Variance
Estimation Theory
Time Series Analysis
Index numbers
Sampling Theory
Statistical Quality Control
Demography & Vital Statistics
Operations Research

The lesser talked topics during graduations, but more useful during Governance are :

National Accounts: Definition, Basic Concepts, issues, Strategy, Collection of Data and Release.
Population Census: Need, Data Collected, Periodicity, Methods of data collection, dissemination, Agencies involved.
Surveys/ Indicators: Socio-Economic Surveys, Household Consumer Expenditure Survey, Disability Survey, National Family Health Survey, Tourism Survey, Global Youth Tobacco Survey, Bio-behavioral Survey Like HIV and AIDS survey, Longitudinal Survey, Government Budgeting and Economic Survey, Cost of production surveys, Production estimation surveys of various crops, milk, egg, meat, wool. Labour Survey.
Gender Statistics, Poverty Measurement, Sustainable Development Goals
International Statistical Systems : Role of various agencies
Central Bank (like Reserve Bank of India) Statistics: CRR, SLR, Exchange rates and basket of currencies

Deductive and Inductive Reasoning

In mathematics, we usually apply deductive rules to prove theorems. The deductive reasoning starts from a general truth (rule) and deduce a specific statement. Since general statement leads to specific statements, there is no chance of having uncertainty in the status of specific statement, because status of specific statement totally depends on the general statements. Examples :

(1)- An angle A is acute, if 0 < A < 90; B = 30; then B is acute.

(2)- If it rains, then there are clouds in the sky. If we reason that there are no clouds in the sky that means there is no rain. This is reasoning by making first statement into the contra-positive statement. Abstractly if statement A implies B is true then complement of B implies complement of A will also be true.

(3)- A implies B, and B implies C, then A implies C.

All the above three are deductive reasoning. There is no concept of uncertainty in deductive reasoning because we are going from general to specific, also called top-down approach.

On the contrary, inductive reasoning means reasoning from specific observations to broader generalizations and theories. We call this a “bottom-up” approach ( NOT “bottoms up”). Testing of Hypothesis based on drawn sample from a population (target universe of sampling process) is an inductive statistics science. Similarly estimating a parameter of population based on sample observations is also a science of inductive reasoning. We are trying to know about unknown universe, after observing a part of it. Uncertainty plays a part here. This uncertainty is measured by level of significance, or confidence level, or confidence interval etc.

In inductive reasoning, if we decide something about universe against an alternative decision, then there are chances that we can commit either of two errors. The decision which we have taken is not correct. This is type one error. The alternative decision is correct, but actually it is not correct. This is type 2 error. Probability of type one error is called alpha