Hypothesis Testing is an essential part of statistics for Data Analytics, hypothesis testing is a statistical method used to make decisions based on statistics based on experimental data. In simple terms, it can also be called an educated guess or a claim or statement about a property or population.

**The goal of Hypothesis Testing:**

To analyze a sample in an attempt to distinguish between population characteristics, that are likely to occur and population characteristics that are unlikely to occur.

**Key terms and concepts-**

**Null Hypothesis H****0****:**

A null hypothesis is the status quo.

A general statement or default position that there is no relationship between two measured phenomena or no association among groups.

**Alternate Hypothesis H****1**:

The alternative hypothesis is the hypothesis contrary to the Null Hypothesis.

It is usually taken to be that the observations are the result of a real effect.

**Null and Alternative Hypotheses Examples:**

Industry | Null Hypothesis | Alternate Hypothesis |

Process Industry | Shop Floor Manager in Dairy Company feels that the Milk Packaging Process unit for 1 Litre Packs is working fine and does not need any calibration. SD = 10 ml Null Hypothesis : µ = 1 | Alternate Hypothesis: µ ≠ 1 |

Credit Risk | Credit Team of a Bank has been taking lending decisions based on in-house developed Credit Scorecard. Their claim to fame in the organization is their scorecard has helped reduce NPAs by at least 0.25%
Null Hypothesis: Scorecard has not helped in reducing NPA level π (scorecard NPA) – π (No scorecard NPA) = 0.25% |
Alternate Hypothesis : π (scorecard NPA) – π (No scorecard NPA) > 0.25% |

Motor Industry | An Electric Car manufacturer claims their newly launched e-car gives an average mileage of 125 MPGe (Miles per Gasoline Equivalent)
Null Hypothesis : µ = 125 |
Alternate Hypothesis: µ < 125 |

**Type I and Type II Error**

Null Hypothesis | True | FALSE |

Reject | Type I Error (α) | No error |

Accept | No error | Type II Error (β) |

I reject the Null hypothesis when it is True. This is Type 1 Error

Eg: A manufacturer’s Quality Control department rejects a lot when it has met the market acceptable quality level. This is Producer’s Risk

** Type I and Type II Error**

Null Hypothesis | True | FALSE |

Reject | Type I Error (α) | No error |

Accept | No error | Type II Error (β) |

I do not reject (Accept) the Null Hypothesis when it is False. This is Type II Error

E.g. A Consumer accepts a lot when it is faulty. This is Consumer’s Risk

**Type I and Type II Error**

**P value:**

In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was observed, assuming that the null hypothesis is correct.

**Significance value:**

The probability of rejecting the null hypothesis is when it is called the significance level α

**Note:** If the P value is equal to or smaller than the significance level(α), it suggests that the observed data is inconsistent with the assumption that the null hypothesis is correct and thus this hypothesis must be rejected (but this does not automatically mean the alternative hypothesis can be accepted as true).

α is the probability of Type l error and β is the probability of Type ll error.

The experimenters (you and I) have the freedom to set the α level for a particular hypothesis test.

That level is called the level of significance for the test. Changing α can (and often does) affect the results of the test- whether you reject or fail to reject H0.

As α increases, β decreases and vice versa.

The only way to decrease both α and β is to increase the sample size. To make both quantities equal to zero, the sample size would have to be infinite you would have to sample the entire population.

**Possible Errors while HT**

The confidence coefficient (1-α) is the probability of not rejecting H0 when it is True.

The Confidence level of a hypothesis test is (1-α) * 100%.

The power of a statistical test (1-β) is the probability of rejecting H0 when it is false.

**How to correct errors in HT**

Steps to follow:

Determine a P-value when testing a Null Hypothesis

If the alternative hypothesis is the less than the alternative, you reject H0 only if the test statistic falls in the left tail of the distribution (below 2). Similarly, if Ha is higher than the alternative, you reject the H0 only if the test statistic falls in the right tail (above 2)

**Upper tailed, Lower Tailed, Two-Tailed Tests**

H1: µ > µ0, where µ0 is the comparator or null value and an increase is hypothesized –this type of test is called an upper-tailed test.

H1: µ < µ0, where a decrease is hypothesized and this type of test is called a lower-tailed test.

H1: µ ≠ µ0, where a difference is hypothesized and this type of test is called a two-tailed test.

**Types of T-test in Hypothesis Test:**

**One sample t-test.**

one -sample t-test is used to compare the mean of a population to a specified theoretical mean µ

**Unpaired two-sample t-test (Independent t-test)**

Independent (or unpaired two sample) t-test is used to compare the means of two unrelated groups of samples.

**Paired t-test**

Paired Student’s t-test is used to compare the means of two related samples. That is when you have two values (a pair of values) for the same samples.

**Let’s Look at some Case studies:**

t-Test Application One SampleExperience Marketing Services reported that the typical American spends a mean of 144 minutes (2.4 hours) per day accessing the Internet via a mobile device. (Source: The 2014 Digital Marketer, available at ex.pn/1kXJifX.) To test the validity of this statement, you select a sample of 30 friends and family. The result for the time spent per day accessing the Internet via a mobile device (in minutes) are stored in Internet_Mobile_Time.csv file.

Is there evidence that the populations mean time spent per day accessing the Internet via a mobile device is different from 144 minutes? Use the p-value approach and a level of significance of 0.05

What assumption about the population distribution is needed to conduct the test in A?

Solution In R

> setwd("D:/Hypothesis") > Mydata=read.csv("InternetMobileTime.csv", header = TRUE)

> attach(mydata) > xbar=mean(Minutes) > s=sd(Minutes) > n=length(Minutes) > Mu=144 #null hypothesis > tstat=(xbar-Mu)/(s/(n^0.5)) > tstat

[1] 1.224674

> Pvalue=2*pt(tstat, df=n-1, lower=FALSE) > Pvalue

[1] 0.2305533

`> if(Pvalue<0.05)NullHypothesis else "Accepted"`

[1] “Accepted”

**Independent t-test two sample**

A hotel manager looks to enhance the initial impressions that hotel guests have when they check in. Contributing to initial impressions is the time it takes to deliver a guest’s luggage to the room after check-in. A random sample of 20 deliveries on a particular day was selected each from Wing A and Wing B of the hotel. The data collated is given in Luggage.csv file. Analyze the data and determine whether there is a difference in the mean delivery times in the two wings of the hotel. (use alpha = 0.05).

Solution In R

`> t.test(WingA,WingB, var.equal = TRUE, alternative = "greater")`

Two Sample t-test

data: WingA and WingB

t = 5.1615,

df = 38,

p-value = 4.004e-06

alternative hypothesis: true difference in means is greater than 0

95 percent confidence interval:

1.531895 Inf

sample estimates:

mean of x mean of y

10.3975 8.1225

`> t.test(WingA,WingB)`

welch Two Sample t-test

data: WingA and WingB

t = 5.1615, df = 37.957, p-value = 8.031e-06

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

1.38269 3.16731

sample estimates:

mean of x mean of y

10.3975 8.1225

```
> boxplot(WingA,WingB, col = c("Red","Pink"), horizontal = TRUE)
```

Case Study- Titan Insurance Company

The Titan Insurance Company has just installed a new incentive payment scheme for its lift policy salesforce. It wants to have an early view of the success or failure of the new scheme. Indications are that the sales force is selling more policies, but sales always vary in an unpredictable pattern from month to month and it is not clear that the scheme has made a significant difference.

Life Insurance companies typically measure the monthly output of a salesperson as the total sum assured for the policies sold by that person during the month. For example, suppose salesperson X has, in the month, sold seven policies for which the sums assured are £1000, £2500, £3000, £5000, £10000, £35000. X’s output for the month is the total of these sums assured, £61,500. Titan’s new scheme is that the sales force receives low regular salaries but are paid large bonuses related to their output (i.e. to the total sum assured of policies sold by them). The scheme is expensive for the company, but they are looking for sales increases which more than compensate. The agreement with the sales force is that if the scheme does not at least break even for the company, it will be abandoned after six months.

The scheme has now been in operation for four months. It has settled down after fluctuations in the first two months due to the changeover.

To test the effectiveness of the scheme, Titan has taken a random sample of 30 salespeople measured their output in the penultimate month before changeover and then measured it in the fourth month after the changeover (they have deliberately chosen months not too close to the changeover). The outputs of the salespeople are shown in Table 1

SALESPERSON | Old_Scheme | New_Scheme |

1 | 57 | 62 |

2 | 103 | 122 |

3 | 59 | 54 |

4 | 75 | 82 |

5 | 84 | 84 |

6 | 73 | 86 |

7 | 35 | 32 |

8 | 110 | 104 |

9 | 44 | 38 |

10 | 82 | 107 |

11 | 67 | 84 |

12 | 64 | 85 |

13 | 78 | 99 |

14 | 53 | 39 |

15 | 41 | 34 |

16 | 39 | 58 |

17 | 80 | 73 |

18 | 87 | 53 |

19 | 73 | 66 |

20 | 65 | 78 |

21 | 28 | 41 |

22 | 62 | 71 |

23 | 49 | 38 |

24 | 84 | 95 |

25 | 63 | 81 |

26 | 77 | 58 |

27 | 67 | 75 |

28 | 101 | 94 |

29 | 91 | 100 |

30 | 50 | 68 |

Data preparation

Since given data are in 000, it will be better to convert them in thousands.

Problem 1

Describe the five percent significance test you would apply to these data to determine whether the new scheme has significantly raised outputs? What conclusion does the test lead to?

Solution:

It is asked that whether the new scheme has significantly **raised** the output, it is an example of the one-tailed t-test.

**Note:*** Two-tailed test could have been used if it was asked “new scheme has significantly **changed** the output”*

Mean of amount assured before the introduction of scheme = 68450

Mean of amount assured after the introduction of scheme = 72000

Difference in mean = 72000 – 68450 = 3550

Let,

μ1 = Average sums assured by salesperson BEFORE changeover. μ2 = Average sums assured by salesperson **AFTER** changeover.

H0: μ1 = μ2 ; μ2 – μ1 = 0

HA: μ1 < μ2 ; μ2 – μ1 > 0 ; true difference of means is greater than zero.

Since population standard deviation is unknown, **paired sample t-test** will be used.

Since p-value (=0.06529) is higher than 0.05, we accept (fail to reject) NULL hypothesis. ** The new scheme has NOT significantly raised outputs**.

**Problem 2**

Suppose it has been calculated that for Titan to break even, the average output must increase by £5000. If this figure is an alternative hypothesis, what is:

(a) The probability of a type 1 error?

(b) What is the p-value of the hypothesis test if we test for a difference of $5000?

(c) Power of the test:

Solution:

**2.a. The probability of a type 1 error?**

Solution: Probability of Type I error = significant level = 0.05 or 5%

**2.b. What is the p-value of the hypothesis test if we test for a difference of $5000? **

Solution:

Let μ2 = Average sums assured by salesperson AFTER changeover.

μ1 = Average sums assured by salesperson BEFORE changeover.

μd = μ2 – μ1 H0: μd ≤ 5000 HA: μd > 5000

This is a right tail test.

R code:

**P-value = 0.6499**

**2.c. Power of the test.**

Solution:

Let μ2 = Average sums assured by salesperson AFTER changeover. μ1 = Average sums assured by salesperson BEFORE changeover. μd = μ2 – μ1 H0: μd = 4000

HA: μd > 0

H0 will be rejected if test statistics > t_critical.

With α = 0.05 and df = 29, critical value for t statistic (or t_critical ) will be 1.699127.

Hence, H0 will be rejected for test statistics ≥ 1.699127.

Hence, H0 will be rejected if for 𝑥̅ ≥ 4368.176

**Graphically,**

R Code:

Probability (type II error) is P(Do not reject H0 | H0 is false)

Our NULL hypothesis is TRUE at μd = 0 so that H0: μd = 0 ; HA: μd > 0

Probability of type II error at μd = 5000

= P (Do not reject H0 | H0 is false)

= P (Do not reject H0 | μd = 5000) = P (𝑥̅ < 4368.176 | μd = 5000)

= P (t < | μd = 5000)

= P (t < -0.245766)

= 0.4037973

R Code:

Now, β=0.5934752,

Power of test = 1- β = 1- 0.5934752

= **0.4065248**

Conclusion:

Hypothesis testing should always explain what you expect to happen. It contains both an independent and dependent variable. It should be testable and measurable. It may or may not be correct. You have to ascertain the truth of the hypothesis by using Hypothesis Testing.

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps.

Formulate the null hypothesis(commonly, that the observations are the result of pure chance) and the alternate hypothesis(commonly, that the observations show a real effect combined with a component of chance variation).

Identify a test statistic that can be used to assess the truth of the null hypothesis

Compute the p-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis was correct. The smaller the p-value, the stronger the evidence against the null hypothesis.

Compare the p-value to an acceptable significance value a(sometimes called an alpha value). If p <= a, then the observed effect is statistically significant, i.e., the null hypothesis is ruled out, and the alternative hypothesis is valid.