Interview Questions on Data Analytics, Statistics, and More
1. What is Data Analytics?
2. What is Statistics?
3. How many types of Statistics?
4. What is Descriptive Statistics?
5. What is Inferential Statistics?
-
You can use inferential statistics to make estimations and test hypotheses about the whole population based on sample data.
-
Sampling Error in Inferential Statistics
There are two types of estimates you can make about a population:
- Point Estimates: The sample mean is a point estimate of the population mean.
- Interval Estimates: Provide a range of values.
- The Confidence Interval is the most common type of interval estimate. Each confidence interval is associated with a confidence level, which tells the probability (in percentage) of the confidence interval.
c. Hypothesis Testing
-
What is Hypothesis Testing?
Hypotheses or predictions are tested using statistical tests. -
Steps in Hypothesis Testing:
- Define and state the null and alternate hypotheses.
- Select the level of significance.
- Identify the test statistic.
- Formulate a decision rule.
- Make a decision and report the results.
-
Things to remember about keywords when defining null and alternate hypotheses:
- z-table and t-table.
d. Statistical Tests
- z-test, t-test for quantitative variables, chi-square for categorical data.
e. Central Limit Theorem
If all samples of a particular size are selected from any population, the sampling distribution of the sample mean will follow a normal distribution.
- Mean = Mode = Median.
f. Obtain z-value for a given Confidence Level
g. Find the Confidence Interval:
( CI = \mu \pm z \times \frac{s}{\sqrt{n}} )
h. Selecting an appropriate sample size:
- The level of confidence desired.
- The margin of error the researcher will tolerate.
- The variation in the population being studied.
Formula:
( n = \left(\frac{z \times \text{standard deviation}}{\text{error}}\right)^2 )
1. Regression
- Regression Analysis in Excel, Explained
- Summary Output
- ANOVA
- Calculate Formula: Simple or multiple linear regression.
- Predict Price of House: (X1 = House Size, X2 = Number of Bedrooms, X3 = Age).
- Visualize Correlation in Graph.
Shell Scripting
-
Why Shell Scripting?
-
Bash Command Line?
-
How to execute shell script?
-
Variables in Shell Script?
-
Read User Input, Arguments?
Steps to Scrape Data from Sources
Model Development
-
What is Machine Learning?
-
Types of Machine Learning?
-
What is a Machine Learning Algorithm?
-
What are Supervised Learning Algorithms?
-
Python Libraries for Machine Learning.
-
Regression using Python
- Visualize, Interpret findings.
Data Visualization
- Given Data
- Coding to visualize the output
Example Problems
1. A survey of 36 randomly selected iPhone owners showed that the purchase price has a mean of $416 with a sample standard deviation of $180.
- Compute the standard error of the sample mean.
- Compute the 95% confidence interval for the mean.
- How large of a sample size is needed to estimate the population mean within a margin of error of $10?
2. CSTAD company manufactures and assembles desks and other office equipment. The weekly production of model C369 desks follows a normal distribution with a mean of 200 and a standard deviation of 16. Recently, new production methods have been introduced, and new employees have been hired. The Vice President of manufacturing would like to investigate whether there has been an increase in the weekly production of the model C369 desk.
- Is the mean number of desks produced at CSTAD different from 200 at the 0.05 significance level?
- Perform hypothesis testing with a sample of 50 weeks where the mean of the sample is 220.
Steps:
- Define the null and alternate hypotheses.
- Select the level of significance.
- Select the test statistic.
- Formulate the decision rule.
- Make a decision and interpret the results.