Question 1: Explain the idea of a “set” and describe common set operations with their main properties.
Meaning of a set
In elementary mathematics and statistics, a set is simply a well–defined collection of distinct objects. The objects inside a set are called its elements or members. For example, the set of working days in a week, the set of students in a class, or the set of prime numbers less than 20.
Ways of representing sets
- Listing (roster) method – We list all elements inside curly brackets. Example: the set of working days can be written as {Mon, Tue, Wed, Thu, Fri}.
- Rule (set–builder) method – We describe the property that all elements share. Example: the same set can be written as {x | x is a working day in a week}.
- Venn diagram – Sets are drawn as closed curves (usually circles) inside a rectangle that represents the “universal set”. Overlaps show common elements, disjoint circles show no common element.
Important set operations
- Union (A ∪ B) – All elements that belong to set A, or to set B, or to both. Example: if A = {1,2,3} and B = {3,4,5}, then A ∪ B = {1,2,3,4,5}.
- Intersection (A ∩ B) – Only those elements which belong to both sets. In the same example, A ∩ B = {3}.
- Difference (A − B) – Elements that are in A but not in B. Here, A − B = {1,2} and B − A = {4,5}.
- Complement (A′) – All elements of the universal set U that do not belong to A. For example, if U is the set of all students in college and A is the set of hostel students, then A′ is the set of day scholars.
- Subset (A ⊆ B) – Every element of A is also an element of B. For instance, the set of even numbers less than 10 is a subset of the set of natural numbers less than 10.
Key properties of set operations
- Commutative – A ∪ B = B ∪ A and A ∩ B = B ∩ A.
- Associative – (A ∪ B) ∪ C = A ∪ (B ∪ C) and similarly for intersection.
- Distributive – A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C); and A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C).
- Identity – A ∪ ∅ = A and A ∩ U = A, where ∅ is the empty set and U is the universal set.
- De Morgan’s laws – (A ∪ B)′ = A′ ∩ B′ and (A ∩ B)′ = A′ ∪ B′. These laws are very useful in probability and logic.
Practical note
In data analysis, sets and their operations help us describe groups of respondents, customers or products. For example, if A is the set of customers who purchased product X and B is the set of those who purchased product Y, then A ∩ B immediately gives us “cross–buyers”.
Question 2: Discuss measures of central tendency for grouped data. Which measure is suitable for finding customers’ preferred toothpaste brand and why?
Central tendency for grouped data
When data are arranged into classes (for example, income groups, age groups, marks intervals), we speak of grouped data. The common measures of central tendency for such data are:
- Arithmetic mean – The “balance point” of the distribution. For grouped data it is calculated using class mid–points and class frequencies. It uses all observations and is very useful for further algebraic work, but it can be influenced by extreme values.
- Median – The value that divides the ordered data into two equal halves. For grouped data, we find the class in which the middle observation lies and then use the median formula. It is less affected by outliers and suitable for skewed distributions (for example, income data).
- Mode – The most frequently occurring value or class. For grouped data, the modal class is the class with the highest frequency. A formula using the frequencies of the modal class and neighbouring classes gives a more accurate value.
Choosing a measure in a brand–choice study
Suppose a seller wants to know which toothpaste brand people in one colony prefer. Responses might look like: Brand A, Brand B, Brand C, etc. These are categories, not numerical values.
- We cannot meaningfully compute a mean or median of “A, B, C”.
- What matters here is: which brand is chosen by the largest number of people?
Therefore, the appropriate measure of central tendency in this situation is the mode. The brand name with the highest frequency in the survey represents the “typical” or most preferred choice. In marketing practice, brand managers often keep a close eye on how the modal brand changes across time or across neighbourhoods.
Question 3: Explain Pearson’s product–moment correlation coefficient and find the correlation between number of cars sold and revenue for three companies.
Idea of Pearson’s correlation
Pearson’s product–moment correlation coefficient (usually written as r) measures the strength and direction of the linear relationship between two quantitative variables. Its value lies between −1 and +1.
Formula
For a sample of paired observations \((x_i, y_i)\)
$$ r = \frac{\sum (x_i – \bar{x})(y_i – \bar{y})} {\sqrt{\Bigl[\sum (x_i – \bar{x})^2\Bigr]\Bigl[\sum (y_i – \bar{y})^2\Bigr]}} $$
Here, x̄ and ȳ are the sample means of the two variables.
Important properties
- -1 ≤ r ≤ +1.
- If r > 0, there is a positive relationship (when one variable increases, the other tends to increase); if r < 0, there is a negative relationship.
- If r = +1 or r = -1, there is a perfect linear relation; r = 0 suggests no linear association.
- r is unit-free; changing units (for example, from rupees to thousands of rupees) does not affect its value.
- It is sensitive to outliers; a single extreme value can strongly change r.
Numerical example: cars and revenue
Data (number of cars sold in thousands and revenue in ₹ crores):
- Company A: x = 63, y = 13
- Company B: x = 29, y = 8
- Company C: x = 28, y = 9
First, compute the means:
x̄ = (63 + 29 + 28) / 3 = 40
ȳ = (13 + 8 + 9) / 3 = 10
Now calculate deviations and products:
- For A: (x − x̄) = 23, (y − ȳ) = 3, product = 69
- For B: (x − x̄) = −11, (y − ȳ) = −2, product = 22
- For C: (x − x̄) = −12, (y − ȳ) = −1, product = 12
So,
Σ(x − x̄)(y − ȳ) = 69 + 22 + 12 = 103
Σ(x − x̄)² = 23² + (−11)² + (−12)² = 529 + 121 + 144 = 794
Σ(y − ȳ)² = 3² + (−2)² + (−1)² = 9 + 4 + 1 = 14
Therefore,
$$ r = \frac{103}{\sqrt{794 \times 14}} \approx \frac{103}{105.1} \approx 0.98 $$
Interpretation
The coefficient is close to +1, which indicates a very strong positive linear relationship between cars sold and revenue for these three companies. In other words, higher car sales are strongly associated with higher revenue, which is what we would expect in a real automobile business.
Question 4: What is bivariate analysis? How can a teacher examine whether using a smart board improves students’ marks?
Meaning of bivariate analysis
When we analyse the relationship between two variables at the same time—such as income and expenditure, age and blood pressure, or study hours and marks—we are doing bivariate analysis. It helps us to see whether changes in one variable are associated with changes in another.
Example context
A teacher wants to check whether using a smart board in class actually helps to improve students’ marks in an examination.
Suggested procedure
- Define variables – One variable is the teaching method (traditional vs. smart board). The other variable is students’ marks.
- Design the study – A simple design is a before–after study with the same group of students:
- Conduct a test after teaching a unit using the traditional method.
- Later, teach another comparable unit using a smart board and conduct a similar test.
- Collect data – Record test scores for each student in both conditions and any important background variables (for example, past performance) that might influence marks.
Statistical techniques
- Paired comparison – If the same students are tested twice (before and after smart board use), compute the difference in marks for each student and analyse the average difference. A paired t–test can be used to check whether the improvement is statistically significant.
- Two–group comparison – If there are two different sections (control and smart–board section), compare their mean marks using an independent–samples t–test or even a simple comparison of means with a suitable graph.
- Graphical tools – Boxplots or side–by–side bar charts of average marks for the two methods help in communicating the findings to colleagues or the school administration.
Practical note
In real classrooms, it is important to control for other factors as far as possible—same teacher, similar syllabus, and tests of comparable difficulty. Otherwise the observed difference in marks may not be due only to the smart board.
Question 5: Describe the procedure for constructing a composite index using the simple ranking method.
Why composite indices?
Often we want a single number that summarises performance using several indicators—for example, ranking districts by “quality of life” using data on income, literacy, health and housing. A composite index brings these different indicators together.
Steps in the simple ranking method
- Step 1 – Choose units and indicators – Decide which units (states, districts, firms, households) and which indicators (for example, literacy rate, infant mortality rate, per capita income) will be used.
- Step 2 – Decide the direction of impact – For some indicators, higher values are “good” (literacy, income); for others, lower values are “good” (infant mortality, crime rate). This will affect ranking.
- Step 3 – Rank each unit on each indicator –
- If higher is better, give rank 1 to the highest value, rank 2 to the next, and so on.
- If lower is better, reverse the ranking (lowest value gets rank 1).
- Step 4 – Combine the ranks – For each unit, sum the ranks across all indicators. Some analysts prefer to take the average rank (sum divided by number of indicators).
- Step 5 – Form the composite index – The unit with the lowest total (or average) rank is considered the best. You may also convert the ranks into a 0–100 scale if a percentage–like index is required.
- Step 6 – Interpret results – Comment on which units are performing better or worse overall and on which indicators they are strong or weak. This helps policy makers to prioritise interventions.
Real–life use
Many government and NGO reports use this method for quick comparative assessments—for example, ranking schools, health centres or districts. The method is simple, transparent and easy to explain to non–specialists, though it does not use the actual distances between values (only their order).
Question 6: Discuss common methods of analysing qualitative data.
Nature of qualitative data
Qualitative data come from interviews, focus group discussions, open–ended survey questions, observation notes, diaries and similar sources. They consist of words, images and stories rather than numbers.
Typical analytical approaches
- Thematic analysis – Collect all responses, read them several times, and identify recurring ideas or patterns. Codes (short labels) are attached to sentences or paragraphs—such as “fear of failure”, “support from family”, “lack of facilities”. These codes are then grouped into larger themes.
- Content analysis – Used when dealing with large volumes of texts such as newspaper articles, policy documents or social media posts. The researcher counts how many times certain categories or ideas appear, sometimes converting them into simple statistics or graphs.
- Narrative analysis – Here the focus is on people’s life stories or case histories—how they describe events, turning points and meanings. This is often used in psychological and sociological research.
- Framework or matrix–based analysis – Key themes are listed in rows and individual cases or groups are put in columns. Evidence from transcripts is summarised in the cells of this matrix. This helps in systematically comparing different participants.
- Grounded theory approach – Instead of starting with a fixed theory, the researcher lets patterns “emerge” from the data. Through repeated coding, categorising and comparison, a conceptual model is gradually built.
Practical considerations
Good qualitative analysis requires careful documentation: saving audio files, keeping field notes, writing memos on emerging ideas and being transparent about how categories were developed. Software packages (for example, NVivo, Atlas.ti) can help manage and code large data sets, but the thinking still has to be done by the researcher.
Question 7: Why is classification of data necessary? Explain with examples different types of classification.
Need for classification
- Raw data collected from surveys or experiments are often long, unorganised lists. Classification groups similar observations together so that patterns can be seen.
- It simplifies complex information and allows meaningful summaries like tables, graphs and averages.
- It helps in comparison between groups—for example, urban vs. rural, male vs. female, different income groups.
Main types of classification
- Chronological (time–based) classification – Data are arranged according to time: years, months, quarters or hours. Example: annual rainfall in a district from 2015 to 2024, or monthly sales of a shop.
- Geographical (spatial) classification – Arrangement by place: countries, states, districts, villages. Example: literacy rates in different states, number of tourists in different cities.
- Qualitative classification – Grouping based on attributes or qualities that cannot be measured on a numeric scale. Example: classifying workers by gender (male/female), education level (illiterate, primary, secondary, graduate) or occupation (farmer, labourer, service, business).
- Quantitative classification – Grouping based on measurable characteristics such as income, age, marks or height. The variable is usually divided into class intervals such as “income ₹10,000–₹20,000”, “₹20,000–₹30,000” etc.
Experience from practice
In actual project reports, a mix of these classifications is often used—for example, showing monthly sales figures for different regions, which combines time and place. Well–thought–out classification makes the rest of the analysis much easier and clearer.
Question 8: What is a hypothesis? Outline the main steps involved in hypothesis testing.
Meaning of hypothesis
A hypothesis is a tentative statement about a relationship between variables, which we want to test using data. For example: “Average monthly expenditure on medicines in urban households is higher than in rural households” or “There is no association between gender and internet usage among students.”
Key steps in hypothesis testing
- Step 1 – Formulate null and alternative hypotheses – The null hypothesis (H0) usually states that there is no effect or no difference (for example, “mean urban expenditure = mean rural expenditure”). The alternative hypothesis (H1) represents the claim we want to support (for example, “urban mean > rural mean”).
- Step 2 – Choose level of significance – Decide the probability of making a “type I error”, that is, rejecting H0 when it is actually true. Common choices are 5% or 1%.
- Step 3 – Select an appropriate test statistic – Depending on the type of data and sample size, we select a z–test, t–test, chi–square test, F–test, etc. The test statistic has a known sampling distribution under H0.
- Step 4 – Determine the decision rule – Using statistical tables or software, find the “critical value(s)” corresponding to the chosen significance level. This defines the rejection region for H0.
- Step 5 – Compute the test statistic from sample data – Substitute the sample values into the formula and obtain the calculated value of the test statistic.
- Step 6 – Take a decision – Compare the calculated value with the critical value:
- If the calculated value falls in the rejection region, reject H0 and conclude that the data support H1.
- Otherwise, do not reject H0; the sample does not provide enough evidence against it.
- Step 7 – Interpret in simple language – Translate the statistical decision into everyday words: for example, “At the 5% level of significance we find sufficient evidence that urban households spend more on medicines than rural households.”
Question 9: Describe important methods of collecting primary data.
Primary data and its importance
Primary data are data collected directly from the original source for the specific purpose of the study—for example, a fresh survey of farmers on fertiliser usage. They are often more accurate and relevant than secondary data, but costlier and more time–consuming to collect.
Main methods
- Direct personal interviews – The investigator meets respondents face–to–face and asks questions. This method permits clarification of doubts and observation of non–verbal behaviour, but it is expensive in terms of time and travel.
- Telephone or online interviews – Useful when respondents are geographically scattered. Costs are lower, but it may be difficult to build rapport; people may also refuse unknown calls.
- Questionnaire by post or online form – A list of questions is sent to respondents who fill it in themselves and return it. This works well when respondents are literate and motivated. It is economical, but non–response can be high and clarifying doubts is difficult.
- Observation – Information is obtained by watching people or events directly, with or without their knowledge. For example, counting how many customers actually pick up a new product from the shelf. It is useful for recording actual behaviour, not just what people say.
- Experiments and field trials – The researcher deliberately changes some conditions (for example, price, packing, teaching method) and observes the effect on outcomes (sales, test scores). Experiments can establish cause–effect relations but require careful design.
- Focus group discussions – Small groups (usually 6–10 people) discuss a topic under the guidance of a moderator. This is useful in exploring attitudes and suggestions before designing a large–scale survey.
Practical note
In real projects, researchers often combine methods—for example, a few focus groups to refine questions, followed by a structured questionnaire survey, plus some observation in the field.
Question 10: Explain how you would construct a histogram using hypothetical data.
What is a histogram?
A histogram is a graphical representation of the distribution of a continuous variable. It looks like a series of adjacent rectangles (bars) with no gaps between them, where the area of each bar is proportional to the frequency of the class interval it represents.
Steps to prepare a histogram (with example)
- Step 1 – Collect and group data – Suppose we have marks of 50 students in a test out of 100. We decide class intervals such as: 0–20, 20–40, 40–60, 60–80, 80–100. Count how many students fall in each interval. Assume the frequencies are: 2, 8, 20, 15, 5.
- Step 2 – Draw axes – On squared paper or in software, draw horizontal (X–axis) and vertical (Y–axis) lines. Put class boundaries (0, 20, 40, 60, 80, 100) on the X–axis and frequencies on the Y–axis.
- Step 3 – Choose a suitable scale – For example, on the Y–axis, 1 cm may represent 5 students. This should allow the highest frequency (20) to fit comfortably.
- Step 4 – Draw rectangles – For each class interval:
- Draw a rectangle with its base covering the class interval on the X–axis.
- The height of the rectangle should correspond to its frequency (for instance, height for 40–60 should represent 20 students).
- Step 5 – Label and title – Give an informative title like “Histogram of Test Marks”, label the axes (Marks, Number of students) and mention the class intervals and scale clearly.
Interpretation
Looking at the shape of the histogram, you can comment whether the marks are concentrated in the middle (bell–shaped), skewed to one side, or show multiple peaks. In school and college examinations, histograms are often used to identify whether a test was too easy or too difficult.
Question 11: State the main properties of the normal distribution.
The normal distribution in brief
The normal distribution is a continuous probability distribution that appears very frequently in statistics—for example, in the distribution of heights, measurement errors, and many sample means. Its graph is the familiar “bell–shaped curve”.
Key properties
- Bell–shaped and symmetric – The curve is single–peaked and perfectly symmetric about its centre.
- Mean, median and mode coincide – All three measures of central tendency have the same value at the centre of the distribution.
- Defined by two parameters – Any normal distribution is completely specified by its mean μ and standard deviation σ.
- Total area under the curve equals 1 – This means the probability of the variable taking some value in the entire real line is 1.
- Asymptotic tails – The curve approaches the horizontal axis as we move away from the mean, but never actually touches it.
- Empirical rule (68–95–99.7 rule) – Approximately 68% of observations lie within μ ± 1σ, about 95% within μ ± 2σ, and about 99.7% within μ ± 3σ.
- Standard normal form – If we convert a normal variable (X) to $$ Z = \frac{X – \mu}{\sigma}, $$ the new variable (Z) has mean 0 and standard deviation 1. Tables for the standard normal distribution are widely used for probability calculations and confidence intervals.
Practical importance
Because many statistical procedures (for example, confidence intervals and hypothesis tests) are based on normality assumptions, understanding these properties is essential for data analysts.
Question 12: Short notes on (1) coefficient of determination, (2) sources of secondary data, (3) focus group discussions, (4) characteristics of a good questionnaire, (5) non–sampling errors.
1- Coefficient of determination
The coefficient of determination, denoted by R2, shows what proportion of the variation in one variable is “explained” by its linear relationship with another variable. If R2 = 0.80, we can say that 80% of the variation is explained by the model and 20% remains unexplained. It is the square of Pearson’s correlation coefficient in the case of simple linear regression.
2- Sources of secondary data
Secondary data are data that have already been collected by someone else for a different purpose, but which can be reused. Common sources include:
- Government publications (census, national sample surveys, economic surveys).
- Reports of international organisations (World Bank, UN agencies, OECD).
- Company annual reports and industry associations’ publications.
- Research reports, theses and academic journals.
- Digital databases and websites maintained by statistical offices and regulatory bodies.
Secondary data save time and money, but the researcher must judge their reliability, coverage and relevance carefully.
3- Focus group discussions
A focus group discussion is a guided conversation with a small group of participants (often 6–10) about a specific topic. A trained moderator asks open–ended questions and encourages participants to talk to each other. This method is useful for exploring attitudes, language people use, and the range of opinions before designing a questionnaire. For example, a company may conduct focus groups with college students to understand their expectations from a new mobile app.
4- Characteristics of a good questionnaire
- Questions are clear, simple and free from technical jargon.
- Each question asks about only one issue at a time (no “double–barrelled” questions).
- The sequence moves smoothly from easy, general questions to more specific or sensitive ones.
- Response options are exhaustive and mutually exclusive where appropriate.
- Layout is neat, with adequate space for answers and clear instructions.
- Questions are pre–tested on a small group so that confusing wording can be corrected.
5- Non–sampling errors
Non–sampling errors are mistakes that can occur in any survey, whether or not a sample is used. They may arise at any stage—data collection, processing or analysis. Examples include:
- Coverage error – Some units are not included in the list from which the sample is drawn (for example, households in slums missing from the sampling frame).
- Non–response error – Selected persons do not respond or refuse to be interviewed, and their behaviour differs from those who respond.
- Measurement error – Due to poorly worded questions, interviewer bias, or respondents’ memory lapses and social desirability bias.
- Processing error – Mistakes during coding, data entry or analysis.
Careful planning, training of field staff, pre–testing tools and systematic quality checks are essential to minimise non–sampling errors in real–world surveys.
These solutions have been prepared and corrected by subject experts using the prescribed IGNOU study material for this course code to support your practice and revision in the IGNOU answer format.
Use them for learning support only, and always verify the final answers and guidelines with the official IGNOU study material and the latest updates from IGNOU’s official sources.