Statistics for Social Understanding

With Stata and SPSS

By Nancy E. Whittier; Tina Wildhagen and Howard J. Gold

Glossary

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

A

aggregate data - Data for which the unit of measurement is the group, not the individual; for example, counties, countries, or organizations.

B

bar graph - A graphical representation of frequencies or percentages for each category of a nominal- or ordinal-level variable.

big data - Data that emerge as a by-product of the electronic tracking of people’s behavior online and in the real world.

C

case - A single member of a data set; each individual or group under study.

closed-ended survey item - Survey item that provides respondents with predefined response categories.

cluster sample - A method of sampling where one randomly samples clusters of cases instead of individuals and then randomly samples individuals from within these clusters.

codebook - Provides essential information about each variable in a data set.

continuous variable - A variable with values that can be continually sub-divided.

cumulative percentage - The percentage of cases that are equal to or lower than a particular value for a variable.

D

descriptive statistics - Statistical techniques for describing the patterns found in a set of data.

discrete variable - A variable measured in whole numbers that cannot be broken down further.

E

ecological fallacy - The error of drawing inferences about individuals based on the groups to which they belong.

experimental control - The random assignment of research participants to treatment and control groups to ensure that participants in one group are not systematically different from those in the other group.

F

frequency - The number of cases in a sample falling into each category of a variable.

frequency distributions - A table that represents the frequencies, percentages, and cumulative percentages for each category of a variable.

frequency polygon - A graphical representation of the distribution of an interval-ratio-level variable that connects a line through the midpoint for each range of values for the variable.

G

growth mindset - An approach that views intelligence as something that develops over time through hard work and effort.

H

histograms - A graphical representation of the distribution of an interval-ratio-level variable.

I

inferential statistics - Statistics that examine whether information from a sample can be generalized to a population.

interval-level variable - A numerical variable where the distance between each consecutive value of the variable is identical, with no true value of zero.

interval-ratio variable - A numerical or quantitative variable where the distance between each consecutive value of the variable is identical; either an interval-level or ratio-level variable.

L

level of measurement - Refers to whether a variable’s values are nominal, ordinal, or interval-ratio; determines what statistical techniques can be applied to variables.

M

multistage cluster sampling - A form of cluster sampling in which the random selection of clusters passes through several stages before selecting a random sample of individuals.

N

nominal-level variable - variable that is not numerical and whose categories cannot be rank-ordered.

non-probability sample - A sample in which cases are self-selected or are not drawn randomly.

nonresponse bias - A form of bias occurring when individuals who are invited to take a survey vary systematically in the likelihood that they will complete the survey.

O

open-ended survey item - A survey item that does not provide respondents with response categories.

ordinal-level variable - A variable with values that can be rank-ordered but that are not numerical and where the distance between each value of the variable is not identical.

P

percent change - A method of understanding the magnitude of a change in percentage over time.

percentage - A standardized version of frequency that divides the number of cases in each category of a variable by the overall number of cases and multiplies by 100.

percentile - The position of any given case relative to the overall distribution for a variable, expressed in a percentage rank.

pie chart - A graphical representation of frequencies or percentages for each category of a nominal- or ordinal-level variable.

population - Every individual or case in a category of interest.

probability sample - A sample in which every member of the population has an equal probability of being selected for the sample and the selection of cases from the population is made randomly.

R

random sampling - Sampling method in which every member of a population has an equal probability of being selected for the sample.

rate - The frequency of an event or outcome relative to the number of times that the event or outcome could have occurred in a given group.

ratio - The size of one category relative to another.

raw frequency - The number of cases in each category of a variable.

recoding - Combining or collapsing categories of a variable, using statistical software.

relative frequency - The size of each response category relative to the overall number of cases; expressed as a percentage or proportion.

reliability - The extent to which the values of a variable are unaffected by the measurement process or instrument.

research question - A question, answerable with data, that asks how two or more variables are related.

S

sample - A group of individuals or cases drawn from the larger population of interest.

sampling frame - A full list of the members of a population.

scale - An ordinal-level variable that asks respondents to place themselves somewhere on a continuum, such as ranging from “strongly agree” to “strongly disagree.”

secondary data - Data that have been collected previously, usually by someone else and often for a purpose that differs from an individual researcher’s.

simple random sample - A method of sampling that starts with a list of all members of a population and randomly draws a desired number of them.

stem-and-leaf plot - A graphical representation of the distribution of an interval-ratio-level variable and the specific value for each case in the data set.

stratified random sample - A method of sampling that allows the researcher to randomly sample from subgroups in a population to ensure that the sample is representative of population subgroups that are of interest.

T

time series chart - A graphical representation of the change in a variable over time.

U

unit of analysis - The object of study, either individuals or groups.

V

validity - The extent to which variables actually measure what they claim to measure.

Statistics for Social Understanding cover