Related Information:Research Glossary
Descriptive research is a type of research that is used to describe the characteristics of a population. It collects data that are used to answer a wide range of what, when, and how questions pertaining to a particular population or group. For example, descriptive studies might be used to answer questions such as: What percentage of Head Start teachers have a bachelor's degree or higher? What is the average reading ability of 5-year-olds when they first enter kindergarten? What kinds of math activities are used in early childhood programs? When do children first receive regular child care from someone other than their parents? When are children with developmental disabilities first diagnosed and when do they first receive services? What factors do programs consider when making decisions about the type of assessments that will be used to assess the skills of the children in their programs? How do the types of services children receive from their early childhood program change as children age?
Descriptive research does not answer questions about why a certain phenomenon occurs or what the causes are. Answers to such questions are best obtained from randomized and quasi-experimental studies. However, data from descriptive studies can be used to examine the relationships (correlations) among variables. While the findings from correlational analyses are not evidence of causality, they can help to distinguish variables that may be important in explaining a phenomenon from those that are not. Thus, descriptive research is often used to generate hypotheses that should be tested using more rigorous designs.
A variety of data collection methods may be used alone or in combination to answer the types of questions guiding descriptive research. Some of the more common methods include surveys, interviews, observations, case studies, and portfolios. The data collected through these methods can be either quantitative or qualitative. Quantitative data are typically analyzed and presenting using descriptive statistics. Using quantitative data, researchers may describe the characteristics of a sample or population in terms of percentages (e.g., percentage of population that belong to different racial/ethnic groups, percentage of low-income families that receive different government services) or averages (e.g., average household income, average scores of reading, mathematics and language assessments). Quantitative data, such as narrative data collected as part of a case study, may be used to organize, classify, and used to identify patterns of behaviors, attitudes, and other characteristics of groups.
Descriptive studies have an important role in early care and education research. Studies such as the National Survey of Early Care and Education and the National Household Education Surveys Program have greatly increased our knowledge of the supply of and demand for child care in the U.S. The Head Start Family and Child Experiences Survey and the Early Childhood Longitudinal Study Program have provided researchers, policy makers and practitioners with rich information about school readiness skills of children in the U.S.
Each of the methods used to collect descriptive data have their own strengths and limitations. The following are some of the strengths and limitations of descriptive research studies in general.
Researchers conduct experiments to study cause and effect relationships and to estimate the impact of child care and early childhood programs on children and their families. There are two basic types of experiments:
An experiment is a study in which the researcher manipulates the treatment, or intervention, and then measures the outcome. It addresses the question “if we change X (the treatment or intervention), what happens to Y (the outcome)?” Conducted both in the laboratory and in real life situations, experiments are powerful techniques for evaluating cause-and-effect relationships. The researcher may manipulate whether research subjects receive a treatment (e.g., attendance in a Head Start program: yes or no) or the level of treatment (e.g., hours per day in the program).
Suppose, for example, a group of researchers was interested in the effect of government-funded child care subsidies on maternal employment. They might hypothesize that the provision of government-subsidized child care would promote such employment. They could then design an experiment in which some mothers would be provided the option of government-funded child care subsidies and others would not. The researchers might also manipulate the value of the child care subsidies in order to determine if higher subsidy values might result in different levels of maternal employment.
The group of participants that receives the intervention or treatment is known as the "treatment group," and the group that does not is known as the “control group” in randomized experiments and “comparison group” in quasi-experiments.
The key distinction between randomized experiments and quasi-experiments lies in the fact that in a randomized experiment, participants are randomly assigned to either the treatment or the control group whereas participants are not in a quasi-experiment.
Random assignment ensures that all participants have the same chance of being in a given experimental condition. Randomized experiments (also known as RCT or randomized control trials) are considered to be the most rigorous approach, or the “gold standard,” to identifying causal effects because they theoretically eliminate all preexisting differences between the treatment and control groups. However, some differences might occur due to chance. In practice, therefore, researchers often control for observed characteristics that might differ between individuals in the treatment and control groups when estimating treatment effects. The use of control variables improves the precision of treatment effect estimates.
Despite being the “gold standard” in causal study design, randomized experiments are not common in social science research because it is often impossible or unethical to randomize individuals to experimental conditions. Cluster-randomized experiments, in which groups (e.g., schools or classes) instead of individuals are randomized, often encounter less objections out of ethical concerns and therefore are more feasible in real life. They also prevent treatment spill over to the control group. For example, if students in the same class are randomly assigned to either the treatment or control group with the treatment being a new curriculum, teachers may introduce features of the treatment (i.e., new curriculum) when working with students in the control group in ways that might affect the outcomes.
One drawback of cluster-randomized experiments is a reduction in statistical power. That is, the likelihood that a true effect is detected is reduced with this design.
Quasi-experiments are characterized by the lack of randomized assignment. They may or may not have comparison groups. When there are both comparison and treatment groups in a quasi-experiment, the groups differ not only in terms of the experimental treatment they receive, but also in other, often unknown or unknowable, ways. As a result, there may be several "rival hypotheses" competing with the experimental manipulation as explanations for observed results.
There are a variety of quasi-experiments. Below are some of the most common types in social and policy research, arranged in the order of weak to strong in terms of their capabilities of addressing threats to a statement that the relationship between the treatment and the outcome of interest is causal.
A single group that receives the treatment is observed at two time points, one before the treatment and one after the treatment. Changes in the outcome of interest are presumed to be the effect of the treatment. For example, a new fourth grade math curriculum is introduced and students' math achievement is assessed in the fall and spring of the school year. Improved scores on the assessment are attributed to the curriculum. The biggest weakness of this design is that a number of events can happen around the time of the treatment and influence the outcome. There can be multiple plausible alternative explanations for the observed results.
Interrupted time series
A single group that receives the treatment is observed at multiple time points both before and after the treatment. A change in the trend around the time of the treatment is presumed to be the treatment effect. For example, individuals participating in an exercise program might be weighed each week before and after a new exercise routine is introduced. A downward trend in their weight around the time the new routine was introduced would be seen as evidence of the effectiveness of the treatment. This design is stronger than one-group pretest-posttest because it shows the trend in the outcome variable both before and after the treatment instead of a simple two-point-in-time comparison. However, it still suffers the same weakness that other events can happen at the time of the treatment and be the alternative causes of the observed outcome.
A group that has experienced some treatment is compared with one that has not. Observed differences between the two groups are assumed to be the result of the treatment. For example, fourth graders in some classrooms in a school district are introduced to a new math curriculum while fourth graders in other classrooms in the district are not. Differences in the math scores of the two groups assessed in the spring of the school year only are assumed to be the result of the new curriculum. The weakness of this design is that the treatment and comparison groups may not be truly comparable because participants are not randomly assigned to the groups and there may be important differences in the characteristics and experiences of the groups, only some of which may be known. If the two groups differ in ways that affect the outcome of interest, the causal claim cannot be presumed.
Both treatment and comparison groups are measured before and after the treatment. The difference between the two before-after differences is presumed to be the treatment effect. This design is an improvement of the static-group comparison because it compares outcomes that are measured both before and after the treatment is introduced instead of two post-treatment outcomes. For example, the fourth graders in the prior example are assessed in both the fall (pre-treatment) and spring (post-treatment). Differences in the fall-spring scores between the two fourth grade groups are seen as evidence of the effect of the curriculum. For this reason, the treatment and comparison groups in difference-in-differences do not have to be perfectly comparable. The biggest challenge for the researcher is to defend the parallel trend assumption, namely the change in the treatment group would be the same as the change in the comparison group in the absence of the treatment.
Participants are assigned to experimental conditions based on whether their scores are above or below a cut point for a quantitative variable. For example, students who score below 75 on a math test are assigned to the treatment group with the treatment being an intensive tutoring program. Those who score at or above 75 are assigned to the comparison group. The students who score just above or below the cut point are considered to be on average identical because their score differences are most likely due to chance. These students therefore act as if they were randomly assigned. The difference in the outcome of interest (e.g., math ability as measured by a different test after the treatment) between the students right around the cut point is presumed to be the treatment effect.
Regression discontinuity is an alternative to randomized experiments when the latter design is not possible. It is the only recognized quasi-experimental design that meets the Institute of Education Sciences standards for establishing causal effects. Although considered to be a strong quasi-experimental design, it needs to meet certain conditions.
See the following for additional information on randomized and quasi-experimental designs.
An instrumental variable is a variable that is correlated with the independent variable of interest and only affects the dependent variable through that independent variable. The IV approach can be used in both randomized experiments and quasi-experiments.
In randomized experiments, the IV approach is used to estimate the effect of treatment receipt, which is different from treatment offer. Many social programs can only offer participants the treatment, or intervention, but not mandate them to use it. For example, parents are randomly assigned by way of lottery to a school voucher program. Those in the treatment group are offered vouchers to help pay for private school, but ultimately it is up to the parents to decide whether or not they will use the vouchers. If the researcher is interested in estimating the impact of voucher usage, namely the effect of treatment receipt, the IV approach is one way to do so. In this case, the IV is the treatment assignment status (e.g., a dummy variable with 1 being in the treatment group and 0 being in the control group), which is used to predict the probability of a parent using the voucher, which is in turn used as the independent variable of interest to estimate the effect of voucher usage.
In quasi-experiments, the IV approach is used to address the issue of endogeneity, namely that the treatment status is determined by participants themselves (self-selection) or by criteria established by the program designer (treatment selection). Endogeneity is an issue that plagues quasi-experiments and often a source of threats to the causal claim. The IV approach can be used to tease out the causal impact of an endogenous variable on the outcome. For example, researchers used cigarette taxes as an instrumental variable to estimate the effect of maternal smoking on birth outcomes (Evans and Ringel, 1999). Cigarette taxes affect how much pregnant mothers smoke but not birth outcomes. They therefore meet the condition of being an IV, which correlates with the independent variable/treatment (i.e., maternal smoking habit) and only affects the dependent variable (i.e., birth outcomes) through that independent variable. The estimated effect is, strictly speaking, a local average treatment effect, namely the effect of treatment (maternal smoking) among those mothers affected by the IV (cigarette taxes). It does not include mothers whose smoking habit is not affected by the price of cigarettes (e.g., chain smokers who may be addicted to nicotine).
An instrumental variable needs to meet certain conditions to provide a consistent estimate of a causal effect.
See the following for additional information on instrumental variables.
The two types of validity are internal and external. It is often difficult to achieve both in social science research experiments.
One potential threat to internal validity in experiments occurs when participants either drop out of the study or refuse to participate in the study. If individuals with particular characteristics drop out or refuse to participate more often than individuals with other characteristics, this is called differential attrition. For example, suppose an experiment was conducted to assess the effects of a new reading curriculum on the reading achievement of 10th graders. Schools were randomly assigned to use the new curriculum in all classrooms (treatment schools) or to continue using their current curriculum (control schools). If many of the slowest readers in treatment schools left the study before it was completed (e.g., dropped out of school or transferred to a school in another state), schools with the new curriculum would experience an increase in the average reading scores. The reason they experienced an increase in reading scores, however, is because weaker readers left the school, not because the new curriculum improved students' reading skills. The effects of the curriculum on the achievement of 10th graders might be overestimated, if schools in the control schools did not experience the same type of attrition.
For example, a study shows that a new curriculum improved reading comprehension of third-grade children in Iowa. To assess the study's external validity, the researcher would consider whether this new curriculum would also be effective with third graders in New York or with children in other elementary grades.
Different statistics and methods used to describe the characteristics of the members of a sample or population, explore the relationships between variables, to test research hypotheses, and to visually represent data are described. Terms relating to the topics covered are defined in the Research Glossary.
Descriptive statistics can be useful for two purposes:
The four most common descriptive statistics are:
One of the most basic ways of describing the characteristics of a sample or population is to classify its individual members into mutually exclusive categories and counting the number of cases in each of the categories. In research, variables with discrete, qualitative categories are called nominal or categorical variables. The categories can be given numerical codes, but they cannot be ranked, added, or multiplied. Examples of nominal variables include gender (male, female), preschool program attendance (yes, no), and race/ethnicity (White, African American, Hispanic, Asian, American Indian). Researchers calculate proportions, percentages and ratios in order to summarize the data from nominal or categorical variables and to allow for comparisons to be made between groups.
Proportion—The number of cases in a category divided by the total number of cases across all categories of a variable.
Percentage—The proportion multiplied by 100 (or the number of cases in a category divided by the total number of cases across all categories of a value times 100).
Ratio—The number of cases in one category to the number of cases in a second category.
A researcher selects a sample of 100 students from a Head Start program. The sample includes 20 White children, 30 African American children, 40 Hispanic children and 10 children of mixed-race/ethnicity.
Proportion of Hispanic children in the program = 40 / (20+30+40+10) = .40.
Percentage of Hispanic children in the program = .40 x 100 = 40%.
Ratio of Hispanic children to White children in the program = 40/20 = 2.0, or the ratio of Hispanic to White children enrolled in the Head Start program is 2 to 1.
Proportions, percentages and ratios are used to summarize the characteristics of a sample or population that fall into discrete categories. Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics, when those characteristics are measured using an interval scale. The values of an interval variable are ordered where the distance between any two adjacent values is the same but the zero point is arbitrary. Values on an interval scale can be added and subtracted. Examples of interval scales or interval variables include household income, years of schooling, hours a child spends in child care and the cost of child care.
Measures of central tendency describe the "average" member of the sample or population of interest. There are three measures of central tendency:
Mean—The arithmetic average of the values of a variable. To calculate the mean, all the values of a variable are summed and divided by the total number of cases.
Median—The value within a set of values that divides the values in half (i.e. 50% of the variable's values lie above the median, and 50% lie below the median).
Mode—The value of a variable that occurs most often.
The annual incomes of five randomly selected people in the United States are $10,000, $10,000, $45,000, $60,000, and $1,000,000.
Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = $225,000.
Median Income = $45,000.
Modal Income = $10,000.
The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between $0 and $200,000 annually, a handful of individuals earn millions.
Measures of dispersion provide information about the spread of a variable's values. There are three key measures of dispersion:
Range is simply the difference between the smallest and largest values in the data. Researchers often report simply the values of the range (e.g., 75 – 100).
Variance is a commonly used measure of dispersion, or how spread out a set of values are around the mean. It is calculated by taking the average of the squared differences between each value and the mean. The variance is the standard deviation squared.
Standard deviation, like variance, is a measure of the spread of a set of values around the mean of the values. The wider the spread, the greater the standard deviation and the greater the range of the values from their mean. A small standard deviation indicates that most of the values are close to the mean. A large standard deviation on the other hand indicates that the values are more spread out. The standard deviation is the square root of the variance.
Five randomly selected children were administered a standardized reading assessment. Their scores on the assessment were 50, 50, 60,75 and 90 with a mean score of 65.
Range = 90 - 50 = 40.
Variance = [(50 - 65)2 + (50 - 65)2 + (60 - 65)2 + (75 - 65)2 + (90 - 65)2] / 5 = 300.
Standard Deviation = Square Root (150,540,000,000) = 17.32.
Skewness and Kurtosis
The range, variance and standard deviation are measures of dispersion and provide information about the spread of the values of a variable. Two additional measures provide information about the shape of the distribution of values.
Skew is a measure of whether some values of a variable are extremely different from the majority of the values. Skewness refers to the tendency of the values of a variable to depart from symmetry. A distribution is symmetric if one half of the distribution is exactly equal to the other half. For example, the distribution of annual income in the U.S. is skewed because most people make between $0 and $200,000 a year, but a handful of people earn millions. A variable is positively skewed (skewed to the right) if the extreme values are higher than the majority of values. A variable is negatively skewed (skewed to the left) if the extreme values are lower than the majority of values. In the example of students' standardized test scores, the distribution is slightly positively skewed.
Kurtosis measures how outlier-prone a distribution is. Outliers are values of a variable that are much smaller or larger than most of the values found in a dataset. The kurtosis of a normal distribution is 0. If the kurtosis is different from 0, then the distribution produces outliers that are either more extreme (positive kurtosis) or less extreme (negative kurtosis) than are produced by the normal distribution.
Measures of association indicate whether two variables are related. Two measures are commonly used:
Chi-Square test of independence is used to evaluate whether there is an association between two variables. (The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests.)
To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). The association between the two variables is determined to be significant (the null hypothesis is rejected), if the value of the chi-square test is greater than or equal to the critical value for a given significance level (typically .05) and the degrees of freedom associated with the test found in a chi-square table. The degrees of freedom for the chi-square are calculated using the following formula: df = (r-1)(c-1) where r is the number of rows and c is the number of columns in a contingency or cross-tabulation table. For example, the critical value for a 2 x 2 table with 1 degree of freedom ([2-1][2-1]=1) is 3.841.
Correlation coefficient is used to measure the strength and direction of the relationship between numeric variables (e.g., weight and height).
The findings reported by researchers are typically based on data collected from a single sample that was drawn from the population of interest (e.g., a sample of children selected from the population of children enrolled in Head Start or Early Head Start). If additional random samples of the same size were drawn from this population, the estimated percentages and means calculated using the data from each of these other samples might differ by chance somewhat from the estimates produced from one sample. Researchers use one of several tests to evaluate whether their findings are statistically significant.
Statistical significance refers to the probability or likelihood that the difference between groups or the relationship between variables observed in statistical analyses is not due to random chance (e.g., that differences between the average scores on a measure of language development between 3- and 4-year-olds are likely to be “real” rather than just observed in this sample by chance). If there is a very small probability that an observed difference or relationship is due to chance, the results are said to reach statistical significance. This means that the researcher concludes that there is a real difference between two groups or a real relationship between the observed variables.
Significance tests and the associated p- value only tell us how likely it is that a statistical result (e.g., a difference between the means of two or more groups, or a correlation between two variables) is due to chance. The p-value is the probability that the results of a statistical test are due to chance. In the social and behavioral sciences, a p-value less than or equal to .05 is usually interpreted to mean that the results are statistically significant (that the statistical results would occur by chance 5 times or fewer out of 100), although sometimes researchers use a p-value of .10 to indicate whether a result is statistically significant. The lower the p-value, the less likely a statistical result is due to chance. Lower p-values are therefore a more rigorous criteria for concluding significance.
Researchers use a variety of approaches to test whether their findings are statistically significant or not. The choice depends on several factors, including the number of groups being compared, whether the groups are independent from one another, and the type of variables used in the analysis. Three widely used tests are the t-test, F-test, and Chi-square test.
Three of the more widely used tests of statistical significance are described briefly below.
Chi-Square test is used when testing for associations between categorical variables (e.g., differences in whether a child has been diagnosed as having a cognitive disability by gender or race/ethnicity). It is also used as a goodness-of-fit test to determine whether data from a sample come from a population with a specific distribution.
t-test is used to compare the means of two independent samples (independent t-test), the means of one sample at different times (paired sample t-test) or the mean of one sample against a known mean (one sample t-test). For example, when comparing the mean assessment scores of boys and girls or the mean scores of 3- and 4-year-old children, an independent t-test would be used. When comparing the mean assessment scores of girls only at two time points (e.g., fall and spring of the program year) a paired t-test would be used. A one sample t-test would be used when comparing the mean scores of a sample of children to the mean score of a population of children. The t- test is appropriate for small sample sizes (less than 30) although it is often used when testing group differences for larger samples. It is also used to test whether correlation and regression coefficients are significantly different from zero.
F-test is an extension of the t-test and is used to compare the means of three or more independent samples (groups). The F-test is used in Analysis of Variance (ANOVA) to examine the ratio of the between groups to within groups variance. It is also used to test the significance of the total variance explained by a regression model with multiple independent variables.
Significance tests alone do not tell us anything about the size of the difference between groups or the strength of the association between variables. Because significance test results are sensitive to sample size, studies with different sample sizes with the same means and standard deviations would have different t statistics and p values. It is therefore important that researchers provide additional information about the size of the difference between groups or the association and whether the difference/association is substantively meaningful.
See the following for additional information about descriptive statistics and tests of significance:
There are several graphical and pictorial methods that enhance understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:
Researchers use different analytical techniques to examine complex relationships between variables. There are three basic types of analytical techniques:
Regression analysis assumes that the dependent, or outcome, variable is directly affected by one or more independent variables. There are four important types of regression analyses:
See the following for additional information about regression methods and models:
Grouping methods are techniques for classifying observations into meaningful categories. Two of the most common grouping methods are discriminant analysis and cluster analysis.
See the following for additional information about grouping methods:
Multiple equation modeling, which is an extension of regression, is used to examine the causal pathways from independent variables to the dependent variable. For example, what are the variables that link (or explain) the relationship between maternal education (independent variable) and children's early reading skills (dependent variable)? These variables might include the nature and quality of mother-child interactions or the frequency and quality of shared book reading.
There are two main types of multiple equation models:
Path analysis is an extension of multiple regression that allows researchers to examine multiple direct and indirect effects of a set of variables on a dependent, or outcome, variable. In path analysis, a direct effect measures the extent to which the dependent variable is influenced by an independent variable. An indirect effect measures the extent to which an independent variable's influence on the dependent variable is due to another variable.
Structural equation modeling (SEM)
Structural equation modeling expands path analysis by allowing for multiple indicators of unobserved (or latent) variables in the model. Latent variables are variables that are not directly observed (measured), but instead are inferred from other variables that are observed or directly measured. For example, children's school readiness is a latent variable with multiple indicators of children's development across multiple domains (e.g., children's scores on standardized assessments of early math and literacy, language, scores based on teacher reports of children's social skills and problem behaviors).
There are two parts to a SEM analysis. First, the measurement model is tested. This involves examining the relationships between the latent variables and their measures (indicators). Second, the structural model is tested in order to examine how the latent variables are related to one another. For example, a researcher might use SEM to investigate the relationships between different types of executive functions and word reading and reading comprehension for elementary school children. In this example, the latent variables word reading and reading comprehension might be inferred from a set of standardized reading assessments and the latent variables cognitive flexibility and inhibitory control from a set of executive function tasks. The measurement model of SEM allows the researcher to evaluate how well children's scores on the standardized reading assessments combine to identify children's word reading and reading comprehension. Assuming that the results of these analyses are acceptable, the researcher would move on to an evaluation of the structural model, examining the predicted relationships between two types of executive functions and two dimensions of reading.
SEM has several advantages over traditional path analysis:
See the following for additional information about multiple equation models: