The Research Connections website is undergoing renovations. All data and resources will continue to be available while we work on the website. Sign up for the Research Connections newsletter to receive the latest updates. You can also watch our video.

Related Information:

# Study Design and Analysis

## Descriptive Research Studies

Descriptive research is a type of research that is used to describe the characteristics of a population. It collects data that are used to answer a wide range of what, when, and how questions pertaining to a particular population or group. For example, descriptive studies might be used to answer questions such as: What percentage of Head Start teachers have a bachelor's degree or higher? What is the average reading ability of 5-year-olds when they first enter kindergarten? What kinds of math activities are used in early childhood programs? When do children first receive regular child care from someone other than their parents? When are children with developmental disabilities first diagnosed and when do they first receive services? What factors do programs consider when making decisions about the type of assessments that will be used to assess the skills of the children in their programs? How do the types of services children receive from their early childhood program change as children age?

Descriptive research does not answer questions about why a certain phenomenon occurs or what the causes are. Answers to such questions are best obtained from randomized and quasi-experimental studies. However, data from descriptive studies can be used to examine the relationships (correlations) among variables. While the findings from correlational analyses are not evidence of causality, they can help to distinguish variables that may be important in explaining a phenomenon from those that are not. Thus, descriptive research is often used to generate hypotheses that should be tested using more rigorous designs.

A variety of data collection methods may be used alone or in combination to answer the types of questions guiding descriptive research. Some of the more common methods include surveys, interviews, observations, case studies, and portfolios. The data collected through these methods can be either quantitative or qualitative. Quantitative data are typically analyzed and presenting using descriptive statistics. Using quantitative data, researchers may describe the characteristics of a sample or population in terms of percentages (e.g., percentage of population that belong to different racial/ethnic groups, percentage of low-income families that receive different government services) or averages (e.g., average household income, average scores of reading, mathematics and language assessments). Quantitative data, such as narrative data collected as part of a case study, may be used to organize, classify, and used to identify patterns of behaviors, attitudes, and other characteristics of groups.

Descriptive studies have an important role in early care and education research. Studies such as the National Survey of Early Care and Education and the National Household Education Surveys Program have greatly increased our knowledge of the supply of and demand for child care in the U.S. The Head Start Family and Child Experiences Survey and the Early Childhood Longitudinal Study Program have provided researchers, policy makers and practitioners with rich information about school readiness skills of children in the U.S.

Each of the methods used to collect descriptive data have their own strengths and limitations. The following are some of the strengths and limitations of descriptive research studies in general.

Strengths:

• Study participants are questioned or observed in a natural setting (e.g., their homes, child care or educational settings).
• Study data can be used to identify the prevalence of particular problems and the need for new or additional services to address these problems.
• Descriptive research may identify areas in need of additional research and relationships between variables that require future study. Descriptive research is often referred to as "hypothesis generating research."
• Depending on the data collection method used, descriptive studies can generate rich datasets on large and diverse samples.

Limitations:

• Descriptive studies cannot be used to establish cause and effect relationships.
• Respondents may not be truthful when answering survey questions or may give socially desirable responses.
• The choice and wording of questions on a questionnaire may influence the descriptive findings.
• Depending on the type and size of sample, the findings may not be generalizable or produce an accurate description of the population of interest.

## Causal Study Design

Researchers conduct experiments to study cause and effect relationships and to estimate the impact of child care and early childhood programs on children and their families. There are two basic types of experiments:

• Randomized experiments
• Quasi-experiments

An experiment is a study in which the researcher manipulates the treatment, or intervention, and then measures the outcome. It addresses the question “if we change X (the treatment or intervention), what happens to Y (the outcome)?” Conducted both in the laboratory and in real life situations, experiments are powerful techniques for evaluating cause-and-effect relationships. The researcher may manipulate whether research subjects receive a treatment (e.g., attendance in a Head Start program: yes or no) or the level of treatment (e.g., hours per day in the program).

Suppose, for example, a group of researchers was interested in the effect of government-funded child care subsidies on maternal employment. They might hypothesize that the provision of government-subsidized child care would promote such employment. They could then design an experiment in which some mothers would be provided the option of government-funded child care subsidies and others would not. The researchers might also manipulate the value of the child care subsidies in order to determine if higher subsidy values might result in different levels of maternal employment.

The group of participants that receives the intervention or treatment is known as the "treatment group," and the group that does not is known as the “control group” in randomized experiments and “comparison group” in quasi-experiments.

The key distinction between randomized experiments and quasi-experiments lies in the fact that in a randomized experiment, participants are randomly assigned to either the treatment or the control group whereas participants are not in a quasi-experiment.

### Randomized Experiments

Random assignment ensures that all participants have the same chance of being in a given experimental condition. Randomized experiments (also known as RCT or randomized control trials) are considered to be the most rigorous approach, or the “gold standard,” to identifying causal effects because they theoretically eliminate all preexisting differences between the treatment and control groups. However, some differences might occur due to chance. In practice, therefore, researchers often control for observed characteristics that might differ between individuals in the treatment and control groups when estimating treatment effects. The use of control variables improves the precision of treatment effect estimates.

#### Cluster-randomized experiments

Despite being the “gold standard” in causal study design, randomized experiments are not common in social science research because it is often impossible or unethical to randomize individuals to experimental conditions. Cluster-randomized experiments, in which groups (e.g., schools or classes) instead of individuals are randomized, often encounter less objections out of ethical concerns and therefore are more feasible in real life. They also prevent treatment spill over to the control group. For example, if students in the same class are randomly assigned to either the treatment or control group with the treatment being a new curriculum, teachers may introduce features of the treatment (i.e., new curriculum) when working with students in the control group in ways that might affect the outcomes.

One drawback of cluster-randomized experiments is a reduction in statistical power. That is, the likelihood that a true effect is detected is reduced with this design.

### Quasi-Experiments

Quasi-experiments are characterized by the lack of randomized assignment. They may or may not have comparison groups. When there are both comparison and treatment groups in a quasi-experiment, the groups differ not only in terms of the experimental treatment they receive, but also in other, often unknown or unknowable, ways. As a result, there may be several "rival hypotheses" competing with the experimental manipulation as explanations for observed results.

There are a variety of quasi-experiments. Below are some of the most common types in social and policy research, arranged in the order of weak to strong in terms of their capabilities of addressing threats to a statement that the relationship between the treatment and the outcome of interest is causal.

#### One group only

• One-group pretest-posttest
A single group that receives the treatment is observed at two time points, one before the treatment and one after the treatment. Changes in the outcome of interest are presumed to be the effect of the treatment. For example, a new fourth grade math curriculum is introduced and students' math achievement is assessed in the fall and spring of the school year. Improved scores on the assessment are attributed to the curriculum. The biggest weakness of this design is that a number of events can happen around the time of the treatment and influence the outcome. There can be multiple plausible alternative explanations for the observed results.

• Interrupted time series
A single group that receives the treatment is observed at multiple time points both before and after the treatment. A change in the trend around the time of the treatment is presumed to be the treatment effect. For example, individuals participating in an exercise program might be weighed each week before and after a new exercise routine is introduced. A downward trend in their weight around the time the new routine was introduced would be seen as evidence of the effectiveness of the treatment. This design is stronger than one-group pretest-posttest because it shows the trend in the outcome variable both before and after the treatment instead of a simple two-point-in-time comparison. However, it still suffers the same weakness that other events can happen at the time of the treatment and be the alternative causes of the observed outcome.

#### Two groups

• Static-group comparison
A group that has experienced some treatment is compared with one that has not. Observed differences between the two groups are assumed to be the result of the treatment. For example, fourth graders in some classrooms in a school district are introduced to a new math curriculum while fourth graders in other classrooms in the district are not. Differences in the math scores of the two groups assessed in the spring of the school year only are assumed to be the result of the new curriculum. The weakness of this design is that the treatment and comparison groups may not be truly comparable because participants are not randomly assigned to the groups and there may be important differences in the characteristics and experiences of the groups, only some of which may be known. If the two groups differ in ways that affect the outcome of interest, the causal claim cannot be presumed.

• Difference-in-differences
Both treatment and comparison groups are measured before and after the treatment. The difference between the two before-after differences is presumed to be the treatment effect. This design is an improvement of the static-group comparison because it compares outcomes that are measured both before and after the treatment is introduced instead of two post-treatment outcomes. For example, the fourth graders in the prior example are assessed in both the fall (pre-treatment) and spring (post-treatment). Differences in the fall-spring scores between the two fourth grade groups are seen as evidence of the effect of the curriculum. For this reason, the treatment and comparison groups in difference-in-differences do not have to be perfectly comparable. The biggest challenge for the researcher is to defend the parallel trend assumption, namely the change in the treatment group would be the same as the change in the comparison group in the absence of the treatment.

• Regression discontinuity
Participants are assigned to experimental conditions based on whether their scores are above or below a cut point for a quantitative variable. For example, students who score below 75 on a math test are assigned to the treatment group with the treatment being an intensive tutoring program. Those who score at or above 75 are assigned to the comparison group. The students who score just above or below the cut point are considered to be on average identical because their score differences are most likely due to chance. These students therefore act as if they were randomly assigned. The difference in the outcome of interest (e.g., math ability as measured by a different test after the treatment) between the students right around the cut point is presumed to be the treatment effect.

Regression discontinuity is an alternative to randomized experiments when the latter design is not possible. It is the only recognized quasi-experimental design that meets the Institute of Education Sciences standards for establishing causal effects. Although considered to be a strong quasi-experimental design, it needs to meet certain conditions.

### Resources

See the following for additional information on randomized and quasi-experimental designs.

### Instrumental Variables (IV) Approach

An instrumental variable is a variable that is correlated with the independent variable of interest and only affects the dependent variable through that independent variable. The IV approach can be used in both randomized experiments and quasi-experiments.

In randomized experiments, the IV approach is used to estimate the effect of treatment receipt, which is different from treatment offer. Many social programs can only offer participants the treatment, or intervention, but not mandate them to use it. For example, parents are randomly assigned by way of lottery to a school voucher program. Those in the treatment group are offered vouchers to help pay for private school, but ultimately it is up to the parents to decide whether or not they will use the vouchers. If the researcher is interested in estimating the impact of voucher usage, namely the effect of treatment receipt, the IV approach is one way to do so. In this case, the IV is the treatment assignment status (e.g., a dummy variable with 1 being in the treatment group and 0 being in the control group), which is used to predict the probability of a parent using the voucher, which is in turn used as the independent variable of interest to estimate the effect of voucher usage.

In quasi-experiments, the IV approach is used to address the issue of endogeneity, namely that the treatment status is determined by participants themselves (self-selection) or by criteria established by the program designer (treatment selection). Endogeneity is an issue that plagues quasi-experiments and often a source of threats to the causal claim. The IV approach can be used to tease out the causal impact of an endogenous variable on the outcome. For example, researchers used cigarette taxes as an instrumental variable to estimate the effect of maternal smoking on birth outcomes (Evans and Ringel, 1999). Cigarette taxes affect how much pregnant mothers smoke but not birth outcomes. They therefore meet the condition of being an IV, which correlates with the independent variable/treatment (i.e., maternal smoking habit) and only affects the dependent variable (i.e., birth outcomes) through that independent variable. The estimated effect is, strictly speaking, a local average treatment effect, namely the effect of treatment (maternal smoking) among those mothers affected by the IV (cigarette taxes). It does not include mothers whose smoking habit is not affected by the price of cigarettes (e.g., chain smokers who may be addicted to nicotine).

An instrumental variable needs to meet certain conditions to provide a consistent estimate of a causal effect.

#### Resources

See the following for additional information on instrumental variables.

### Validity of Results from Causal Designs

The two types of validity are internal and external. It is often difficult to achieve both in social science research experiments.

• Internal Validity
• Internal validity refers to the strength of evidence of a causal relationship between the treatment (e.g., child care subsidies) and the outcome (e.g., maternal employment).
• When subjects are randomly assigned to treatment or control groups, we can assume that the treatment caused the observed outcomes because the two groups should not have differed from one another at the start of the experiment.
• For example, take the child care subsidy example above. Since research subjects were randomly assigned to the treatment (child care subsidies available) and control (no child care subsidies available) groups, the two groups should not have differed at the outset of the study. If, after the intervention, mothers in the treatment group were more likely to be working, we can assume that the availability of child care subsidies promoted maternal employment.

One potential threat to internal validity in experiments occurs when participants either drop out of the study or refuse to participate in the study. If individuals with particular characteristics drop out or refuse to participate more often than individuals with other characteristics, this is called differential attrition. For example, suppose an experiment was conducted to assess the effects of a new reading curriculum on the reading achievement of 10th graders. Schools were randomly assigned to use the new curriculum in all classrooms (treatment schools) or to continue using their current curriculum (control schools). If many of the slowest readers in treatment schools left the study before it was completed (e.g., dropped out of school or transferred to a school in another state), schools with the new curriculum would experience an increase in the average reading scores. The reason they experienced an increase in reading scores, however, is because weaker readers left the school, not because the new curriculum improved students' reading skills. The effects of the curriculum on the achievement of 10th graders might be overestimated, if schools in the control schools did not experience the same type of attrition.

• External Validity
• External validity, or generalizability, is also of particular concern in social science experiments.
• It can be very difficult to generalize experimental results to groups that were not included in the study.
• Studies that randomly select participants from the most diverse and representative populations are more likely to have external validity.

For example, a study shows that a new curriculum improved reading comprehension of third-grade children in Iowa. To assess the study's external validity, the researcher would consider whether this new curriculum would also be effective with third graders in New York or with children in other elementary grades.

### Advantages and Disadvantages of Experimental and Quasi-Experimental Designs

#### Randomized experiments

• Advantages
• Yield the most accurate assessment of cause and effect.
• Typically have strong internal validity.
• Ensure that the treatment and control groups are truly comparable and that treatment status is not determined by participant characteristics that might influence the outcome.
• Disadvantages
• In social policy research, it can be impractical or unethical to conduct randomized experiments.
• They typically have limited external validity due to the fact that they often rely on volunteers and are implemented in a somewhat artificial experimental setting with a small number of participants.
• Despite being the “gold standard” for identifying causal impacts, they can also be faced with threats to internal validity such as attrition, contamination, cross-overs, and Hawthorne effects.

#### Quasi-experiments

• Advantages
• Often have stronger external validity than randomized experiments because they are typically implemented in real-world settings and on larger scale.
• May be more feasible than randomized experiments because they have fewer time and logistical constraints often associated with randomized experiments.
• Avoid the ethical concerns associated with random assignment.
• Are often less expensive than randomized experiments.
• Disadvantages
• They often have weaker internal validity than randomized experiments.
• The lack of randomized assignment means that the treatment and control groups may not be comparable and that treatment status may be driven by participant characteristics or other experiences that might influence the outcome.
• Conclusions about causality are less definitive than randomized experiments due to the lack of randomization and reduced internal validity.
• Despite having weaker internal validity, they are often the best option available when it is impractical or unethical to conduct randomized experiments.

## Data Analysis

Different statistics and methods used to describe the characteristics of the members of a sample or population, explore the relationships between variables, to test research hypotheses, and to visually represent data are described. Terms relating to the topics covered are defined in the Research Glossary.

• Descriptive Statistics
• Tests of Significance
• Graphical/Pictorial Methods
• Analytical Techniques

### Descriptive Statistics

Descriptive statistics can be useful for two purposes:

1. To provide basic information about the characteristics of a sample or population. These characteristics are represented by variables in a research study dataset.
2. To highlight potential relationships between these characteristics, or the relationships among the variables in the dataset.

The four most common descriptive statistics are:

#### Proportions, Percentages and Ratios

One of the most basic ways of describing the characteristics of a sample or population is to classify its individual members into mutually exclusive categories and counting the number of cases in each of the categories. In research, variables with discrete, qualitative categories are called nominal or categorical variables. The categories can be given numerical codes, but they cannot be ranked, added, or multiplied. Examples of nominal variables include gender (male, female), preschool program attendance (yes, no), and race/ethnicity (White, African American, Hispanic, Asian, American Indian). Researchers calculate proportions, percentages and ratios in order to summarize the data from nominal or categorical variables and to allow for comparisons to be made between groups.

Proportion—The number of cases in a category divided by the total number of cases across all categories of a variable.

Percentage—The proportion multiplied by 100 (or the number of cases in a category divided by the total number of cases across all categories of a value times 100).

Ratio—The number of cases in one category to the number of cases in a second category.

Example:

A researcher selects a sample of 100 students from a Head Start program. The sample includes 20 White children, 30 African American children, 40 Hispanic children and 10 children of mixed-race/ethnicity.

Proportion of Hispanic children in the program = 40 / (20+30+40+10) = .40.

Percentage of Hispanic children in the program = .40 x 100 = 40%.

Ratio of Hispanic children to White children in the program = 40/20 = 2.0, or the ratio of Hispanic to White children enrolled in the Head Start program is 2 to 1.

#### Measures of Central Tendency

Proportions, percentages and ratios are used to summarize the characteristics of a sample or population that fall into discrete categories. Measures of central tendency are the most basic and, often, the most informative description of a population's characteristics, when those characteristics are measured using an interval scale. The values of an interval variable are ordered where the distance between any two adjacent values is the same but the zero point is arbitrary. Values on an interval scale can be added and subtracted. Examples of interval scales or interval variables include household income, years of schooling, hours a child spends in child care and the cost of child care.

Measures of central tendency describe the "average" member of the sample or population of interest. There are three measures of central tendency:

Mean—The arithmetic average of the values of a variable. To calculate the mean, all the values of a variable are summed and divided by the total number of cases.

Median—The value within a set of values that divides the values in half (i.e. 50% of the variable's values lie above the median, and 50% lie below the median).

Mode—The value of a variable that occurs most often.

Example:

The annual incomes of five randomly selected people in the United States are \$10,000, \$10,000, \$45,000, \$60,000, and \$1,000,000.

Mean Income = (10,000 + 10,000 + 45,000 + 60,000 + 1,000,000) / 5 = \$225,000.

Median Income = \$45,000.

Modal Income = \$10,000.

The mean is the most commonly used measure of central tendency. Medians are generally used when a few values are extremely different from the rest of the values (this is called a skewed distribution). For example, the median income is often the best measure of the average income because, while most individuals earn between \$0 and \$200,000 annually, a handful of individuals earn millions.

#### Measures of Dispersion

Measures of dispersion provide information about the spread of a variable's values. There are three key measures of dispersion:

Range is simply the difference between the smallest and largest values in the data. Researchers often report simply the values of the range (e.g., 75 – 100).

Variance is a commonly used measure of dispersion, or how spread out a set of values are around the mean. It is calculated by taking the average of the squared differences between each value and the mean. The variance is the standard deviation squared.

Standard deviation, like variance, is a measure of the spread of a set of values around the mean of the values. The wider the spread, the greater the standard deviation and the greater the range of the values from their mean. A small standard deviation indicates that most of the values are close to the mean. A large standard deviation on the other hand indicates that the values are more spread out. The standard deviation is the square root of the variance.

Example:

Five randomly selected children were administered a standardized reading assessment. Their scores on the assessment were 50, 50, 60,75 and 90 with a mean score of 65.

Range = 90 - 50 = 40.

Variance = [(50 - 65)2 + (50 - 65)2 + (60 - 65)2 + (75 - 65)2 + (90 - 65)2] / 5 = 300.

Standard Deviation = Square Root (150,540,000,000) = 17.32.

Skewness and Kurtosis

The range, variance and standard deviation are measures of dispersion and provide information about the spread of the values of a variable. Two additional measures provide information about the shape of the distribution of values.

Skew is a measure of whether some values of a variable are extremely different from the majority of the values. Skewness refers to the tendency of the values of a variable to depart from symmetry. A distribution is symmetric if one half of the distribution is exactly equal to the other half. For example, the distribution of annual income in the U.S. is skewed because most people make between \$0 and \$200,000 a year, but a handful of people earn millions. A variable is positively skewed (skewed to the right) if the extreme values are higher than the majority of values. A variable is negatively skewed (skewed to the left) if the extreme values are lower than the majority of values. In the example of students' standardized test scores, the distribution is slightly positively skewed.

Kurtosis measures how outlier-prone a distribution is. Outliers are values of a variable that are much smaller or larger than most of the values found in a dataset. The kurtosis of a normal distribution is 0. If the kurtosis is different from 0, then the distribution produces outliers that are either more extreme (positive kurtosis) or less extreme (negative kurtosis) than are produced by the normal distribution.

#### Measures of Association

Measures of association indicate whether two variables are related. Two measures are commonly used:

• Chi-square test of independence
• Correlation

Chi-Square test of independence is used to evaluate whether there is an association between two variables. (The chi-square test can also be used as a measure of goodness of fit, to test if data from a sample come from a population with a specific distribution, as an alternative to Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests.)

• It is most often used with nominal data (i.e., data that are put into discrete categories: e.g., gender [male, female] and type of job [unskilled, semi-skilled, skilled]) to determine whether they are associated. However, it can also be used with ordinal data.
• Assumes that the samples being compared (e.g., males, females) are independent.
• Tests the null hypothesis of no difference between the two variables (i.e., type of job is not related to gender).

To test for associations, a chi-square is calculated in the following way: Suppose a researcher wants to know whether there is a relationship between gender and two types of jobs, construction worker and administrative assistant. To perform a chi-square test, the researcher counts the number of female administrative assistants, the number of female construction workers, the number of male administrative assistants, and the number of male construction workers in the data. These counts are compared with the number that would be expected in each category if there were no association between job type and gender (this expected count is based on statistical calculations). The association between the two variables is determined to be significant (the null hypothesis is rejected), if the value of the chi-square test is greater than or equal to the critical value for a given significance level (typically .05) and the degrees of freedom associated with the test found in a chi-square table. The degrees of freedom for the chi-square are calculated using the following formula: df = (r-1)(c-1) where r is the number of rows and c is the number of columns in a contingency or cross-tabulation table. For example, the critical value for a 2 x 2 table with 1 degree of freedom ([2-1][2-1]=1) is 3.841.

Correlation coefficient is used to measure the strength and direction of the relationship between numeric variables (e.g., weight and height).

• The most common correlation coefficient is the Pearson's product-moment correlation coefficient (or simply Pearson's r), which can range from -1 to +1.
• Values closer to 1 (either positive or negative) indicate that a stronger association exists between the two variables.
• A positive coefficient (values between 0 and 1) suggests that larger values of one of the variables are accompanied by larger values of the other variable. For example, height and weight are usually positively correlated because taller people tend to weigh more.
• A negative association (values between 0 and -1) suggests that larger values of one of the variables are accompanied by smaller values of the other variable. For example, age and hours slept per night are often negatively correlated because older people usually sleep fewer hours per night than younger people.

#### Tests of Significance

The findings reported by researchers are typically based on data collected from a single sample that was drawn from the population of interest (e.g., a sample of children selected from the population of children enrolled in Head Start or Early Head Start). If additional random samples of the same size were drawn from this population, the estimated percentages and means calculated using the data from each of these other samples might differ by chance somewhat from the estimates produced from one sample. Researchers use one of several tests to evaluate whether their findings are statistically significant.

Statistical significance refers to the probability or likelihood that the difference between groups or the relationship between variables observed in statistical analyses is not due to random chance (e.g., that differences between the average scores on a measure of language development between 3- and 4-year-olds are likely to be “real” rather than just observed in this sample by chance). If there is a very small probability that an observed difference or relationship is due to chance, the results are said to reach statistical significance. This means that the researcher concludes that there is a real difference between two groups or a real relationship between the observed variables.

Significance tests and the associated p- value only tell us how likely it is that a statistical result (e.g., a difference between the means of two or more groups, or a correlation between two variables) is due to chance. The p-value is the probability that the results of a statistical test are due to chance. In the social and behavioral sciences, a p-value less than or equal to .05 is usually interpreted to mean that the results are statistically significant (that the statistical results would occur by chance 5 times or fewer out of 100), although sometimes researchers use a p-value of .10 to indicate whether a result is statistically significant. The lower the p-value, the less likely a statistical result is due to chance. Lower p-values are therefore a more rigorous criteria for concluding significance.

Researchers use a variety of approaches to test whether their findings are statistically significant or not. The choice depends on several factors, including the number of groups being compared, whether the groups are independent from one another, and the type of variables used in the analysis. Three widely used tests are the t-test, F-test, and Chi-square test.

Three of the more widely used tests of statistical significance are described briefly below.

• Chi-Square test is used when testing for associations between categorical variables (e.g., differences in whether a child has been diagnosed as having a cognitive disability by gender or race/ethnicity). It is also used as a goodness-of-fit test to determine whether data from a sample come from a population with a specific distribution.

• t-test is used to compare the means of two independent samples (independent t-test), the means of one sample at different times (paired sample t-test) or the mean of one sample against a known mean (one sample t-test). For example, when comparing the mean assessment scores of boys and girls or the mean scores of 3- and 4-year-old children, an independent t-test would be used. When comparing the mean assessment scores of girls only at two time points (e.g., fall and spring of the program year) a paired t-test would be used. A one sample t-test would be used when comparing the mean scores of a sample of children to the mean score of a population of children. The t- test is appropriate for small sample sizes (less than 30) although it is often used when testing group differences for larger samples. It is also used to test whether correlation and regression coefficients are significantly different from zero.

• F-test is an extension of the t-test and is used to compare the means of three or more independent samples (groups). The F-test is used in Analysis of Variance (ANOVA) to examine the ratio of the between groups to within groups variance. It is also used to test the significance of the total variance explained by a regression model with multiple independent variables.

Significance tests alone do not tell us anything about the size of the difference between groups or the strength of the association between variables. Because significance test results are sensitive to sample size, studies with different sample sizes with the same means and standard deviations would have different t statistics and p values. It is therefore important that researchers provide additional information about the size of the difference between groups or the association and whether the difference/association is substantively meaningful.

#### Resources

See the following for additional information about descriptive statistics and tests of significance:

### Graphical/Pictorial Methods

There are several graphical and pictorial methods that enhance understanding of individual variables and the relationships between variables. Graphical and pictorial methods provide a visual representation of the data. Some of these methods include:

• Bar charts
• Pie charts
• Line graphs
• Scatter plots
• Geographical Information Systems (GIS)
• Sociograms

#### Bar charts

• Bar charts visually represent the frequencies or percentages with which different categories of a variable occur.
• Bar charts are most often used when describing the percentages of different groups with a specific characteristic. For example, the percentages of boys and girls who participate in team sports. However, they may also be used when describing averages such as the average boys and girls spend per week participating in team sports.
• Each category of a variable (e.g., gender [boys and girls], children's age [3, 4, and 5]) is displayed along the bottom (or horizontal or X axis) of a bar chart.
• The vertical axis (or Y axis) includes the values of the statistic on that the groups are being compared (e.g., percentage participating in team sports).
• A bar is drawn for each of the categories along the horizontal axis and the height of the bar corresponds to the frequency or percentage with which that value occurs.

#### Pie charts

• A pie chart (or a circle chart) is one of the most commonly used methods for graphically presenting statistical data.
• As its name suggests, it is a circular graphic, which is divided into slices to illustrate the proportion or percentage of a sample or population that belong to each of the categories of a variable.
• The size of each slice represents the proportion or percentage of the total sample or population with a specific characteristic (found in a specific category). For example, the percentage of children enrolled in Early Head Start who are members of different racial/ethnic groups would be represented by different slices with the size of each slice proportionate to the group's representation in the total population of children enrolled in the Early Head Start program.

#### Line graphs

• A line graph is a type of chart which displays information as a series of data points connected by a straight line.
• Line graphs are often used to show changes in a characteristic over time.
• It has an X-axis (horizontal axis) and a Y axis (vertical axis). The time segments of interest are displayed on the X-axis (e.g., years, months). The range of values that the characteristic of interest can take are displayed along the Y-axis (e.g., annual household income, mean years of schooling, average cost of child care). A data point is plotted coinciding with the value of the Y variable plotted for each of the values of the X variable, and a line is drawn connecting the points.

#### Scatter plots

• Scatter plots display the relationship between two quantitative or numeric variables by plotting one variable against the value of another variable
• The values of one of the two variables are displayed on the horizontal axis (x axis) and the values of the other variable are displayed on the vertical axis (y axis)
• Each person or subject in a study would receive one data point on the scatter plot that corresponds to his or her values on the two variables. For example, a scatter plot could be used to show the relationship between income and children's scores on a math assessment. A data point for each child in the study showing his or her math score and family income would be shown on the scatter plot. Thus, the number of data points would equal the total number of children in the study.

#### Geographic Information Systems (GIS)

• A Geographic Information System is computer software capable of capturing, storing, analyzing, and displaying geographically referenced information; that is, data identified according to location.
• Using a GIS program, a researcher can create a map to represent data relationships visually. For example, the National Center for Education Statistics creates maps showing the characteristics of school districts across the United States such as the percentage of children living in married couple households, median family incomes and percentage of population that speaks a language other than English. The data that are linked to school district location come from the American Community Survey.

#### Sociograms

• Display networks of relationships among variables, enabling researchers to identify the nature of relationships that would otherwise be too complex to conceptualize.
See the following for additional information about different graphic methods:

### Analytical Techniques

Researchers use different analytical techniques to examine complex relationships between variables. There are three basic types of analytical techniques:

• Regression Analysis
• Grouping Methods
• Multiple Equation Models

#### Regression Analysis

Regression analysis assumes that the dependent, or outcome, variable is directly affected by one or more independent variables. There are four important types of regression analyses:

1. Ordinary least squares (OLS) regression
• OLS regression (also known as linear regression) is used to determine the relationship between a dependent variable and one or more independent variables.
• OLS regression is used when the dependent variable is continuous. Continuous variables, in theory, can take on any value with a range. For example, family child care expenses, measured in dollars, is a continuous variable.
• Independent variables may be nominal, ordinal or continuous. Nominal variables, which are also referred to as categorical variables, have two or more non-numeric or qualitative categories. Examples of nominal variables are children's gender (male, female), their parents' marital status (single, married, separated, divorced), and the type of child care children receive (center-based, home-based care). Ordinal variables are similar to nominal variables except it is possible to order the categories and the order has meaning. For example, children's families’ socioeconomic status may be grouped as low, middle and high.
• When used to estimate the associations between two or more independent variables and a single dependent variable, it is called multiple linear regression.
• In multiple regression, the coefficient (i.e., standardized or unstandardized regression coefficient for each independent variable) tells you how much the dependent variable is expected to change when that independent variable increases by one, holding all the other independent variables constant.
2. Logistic regression
• Logistic regression (or logit regression) is a special form of regression analysis that is used to examine the associations between a set of independent or predictor variables and a dichotomous outcome variable. A dichotomous variable is a variable with only two possible values, e.g. child receives child care before or after the Head Start program day (yes, no).
• Like linear regression, the independent variables may be either interval, ordinal, or nominal. A researcher might use logistic regression to study the relationships between parental education, household income, and parental employment and whether children receive child care from someone other than their parents (receives nonparent care/does not receive nonparent care).
3. Hierarchical linear modeling (HLM)
• Used when data are nested. Nested data occur when several individuals belong to the same group under study. For example, in child care research, children enrolled in a center-based child care program are grouped into classrooms with several classrooms in a center. Thus, the children are nested within classrooms and classrooms are nested within centers.
• Allows researchers to determine the effects of characteristics for each level of nested data, classrooms and centers, on the outcome variables. HLM is also used to study growth (e.g., growth in children’s reading and math knowledge and skills over time).
4. Duration models
• Used to estimate the length of time before a given event occurs or the length of time spent in a state. For example, in child care policy research, duration models have been used to estimate the length of time that families receive child care subsidies.
• Sometimes referred to as survival analysis or event history analysis.

#### Resources

See the following for additional information about regression methods and models:

Grouping methods are techniques for classifying observations into meaningful categories. Two of the most common grouping methods are discriminant analysis and cluster analysis.

Discriminant analysis

• Identifies characteristics that distinguish between groups. For example, a researcher could use discriminant analysis to determine which characteristics identify families that seek child care subsidies and which identify families that do not.
• It is used when the dependent variable is a categorical variable (e.g., family receives child care subsidies [yes, no], child enrolled in family care [yes, no], type of child care child receives [relative care, non-relative care, center-based care]). The independent variables are interval variables (e.g., years of schooling, family income).

Cluster analysis

• Used to classify similar individuals together. It uses a set of measured variables to classify a sample of individuals (or organizations) into a number of groups such that individuals with similar values on the variables are placed in the same group. For example, cluster analysis would be used to group together parents who hold similar views of child care or children who are suspended from school.
• Its goal is to sort individuals into groups in such a way that individuals in the same group (cluster) are more similar to each other than to individuals in other groups.
• The variables used in cluster analysis may be nominal, ordinal or interval.

#### Resources

See the following for additional information about grouping methods:

• Review of Discriminant Function Analysis
• Review of Cluster Analysis

#### Multiple Equation Models

Multiple equation modeling, which is an extension of regression, is used to examine the causal pathways from independent variables to the dependent variable. For example, what are the variables that link (or explain) the relationship between maternal education (independent variable) and children's early reading skills (dependent variable)? These variables might include the nature and quality of mother-child interactions or the frequency and quality of shared book reading.

There are two main types of multiple equation models:

• Path analysis
• Structural equation modeling

Path analysis

Path analysis is an extension of multiple regression that allows researchers to examine multiple direct and indirect effects of a set of variables on a dependent, or outcome, variable. In path analysis, a direct effect measures the extent to which the dependent variable is influenced by an independent variable. An indirect effect measures the extent to which an independent variable's influence on the dependent variable is due to another variable.

• A path diagram is created that identifies the relationships (paths) between all the variables and the direction of the influence between them.
• The paths can run directly from an independent variable to a dependent variable (e.g., X→Y), or they can run indirectly from an independent variable, through an intermediary, or mediating, variable, to the dependent variable (e.g. X1→X2→Y).
• The paths in the model are tested to determine the relative importance of each.
• Because the relationships between variables in a path model can become complex, researchers often avoid labeling the variables in the model as independent and dependent variables. Instead, two types of variables are found in these models:
• Exogenous variables are not affected by other variables in the model. They have straight arrows emerging from them and not pointing to them.
• Endogenous variables are influenced by at least one other variable in the model. They have at least one straight arrow pointing to them.

Structural equation modeling (SEM)

Structural equation modeling expands path analysis by allowing for multiple indicators of unobserved (or latent) variables in the model. Latent variables are variables that are not directly observed (measured), but instead are inferred from other variables that are observed or directly measured. For example, children's school readiness is a latent variable with multiple indicators of children's development across multiple domains (e.g., children's scores on standardized assessments of early math and literacy, language, scores based on teacher reports of children's social skills and problem behaviors).

There are two parts to a SEM analysis. First, the measurement model is tested. This involves examining the relationships between the latent variables and their measures (indicators). Second, the structural model is tested in order to examine how the latent variables are related to one another. For example, a researcher might use SEM to investigate the relationships between different types of executive functions and word reading and reading comprehension for elementary school children. In this example, the latent variables word reading and reading comprehension might be inferred from a set of standardized reading assessments and the latent variables cognitive flexibility and inhibitory control from a set of executive function tasks. The measurement model of SEM allows the researcher to evaluate how well children's scores on the standardized reading assessments combine to identify children's word reading and reading comprehension. Assuming that the results of these analyses are acceptable, the researcher would move on to an evaluation of the structural model, examining the predicted relationships between two types of executive functions and two dimensions of reading.

SEM has several advantages over traditional path analysis:

• Use of multiple indicators for key variables reduces measurement error.
• Can test whether the effects of variables in the model and the relationships depicted in the entire model are the same for different groups (e.g., are the direct and indirect effects of parent investments on children's school readiness the same for White, Hispanic and African American children).
• Can test models with multiple dependent variables (e.g., models predicting several domains of child development).

See the following for additional information about multiple equation models: