Meta-analyses aggregate a comparable, method-independent measure across many studies that aim to tap into the same underlying phenomenon. To achieve that, we go through these three steps:
First, what is a meta-analysis?
Step 1: Collect studies on a single phenomenon
Step 2: Convert study results into effect sizes and their variance
Step 3: Analyze
→ Is there an effect? (Or do results cluster around 0? Is publication bias present?)
→ Are there variables influencing the effect?
So for the first step, we do a hopefully exhaustive literature review. Meta-analyses are not bound by a specific narrative, but aim to incorporate all studies that tap into the same phenomenon.
In the second step, we express the outcome of a single experiment in a way that captures how big an effect is and how much it varies. A very common effect size measure is Cohen's d* which is based on standardized mean differences. In a typical infant study, babies might hear two types of trials and the responses to each are compared. In most papers, it is sufficient that the difference between the trial types reaches statistical significance, but in a meta-analyses we care about the size of this single observed effect and its variance.**
The third step can be split into two parts. First, we want to know whether we are measuring something real. The weighted effect sizes taken together should be different from 0. Using effect sizes and their standard error, we can also check for publication biases.***
There might be factors changing effect sizes in a systematic way. In infant studies, age is a very good suspect. But we could look at almost anything that is frequently reported in papers.
We want to model the emergence and development of infant abilities, so a number of questions need to be answered: What can we put in models, what happens when, and how do we justify decisions? Is it enough to find a study that shows an ability? As soon as the magical threshold of a significant p value is crossed, we can cite it.
Why should we bother?
Actually, this is not ideal, to put it mildly. A single study might be the 1 in 20 with a significant p value where no true effect is present. Or it only applies to a specific population (e.g., American English learning infants). Or the stimuli are very particular… In short, we might rely on studies that cannot be replicated. To know the underlying truth that we then can attempt to model, we would ideally draw our conclusions from a sample as large as possible (= a meta-analysis).
Further, an effect might be small or very variable, independent of the population and the stimuli. If we model an ability that is difficult to measure with our (admittedly often very imprecise) infant measures, can we really assume that it is robust and reliably emerges at a certain age?
In addition, specific factors could influence an effect. This might either be crucial to our theories (e.g., vowel discrimination depends on nativeness) or be an important nuisance factor that we should not just dismiss (the method used to measure an ability significantly affects effect sizes).
Finally, we can get a better estimate of what happens when, as meta-analyses cover a range of ages and not just one, or at best a few age groups. In fact, the possibility to observe developmental trajectories is a unique opportunity!
A case study: Infant vowel discriminationA recent meta-analysis that encompasses data from over 40 years of research looked at the emerging ability to discriminate sounds, with a focus on vowels. Commonly we assume that the ability to distinguish two sounds, including vowels, from the native tongue improves as infants tune into their surrounding language. At the same time, the ability to tell two sounds (maybe also vowels?) from other languages apart should decrease. It is in fact the latter observation that lies at the core of many current approaches to early language acquisition.
While I did not author this meta-analysis, I can look at all the data, because everything is available and even updateable here: inphondb.acristia.org****
First of all, good news: the weighted observed effect sizes are different from 0. But we do have a publication bias, which is discussed in the paper.
The image***** shows that effect sizes for native and nonnative stimuli diverge, and they differ by 6-7 months, an age typically cited as the onset of native-like vowel discrimination. But even more striking is the vast variability in effect size (plotted along the y axis) and the variance (indicated by size of the point, larger = lower variance). It seems not the best strategy to base hard conclusions on any single (or a small sample) of these points.
The meta-analysis uncovered a number of methodological factors that significantly influence results. A special report can be found on the website. This means that we have to be careful when interpreting studies, as our measures are very noisy and even differ in their ability to pick up phenomena.
Another case study on infant word segmentation from native speech would fill a whole blog post, but if you are already curious, take a look here: inworddb.acristia.org
What now?The modeling community can take a few steps to benefit from meta-analyses and help promote them:
- First, be aware that such things exist (but not for all phenomena - yet!)
- Do not build your castle, erm, model, (or assumptions therein) on a single experiment
- Instead, value replications (e.g., by citing them!)
- Hop on the meta-train! We can always use support in entering papers (that you read anyways for your model)
- Explore existing datasets that have been made available
- Get a better idea of developmental trajectories
* There are 3 groups of effect sizes: (1) effect sizes based on means, which includes Cohen's d; (2) effect sizes based on binary data; and (3) effect sizes based on correlations. Since most developmental studies in the lab use mean responses of two groups or of the same infant in two (or more) conditions, I am limiting this blog post to Cohen's d. In this chapter and the following ones, a gentle introduction to effect sizes is provided.
** To get a feel for Cohen's d I highly recommend to play with the visualization of RPsychlogist.
*** In fact, there are many complicated ways to check for publication biases in a meta-analysis and there is an active discussion going on what to do here. Since this ventures into the territory of advanced meta-analyzing, I am leaving this topic out.
**** Submit your filedrawer studies! Or new findings!
Showing 1 Reviews
This article and its reviews are distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and redistribution in any medium, provided that the original author and source are credited.