13 Psychological research basics

“A deeper understanding of judgements and choices also requires a richer vocabulary than is available in everyday language.”

— Daniel Kahneman, Thinking, Fast and Slow

“Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”
(“The limits of my language mean the limits of my world.”)

— Ludwig Wittgenstein, Tractatus Logico-Philosophicus

“Extraordinary claims require extraordinary evidence.”

— Carl Sagan, Broca’s Brain

In this lab, we will cover issues related to Chapters 1-3 and 5 in Beth Morling’s book. The first three chapters in Beth’s book are an introduction to scientific reasoning and Chapter 5 focuses on identifying good measurement.

If you took A-level psychology, some of the content might reiterate things you already heard about. Nevertheless, I would encourage to carefully read these chapters and compare their content to what you were told in school. Check if there are things you hadn’t heard about before. Ask yourself if what you read is consistent with what you previously learnt.

In addition, of course not everyone took A-level psychology and therefore another aim of these chapters is to get everyone on the same page with regard to psychological research basics. Please note that your knowledge of these basics will be tested in a summative quiz (see Chapter 15).

13.1 Empiricism

An important point made in Chapter 1 of Beth’s book is that psychologists are empiricists. In a nutshell, this means:

Important

Psychological reasoning must be based on data, not opinions or intuitions.

Just to be clear: There is nothing wrong with opinions or intuitions. On the contrary. However, they can only ever be the beginning, but not the endpoint of psychological research. We need to test our opinions and intuitions empirically, that is, by conducting studies that generate data.

Also, not all data are the same. That is, studies can produce data that differ in quality. The quality of data (and the quality of associated data analyses) are issues we will return to repeatedly in this module.

Being an empiricist changes how you view the world. It means that you should not naïvely accept claims, but critically question them, using your own intellect. As Kant put it in 1784:

“Habe Mut dich deines eigenen Verstandes zu bedienen.”
(“Have courage to use your own reason.”)

— Immanuel Kant, What Is Enlightenment?

Important

If you encounter a claim, here are some of the questions you should ask:

What does this actually mean?
Given what you already know, how credible is the claim?
What evidence is there to support the claim?

If there appears to be evidence to support the claim:

Is the explanation provided plausible? What could be alternative explanations for the effect? E.g., does the study have methodological shortcomings?
Is the effect statistically significant?
How big is the effect size?

13.2 Lab 2 Activities

The anti-anxiety patch

Here is a recent article about an “anti-anxiety sticker” published in the Guardian.

Activity

Read the article and discuss the claims made by the developers of the device with the student sitting next to you. What do you think about the device? How would you design a study to investigate whether or not it actually works?

Lab 2 practice quiz

Activity

You are again welcome to complete this activity together with the student sitting next to you. You may also use the remaining sections in this chapter to help you answer the quiz questions. If you have not read all of them by the end of this lab class, please read them in your own time.

Link to the Lab 2 practice quiz.

13.3 Measured and manipulated variables

A measured variable is one whose levels naturally occur and are simply observed and recorded by the researcher. A manipulated variable is a variable whose levels are controlled by the researcher.

Some variables can only be measured (e.g., height or intelligence), whereas others could be measured or manipulated. If, say, you are interested in the effect of caffeine consumption on exam performance, you could ask participants how much coffee they drank before the exam (measured) or you could assign them to different levels of caffeine intake before the exam (manipulated).

13.4 Conceptual and operational definitions

Let us look at an example to illustrate the difference between conceptual and operational definitions. Neisser et al. (1996) proposed the following conceptual definition of intelligence: “[Intelligence is the] ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought.” (Neisser et al., 1996, p. 77)

That’s all very nice, you might say, but how exactly do I measure intelligence? This is exactly what operational definitions are about. They describe what you need to do (i.e., which operations to perform) to measure intelligence. Various answers to this question have been provided. One example is Raven’s Progressive Matrices:

Two examples similar to those used in Raven's Progressive Matrices. — Two examples similar to those used in Raven’s test. Participants must identify the item that completes the overall pattern in a rule-based manner. From Little et al. (2014).

Here the operational definition involves presenting participants with complex patterns they have to complete. Working out the correct answer arguably involves reasoning and problem solving. Whether or not this is indeed an appropriate way to measure intelligence is a matter of validity (see below).

Note that conceptual and operational definitions are not independent. If, for example, your conceptual definition of intelligence suggests that visuospatial and verbal abilities are at least partially separable, your operational definition of intelligence must reflect this distinction. That is, your test must include items that are assumed to measure visuospatial abilities and items that are assumed to measure verbal abilities.

Operationalisation can take rather different forms, each with their own advantages and drawbacks. Imagine you wanted to investigate the frequency of texting while driving (your conceptual variable). You could simply ask people how often they text and drive. These data would be simple to collect, but might suffer from social desirability bias. Alternatively, you could hire research assistants to directly observe drivers in their cars. This would likely give you a better idea of the actual texting frequency in cars, but would be a very time-consuming and costly way of collecting the data. Finally, you could ask people how often their friends text and drive. You might hope that this approach removes some of the social desirability bias, while at the same time making the data easy to collect.

13.5 Types of claims

There are three prototypical types of claims that can be made in the context of psychological studies:

Frequency claims: The aim is to measure a single variable as accurately as possible. Example: 39% of teens admit to texting while driving.
Association claims: Investigate associations or correlations between variables. Example: Coffee consumption linked to lower depression in women.
Causal claims: Claim that manipulations of one variable (the independent variable or IV) causally influence another (the dependent variable or DV). Example: Spatial working memory training improves navigation skills.

13.6 Reliability and validity

Two key concepts for evaluating psychological research are reliability and validity. Reliability refers to how consistent a certain measurement is. Validity on the other hand refers to how well it measures what it is supposed to measure.

A good example to illustrate how the two are related are scales. Imagine you weigh a 1 kg bag of flour on your kitchen scales ten times, and each time it tells you that the flour weighs 800 g. This measurement is highly reliable (because it is identical on every repeat), but it is not valid (because the actual weight is 1 kg).

There are different approaches to measuring reliability:

Test-retest reliability: Can be used for experimental or questionnaire studies. The basic idea is to repeat the study after a delay with the same participants and to investigate how similar the participants performed on both attempts.
Internal reliability/consistency: Typically used for questionnaires. The basic idea here is to split a questionnaire into different parts (e.g., two halves) and to investigate how well these parts correlate with each other.
Interrater reliability: Typically used where independent observers rate certain behaviours. The idea is to investigate how similar the ratings across observers are.

Note that all types of reliability predict a high positive correlation if measurements are reliable:

Scatterplots showing examples of high test-retest reliability, internal reliability and interrater reliability.

Activity

In your own time: Can you explain what the data points in each of the plots represent? Once you have answered this question, ask an AI for the answer. You can do so by describing the plots or by uploading a screenshot of the plots. Do you and the AI agree? If you don’t agree and you’re not sure who’s right, let us know in the next lab!

In Chapter 3, Beth refers to four “big validities” (also see the overview on p. 135):

Construct validity: How well has the researcher defined and measured (frequency and association claims) or manipulated (causal claims) the variables of interest?
Statistical validity: How precise is our estimate and how big is the effect size?
External validity: How well do the results generalise to different people, times and places?
Internal validity: To what degree can we be sure that there are no alternative explanations for the results? (relevant for causal claims)

Reliability is a necessary, but not a sufficient condition for validity. It is necessary, because a highly unreliable test cannot be valid. Or, put differently, a test cannot be more valid than it is reliable. We could also say that the reliability of a test is the upper limit for its validity. The reason for this is that the highest possible correlation of a test is going to be with itself (a measure of reliability). No other test (a measure of validity) can correlate more highly with a test than the test with itself. On the other hand, reliability is not sufficient as demonstrated by the scales example: Even a highly reliable measurement is not necessarily valid.

References

Neisser, U., Boodoo, G., Bouchard, T. J. J., Boykin, A. W., Brody, N., Ceci, S. J., Halpern, D. F., Loehlin, J. C., Perloff, R., Sternberg, R. J., & Urbina, S. (1996). Intelligence: Knowns and unknowns. The American Psychologist, 51(2), 77–101. https://doi.org/10.1037/0003-066X.51.2.77