Chapter 3 TV shows data exploration

Throughout your course it will be very important that you develop competencies that allow you to explore, analyse, visualise, understand and communicate quantitative data. This is often referred to as data literacy. This can also be really useful in our everyday lives. For example, as hard-working students, you might want to spend the time you can set aside for watching TV shows on only the best shows. These days, it’s not overly complicated to create an interactive table that allows you to explore TV shows available on Netflix, Amazon Prime and Disney+. You don’t need to concern yourself with the code that was used to create the table below, but in case you would like to have a look at it, you can click on the button that says “Show code”.

Feel free to play around with the table. Note that you can use the sliders to modify the shows that are displayed. You can also use the checkboxes to only display TV shows available on, for example, Netflix.2 Note how the number of rows (displayed below the table) changes when you move the sliders or tick the checkboxes. You can also search for a specific show by using the search box. Also note that you can sort the columns be clicking on the column headers. The default sorting is by Rotten Tomatoes ratings3, then by IMDb ratings. To reset the table sorting, reload the page.

df <- read_csv("assets/tv_shows.csv")

df2 <- SharedData$new(df)

bscols(
  widths = c(6, 1, 5),
  list(
    filter_slider("imdb", "IMDb rating", df2, column = ~IMDb, step=0.1, width="100%", min=0, max=10),
    filter_slider("rottom", "Rotten Tomatoes rating", df2, column = ~RottenTomatoes, step=1, width="100%", min=0, max=100),
    filter_slider("year", "Year", df2, column = ~Year, step=1, width="100%", sep=NULL)
  ),
  list(
  ),
  list(
    filter_checkbox("netflix", "On Netflix", df2, ~Netflix, inline = TRUE),
    filter_checkbox("prime", "On Amazon Prime", df2, ~Prime, inline = TRUE),
    filter_checkbox("disney", "On Disney+", df2, ~Disney, inline = TRUE)
  )
)
reactable(df2, defaultSorted = list(RottenTomatoes = "desc", IMDb = "desc"), searchable = TRUE, 
          showPageSizeOptions = TRUE, pageSizeOptions = c(10, 25, 50), minRows = 10, 
          paginationType = "jump", highlight = TRUE, resizable = TRUE, bordered = TRUE, 
          style = list(fontSize = "14px"), columns = list(Title = colDef(minWidth = 140)))

Data source4

An extremely important aspect of data literacy is asking critical questions about data. In the case of the TV shows data shown above, you might ask the following questions:

How many individual ratings are the overall ratings based on?

If an overall rating is based on only a few individual ratings, it might not be particularly reliable. As a result, the rating might change substantially as more individual ratings are added.

Who contributed the individual ratings?

For example, you might ask if it was critics (the basis for, e.g., Rotten Tomatoes’ Tomatometer) or members of the general public (the basis for, e.g., Rotten Tomatoes’ audience score and IMDb). This touches on issues of operationalisation (i.e., how should we measure the quality of a TV show or movie?).

How are the individual ratings combined to arrive at an overall rating?

For example, IMDb does not disclose how exactly individual ratings are weighted to arrive at an overall rating (see this article about IMDb ratings for more information). In scientific research, such an approach would not be acceptable as it lacks transparency and the results cannot be reproduced by other researchers.

Could the overall ratings be biased in any way?

For example, you might ask if people who rate TV shows on Rotten Tomatoes or IMDb are representative of the overall population. You can read about some of the potential biases at play. In addition, there might also be gender biases in the rating data). These issues threaten the validity of the ratings.

Could it be that Dark is completely overrated?

Yes, right? I’m really glad you also noticed this!5

Incidentally, reliability, operationalisation, reproducibility and validity are all key terms in psychological research and you will read much more about them in Beth Morling’s book (see next chapter).


  1. Note that the checkboxes are linked by logical ANDs. That is, selecting both Netflix and Amazon Prime will display shows that are available on both Netflix and Amazon.↩︎

  2. Note that these are Rotten Tomatoes’ audience scores, not the “Tomatometer” scores.↩︎

  3. Note that these data refer to shows available in the US. Thus, some shows available in the UK might not be listed, and some listed might not be available. What is more, which shows are available changes over time even within a country.↩︎

  4. This is of course a personal opinion and you’re most welcome to disagree!↩︎