This is AWESOME!
Jim Vallandingham
Posts Visualizations Experiments About
A Data Driven Exploration of Kung Fu Films
January 24th, 2017 Demo Source
Recently, I’ve been a bit caught up in old Kung Fu movies. Shorting any technical explorations, I have instead been diving head-first into any and all Netflix accessible martial arts masterpieces from the 70’s and 80’s.
While I’ve definitely been enjoying the films, I realized recently that I had little context for the movies I was watching. I wondered if some films, like our latest favorite, Executioners from Shaolin, could be enjoyed even more, with better understanding of the context in which these films exist in the Kung Fu universe.
So, I began a data driven quest for truth and understanding (or at least a semi-interesting dataset to explore) of all Shaw Brothers Kung Fu movies ever made!
For those not dedicating some portion of their finite lives to these retro wonders, the Shaw Brothers Studio is the most famous (to me) Kung Fu film producer of all time. Their memorable title screen is almost always a part of my Kung Fu watching experience.

I figured this company’s entire martial arts collection would provide for a consistent and thorough look at the genre. Fortunately, after a bit of searching, I stumbled on what appears to be a comprehensive list of Shaw Brothers Films. I decided to pull down details for each of these movies from the amazingly useful Letterboxd movie-list-creation site to explore them in a data driven way to see what patterns could be discovered and what context I could learn from those patterns.
So here is a bit of data exploration fun. The analysis is in R, using tips and tricks from Hadley Wickham’s wonderful new Data Science in R book.
The full analysis code can be found in this R Notebook, which includes the code and graphs in an integrated format. And (spoilers!), the end Actor Collaboration Network and the rest of the code can be found on github.
Come for the Kung Fu, stay for the word embedding and interactive networks!
#Shaw Brothers, Through The Ages
To get started, here is a look at the count of Shaw Brothers films by year.

I’m using the wonderful theme_fivethirtyeight for these charts. Someday, I’ll make my own.
That’s 260 films over 22 years.
The first Kung Fu Shaw Brothers film in this data set is Temple of the Red Lotus from 1965. From the reviews, it sounds like it was a bit rough around the edges - but that’s about what you would expect from this burgeoning genre.

Looks pretty sweet to me, I’ll have to check it out
The studio hits its stride in the early 70’s, with a lull in the mid 70’s and another spike in the late 70’s / early 80’s. Keep in mind that even during the lull, most years the studio is still putting out 10 or more Kung Fu movies.
To create this graph, I first loaded my raw JSON file into R using the tidyjson package like this:
[QUOTE]# load the library
library(tidyjson)
read the raw json as text
filename = ‘../out/shaw.json’
shaw_json <- paste(readLines(filename), collapse=“”)
parse the json into a table, pulling out
the variables we want to explore.
films <- shaw_json %>% as.tbl_json %>% gather_array %>%
spread_values(
title = jstring(“title”),
director = jstring(‘director’),
year = jstring(‘year’),
watches = jnumber(“watches”),
likes = jnumber(“likes”),
time = jnumber(“time”)
)
I then graphed count by year using ggplot:
films %>% ggplot(aes(x = year)) +
geom_bar() +
labs(title = ‘Shaw Bros Films by Year’)
Not too shabby.
#Which Shaw Brothers Film should I watch?
If you are just getting started with the Kung Fu classics, 260 movies can be difficult to wade through. How do you get to the best of the best to make your initial experience in this genre a pleasant one?
Well, we can use the Letterboxd “watches” and “likes” metrics to help winnow down to the films that are the best bang-for-your-buck.
As you might expect, these two metrics are highly correlated:

Basically, anything with more than 400 watches or 100 likes seems like a good place to start. The standout, with over 800 watches is 1984’s Eight Diagram Pole Fighter. Not the catchiest title, but as one reviewer puts it:
Some of the raddest fights from any Shaw Bros films I’ve seen (specially that last one).
I haven’t seen this one yet, so I can’t comment - but it’s definitely on my list!
#Prolific Directors
We have the director for each movie in our dataset, let’s look to see if there are any popular standouts.

I’d say! Chang Cheh directed 67 or roughly 26% of all Shaw Brothers Kung Fu!
According to his Wikipedia page, he was known as the “The Godfather of Hong Kong cinema”, and rightly so - at least in terms of quantity.
Let’s pull out the top 5 directors, in terms of movie count, and see when they were most active.
Here’s the R code:
pull out just the top 5 directors
top_directors <- by_director %>% head(n = 5)
filter films to those directed by these titans of Kung Fu
films_top_director <- films %>% filter(director %in% top_directors$director)
add a label to distinguish top directors from everyone else
films_top_director_all <- films %>% mutate(director_label = ifelse(director %in% top_directors$director, director, ‘Other’))
graph
films_top_director_all %>%
ggplot(aes(x = year)) +
geom_bar(aes(fill = director_label)) +
labs(title = ‘Shaw Bros Director Count by Year’, fill = ‘’) +
theme_fivethirtyeight()
and plot:

We can kind of see that Chang Cheh’s reign is towards the beginning of the Shaw Brothers timeline and tapers towards the end. Let’s view the same data as a percentage of the total movies made each year:

This shows how dominate Chang Cheh was in directing nearly half of Shaw Brothers films in some years. In the mid and later years, Chor Yuen came in to direct many films as well.[/QUOTE]
continued next post