More on the Empirical film study Data Set and the StoryAlity Movie Story Study Methodology

I would like to begin today’s StoryAlity post with a quote from philosopher of science Thomas Kuhn’s `The Structure of Scientific Revolutions‘ (the updated 2012 edition of the 1962 classic work first published by the University of Chicago Press, and for which the University of Chicago Press rarely gets enough recognition, but which I am trying to redress by mentioning, right now):

`Three classes of problems – determination of significant fact, matching of facts with theory, and articulation of theory – exhaust, I think, the literature of normal science, both empirical and theoretical.’

(Kuhn and Hacking 2012: 34)

So – before we move on to the articulation of theory – let’s yet again look closely at the doctoral research study data set itself:

The Top 20 ROI Films:

The Top 20 Audience Reach/Budget Films of the Last 70 Years. Data Source: The-Numbers.com. Analysis: JT Velikovsky

The Top 20 Audience Reach/Budget Films of the Last 70 Years. Data Source: The-Numbers.com. Analysis: JT Velikovsky

And – the Bottom 20 ROI films:

Bottom 20 ROI Film List

Bottom 20 ROI Film List

This way, we are “comparing apples with – `bad apples’” (as opposed to `oranges’, or even `bad oranges):

The Film ROI (return on investment) bell curve

The Film ROI (return on investment) bell curve

We may ask – Why such a small data sample? Why only `two times twenty’ = 40 feature films?

Why not, a data set composed of 1000 films – as Simonton (rightly) suggests in the (excellent) Great Flicks: Scientific Studies of Cinematic Creativity and Aesthetics (Simonton 2011: 107-08) ?

Well, for one thing, as Simonton both indicates and estimates, a study of 1000 feature films – of about 2 hours running time each – would take 10 people about 5 years; and – I’m just one doctoral researcher. It took me 6 solid months to analyze the 40 feature films that I have analyzed in depth, for about 50 different `story and filmic elements’. 1000 films would take longer than my remaining lifetime, unless the age-extension drug

1000 films would take longer than my remaining lifetime, unless the age-extension drug Telemorase comes on the market sometime soon at an affordable price, noting that a 2009 Nobel prize was awarded for the discovery of telemorase. (I love science.*)

Bottom ROI left tail of the bell curve

For another thing, I hereby contend that: we learn much more by looking at the `tails’ (the far right, and the far left) of the data in the Gaussian bell-curve, anyway. The examples we observe are therefore more extreme; the differences stand out much more – in very stark relief, in fact.

As a comparative proof of this, I would point directly to the three (admittedly, problematic) studies undertaken by The Wharton School of Marketing, at the University of Pennsylvania: one study published in 2007, one in 2010, one in 2014:

The 2007 Wharton School paper – Eliashberg, J, Hui, SK & Zhang, ZJ 2007, ‘From Story Line to Box Office: A New Approach for Green-Lighting Movie Scripts’, Management Science, vol. 53, no. 6, pp. 881-93.

The 2010 Wharton School paper – Eliashberg, J, Hui, SK & Zhang, ZJ 2010, ‘Green-lighting Movie Scripts: Revenue Forecasting and Risk Management‘.

The 2014 Wharton School paper: Eliashberg, J., Hui, S. K., & Zhang, Z. J. (2014). Assessing Box Office Performance Using Movie Scripts: A Kernel-based Approach, IEEE Transactions on Knowledge and Data Engineering, 26(11), 2639-2648. doi: 10.1109/TKDE.2014.2306681

In this blog post there is not sufficient space for a detailed analytic critique of these 3 papers (I must save that for a later post).

However some (but not all) of the key problematic points, in brief, are these:

1) The studies appear to have been conducted by researchers with expertise in Marketing, not extensive training in Story/Screenplay analysis, and are outside a Humanities discipline (by which I mean: Film Story, Literature, Narrative).

Purely from my own humble, limited and idiosyncratic perspective, having been a professional Story Analyst for 20 years, for major film studios, for private film production companies, for government film funding bodies and for other arts organizations (such as the National Writers Guild, and the National Council for the Arts), and prior to this having also studied Screenwriting at an undergraduate (Communications discipline) and post-graduate level (the National Film School) for 5 years fulltime, another 6th year at postgraduate (doctoral) level, also having taught screenwriting at undergraduate and postgraduate level, I can say sincerely that: Screenwriting for Feature Films is a recondite field. Screenwriting is deeply complex, and I would suggest – as per various peer-reviewed creativity studies, it takes on average 10 years to master the domain of Feature Film Screenwriting, which is to say, 10 years just to learn all aspects of: Premise, Plot, Theme, Character, Structure, Dialog, and Commercial Potential. (There are also vastly more aspects than this, as film includes about 10 sub-domains that influence screenwriting (cinematography, sound, music, acting, etc) but these 7 areas alone take roughly 10 years to master, and to understand in necessary depth.) I would also estimate that there are approximately 1000 `rules’ to writing a screenplay (for details, please see the free summary of 100 Screenwriting Texts I have published, here). However, the criteria for film story analysis used by the 2 above Wharton School studies omit many of these factors. This is not to criticize the Wharton School’s knowledge of the Marketing Discipline in any way; merely to point out that: Marketing and Film Narrative are very distinct and mostly-separate Domains.

2) The first two Wharton School studies involved analysis, by computer (genre/content, bag-of-words, and semantics) of words (textual analysis) in `spoilers’ (plot synopses), likewise, provided by audiences on the internet, not trained film story analysts. The third paper (2014) uses scripts but again there are problems with analysis of words.

Speaking as a sometime-consultant for the Singularity Institute for Artificial Intelligence (http://singularity.org/) – this method is not the same as: a trained human examining, comprehending and analyzing the data. Machine intelligence is not yet strong artificial intelligence. There are (subtextual) meanings and details within a film story (that emerge `above and beyond’ the text) that this Wharton School methodology (a textual analysis by a computer) cannot look for, nor recognize.

Gestalt Pac-Men

Gestalt Pac-Men

A good way to convey this concept, is: in the image above: a computer in both cases will detect 4 Pac-Men. But a human observer will detect an (invisible, yet present, emergent, `subtextual’) square, in the image on the left.

This is an epistemological problem: a problem of `perceived meanings’. A textual computer analysis does not recognize emergent meanings in a film story, taken from a plot synopsis/`spoiler’.

Also – how do we know that the meaning/interpretation provided in the plot spoiler is the most commonly perceived meaning (by the majority of a film’s audience) of the film, and its story events?

(I note that – we can of course ask the same question of/level the same criticism at, my own interpretation of the `meanings’ of the 40 film stories and story events. To this I would say: please watch the 40 films of the StoryAlity data set for yourself, and closely analyze my own research findings, comparing them to your own findings and interpretations of the data, and see if you agree/are convinced that I am correct. i.e. Please check, and verify my work. If indeed the StoryAlity Theory is falsified, I will be forced to recant. Either way, on the bright side, the science of story will have progressed significantly, compared with the current unscientific, `pre-paradigm’ state of the screenwriting domain.)

3) The criteria in the film stories analyzed are not sufficient to make for meaningful results, for screenwriters or story analysts/executives. The findings of the Wharton School studies are `obvious’ in one sense, given the current screenwriting convention. The findings simply reinforce the existing screenwriting convention, as composed of the general points of consensus in the major screenplay `guru’ screenwriting manuals. And yet – the current `screenwriting convention’ itself already results in 7 in 10 films losing money, and in 98% of screenplays being rejected. Therefore, if either of these things are to change anytime soon (and make life easier for the entire film industry, but – for screenwriters and filmmakers especially), then a new (scientific and conceptual) paradigm, an entirely new (different, alternative) way of looking at “What in fact empirically makes a film story successful” is required.

StoryAlity Spirality


For example – this doctoral StoryAlity study of the Top 20 ROI films reveals:

1) all are Villain Triumphant Stories,

2) are `Villain protagonist’ stories,

3) all have a two-part structure,

4) all have no Character Arcs,

5) they are predominantly temporally Linear,

6) they are predominantly set in `the Present Day’,


7) they all have common (`primal’) Themes of: `Survival, Reproduction and Revenge’.

These characteristics are contrary to (or, are notably absent from) the current `screenwriting convention’.

Yet the Wharton School studies neither look for – nor find – that these above-mentioned, arguably counter-intuitive High-ROI film story characteristics are likely to increase the predicted ROI of a film.

Moreover, the characteristics that are identified as likely to increase the ROI of a film story (i.e. that are shown to empirically correlate with a high ROI) by the two Wharton School studies are also characteristics that are found in the Bottom 20 ROI films.

– This is similar to “the 3-Act structure” problem, in that, if films that have 3 “Acts” are in the Top 20 – and yet also the Bottom 20 ROI films, then how is “3-Act structure” in itself useful, in increasing the likelihood of ROI? – It is not a variable; it is a constant. (Every film ever made, whether profitable or not, can be demonstrated to have 3 “acts”, or, a `beginning, middle and end’. – In fact, logically, if a story does not have a beginning middle and end, it is not actually a `complete’ story.)

Bottom ROI left tail of the bell curve

If films with 3 `Acts’ are in the top ROI films – yet are also in the Bottom ROI films, how is 3 Acts a variable? – It is instead a constant, and is therefore not useful for screenwriters/filmmakers to consider. All film stories can be shown to have 3 `Acts’, or: a Beginning, Middle and an End. (If it does not, then it is not a film story.)

Therefore, although the Wharton School of Marketing is clearly to be commended on their work, and their ingenuity and efforts in aiming to find story characteristics that might perhaps correlate with higher ROI, given that the methodology and the research paradigm utilized itself is `conventional’, the actual results/findings of the study are not demonstrably effective nor practically useful for screenwriters/fiilmmakers.

The studies reveal no new knowledge, they merely confirm the `old knowledge’ that currently and consistently (for at least the past 20 years: see Vogel 1990) has resulted (and continues to result) in 7 in 10 films losing money.

This problem (this lack/gap in knowledge) is exactly what this StoryAlity study aims to address,        

So, happily – 40 films (two sets of 20) is in fact a great start, and: at any rate, is vastly superior to the absent methodology of almost all of the other screenplay manuals. (i.e. such as McKee in Story (1997), choosing films that are simply “illustrations of points made in the text” (McKee, Story (1997), Notes on the text). – That is not Science.)

So – the empirical evidence itself (the top and bottom 20 ROI films) is clear. Although there are only two `contrasting data sets’ of twenty films each to study, there would certainly appear to be the promise of success, At this stage of our investigation, this is an optimal scenario; the sky is the limit.

Galileo Galilei shortly before he stabbed himself in the heel with a protractor

Galileo Galilei shortly before he accidentally stabbed himself in the heel with a protractor

Self-Critique: Some possible flaws and defects in the StoryAlity Doctoral Research Study Methodology

Some possible flaws and defects in the research methodology include that: the box office figures may be inaccurate, as they are estimates (film distributors can be unreliable with their reporting); there may be researcher bias; some concepts may be accidentally conflated; and that the theoretical model of how the film system works (using Creative Practice Theory and Creative Practice Theory Narratology) may well be problematic.

However, the evidence to date would appear to contradict certain guidelines provided by the eight screenplay manuals. Given the notions of falsifiability, refutability, and testability (Popper 1963: 36) this enquiry may falsify some – or all – of these eight dominant screenplay paradigms.

These tentative results of the empirical research study suggests that the 8 major/dominant US screenwriting “paradigms” are problematic, and this may contribute to an understanding of why 7 in 10 feature films currently lose money, assuming that many or even all of these films are based on these story/screenplay structural  paradigms and conventional screenwriting and film production methodology.

In The Structure of Scientific Revolutions (2012), Kuhn also states:

`The success of a paradigm – whether Aristotle’s analysis of motion, Ptolemy’s computations of planetary positions,  Lavoisier’s application of the balance, or Maxwell’s mathematization of the electromagnetic field – is at the start largely a promise of success discoverable in selected and still incomplete examples’

(Kuhn and Hacking 2012: 24)

So, admittedly, what we now have, as a result of this doctoral research study of 40 films, are: incomplete examples.

The only way to have complete examples would be, to in fact do a study of the story elements in all films that have ever made an ROI (at the theatrical box office), compared to, all films that did not make a return on investment.

In other words, comparing all films ever released that have been profitable – to – all films ever made (not just released) that have been unprofitable. For in determining the probability of predicted film story ROI, we need to also consider the `silent initial population‘: all the films that are made, that never even obtain a theatrical cinema release (assuming that was the filmmakers’ goal, and it is not always the case), and, whose existence therefore possibly remains unknown and in fact, untraceable.

Nassim Taleb points to the reason to include the initial silent population, in his paper, The Roots of Unfairness: the Black Swan in Arts and Literature (Taleb 2004):

`So in addition to the preceding cognitive bias, there prevails an information-theoretic one as well, related to the limitations of the information at hand –and the neglect of silent evidence.

Consider the thousands of writers now completely vanished from consciousness: their record did not enter analyses. We do not see the tons of rejected manuscripts because these have never been published, or the profile of actors who never won an audition –therefore we cannot analyze their attributes.

To understand successes, the study of traits in failure need to be present. For instance some traits that seem to explain millionaires, like appetite for risk, only appear because one does not study bankruptcies. If one includes bankrupt people in the sample, then risk-taking would not appear to be a valid factor explaining success. 

Any form of analysis of art that does not take into account the silent initial population becomes close to pure verbiage.’

(Taleb 2004)

Part of the problem there, however, is: given that 98% of screenplays get rejected (and/or do not get produced), if there are 500,000 films in existence, this is therefore, 2% of all screenplays. How do we find, let alone find time to analyze – the problems with, 12,500,000 feature film screenplays that are/were unproduced? The key point being, why do films get produced? Because the investors/financiers feel the screenplay will be profitable, and make an ROI. How do we determine what screenplays will make an ROI?

Apparently, nobody in the entire world really knows, or – if they do – they are not telling*. That’s why I did this doctoral research study: in order to share this knowledge with screenwriters and filmmakers worldwide.

Given that Vogel estimates there are approximately 500,000 feature films (Vogel 2011: 102), and given that, on average, 7 in 10 films are unprofitable (Vogel 2011: 71), this would suggest that there would be about 350,000 feature films in the `Loss On Investment’ category (i.e. 70% of 500,000 films), and 150,000 films that have been profitable. Of course – this is an estimate.

In fact writing in 2004, De Vany and Walls suggest that the number is closer to 78%:

`Seventy-eight percent of movies lose money and only 22% are profitable.’

(De Vany and Walls 2004: 1039)

As for Taleb’s `silent initial population’, I must therefore create an analogy.

What this doctoral research study does is: look at the subset of films that were a) made and b) obtained a theatrical cinema release. – It does not examine the additional films that were made but did not obtain a theatrical release. The analogy is: Let us study the top 20 and bottom 20 Olympic athletes of the past 70 years, and note the differences in their bio-socio-cultural characteristics. We are not examining the athletes who failed to qualify for the Olympics. That would be a separate study on: Why certain athletes failed to qualify for the Olympics.

In drawing this post on the Methodology of the empirical and scientific doctoral research to a close, one more quote from Kuhn:

`From Tycho Brahe to E. O. Lawrence, some scientists have acquired great reputations, not from any novelty of their discoveries, but from the precision, reliability and scope of the methods they developed for the redetermination of a previously known sort of fact.

(Kuhn and Hacking 2012: 26)

And I am now going to suggest something shocking, given that my previous posts (all 38 of them) and my argument has been very forcefully stated, to abandon Aristotle.

Aristotle thinks about old stuff

Aristotle mugs for the camera as he thinks about old stuff

I in fact still think we should definitely abandon his ideas for film (and the fact that we haven’t is deeply problematic for the screenwriting convention, and is likely one reason why 7 in 10 films lose money) – but I want to note that Aristotle was right about a great many things, with regard to ancient Greek drama. (And certainly made incredible and amazing contributions to Philosophy and the Sciences, and world culture in general).

But – again to be clear – most of those points in `Poetics’ do not apply to the dramatic principles of feature films.

However – there is one thing that Aristotle suggested (in Poetics) – about what he deemed “good” ancient Greek tragedies – that I believe also clearly applies to (is empirically evident in) the Top 20 ROI films; that they have 2 structural parts: the `tying’ (desis) and the `untying’ (lusis). (Aristotle et al. 1997: 115)

The `tying’ includes the backstory and everything that happens up to the moment the hero’s fortunes change (given that in Ancient Greek tragedy, by definition, something terrible befalls the hero); and the `untying’ is everything after it.

This 2-part structure is a common element the structure of the top 20 ROI films – and is not a common element in all the Bottom 20 ROI films.

In fact, the Top 20 ROI film stories are structured like ancient Greek tragedies; in 2 parts.

Aristotle's Big Idea

Aristotle’s Two-Part  Structure

(Note that this certainly – and obviously – does not mean `3 Acts’, and I still maintain that “Aristotelian 3-Act structure” is a misconception, and that it is not useful to suggest stories have a `beginning, middle and end’, as – if they do not – they are not a story, and, moreover – the bottom 20 ROI films also have: a beginning, middle and end.)

But what it does mean is – this StoryAlity doctoral study has in fact potentially achieved this:

…the precision, reliability and scope of… methods… developed for the redetermination of a previously known sort of fact.

(Kuhn and Hacking 2012: 26)

This assumes that: what Aristotle prescribed for the structure of `good’ ancient Greek drama (i.e. 2-part story structure) is a fact that – possibly by coincidence – also applies to narrative fiction feature films.

This screenplay is pretty hot

A really hot screenplay

And so, to return to `the burning question’:

What is the sole reason a film succeeds?

Is it: the Story?

Given the results of this StoryAlity empirical and scientific doctoral research study on the Top 20 and Bottom 20 ROI films – presently, in 2012, I am convinced that indeed it is The Story alone.

…Thoughts, Comments, Feedback?


JT Velikovsky

High-RoI Story/Screenplay/Movie and Transmedia Researcher

The above is (mostly) an adapted excerpt, from my doctoral thesis: “Communication, Creativity and Consilience in Cinema”. It is presented here for the benefit of fellow screenwriting, filmmaking and creativity researchers. For more, see https://aftrs.academia.edu/JTVelikovsky

JT Velikovsky is also a produced feature film screenwriter and million-selling transmedia writer-director-producer. He has been a professional story analyst for major film studios, film funding organizations, and for the national writer’s guild. For more see: http://on-writering.blogspot.com/



My thanks to Gill Leahy at UTS, for the 2006 Malcolm Gladwell New Yorker article: `The Formula‘.

* (In fact, I should qualify that statement: possibly the UK-based company Epagogix may know; yet they are a private company, that charges money to predict the ROI of a shooting screenplay. This service is not useful to writers setting out to conceive and write a screenplay, nor, to independent filmmakers who cannot afford that company’s services. I also cannot verify how accurate their findings/predictions are. It seems deeply problematic that their system determines ROI in part based on locations in the film: Would their system have predicted The #1 ROI film Paranormal Activity (2009), which is set entirely in a house? In fact, are any of the locations in the top 20 ROI films remarkable or `exotic’? I would suggest that perhaps none of them are.)

There have been two noteworthy articles about Epagogix:

1) A 2006 New Yorker article “The Formula” by Malcolm Gladwell (author of Outliers, 2008)

2) A 2009 CIO magazine article `Prediction Software: The New Science Behind the Art of Making Hit Movies‘.

* I love Science. Who doesn’t? I would love to win a Nobel Peace Prize. If anyone wants to nominate me, here is the form: http://www.nobelprize.org/nobel_prizes/peace/nomination/)


Aristotle, Baxter, John, and Atherton, Patrick (1997), Aristotle’s Poetics (London: McGill-Queen’s University Press).

De Vany, Arthur S. and Walls, W. David (2004), ‘Motion picture profit, the stable Paretian hypothesis, and the curse of the superstar’, Journal of Economic Dynamics and Control, 28 (6), 1035-57.

Eliashberg, J, Hui, SK & Zhang, ZJ 2007, ‘From Story Line to Box Office: A New Approach for Green-Lighting Movie Scripts’, Management Science, vol. 53, no. 6, pp. 881-93.

Eliashberg, J, Hui, SK & Zhang, ZJ 2010, ‘Green-lighting Movie Scripts: Revenue Forecasting and Risk Management‘.

Gladwell, M (2006), ‘The Formula’, The New Yorker, Oct 16th.

Kuhn, Thomas S. and Hacking, Ian (2012), The Structure of Scientific Revolutions (4th edn.; Chicago ; London: University of Chicago Press) xlvi, 217 p.

Nash Information Services, LLC (2012), ‘Movie Budget Records – Most Profitable Movies, Based on Return on Investment’, Movie Budget Records http://www.the-numbers.com/movies/records/budgets.php

Popper, Karl R. (1963), Conjectures and Refutations. The growth of scientific knowledge. (Essays and lectures.) (London: Routledge & Kegan Paul).

Simonton, Dean Keith (2011), Great Flicks: Scientific Studies of Cinematic Creativity and Aesthetics (New York; Oxford: Oxford University Press).

Taleb, N. N. (2004) “Roots of Unfairness” Literary Research / Recherche Littéraire. 21 (41–42): pp. 241–254

Vogel, Harold L. (2011), Entertainment Industry Economics: A Guide For Financial Analysis (8th edn.; New York: Cambridge University Press) xxii, 655 p.

6 thoughts on “StoryAlity #39 – More On The StoryAlity Movie-Story Study Methodology

  1. Pingback: StoryAlity #41 – On Scientific Paradigms – and Screenwriting Paradigms(!) | StoryAlity

  2. Pingback: StoryAlity #48 – On Holons and Holarchies (and: How Holarchies Work) | StoryAlity

  3. Pingback: StoryAlity #50.2 – The Universal Story Structure and Story Memes of the Top 20 ROI Films | StoryAlity

  4. Pingback: StoryAlity #51 – The Universal Story Structure and Story Memes of the Top 20 ROI Films | StoryAlity

  5. Pingback: StoryAlity #122 – IE2014 – International Interactive Entertainment Conference | StoryAlity

  6. Pingback: Understanding Story Structure - The Tao of Screenwriting

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.