Saturday, June 6, 2015

Tracking Netflix Usage - The easy way with Python!

Automate the Boring Stuff!

Previously, I'd generated some rather elaborate google doc spreadsheets to study my Netflix usage. This involved, copying and pasting the "Viewing Activity" listed when you login to your Netflix account, and many many fancy formulae to process the text. There is, a better way.

A Programming Book, With Great Examples For Non-Programmers

At a recent family event, my brother-in-law said he was reading this book, Automate the Boring Stuff with Python. The complete book is available online, and it's designed to be a book for non-programmers to write short scripts that eliminate tedious typing and clicking tasks.

As a programmer, who isn't familiar in particular with python, I've found the book to be a great introduction to python. One of the best things about it, is some of the powerful tools, like regular expressions, and the useful projects and practice programming assignments in the book.

So What Does This Have to Do With Netflix?

Consider the following short Python program:

import re, pyperclip
webText = pyperclip.paste()
showRegex = re.compile('\d\d/\d\d/\d\d\s+.*')
results = showRegex.findall(webText)
for each in range(len(results)):
    results[each] = results[each][3:6]  + results[each][0:3] + results[each][6:len(results[each])]
listOfItems = "\n".join(results)
cleanupRegex = re.compile('\tReport a problem.*')
finalList = cleanupRegex.sub('',listOfItems)

print (finalList)

If you highlight and copy the website with viewing activity into the clipboard, you go from this:
To this, in the clipboard:

From here, it's a very simple matter to past the data into google sheets, and within 2 minutes create:
  1. A column that calculates the year of each row
  2. A column that calculates the month of each row
  3. A column that calculates the day of each row (not used, but just for fun)
  4. A pivot table that shows the "counta" of each year, and month
  5. A chart, from that pivot table
  6. A duplicate of that pivot table and chart, that is filtered on the current year
After that, you're left with:

Now we've got charts, and graphs, and data for analysis. While I can only get number of items watched, and not duration, this still gives us something useful especially considering it took almost  no time to generate this.

The python logic, in a very few lines, converted the day/month/year, and removed all the non-relevant text from the website. In only a few minutes to write the script, it made the whole process of generating this much easier.