nteract-papermill

Replicating a piece of the Netflix work flow.

Setup

If you plan on using aws.

pip install papermill[s3] sudo apt-get update sudo apt-get install awscli pip3 install jupyter

Without installing jupyter, you will get an error from papermill saying that there is no kernel. I tried installing miniconda for some sort of synergy but it did not work. These instructions are the best. Some guides mentioned doing papermill with a conda environment, but I could not make that work. Anaconda comes with over 700 packages, which is great for your own computer, but not a great idea for running code on a webserver.

Setting up AWS.

Using AWS can be the most frustrating part of using the internet. The documentation is at best unclear, often what you see and what you are supposed to see are two different things, and you usually go around in circles reading documentation links. The instructions have everything you need to get papermill working with aws.

This is the first page you see when you go to the Amazaon IAM page. Click on the Create individual IAM users. Then click on Manage Users.

Click on Add User.

client = boto3.client( 9 "s3", 10 aws_access_key_id=config("AWS_ACCESS_KEY_ID"), 11 aws_secret_access_key=config("AWS_SECRET_ACCESS_KEY"), 12 )

Find a new name for yourself and give yourself programmatic access. We need to use the AWS CLI (command line interface) to take advantage of papermill.

Under the Attach existing policies directly tab, enter s3 in the search bar and add Amazon S3FullAccess. Click on through the review until you see the Success screen.

At this screen, you have the option to download your Access key ID and Secret access key as .csv file. You can only view these online while this window is open. So make sure to download these keys and save them somewhere you can trust.

Running Papermill with a bash script

To finally get this stuff working type aws configure in your terminal. Enter your Access Key ID and Secret Access Key where appropriate. There are other prompts after those two such as default region. Set those up if you really need to, but since I am in the U.S. I am fine with the defaults.

Now that we have the setup ready, let's get papermill running.

If you have a notebook you want to run you can use that, if not, you can use this notebook. The command to run my simple test is a one liner. Since I am in my the directory with my notebook the command is: papermill papermill_aws_permissions_test.ipynb s3://python-portfolio/saved-notebooks/save.ipynb

You can navigate to the folder you specified in the s3 address. the command will make a folder for you, if you haven't made one in the S3 console.

Running Papermill with a python script.

import boto3
from decouple import config
# setup the aws connection 
client = boto3.client(
     "s3",
     aws_access_key_id=config("AWS_ACCESS_KEY_ID"),
     aws_secret_access_key=config("AWS_SECRET_ACCESS_KEY"),
)
now = datetime.datetime.now()
pm.execute_notebook(
             "notebook_to_run.ipynb",
              "s3://notebooks/updates/test"
              + str(now.year)
              + "-"
              + str(now.month)
              + "-"
              + str(now.day)
              + ".ipynb",
          )

Last updated