Local testing and Google Cloud Functions

Back in 2017, I wrote about local testing and AWS Lambda. At some point, I will update that post with details on how to use AWS SAM to invoke automated local tests. In this article, I will pivot away from AWS to talk about Google’s equivalent service, Cloud Functions, and again will focus on local testing.

I am using the Python runtime, which makes use of Flask to handle incoming requests. If you are already familiar with Flask patterns, you will find a lot to like about Google Cloud Functions. And as before, I am using Python 3’s unittest to discover and run my tests.

The following is an attempt to document a problem I encountered, and the solution that I settled on. There are likely other/better ways. If you have a suggestion, please leave me a comment!

Here’s where I started.

My project structure:

  • gcf-testing-demo
      • count
        • __init__.py
        • main.py
        • counter.py
      • tests
        • test_count.py


import os
import json
from counter import Count

def document_count(request):

    headers = {
        'Content-Type': 'application/json'

        request_json = request.get_json()
        document = request_json['document']
        c = Count()
        count = c.tok_count(document)
        response_body = {}
        response_body['document_count'] = count
        response = (json.dumps(response_body), 200, headers)

    except Exception as error:
        #can't parse json
        response_body = {}
        response_body['message'] = error
        response = (json.dumps(response_body), 400, headers)


class Count:

    def tok_count(self, mystring):
        tokens = mystring.split()
        return len(tokens)


from count.main import document_count
import unittest
import json
from unittest.mock import Mock

class MyTests(unittest.TestCase):
    def test_count(self):
        data = {"document":"This is a test document"}
        count = 5
        request = Mock(get_json=Mock(return_value=data), args=data)
        response = document_count(request)[0]
        self.assertTrue(json.loads(response)['document_count'] == 5)

if __name__ == '__main__':

This function takes an input:

{"document":"this is a test document"}

And returns a token count of the input ‘document’:


A few details worth highlighting:

From main.py:

request_json = request.get_json()

This is an example of how Google is reusing familiar Flask patterns. The method get_json will pull any JSON out of the incoming Flask object. We can then directly pick out properties, e.g. request_json[‘document’].

Also, from main.py:

response = (json.dumps(response_body), 200, headers)

Each Google Cloud Function is essentially an API method, and we must provide not just the response body, but the HTTP status code, and any headers. We return all 3 as a tuple and GCF will proxy this to the user appropriately.

The code for multiple Google Cloud Functions can be maintained in a single main.py. You will see how this looks when we deploy this function below. We can also import external modules, as we have done here with count.py, but to enable this functionality we must include an __init__.py in our function directory.

Note that you could also keep your modules in a subdirectory, provided that subdirectory also has an __init__.py. In either case, the __init__.py can be empty.

In test_count.py you will find a single test which ensures that the document counter returns the correct value for some input. I am using unittest.mock library to construct a Mock object equivalent to the object expected by our Google Cloud Function. Note the get_json method, which stores the contents of my test document.

request = Mock(get_json=Mock(return_value=data), args=data)

Okay, things look good. Let’s run our test:

python3 -m unittest discover

Which results in the error:

ERROR: tests.test_count (unittest.loader._FailedTest)
ImportError: Failed to import test module: tests.test_count
Traceback (most recent call last):
  File "/usr/lib/python3.5/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.5/unittest/loader.py", line 369, in _get_module_from_name
  File "/home/dfox/gcf-testing-demo/tests/test_count.py", line 1, in 
    from count.main import document_count
  File "/home/dfox/gcf-testing-demo/count/main.py", line 3, in 
    from counter import Count
ImportError: No module named 'counter'
Ran 1 test in 0.000s
FAILED (errors=1)

So, what’s happening? When I invoke my test script, it searches in local and system paths for counter.py, and finds…nothing! I have a few options at this point:

  • I could manually update my PYTHONPATH to ensure that my module directory is included.
  • I could move my test script into my module directory.
  • Or I can switch from using relative to absolute paths in my module code.

Let’s try this last option, as it seems like the solution that will be easiest on any future testers/developers. Update main.py as follows:

from count.counter import Count

When we run our unittest again, we should get back a successful report:

Ran 1 test in 0.001s

Great! Let’s deploy our function. I’m using the gcloud command-line utility. More info on setting this up here. I’m also using the beta client. Change directories into your module dir and execute the following (note that the name ‘document_count) refers to the function in main.py and not to main.py itself):

gcloud beta functions deploy document_count --runtime python37 --trigger-http

Which will return

ERROR: (gcloud.beta.functions.deploy) OperationError: code=3, message=Function failed on loading user code. Error message: Code in file main.py can't be loaded.
Did you list all required modules in requirements.txt?
Detailed stack trace: Traceback (most recent call last):
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 256, in check_or_load_user_function
  File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 166, in load_user_function
  File "", line 728, in exec_module
  File "", line 219, in _call_with_frames_removed
  File "/user_code/main.py", line 3, in 
    from count.counter import Count
ModuleNotFoundError: No module named 'count'

Oh boy. Looks like my genius plan to use absolute paths will not fly with our Google Cloud Function deployment. Which makes sense, as the resulting function has no knowledge of our local project structure. It is dynamically building a package out of main.py, any imported modules, and any dependencies we may have listed in our requirements.txt.

At this point, I was a bit unsure of how to proceed. I decided to take a look at some of Google’s own sample Python applications. Here is a sample application for a Slack ‘slash command’. Let’s peek at the project structure:

  • slack/
    • README.md
    • config.json
    • main.py
    • main_test.py
    • requirements.txt

It is not a like-for-like example, since there are no modules being imported into main.py, but notice that the test script is simply in the same directory as main.py. This was an approach I had considered and discarded, simply because I was concerned about mingling my tests with function code. But if it is good enough for Google, who am I to argue?

So, let’s restructure things:

  • gcf-testing-demo
      • count
        • __init__.py
        • main.py
        • counter.py
        • test_count.py

And then switch back to relative imports in main.py:

from counter import Count

I also need to update the import  in test_count.py:

from main import document_count

Now, I should be able to cd into count/ and execute my test:

python3 -m unittest discover</div>
Ran 1 test in 0.000s

Next, I will confirm that I can deploy this function as  GCF:

gcloud beta functions deploy document_count --runtime python37 --trigger-http
Deploying function (may take a while - up to 2 minutes)...done.
availableMemoryMb: 256
entryPoint: document_count
  url: ###
  deployment-tool: cli-gcloud
name: ###
runtime: python37
serviceAccountEmail: ###
status: ACTIVE
timeout: 60s
updateTime: '2019-01-23T15:06:23Z'
versionId: '1'

It worked!

Let’s log into the console, browse the Cloud Functions resource and view our function. If I look at the ‘Source’ tab I’ll see main.py, counter.py (this is good!), and also test_count.py (this is less good). This is why I did not want to mingle my tests with my code:


Fortunately, Google provides a method to filter out those files we don’t wish to incorporate into our Cloud Function package. We need to create a .gcloudignore file (equivalent to.gitignore which you may be familiar with) and add this to the module directory. I only need one line to filter out my tests, but I may as well also filter out __pycache__, *.pyc, and .gcloudignore itself:



After I redeployed this function, the source code looks much cleaner:


Now, I can finally make a live test of the deployed function:



Hierarchical multi-label classification of news content using machine learning

There is no shortage of beginner-friendly articles about text classification using machine learning, for which I am immensely grateful. In general, these posts attempt to classify some set of text into one or more categories: email or spam, positive or negative sentiment, a finite set of topical categories (e.g. sports, arts, politics). This last example can be described as a multi-class problem. Here’s a definition of multi-class taken from the scikit-learn documentation:

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.

This is certainly fine for a simple classification task such as slotting a news article into a broad vertical such as ‘Travel,’ or ‘Weather,’ but if our taxonomy is even a bit wider or deeper we will find ourselves struggling to assign each piece of text to a single category. Take for example, the following article:

Dancer badly injured in hit-and-run returns to the stage

PROVIDENCE, R.I. (AP) — A ballet dancer who was seriously injured in a Rhode Island hit-and-run over the summer has returned to the stage.

Festival Ballet Providence dancer Jordan Nelson was riding his bike in June when he was struck by a car. He suffered skull fractures and a broken clavicle. WLNE-TV reports doctors told Nelson he’d never dance again but he wouldn’t accept that as an answer.

How should we classify this document? Is it about dance? Or about car accidents? Or perhaps about sports injuries? If we look for inspiration in the IPTC Media Topics taxonomy, we might end up with the following topics:

accident and emergency incident http://cv.iptc.org/newscodes/mediatopic/20000139
ballet http://cv.iptc.org/newscodes/mediatopic/20000008

This kind of scenario, where a single sample can be associated with multiple targets (accident and ballet), is called multi-label classification. Let’s crib one more time from the scikit-learn documentation:

Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

And if we look a bit closer at these topics, we might notice that ‘ballet’ is a child of ‘dance’, which is itself a child of ‘arts and entertainment’. The full hierarchy of both terms can be expressed as the following:

  • arts, culture and entertainment
    • arts and entertainment
      • dance
        • ballet
  • disaster, accident and emergency incident
    • accident and emergency incident

We’ve quickly transitioned from a ‘simple’ multi-class classification problem to a multi-label classification problem that is further complicated by a set of hierarchically structured targets. Should we only apply the narrowest of topics in our taxonomy? Do we create a classifier for all topics, broad and narrow, and does the application of one mean anything for the other?

Sadly, I was not able to find many beginner-friendly articles written about hierarchical multi-label classification. I wish I could tell you that this will be that very article, but I can’t and it won’t. Maybe if I outline the problem, someone else will be inspired to write that article. And then we all benefit!

A simple example of multi-label classification

Let’s table the discussion of hierarchy for now and start with the simplest implementation of multi-label classification we can find.

The two main methods for approaching multi-label classification are problem transformations and algorithm adaptations. You will find a good overview of the two approaches here and here. Problem transformation techniques convert the multi-label task into a set of binary classification tasks, somewhat simplifying the task. For each label in the training data we create a single binary classifier and then the set of binary classifiers are then evaluated in concert. This is also referred to as a one-vs.-rest classifier. Let’s walk through a simple example.

Our training set:

example label
PROVIDENCE, R.I. (AP) — A ballet dancer who was seriously injured in a Rhode Island hit-and-run over the summer has returned to the stage. Festival Ballet Providence dancer Jordan Nelson was riding his bike in June when he was struck by a car. He suffered skull fractures and a broken clavicle. WLNE-TV reports doctors told Nelson he’d never dance again but he wouldn’t accept that as an answer. ballet
PROVIDENCE, R.I. (AP) — A ballet dancer who was seriously injured in a Rhode Island hit-and-run over the summer has returned to the stage. Festival Ballet Providence dancer Jordan Nelson was riding his bike in June when he was struck by a car. He suffered skull fractures and a broken clavicle. WLNE-TV reports doctors told Nelson he’d never dance again but he wouldn’t accept that as an answer. accident and emergency incident


You’ll notice that in the training data I have repeated the example text on two rows, one per label. Not knowing how many labels an example might have, and therefore how many columns I’d need for a single row display, this seemed the best way to encode the information. You might start with something a bit different. Regardless of where you start, we need to make some modifications before training a multi-label model.

Essentially, we need end up here:

example labels
PROVIDENCE, R.I. (AP) — A ballet dancer who was seriously injured in a Rhode Island hit-and-run over the summer has returned to the stage. Festival Ballet Providence dancer Jordan Nelson was riding his bike in June when he was struck by a car. He suffered skull fractures and a broken clavicle. WLNE-TV reports doctors told Nelson he’d never dance again but he wouldn’t accept that as an answer. [ballet, accident and emergency incident]


Where our ‘labels’ value is an array of label strings. Here is how I transformed my data:

import pandas as pd

path_to_csv = ‘training_data.csv'

dataset = pd.read_csv(path_to_csv,usecols=["label","example"])

#modify dataset for multilabel

grouped = dataset.groupby('example')

df = grouped['label'].aggregate(lambda x: list(x)).reset_index(name="labels")

This will group my data by example and then pull all of the related labels into an array. This is great, but we are not quite done. Though we can intuitively understand the meaning of our lists of strings, they will be too cumbersome for our model to process. We need to convert these arrays into the expected multi-label format, a binary matrix indicating the presence (or absence) of a label. We do this using scikit-learn’s MultiLabelBinarizer:

from sklearn.preprocessing import MultiLabelBinarizer

X = df['example']

y = df['labels']

y = MultiLabelBinarizer().fit_transform(y)

Now that our data is in the correct format, we can train a model. The OneVsRestClassifier allows us to use the binary classifier of our choice. Let’s start with LinearSVC:

from sklearn.model_selection import train_test_split

import numpy as np

from sklearn.multiclass import OneVsRestClassifier

from sklearn.svm import LinearSVC

from sklearn.pipeline import Pipeline

from sklearn.feature_extraction.text import TfidfVectorizer

#split data into test and train

random_state = np.random.RandomState(0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2,random_state=random_state)

#our pipeline transforms our text into a vector and then applies OneVsRest using LinearSVC

pipeline = Pipeline([

('tfidf', TfidfVectorizer()),

('clf', OneVsRestClassifier(LinearSVC()))



That, I think, is the simplest approach to multi-label classification. Of course, my results were (seemingly) abysmal. More on evaluation metrics later.

Other methods

Another problem transformation technique is the classifier chains method. This approach is similar to one-vs.-rest, seen above, in that it is comprised of several binary classifiers. But in a classifier chain the output of each classifier is passed on to the next classifier in the chain (along with the original input, in our case the news text). This approach is intended to improve our classifier by taking label dependencies/co-occurrences into consideration.

Our working example classed with ‘ballet’ and ‘accident and emergency incident’ is perhaps not the best represenation of label interdependence, since these two topics will not co-occur with great frequency (we hope!). However, if we browse our favorite news site, mentally classifying each article into a set of topics, we should come up with a few sets of commonly co-occurring topics. ‘Elections’ and ‘campaign finance.’ ‘Football’ and ‘sports injuries.’ ‘Coal mining’ and ‘environment.’ (For the purpose of these examples, I am inventing my own news topics, rather than looking to IPTC).

There are a few, frequently cited, papers on the subject of classifier chains (such as, Classifier Chains for Multi-label Classification) and a scikit-learn implementation, described here. In the example, the order of the chains is random. The documentation notes:

Because the models in each chain are arranged randomly there is significant variation in performance among the chains. Presumably there is an optimal ordering of the classes in a chain that will yield the best performance. However we do not know that ordering a priori. Instead we can construct an voting ensemble of classifier chains by averaging the binary predictions of the chains and apply a threshold of 0.5.

Since we have an implicit order in our hierarchical taxonomy, I wonder if this can be used to improve performance. Of course, there is no guarantee that co-occurrence will be limited to labels in the same taxonomy branch. At any rate, I have yet to implement this method.

Yet another problem transformation technique is the label powerset method. In this approach, each combination of labels in the training set is considered as a unique class. So, instead of two classes for our example, ‘ballet’ and ‘accident and emergency incident,’ we would have a single class ‘ballet, accident and emergency incident.’ If you are starting with something like IPTC’s media topics, a rather large taxonomy which may also be applied in an unexpected fashion (e.g. lots of cross-hierarchy cooccurrences), the resulting set of classes may be too large. Also, we cannot guarantee that our training set will have an example for every potential combination of labels. Mostly for this latter reason, I don’t think this method is appropriate for news classification.

Algorithm adaptions

This brings us to algorithm adaptations, methods that modify an existing algorithm so it can directly cope with a multi-label dataset.

There are several scikit-learn libraries that are described as having support for multi-label classification, a list which includes decision tree, k-neighbor, random forest, and ridge classifiers. Decision trees seem to have some promise for the problem, especially considering the issue of hierarchy. There has been some research on the subject, see Decision Trees for Hierarchical Multi-label Classification.

In addition, several algorithms are available from the scikit-multilearn library (built on top of scikit-learn and expressly designed for multi-label classification):

Next steps

Again, there are not enough examples of applying these methods for text classification, at least not enough at my level (novice). I think the scikit-multilearn library is likely the obvious next step as it implements several algorithms from commonly cited articles in the literature. Although, it may also be worthwhile to run through all the available scikit-learn multi-label-compliant algorithms, just to see if there are any easy wins to be had.

Notes on accuracy

After fitting the simple OneVsRestClassifier seen above, I was disappointed by the low accuracy score. Little did I know that the evaluation metrics I was used to using were not appropriate for a multi-label scenario. Here’s a note from the OneVsRestClassifier documentation regarding accuracy:

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

That is pretty harsh. What we need is a metric that will reflect partial accuracy. For instance, we apply ‘ballet’ correctly, but we also apply ‘weather’ incorrectly to the same example text. This is not a wholly inaccurate classification, it is only partially inaccurate. Luckily, we have a few options.

  • Hamming loss
    • the fraction of the wrong labels to the total number of labels
    • a loss function, so the optimal value is 0
    • scikit-learn implementation of hamming loss
  • Jaccard similarity coefficient score
    • the size of the intersection divided by the size of the union of the sample sets
    • ranges from 0 to 1, and 1 is the optimal score
    • scikit-learn implementation of jaccard similarity
  • Coverage error measure
    • average “depth” (how far we need to go through the ranked scores) to cover all true labels
    • the optimal value is the average number of true labels.
    • scikit-learn implementation of coverage error
  • Averaged (micro and macro) F1 scores
    • I’m having trouble understanding this one, so I’ll just point you to a seemingly useful StackOverflow post.
    • scikit-learn implementation of F1 score (see note for ‘average’ param)

An example of hamming loss and jaccard similarity using scikit-learn:

from sklearn.metrics import hamming_loss

from sklearn.metrics import jaccard_similarity_score

y_pred = pipeline.predict(X_test)


print(jaccard_similarity_score(y_test, y_pred))


  • 0.0107019250706
  • 0.391602082404

Above are my results using the simple OneVsRestClassifier described earlier. The hamming loss, seems good? The jaccard similarity, less so.

On hierarchy

One approach to dealing with my hierarchical taxonomy would be to simply flatten it, ignoring the hierarchy entirely. And this is exactly what I’ve done so far. This seems to be a fine approach for the short term, as I have yet to explore all of the available multi-label algorithms described above. Perhaps, the ‘flat’ approach will be good enough. Perhaps not.

If not, there are some novel ideas out there that use the hierarchy to the advantage of the classifier. Unfortunately, most of these ideas are described in academic papers, with few including any bootstrap code.

One approach which seemed interesting is described in a PyData talk by Jurgen Van Gael: Hierarchical Text Classification using Python (and friends). There is a lot to chew on here, but essentially this approach uses a set of Naïve Bayes classifiers to route a document through the branches of our hierarchical tree, and then individual classifiers for each node in the branch. Using IPTC Media Topics as our example again, we might have a set of Naïve Bayes classifiers to route a document to one of the top level terms (arts, culture and entertainment, education, environment, politics, society, sport, etc.) and then a different classifier for any subsequent nodes in the tree. I’m assuming there would be a set of Naïve Bayes classifiers for any hierarchical level where multiple paths can be followed. Van Gael also notes that if a training example is associated with a class that is 5 levels deep, that training example is copied to each of that class’s ancestors. It seems like a promising approach, but it requires a lot of orchestration, also several more classifiers than just the simple flattened approach.

On imbalance

Another potential issue with a corpus tagged with a relatively deep taxonomy, is that many of the deepest labels will have less examples. The more granular a concept the less broadly it can be applied. If we look at a news corpus that has been tagged with the IPTC Media Topics taxonomy we will likely find plenty of examples for ‘health,’ but far fewer for ‘dietary supplements’ (which is 4 levels down from ‘health’).

Generally, our classification models are better served by having a balanced number of examples across the target classes. Given a large enough corpus we may be able to ensure that all classes are equally represented, but it is inevitable that some will lag behind.

A few options:

We could modify the individual binary classifiers we’ve wrapped with the one-vs.-rest classifier. For example, the LinearSVC classifier (shown above) has a ‘class_weight’ parameter which purports (if set to ‘balanced’) to automatically adjust weights inversely proportional to class frequencies in the input data. So, our instances of ‘dietary supplements’ which appear less frequently should be weighted appropriately.

We could use the imbalanced-learn library. In particular, the method RandomUnderSample can be easily added to our pipeline to equalize the number of examples per class before training begins. However, it is not clear if this will work in a multi-label scenario.

Or, if we have decided to use one of the adapted algorithms provided by scikit-multilearn (described above), we could follow their suggestion to use a k-fold cross-validation approach with sklearn.model_selection.KFold. The scikit-multilearn folks also mention:

If your data set exhibits a strong label co-occurrence structure you might want to use a label-combination based stratified k-fold.

But this method uses the label powerset approach, in which cooccurring labels are combined into unique classes. This would have the same drawbacks described above, in that our training becomes more expensive (many more classes) and we may be unable to accurately tag content in the future as the classifier can only tag content with label combinations it saw during training.


If you have any thoughts about where I should go next, or regarding any false assumptions I might have made above, I’d love to hear from you!

More resources:

Multi-label Classification: A Guided Tour

Comparative Study of Supervised Multi-Label Classification Models for Legal Case Topic Classification

Learning Hierarchical Multi-label Classification Trees from Network Data



Getting started with Serverless Framework and Python (Part 2)

Stax radio tower

Radio tower at Stax Museum of American Soul Music

This is a continuation of my previous post, which offered some tips for setting up Serverless Framework and concluded with generating a template service in Python. By this point you should have a file, serverless.yml, that will allow you to define your service. Again, the documentation is quite good. I suggest reading from Services down to Workflow. This gives a good overview and should enable you start hacking away at your YML file and adding your functions, but I’ll call out a few areas where I had some trouble.


Lambda functions are hard to test outside the context of AWS, but any testing withing AWS is going to cost you something (even if it is pennies). The Serverless folks suggest that you “write your business logic so that it is separate from your FaaS provider (e.g., AWS Lambda), to keep it provider-independent, reusable and more easily testable.” Ok! This separation, if we can create it, would allow us to write typical unit tests for each discrete function.

All of my previous Lambdas contained the handler function as well as any other functions required to process the incoming event. I was not even aware that it was possible to import local functions into my handler, but you can! And it works great!

Here’s my handler:

from getDhash import image_to_dhash

def dhash(event, context):
    image = event['image']
    dhash = image_to_dhash(image)
    return dhash

The handler accepts an image file in a string, passes this to the imported image_to_dash function, and returns the resulting dhash.

And here is the image_to_dash function, which I’ve stored separately in getDhash.py:

from PIL import Image
import imagehash
from io import BytesIO
def image_to_dhash(image):
    return str(imagehash.dhash(Image.open(BytesIO(image))))

Now, I can simply write my tests against getDhash.py and ignore the handler entirely. For my first test I have a local image (test/image.jpg) and a Python script, test.py, containing my unit tests:

import unittest
from getDhash import image_to_dhash

class TestLambdaFunctions(unittest.TestCase):
    with open('test/image.jpg', 'r') as f:
        image  = f.read()
    def testGetDhash(self):
        self.assertEqual(image_to_dhash(self.image), 'db5b513373f26f6f')
if __name__ == '__main__':

Running test.py should return some testing results:

(myenv) dfox@dfox-VirtualBox:~/myservice$ python test.py 
Ran 1 test in 0.026s


Environment variables

AWS Lambdas support the use of environment variables. These variables can also be encrypted, in case you need to store some sensitive information along with your code. In other cases, you may want to use variables to supply slightly different information to the same Lambda function, perhaps corresponding to a development or production environment. Serverless makes it easy to supply these environment variables at the time of deployment. And making use of Serverless’ feature-rich variable system we have a few options for doing so.

Referencing local environment variables:

    handler: handler.dhash

Or, referencing a different .yml file:

    handler: handler.dhash
      MYENVVAR: ${file(./serverless.env.yml):${opt:stage}.MYENVVAR}

The above also demonstrates how to reference CLI options, in this case the stage we provided with our deploy command:

serverless deploy --stage dev

And for completeness sake, the serverless.env.yml file:

    MYENVVAR: "my environment variable is the best"


In the past, I found that dealing with Python dependencies and Lambda could be a real pain. Inevitably, my deployment package would be incorrectly compiled. Or, I’d build the package fine, but the unzipped contents would exceed the size limitations imposed by Lambda. Using Serverless along with a plugin, Serverless Python Requirements, makes life (specifically your Lambda-creating life) a lot easier. Here’s how it works.

Get your requirements ready:

pip freeze > requirements.txt

In my case, this produced something like the following:


Call the plugin in your serverless.yml file:

  - serverless-python-requirements

And that’s it. 🙂

Now, if you have requirements like mine, you’re going to hit the size limitation (note the inclusion of Pillow, numpy, and scipy). So, take advantage of the built in zip functionality, by adding the following to your serverless.yml file:

    zip: true

This means your dependencies will remained zipped. This also means you need to unzip them when your handler is invoked.

When you run the deploy service command, the Python requirements plugin will add a new script to your directory called unzip_requirements.py. This script will extract the required dependencies when they are needed by your Lambda functions. You will have to import this function before all of your other imports. For example:

import unzip_requirements
from PIL import Image

There does seem to be a drawback here, however. Until you run the deploy command, the unzip_requirements.py will not be added to your directory and therefore all of your local tests will fail with an ImportError:

ImportError: No module named unzip_requirements

Of course, I may be doing something wrong here.


  • There are actually two Python requirements plugins for Serverless. Am I using the best one?
  • As I add functions to my service, do I reuse the existing handler.py? Or do I create new handler scripts for each function?

Getting started with Serverless Framework and Python (Part 1)

Edison water tower

Water tower of the former Edison laboratories

For a while now I’ve been working with various AWS solutions (Lambda, Data Pipelines, CloudWatch) through the console, and sometimes using homegrown scripts that take advantage of the CLI. This approach has many limitations, but I’d actually recommend it if you’re just getting started with AWS as I found it to be a great way to learn.

But if you’re ready to truly embrace the buzzy concept of serverless architecture and let your functions fly free in the rarefied air of ‘the cloud,’ well then you’ll want to make use of a framework that is designed to make development and deployment a whole heck of a lot easier. Here’s a short list of serverless frameworks:

There are many more, I’m sure, but these seem to be the popular choices. And each one has something to recommend it. Chalice is the official AWS client. Serverless is widely used. Zappa has some neat dependency packaging solutions. Apex is clearly at the apex of serverless framework technology.

I selected Serverless for a few reasons. It does seem to be used quite a bit, so there is a lot of discussion on Stackoverflow and quite a few code examples on Github. As I need all the help I can get, these are true benefits. Also, there are numerous plugins available for serverless, which seems to indicate there is an active developer community. And I knew right off the bat that I would take advantage of at least two of these plugins:

  • Serverless Python Requirements
    • Coping with Python dependencies when deploying a Lambda is one of the more challenging aspects for beginners (like me). I appreciate that someone figured out how to do it well and made that method available to me with a few extra lines in the config
  • Serverless Step Functions
    • I’m looking forward to making use of this relatively new service and none of the other frameworks had anything built in yet for Step Functions

The installation guide for Serverless is pretty good, actually, but I’ll call out a few things that might need some extra attention.

Once you’ve installed Serverless:

sudo npm install -g serverless

Your next concern will be authentication. Serverless provides some pretty good documentation on the subject, but as they describe a few different scenarios, here’s my recommendation.

Follow their instructions for generating your client key/secret, then authenticate using the CLI:

aws configure --profile serverless-user

This will walk you through entering your key, your secret, and your region. The reason I suggest authenticating using the CLI is that the CLI is darned useful. For example, you may want to ‘get-function’ just to see if serverless is doing what you think it is doing.

Also note that I have provided a profile to the command. I find profiles useful, you may not. But if you do like to use profiles, Serverless will let you take full advantage of them. For example, you can deploy with a particular profile:

serverless deploy --aws-profile serverless-user

Or check out this nice idea for per stage profiles.

I found the idea of stages in AWS a bit confusing at first. This blog post does a good job of explaining the concept and how to implement it.

Installing the plugins was dead simple:

npm install --save serverless-python-requirements
npm install --save serverless-step-functions

And setting up a new service environment ain’t much harder:

serverless create --template aws-python --path myAmazingService

Now we are ready to dig in and start writing our functions. In my next post I’ll write a bit about Python dependencies, unit testing, and anything else that occurs to me in the meantime.

Mock objects and APIs

Since I do not have a computer science background (I have a BA in English Lit and an MA in Information Science), I sometimes think I’ve uncovered something entirely new which turns out to be common practice among programmers. The latest such ‘discovery’ is apparently called, by those who know better, a ‘mock object.’ Wikipedia says that a ‘programmer typically creates a mock object to test the behavior of some other object.’ Oh right, that’s exactly what I’ve done. Ok, then.

In the last year I have had to write two applications which accessed an API that did not exist yet. The reasons for this are obscure and perhaps best left unremarked upon, but I did learn something in the process. When building similar applications in the past I had found that it was incredibly useful to, you know, actually have an API to run them against, so it was suggested that I mock up the–currently non-existent–API. In fact, this turned out to be so useful that if ever I have to build another application that accesses an API I will repeat this procedure. A mock API allows you to test any and all responses that might be returned, and actually having the responses to test against is far more productive (at least it was for me) than simply reading about them in the documentation.

Having now read through the Wikipedia page on mock objects, I know that my own mock API is actually more of a ‘fake API.’ This is because I am not really testing the request itself or the submitted data. Instead I simply return a particular HTTP status code along with a bit of sample JSON in the body. Regardless of my indiscretion with the terminology, if you’d like to learn how I generated my mock/fake API, read on.

My mock fake API in Python

Setting up a simple web server is quite easy using Python’s Basic HTTP server. The following code will create a ‘things’ endpoint at which we can GET a particular thing via it’s Id:

import BaseHTTPServer
import re

class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        a = re.compile("\/things\/[a-zA-Z0-9]*")
        if a.match(self.path):
            self.send_header('Content-type', 'application/json')

server_class = BaseHTTPServer.HTTPServer
httpd = server_class(('', 1234), MyHandler)

To test this code you need to also create a sample json file, named ‘data.json,’ in the same folder as the above Python code. Now you can access the URL http://localhost:1234/things/1234, which should return whatever snippet of json you’ve stored in data.json. The regex on line 6 can be altered to accommodate whatever call you wish to emulate. In this case a ‘thing’ Id can be any number of numbers and lower or upper case letters.

It is similarly easy to handle other kinds of requests, such as PUT, POST and HEAD. Here’s a sample POST against the ‘things’ endpoint that returns a HTTP status code of 404 and a snippet of JSON stored as ‘error.json’:

import BaseHTTPServer

class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_POST(self):
        if self.path == '/things':
            self.send_header('Content-type', 'application/json')

server_class = BaseHTTPServer.HTTPServer
httpd = server_class(('', 1234), MyHandler)

Should you wish to test the content of this POST and make this more of an actual mock object, you can read the contents of the submitted data using the following:

postLen = int(self.headers['Content-Length'])
postData = self.rfile.read(postLen)

Of course, this approach requires the presence of a ‘Content-Length’ header, but there are probably more direct methods you could try. Lastly, I occasionally found it useful to randomly return different status codes:


The server code above could certainly be more dynamic, but for my use case it was easy enough to make manual edits to return a different code temporarily, or to alter the response body in some way.