Mock objects and APIs

Since I do not have a computer science background (I have a BA in English Lit and an MA in Information Science), I sometimes think I’ve uncovered something entirely new which turns out to be common practice among programmers. The latest such ‘discovery’ is apparently called, by those who know better, a ‘mock object.’ Wikipedia says that a ‘programmer typically creates a mock object to test the behavior of some other object.’ Oh right, that’s exactly what I’ve done. Ok, then.

In the last year I have had to write two applications which accessed an API that did not exist yet. The reasons for this are obscure and perhaps best left unremarked upon, but I did learn something in the process. When building similar applications in the past I had found that it was incredibly useful to, you know, actually have an API to run them against, so it was suggested that I mock up the–currently non-existent–API. In fact, this turned out to be so useful that if ever I have to build another application that accesses an API I will repeat this procedure. A mock API allows you to test any and all responses that might be returned, and actually having the responses to test against is far more productive (at least it was for me) than simply reading about them in the documentation.

Having now read through the Wikipedia page on mock objects, I know that my own mock API is actually more of a ‘fake API.’ This is because I am not really testing the request itself or the submitted data. Instead I simply return a particular HTTP status code along with a bit of sample JSON in the body. Regardless of my indiscretion with the terminology, if you’d like to learn how I generated my mock/fake API, read on.

My mock fake API in Python

Setting up a simple web server is quite easy using Python’s Basic HTTP server. The following code will create a ‘things’ endpoint at which we can GET a particular thing via it’s Id:

import BaseHTTPServer
import re

class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        a = re.compile("\/things\/[a-zA-Z0-9]*")
        if a.match(self.path):
            self.send_response(200)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(open('data.json').read())

server_class = BaseHTTPServer.HTTPServer
httpd = server_class(('', 1234), MyHandler)
httpd.serve_forever()

To test this code you need to also create a sample json file, named ‘data.json,’ in the same folder as the above Python code. Now you can access the URL http://localhost:1234/things/1234, which should return whatever snippet of json you’ve stored in data.json. The regex on line 6 can be altered to accommodate whatever call you wish to emulate. In this case a ‘thing’ Id can be any number of numbers and lower or upper case letters.

It is similarly easy to handle other kinds of requests, such as PUT, POST and HEAD. Here’s a sample POST against the ‘things’ endpoint that returns a HTTP status code of 404 and a snippet of JSON stored as ‘error.json’:

import BaseHTTPServer

class MyHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_POST(self):
        if self.path == '/things':
            self.send_response(404)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(open('error.json').read())

server_class = BaseHTTPServer.HTTPServer
httpd = server_class(('', 1234), MyHandler)
httpd.serve_forever()

Should you wish to test the content of this POST and make this more of an actual mock object, you can read the contents of the submitted data using the following:

postLen = int(self.headers['Content-Length'])
postData = self.rfile.read(postLen)

Of course, this approach requires the presence of a ‘Content-Length’ header, but there are probably more direct methods you could try. Lastly, I occasionally found it useful to randomly return different status codes:

self.send_response(random.choice([200,404]))

The server code above could certainly be more dynamic, but for my use case it was easy enough to make manual edits to return a different code temporarily, or to alter the response body in some way.

Generating synonyms

If you spend any time managing vocabularies, there may come a day when you need to quickly generate a set of synonyms for your existing terms. After all, synonyms are quite useful. Thanks to synonyms, a Google search for ‘theatre’ should also return content with the word ‘theater,’ or perhaps even ‘cinema.’ We can also use synonyms to account for common misspellings like ‘loose’ for ‘lose’ (and vice versa).

Synonyms are also useful for people, places and organizations. Consider a text classification engine that only looks for the full term name ‘Apple, Inc.’ rather than the shorter and frequently used ‘Apple.’ Or perhaps, an auto-suggest search box that does not know that I really mean ‘President Barack Obama’ when I type ‘Obama.’

What dark arts must we employ to quickly generate a substantial set of synonyms? Let’s explore.

Bing Synonyms API

The Synonyms API from Bing returns alternate forms of products, people, locations and more. The free version is limited to 5000 calls per day, and the terms of service indicate that users of the service should not copy, store, or cache any Synonyms results. This pretty much excludes the service for my purposes, but it is a good demonstration of the kind of service we will be considering here.

This is a RESTful web API that can be invoked rather simply:

https://api.datamarket.azure.com/Bing/Synonyms/v1/GetSynonyms?Query=%27zimbabwe%27

This will return an atom feed containing entries like the following:

<entry>
 <id>https://api.datamarket.azure.com/Data.ashx/Bing/Synonyms/v1/GetSynonyms?Query='zimbabwe'&$skip=4&$top=1</id>
 <title type="text">GetSynonymsEntitySet</title>
 <updated>2013-07-11T15:28:55Z</updated>
 <link rel="self"
 href="https://api.datamarket.azure.com/Data.ashx/Bing/Synonyms/v1/GetSynonyms?Query='zimbabwe'&$skip=4&$top=1"/>
 <content type="application/xml">
 <m:properties>
 <d:Synonym m:type="Edm.String">republic of zimbabwe</d:Synonym>
 </m:properties>
 </content>
</entry>

Wordnet

Wordnet is a large lexical database for the English language. Among other things, it groups words into sets of synonyms, each expressing a single concept. It is also a free resource and its data can be used as needed. Perfect, let’s use it!

To pull synonyms from the Wordnet database I used the NLTK Python library. A brief description of the Wordnet interface that is available with NLTK is available here.

This makes getting synonyms for any term as simple as:

def synonymns(word):
    syns = []
    for synset in wordnet.synsets(word):
        for syn in synset.lemma_names:
            syns.append(syn)
    return sorted(set(syns))

Freebase

Freebase is a community curated collection of structured data, including entries for well-known people, places and things. Freebase’s data is licensed under an open, Creative Commons Attribution (CC-BY) license.

To return synonyms from Freebase I am passing an MQL (Metaweb Query Language) query to the MQL Read API.

The MQL itself is expressed in JSON and is fairly simple. Here I am asking for all aliases of a concept with the name ‘Zimbabwe’ with any type (types in Freebase provide a level of disambiguation between concepts so that ‘War’ the subject can be distinguished from ‘War’ the band):

[{
  "id": null,
  "name": "Zimbabwe",
  "/common/topic/alias": [],
  "type": []
}]

The following is a snippet of what is returned for this query:

{
"result": [{
    "id": "/en/zimbabwe",
    "/common/topic/alias": ["The Republic of Zimbabwe"],
    "name": "Zimbabwe",
    "type": [
        "/common/topic",
        "/location/location",
        "/location/country"
    ]
}]
}

We can use the ‘type’ value to indicate something about the alias (in this case the alias is a location as we would expect, but it could be a band or a person name).

DBpedia

I’ve saved my favorite source for last. It’s my favorite because it combines structured data with the haphazard efforts (both intentional and unintentional) of Wikipedia’s many users.

Like Freebase, DBpedia is a community curated collection of structured data, but it differs in that its data has been extracted from Wikipedia alone (whereas Freebase combines data from multiple sources with the contributions of its users).

To return synonyms from DBpedia I am making use of the page redirect information stored for each resource. If you have no idea what I mean by ‘page redirect’, bring up Wikipedia in your browser and search for ‘President Clinton.’ Now look at the page’s heading: ‘Bill Clinton.’ How did we get here? Magic. No wait, it was a page redirect. Now let’s take a look at the DBpedia version of the Bill Clinton resource:

http://dbpedia.org/page/Bill_Clinton

Scroll down the page to dbpedia-owl:wikiPageRedirects to see all of the RDF redirect triples for this resource. For example:

dbpedia:Bill_Clinton  dbpedia-owl:wikiPageRedirects   dbpedia:William_Jefferson_Clinton

This is a great way to get a bunch of synonyms for ‘Bill Clinton,’ but what if our term form is actually ‘William Jefferson Clinton’? Well, then we need to reverse the query. Here is a sample SPARQL query to do just that:

?x <http://dbpedia.org/ontology/wikiPageRedirects> <http://dbpedia.org/resource/Bill_Clinton>

We can also get redirects of our redirects for extra synonym goodness:

<http://dbpedia.org/resource/Bill_Clinton > <http://dbpedia.org/ontology/wikiPageRedirects> ?y.
?x <http://dbpedia.org/ontology/wikiPageRedirects> ?y.

A UNION Of all of the queries above would yield the following list (note the misspellings):

  • Bill Clinton
  • William Jefferson Clinton
  • BillClinton
  • Billl Clinton
  • 42nd President of the United States
  • William Jefferson Blythe III
  • Bull Clinton
  • William clinton
  • William Jefferson “Bill” Clinton
  • Bill Blythe IV
  • William Jefferson Blythe IV
  • Buddy (Clinton’s dog)
  • Bill clinton
  • William J. Clinton
  • Clinton
  • Bill
  • President Bill Clinton
  • President Clinton
  • Clinton Gore Administration
  • Bill Jefferson Clinton
  • Bill J. Clinton
  • Bil Clinton
  • WilliamJeffersonClinton
  • William Blythe III
  • William J. Blythe
  • William J. Blythe III
  • William J Clinton
  • Bill Clinton’s Post-Presidency
  • Bill Clinton’s Post Presidency
  • Bill Clinton\
  • Klin-ton

Huh, how did Buddy get in there?

Update 2016-08-24

I’ve created a Gist of the code I used to generate synonyms using DBpedia:

import sys
from SPARQLWrapper import SPARQLWrapper, JSON, XML, N3, RDF
def dbpedia(term):
term = term.strip()
nterm = term.capitalize().replace(' ','_')
query = """
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#&gt;
SELECT ?label
WHERE
{
{
<http://dbpedia.org/resource/VALUE&gt; <http://dbpedia.org/ontology/wikiPageRedirects&gt; ?x.
?x rdfs:label ?label.
}
UNION
{
<http://dbpedia.org/resource/VALUE&gt; <http://dbpedia.org/ontology/wikiPageRedirects&gt; ?y.
?x <http://dbpedia.org/ontology/wikiPageRedirects&gt; ?y.
?x rdfs:label ?label.
}
UNION
{
?x <http://dbpedia.org/ontology/wikiPageRedirects&gt; <http://dbpedia.org/resource/VALUE&gt;.
?x rdfs:label ?label.
}
UNION
{
?y <http://dbpedia.org/ontology/wikiPageRedirects&gt; <http://dbpedia.org/resource/VALUE&gt;.
?x <http://dbpedia.org/ontology/wikiPageRedirects&gt; ?y.
?x rdfs:label ?label.
}
FILTER (lang(?label) = 'en')
}
"""
nquery = query.replace('VALUE',nterm)
sparql = SPARQLWrapper("http://dbpedia.org/sparql&quot;)
sparql.setQuery(nquery)
rterms = []
sparql.setReturnFormat(JSON)
try:
ret = sparql.query()
results = ret.convert()
requestGood = True
except Exception, e:
results = str(e)
requestGood = False
if requestGood == False:
return "Problem communicating with the server: ", results
elif (len(results["results"]["bindings"]) == 0):
return "No results found"
else:
for result in results["results"]["bindings"]:
label = result["label"]["value"]
rterms.append(label)
alts = ', '.join(rterms)
alts = alts.encode('utf-8')
return alts
if __name__ == "__main__":
alts = dbpedia(sys.argv[1])
print alts

view raw
dbpedia_redirects.py
hosted with ❤ by GitHub