Deploying a pre-trained object detection model on Google Cloud ML Engine

The Tensorflow detection model zoo provides several extremely useful pre-trained object detection models. And though we have the option of using one of these models in a transfer learning scenario to train our own custom model, occasionally the pre-trained model will provide everything we need.

For example, we may wish to add human actions to our own image classifier. Instead of collecting and labeling images and training our own model, we might decide to employ one of the existing models trained on the AVA dataset and map a subset of these labels to our own taxonomy. For example:

  • Stand
  • Sit
  • Walk
  • Run
  • Dance
  • Fight

These pre-trained models have an expected input of a ‘tensor.’ We can confirm this by downloading and inspecting the model. In the zoo there is just one AVA model. Let’s download and extract the model:

wget http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz</pre>
tar -xzf faster_rcnn_resnet101_ava_v2.1_2018_04_30.tar.gz

If we have Tensorflow installed we can use the saved_model_cli to inspect it:

saved_model_cli show --dir faster_rcnn_resnet101_ava_v2.1_2018_04_30/saved_model/ --all

Which will return the following:


MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['inputs'] tensor_info:
dtype: DT_UINT8
shape: (-1, -1, -1, 3)
name: image_tensor:0
The given SavedModel SignatureDef contains the following output(s):
outputs['detection_boxes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100, 4)
name: detection_boxes:0
outputs['detection_classes'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100)
name: detection_classes:0
outputs['detection_scores'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 100)
name: detection_scores:0
outputs['num_detections'] tensor_info:
dtype: DT_FLOAT
shape: (-1)
name: num_detections:0
Method name is: tensorflow/serving/predict

From this we can tell that the expected input is an uint8 image tensor. While this may work with ML Engine, all the other examples I have seen used a different input type: encoded_image_string_tensor.

So, how can we update the saved model so it allows for this different input type? Essentially, we need to install the Tensorflow Object Detection library and then re-export this pre-trained model with the preferred input signature. Note that the outputs seen above will be fine for our purposes (as we will see later).

Follow the installation guide to install the Object Detection library.

Note 1:

I am running Anaconda on a Google Compute instance, and had an issue with bunzip2. I resolved this by first installing bzip2 before beginning the installation process:

sudo apt-get install bzip2

Note 2:

I also had some issues installing Protobuf even though I was following the installation guide. Here are the steps that worked for me:


mkdir protobuf
cd protobuf
wget https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protoc-3.7.1-linux-x86_64.zip
unzip protoc-3.7.1-linux-x86_64.zip

Now add the following line to .bashrc in your home directory:

export PATH=/home/dfox/protobuf/bin${PATH:+:${PATH}}

And then activate the modified PATH using source:

source .bashrc

Lastly, you should be able to run the final step in the installation guide. Change directories into TensorFlow/models/research/ and run the following command:

protoc object_detection/protos/*.proto --python_out=.

Exporting the model:

We have downloaded our pre-trained model, and we have installed the Object Detection library. At this point, we can run export_inference_graph.py to modify the input of our pre-trained model. Here is the sample command provided by Google:


python3 export_inference_graph.py \
--input_type encoded_image_string_tensor \
--pipeline_config_path path/to/sample/config \
--trained_checkpoint_prefix path/to/model/checkpoint \
--output_directory path/to/output/for/mlengine

The sample pipeline configs can be found in the Tensorflow repository under object_detection/samples/configs. Pick the config that will work with your model.

Here is the command I used for the AVA model:


python TensorFlow/models/research/object_detection/export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path TensorFlow/models/research/object_detection/samples/configs/faster_rcnn_resnet101_ava_v2.1.config --trained_checkpoint_prefix faster_rcnn_resnet101_ava_v2.1_2018_04_30/model.ckpt --output_directory ava_for_mlengine

Note that there is no file named model.ckpt in the AVA directory; this is the prefix of the files created for the model checkpoint.

Now we should have a directory ava_for_mlengine, with everything we need to deploy this model:


$ ls ava_for_mlengine/

checkpoint model.ckpt.data-00000-of-00001 model.ckpt.meta saved_model
frozen_inference_graph.pb model.ckpt.index pipeline.config

Now our model is ready to be deployed on ML Engine, which we can do using the gcloud command line tool.

We begin by copying our new saved model to a GCS bucket which can be accessed by ML Engine:

gsutil cp -r ava_for_mlengine/saved_model/ ${GCS_BUCKET_NAME}/faster_rcnn_resnet101_ava/

Next we can create our model:

gcloud ml-engine models create faster_rcnn_resnet101_ava --regions us-central1

And finally create our model version (v1):

gcloud ml-engine versions create v1 --model faster_rcnn_resnet101_ava --origin=gs://${YOUR_PROJECT_NAME}/faster_rcnn_resnet101_ava/saved_model --framework tensorflow --runtime-version=1.13

Note that I have specified a runtime version equivalent to my local version of Tensorflow (which we used to export the model). Your version may be different.

This step may take some time. While we wait, let’s download an image for testing:

Kick

wget https://upload.wikimedia.org/wikipedia/commons/9/9b/Kick.JPG

You can use the code provided by Google for performing an online prediction (found here), but we first need to convert our image into an encoded string.

For a quick test, you can add the following to the script provided by Google:

if __name__ == '__main__':
	instances = []
	for image_path in sys.argv[1:]:
		with open(image_path, "rb") as image_file:
			encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
			instances.append({"b64":encoded_string})
		response = predict_json($MY_PROJECT_NAME,$MY_MODEL_NAME,instances)
		for response in response:
			print(response.keys())

Which will return:

dict_keys(['detection_boxes', 'detection_classes', 'raw_detection_scores', 'detection_scores', 'num_detections', 'raw_detection_boxes'])

The boxes provide the locations of the detected objects, the classes provide a unique key that identifies the detected object, and the scores provide the confidence for each detected object.

And if I am interested in knowing which classes were applied with a particular confidence level:

for c, s in zip(response['detection_classes'], response['detection_scores']):
	if s > .80:
		print(c, s)

Which, for the ‘kick’ image seen above, will return:

11.0 0.8621717691421509
80.0 0.8211425542831421
80.0 0.8173160552978516

To figure out which classes these keys indicate, I need to download the AVA concept mappings. These can be found here. The relevant labels:


label {
name: "sit"
label_id: 11
label_type: PERSON_MOVEMENT
}

label {
name: "watch (a person)"
label_id: 80
label_type: PERSON_INTERACTION
}

So, we did not detect any kicking with any confidence. There are two kick related labels (35 and 71), but neither appear even in an unfiltered list of the detected classes.

At this point, if my intention was to create a sports specific action detector, that would perform well on photos of martial arts events, I might train my own object detection model. I might also decide to use AVA as a baseline for transfer learning.

Generating new triples in Marklogic with SPARQL CONSTRUCT (and INSERT)

SPARQL is known mostly as query language, but it also has the capability—via the CONSTRUCT operator—to generate new triples. This can be useful for delivering a custom snippet of RDF to a user, but can also be used to write new data back to the database, enriching what was already there. Marklogic’s triple store supports the SPARQL standard, including CONSTRUCT queries, and the results can be easily incorporated back into the data set using the XQuery Semantics API. Here’s a quick demo.

I have a set of geography terms which have already been linked to the Geonames dataset. Here’s an example:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> ;
 <http://www.w3.org/2004/02/skos/core#inScheme> <http://cv.ap.org/a#geography> ; <http://www.w3.org/2003/01/geo/wgs84_pos#long> &quot;-70.76255&quot;^^xs:decimal ;
 <http://www.w3.org/2003/01/geo/wgs84_pos#lat> &quot;43.07176&quot;^^xs:decimal ;  <http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/5091383/> ;
 <http://www.w3.org/2004/02/skos/core#broader> <http://cv.ap.org/id/9531546082C6100487B5DF092526B43E> ;
 <http://www.w3.org/2004/02/skos/core#prefLabel> &quot;Portsmouth&quot;@en .

If we look at the same term via the New York Times’ Linked Open Data service we’ll see a set of equivalent terms, including the Geonames resource for Portsmouth:


<http://data.nytimes.com/10237454346559533021> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/portsmouth_nh_geo> ;
 <http://dbpedia.org/resource/Portsmouth%2C_New_Hampshire> ;
 <http://rdf.freebase.com/ns/en.portsmouth_new_hampshire> ;
 <http://sws.geonames.org/5091383/> .

Oh, hey, we have the same Geonames URI. Guess what we can do with that? More links!

After ingesting the NYTimes data into Marklogic, I was able to write a SPARQL query to begin connecting the two datasets using the Geonames URI as glue.


 PREFIX cts: <http://marklogic.com/cts#>
 PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX owl: <http://www.w3.org/2002/07/owl#>
 SELECT ?s ?n 
 WHERE
 {
 ?s skos:inScheme <http://cv.ap.org/a#geography> .
 ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
 ?s skos:exactMatch ?gn .
 ?n owl:sameAs ?gn .
 } 
 LIMIT 2

Returning:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://data.nytimes.com/10237454346559533021>
<http://cv.ap.org/id/662030807D5B100482BDC076B8E3055C> <http://data.nytimes.com/10616800927985096861>

Now, if we want to generate triples instead of SPARQL results, we simply swap out our SELECT for a CONSTRUCT operator, like so:


PREFIX cts: <http://marklogic.com/cts#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
CONSTRUCT { ?s skos:exactMatch ?n .}
WHERE
 {
 ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
 ?s skos:inScheme <http://cv.ap.org/a#geography> .
 ?s skos:exactMatch ?gn .
 ?n owl:sameAs ?gn .
 } 
LIMIT 2

Returning:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10237454346559533021> .
<http://cv.ap.org/id/662030807D5B100482BDC076B8E3055C> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10616800927985096861> .

We have a few options for writing our newly generated triples back to the database, but let’s start with Marklogic’s XQuery Semantics API, in particular the sem:rdf-insert function. Here’s a bit of XQuery to run the SPARQL query above and insert them into the <geography> graph in the database:


import module namespace sem = &quot;http://marklogic.com/semantics&quot;
  at &quot;/MarkLogic/semantics.xqy&quot;

let $sparql := &quot;PREFIX cts: <http://marklogic.com/cts#>
                PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
                PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                PREFIX owl: <http://www.w3.org/2002/07/owl#>
                CONSTRUCT { ?s skos:exactMatch ?n .}
                WHERE
                {
                ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
                ?s skos:inScheme <http://cv.ap.org/a#geography> .
                ?s skos:exactMatch ?gn .
                ?n owl:sameAs ?gn .
                } &quot;

let $triples := sem:sparql($sparql, (),(),())                                          

return
(
sem:rdf-insert($triples,(&quot;override-graph=geography&quot;))
)

Now if we look at the triples for my original term we should see an additional skos:exactMatch for the NYTimes:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> ;
<http://www.w3.org/2004/02/skos/core#inScheme> <http://cv.ap.org/a#geography> ;
<http://www.w3.org/2003/01/geo/wgs84_pos#long> "-70.76255"^^xs:decimal ; 
<http://www.w3.org/2003/01/geo/wgs84_pos#lat> "43.07176"^^xs:decimal ; 
<http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/5091383/> ;
<http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10237454346559533021> ;
<http://www.w3.org/2004/02/skos/core#broader> <http://cv.ap.org/id/9531546082C6100487B5DF092526B43E> ;
<http://www.w3.org/2004/02/skos/core#prefLabel> &quot;Portsmouth&quot;@en .

Another option for writing the new triples back to the database is SPARQL itself. The most recent version, SPARQL 1.1, defines an update language which includes the useful operator INSERT. We can modify our earlier SPARQL query like so:


PREFIX cts: <http://marklogic.com/cts#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT
{ GRAPH <geography> { ?s skos:exactMatch ?n .} }
WHERE
{
  GRAPH <nytimes>
    {
    ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
    ?n owl:sameAs ?gn .
    } .
 GRAPH <geography>
    {
    ?s skos:inScheme <http://cv.ap.org/a#geography> .
    ?s skos:exactMatch ?gn . 
    }.
}

The multiple GRAPH statements allow me to query across two graphs, but only write to one. And if we wanted to replace an existing skos:exactMatch triple, rather than append to our existing statements we would precede our INSERT statement with a DELETE. This DELETE/INSERT operation is described in detail here.

Marklogic 8, not yet released, will include support for the SPARQL 1.1 Update query language (among other new semantic capabilities). Since I am lucky enough to be part of the Early Access program for Marklogic 8 I was able to run the query above and see that it generated the new triples correctly.

Both CONSTRUCT and INSERT are not exactly new technologies, but it’s great to see how they might be used within the context of a Marklogic application. For my own work cleaning and enriching vocabulary data these methods have proved to be quite valuable and I look forward to digging into the rest of the SPARQL 1.1 features coming to Marklogic 8 in the near future.

Searching RDF vocabulary data in Marklogic 7

Recently, I’ve been experimenting with Marklogic’s new(ish) semantic capabilities (here’s a quick overview of what Marklogic is offering with their semantics toolkit). In particular, I’ve been trying to build a simple interface for searching across vocabulary data in RDF. This turned out to be an interesting exercise since Marklogic’s current semantic efforts are targeted at an intersection of “documents, data, and now RDF triples.”

It’s fairly easy to set up the triple store; the quick start documentation should get you up and running in short order. For the purposes of this brief document I’m using a small content set from DBPedia derived from the following SPARQL query:

DESCRIBE ?s 
WHERE {
?s rdf:type <http://dbpedia.org/class/yago/ProfessionalMagicians> .
}

I downloaded my set in RDF/XML (using DBPedia’s SPARQL endpoint) and loaded them into Marklogic using mlcp:

mlcp.bat import -host localhost -port 8040 -username [name] -password [pass] -input_file_path C:\data\magicians.rdf -mode local -input_file_type RDF -output_collections magician -output_uri_prefix  /triplestore/

Now if you open up QConsole and ‘explore’ the data you’ll see that all of our triples have been packaged up into discrete documents:

/triplestore/1105189df46c20c7-0-11170.xml
/triplestore/1105189df46c20c7-0-11703.xml
/triplestore/1105189df46c20c7-0-12614.xml
/triplestore/1105189df46c20c7-0-13346.xml

Each one contains 100 triples, and each triple looks something like:

<sem:triple>
<sem:subject>http://dbpedia.org/resource/Criss_Angel</sem:subject>
<sem:predicate>http://dbpedia.org/property/birthDate</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#date">1967-12-18+02:00</sem:object>
</sem:triple>

Available query structures fall into three categories:

  • CTS queries (cts:*)
  • SPARQL queries (sem:*)
  • Hybrid CTS/SPARQL

The documentation for the available queries is here.

But before we dig into some sample queries, let’s try the Search API. It has some appeal as a solution, since it can provide easy pagination, result counts, and all the nice features of CTS (stemming/lemmatization) to boot.

Let’s search for terms which mention the word ‘paranormal’:

search:search('paranormal')

This returns the familiar results, but you will quickly realize that these will not be particularly helpful, if what you are interested are subject and not document matches.

<search:response snippet-format="snippet" total="21" start="1" page-length="10" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="" xmlns:search="http://marklogic.com/appservices/search">
    <search:result index="1" uri="/triplestore/ad02c265f21a9295-0-55.xml" path="fn:doc("/triplestore/ad02c265f21a9295-0-55.xml")" score="208896" confidence="0.7033283" fitness="0.8208094">
        <search:snippet>
            <search:match path="fn:doc("/triplestore/ad02c265f21a9295-0-55.xml")/sem:triples/sem:triple[97]/sem:object">http://dbpedia.org/resource/Category:<search:highlight>Paranormal</search:highlight>_investigators</search:match>
            <search:match path="fn:doc("/triplestore/ad02c265f21a9295-0-55.xml")/sem:triples/sem:triple[100]/sem:object">...and scientific skeptic best known for his challenges to <search:highlight>paranormal</search:highlight> claims and pseudoscience. Randi is the founder of the...</search:match>
        </search:snippet>
    </search:result>
</search:response>

Let’s try the same search with a direct CTS query, but this time we’ll allow for wildcarding:

cts:search(collection(),
cts:and-query((cts:collection-query("magician"), 
cts:word-query("paranormal*","wildcarded")))
)

This is nice, but it again returns the entire document containing matches. And since our triples were arbitrarily added to these documents via MLCP, we have subjects in our results that we don’t care about.

Let’s try the same query in pure SPARQL:

DESCRIBE ?s
WHERE{ 
?s ?p ?o.
FILTER regex(?o, "paranormal*", "i")
}

This works pretty well and returns all the triples for those subjects we are interested in, but it’s rather slow. I’m guessing the pure SPARQL FILTER query here is not particularly optimized. As a comparison, we can actually insert some CTS into our SPARQL query if we wish, like so:

PREFIX cts: http://marklogic.com/cts#
DESCRIBE ?s 
WHERE{ 
?s ?p ?o .
 FILTER cts:contains(?o, cts:word-query("paranormal")) 
}

Compared to the previous query this is blazing fast, though not a sub-second query yet. We can speed things up a bit by using a hybrid CTS/SPARQL approach where we pass a cts:query as an option to sem:sparql. This reduces the set of documents in our search scope before executing the SPARQL, and so may offer a boost to performance. Of course, to continue to drill down to only the relevant subjects (not documents) we need to execute the query twice:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy";
let $query := cts:word-query('paranormal',"case-insensitive")
let $sparql := "PREFIX cts: <http://marklogic.com/cts#>
                DESCRIBE ?s 
                WHERE{ 
                   ?s ?p ?o .
                   FILTER cts:contains(?o, cts:word-query('paranormal')) 
                }"
let $results := sem:sparql($sparql,(),("default-graph=magician"),($query))  
return
(
sem:rdf-serialize($results,'rdfxml')
)

If we want to dynamically generate our SPARQL queries we can send a $bindings map to sem:sparql containing our variables. Here’s a more dynamic version of the above:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy";
let $q := 'paranormal'
let $query := cts:word-query($q,"case-insensitive")
let $bindings := map:map()
let $put := map:put($bindings,"q",$q)
let $sparql := "PREFIX cts: <http://marklogic.com/cts#>
                DESCRIBE ?s 
                WHERE{ 
                   ?s ?p ?o .
                   FILTER cts:contains(?o, cts:word-query(?q)) 
                }"
let $results := sem:sparql($sparql,($bindings),("default-graph=magician"),($query))  
return
(
sem:rdf-serialize($results,'rdfxml')
)

There’s an interesting byproduct of this approach, however. Once you have filtered the set of documents using the CTS query option, you have also potentially limited the triples available to your SPARQL query. So, if your subject has triples spanning two documents (which happens due to the arbitrary method MLCP employs in uploading your content), and your CTS query only matches the first, any triples from the second document which you expect to return with your SPARQL will appear to be missing.

So, our approach is flawed, but let us press on anyway. What can I do with those triples in a search interface? There are a few options here. Certainly, we can flesh out our SPARQL query to use SELECT and whatever array of properties we need for our display (label, description, etc.) and then pass the results through sem:query-results-serialize to generate SPARQL XML:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" at
"/MarkLogic/semantics.xqy";
let $sparql := "PREFIX cts: <http://marklogic.com/cts#>
                PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
               SELECT DISTINCT ?s ?c
                WHERE{ 
                   ?s ?p ?o .
                   ?s rdfs:comment ?c .
                   FILTER ( lang(?c) = 'en' )
                   FILTER cts:contains(?o, cts:word-query('paranormal')) 
                }"
let $results := sem:sparql($sparql,(),("default-graph=magician"),())  
return
(
sem:query-results-serialize($results)
)

Or, if you’d rather serialize the resulting triples in a particular format such as RDF/XML:

sem:rdf-serialize($results,'rdfxml')

I mentioned pagination earlier. Certainly, this would be a lot easier with the Search API, but it can be done with pure SPARQL and a bit of imagination:

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" at
"/MarkLogic/semantics.xqy";
declare namespace sparql = "http://www.w3.org/2005/sparql-results#";
let $q := 'paranormal'
let $query := cts:word-query($q,"case-insensitive")

let $search-page-size := 2
let $search-start := 1
let $bindings := map:map()
let $put := map:put($bindings,"q",$q)
let $sparql :=  fn:concat(
                "PREFIX cts: <http://marklogic.com/cts#>
                SELECT DISTINCT ?s 
                WHERE{ 
                   ?s ?p ?o .
                   FILTER cts:contains(?o, cts:word-query(?q)) 
                }",
                "LIMIT ",
                $search-page-size,
                " OFFSET ",
               $search-start
               )
let $results := sem:sparql($sparql,($bindings),("default-graph=magician"),($query))  
return
(
sem:query-results-serialize($results)
)

One last issue I encountered with this approach is that reliance on SPARQL requires that all users granted access to this search interface will need sem:sparql execute privileges. SPARQL 1.1 allows for updates to be made to the database via queries. Though this feature is not currently included in Marklogic’s implementation of SPARQL, it might be in version 8. Does this mean that SPARQL privileges are not something you’d want to hand out to read-only users? Perhaps.

After building my own search interface using some of the approaches described above, I feel that Marklogic is at its best when it’s used as a document store. So, perhaps the best approach is to mirror the construction of a document repository, with each term its own document, and employing embedded RDF triples. Something like this abbreviated and modified LC record for Harry Houdini:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about="http://id.loc.gov/authorities/names/n79096862">
        <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
        <skos:prefLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">Houdini,
            Harry, 1874-1926</skos:prefLabel>
        <skos:exactMatch rdf:resource="http://viaf.org/viaf/sourceID/LC%7Cn+79096862#skos:Concept"
            xmlns:skos="http://www.w3.org/2004/02/skos/core#"/>
        <skos:inScheme rdf:resource="http://id.loc.gov/authorities/names"
            xmlns:skos="http://www.w3.org/2004/02/skos/core#"/>
        <skos:altLabel xml:lang="en" xmlns:skos="http://www.w3.org/2004/02/skos/core#">Weiss,
            Ehrich, 1874-1926</skos:altLabel>
    </rdf:Description>
    <sem:triples xmlns:sem="http://marklogic.com/semantics">
        <sem:triple>
            <sem:subject>http://id.loc.gov/authorities/names/n79096862</sem:subject>
            <sem:predicate>http://www.w3.org/1999/02/22-rdf-syntax-ns#type</sem:predicate>
            <sem:object>http://www.w3.org/2004/02/skos/core#Concept</sem:object>
        </sem:triple>
        <sem:triple>
            <sem:subject>http://id.loc.gov/authorities/names/n79096862</sem:subject>
            <sem:predicate>http://www.w3.org/2004/02/skos/core#prefLabel</sem:predicate>
            <sem:object xml:lang="en">Duke Thomas</sem:object>
        </sem:triple>
        <sem:triple>
            <sem:subject>http://id.loc.gov/authorities/names/n79096862</sem:subject>
            <sem:predicate>http://www.w3.org/2004/02/skos/core#exactMatch</sem:predicate>
            <sem:object>http://viaf.org/viaf/sourceID/LC%7Cn+79096862#skos:Concept</sem:object>
        </sem:triple>
        <sem:triple>
            <sem:subject>http://id.loc.gov/authorities/names/n79096862</sem:subject>
            <sem:predicate>http://www.w3.org/2004/02/skos/core#inScheme</sem:predicate>
            <sem:object>http://id.loc.gov/authorities/names</sem:object>
        </sem:triple>
        <sem:triple>
            <sem:subject>http://id.loc.gov/authorities/names/n79096862</sem:subject>
            <sem:predicate>http://www.w3.org/2004/02/skos/core#altLabel</sem:predicate>
            <sem:object xml:lang="en">Weiss, Ehrich, 1874-1926</sem:object>
        </sem:triple>
    </sem:triples>
</rdf:RDF>

I’ll be trying this approach next.