Generating new triples in Marklogic with SPARQL CONSTRUCT (and INSERT)

SPARQL is known mostly as query language, but it also has the capability—via the CONSTRUCT operator—to generate new triples. This can be useful for delivering a custom snippet of RDF to a user, but can also be used to write new data back to the database, enriching what was already there. Marklogic’s triple store supports the SPARQL standard, including CONSTRUCT queries, and the results can be easily incorporated back into the data set using the XQuery Semantics API. Here’s a quick demo.

I have a set of geography terms which have already been linked to the Geonames dataset. Here’s an example:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> ;
 <http://www.w3.org/2004/02/skos/core#inScheme> <http://cv.ap.org/a#geography> ; <http://www.w3.org/2003/01/geo/wgs84_pos#long> &quot;-70.76255&quot;^^xs:decimal ;
 <http://www.w3.org/2003/01/geo/wgs84_pos#lat> &quot;43.07176&quot;^^xs:decimal ;  <http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/5091383/> ;
 <http://www.w3.org/2004/02/skos/core#broader> <http://cv.ap.org/id/9531546082C6100487B5DF092526B43E> ;
 <http://www.w3.org/2004/02/skos/core#prefLabel> &quot;Portsmouth&quot;@en .

If we look at the same term via the New York Times’ Linked Open Data service we’ll see a set of equivalent terms, including the Geonames resource for Portsmouth:


<http://data.nytimes.com/10237454346559533021> <http://www.w3.org/2002/07/owl#sameAs> <http://data.nytimes.com/portsmouth_nh_geo> ;
 <http://dbpedia.org/resource/Portsmouth%2C_New_Hampshire> ;
 <http://rdf.freebase.com/ns/en.portsmouth_new_hampshire> ;
 <http://sws.geonames.org/5091383/> .

Oh, hey, we have the same Geonames URI. Guess what we can do with that? More links!

After ingesting the NYTimes data into Marklogic, I was able to write a SPARQL query to begin connecting the two datasets using the Geonames URI as glue.


 PREFIX cts: <http://marklogic.com/cts#>
 PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX owl: <http://www.w3.org/2002/07/owl#>
 SELECT ?s ?n 
 WHERE
 {
 ?s skos:inScheme <http://cv.ap.org/a#geography> .
 ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
 ?s skos:exactMatch ?gn .
 ?n owl:sameAs ?gn .
 } 
 LIMIT 2

Returning:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://data.nytimes.com/10237454346559533021>
<http://cv.ap.org/id/662030807D5B100482BDC076B8E3055C> <http://data.nytimes.com/10616800927985096861>

Now, if we want to generate triples instead of SPARQL results, we simply swap out our SELECT for a CONSTRUCT operator, like so:


PREFIX cts: <http://marklogic.com/cts#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
CONSTRUCT { ?s skos:exactMatch ?n .}
WHERE
 {
 ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
 ?s skos:inScheme <http://cv.ap.org/a#geography> .
 ?s skos:exactMatch ?gn .
 ?n owl:sameAs ?gn .
 } 
LIMIT 2

Returning:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10237454346559533021> .
<http://cv.ap.org/id/662030807D5B100482BDC076B8E3055C> <http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10616800927985096861> .

We have a few options for writing our newly generated triples back to the database, but let’s start with Marklogic’s XQuery Semantics API, in particular the sem:rdf-insert function. Here’s a bit of XQuery to run the SPARQL query above and insert them into the <geography> graph in the database:


import module namespace sem = &quot;http://marklogic.com/semantics&quot;
  at &quot;/MarkLogic/semantics.xqy&quot;

let $sparql := &quot;PREFIX cts: <http://marklogic.com/cts#>
                PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
                PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
                PREFIX owl: <http://www.w3.org/2002/07/owl#>
                CONSTRUCT { ?s skos:exactMatch ?n .}
                WHERE
                {
                ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
                ?s skos:inScheme <http://cv.ap.org/a#geography> .
                ?s skos:exactMatch ?gn .
                ?n owl:sameAs ?gn .
                } &quot;

let $triples := sem:sparql($sparql, (),(),())                                          

return
(
sem:rdf-insert($triples,(&quot;override-graph=geography&quot;))
)

Now if we look at the triples for my original term we should see an additional skos:exactMatch for the NYTimes:


<http://cv.ap.org/id/F1818B152CFC464EBAAF95E407DD431E> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> ;
<http://www.w3.org/2004/02/skos/core#inScheme> <http://cv.ap.org/a#geography> ;
<http://www.w3.org/2003/01/geo/wgs84_pos#long> "-70.76255"^^xs:decimal ; 
<http://www.w3.org/2003/01/geo/wgs84_pos#lat> "43.07176"^^xs:decimal ; 
<http://www.w3.org/2004/02/skos/core#exactMatch> <http://sws.geonames.org/5091383/> ;
<http://www.w3.org/2004/02/skos/core#exactMatch> <http://data.nytimes.com/10237454346559533021> ;
<http://www.w3.org/2004/02/skos/core#broader> <http://cv.ap.org/id/9531546082C6100487B5DF092526B43E> ;
<http://www.w3.org/2004/02/skos/core#prefLabel> &quot;Portsmouth&quot;@en .

Another option for writing the new triples back to the database is SPARQL itself. The most recent version, SPARQL 1.1, defines an update language which includes the useful operator INSERT. We can modify our earlier SPARQL query like so:


PREFIX cts: <http://marklogic.com/cts#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
INSERT
{ GRAPH <geography> { ?s skos:exactMatch ?n .} }
WHERE
{
  GRAPH <nytimes>
    {
    ?n skos:inScheme <http://data.nytimes.com/elements/nytd_geo> .
    ?n owl:sameAs ?gn .
    } .
 GRAPH <geography>
    {
    ?s skos:inScheme <http://cv.ap.org/a#geography> .
    ?s skos:exactMatch ?gn . 
    }.
}

The multiple GRAPH statements allow me to query across two graphs, but only write to one. And if we wanted to replace an existing skos:exactMatch triple, rather than append to our existing statements we would precede our INSERT statement with a DELETE. This DELETE/INSERT operation is described in detail here.

Marklogic 8, not yet released, will include support for the SPARQL 1.1 Update query language (among other new semantic capabilities). Since I am lucky enough to be part of the Early Access program for Marklogic 8 I was able to run the query above and see that it generated the new triples correctly.

Both CONSTRUCT and INSERT are not exactly new technologies, but it’s great to see how they might be used within the context of a Marklogic application. For my own work cleaning and enriching vocabulary data these methods have proved to be quite valuable and I look forward to digging into the rest of the SPARQL 1.1 features coming to Marklogic 8 in the near future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s