Sunday, May 12, 2013

Riak Links and Link Walking with F# and CorrugatedIron

Continuing my journey through the book Seven Databases in Seven Weeks, I explore links and link walking in this blog post. Riak has the ability to establish one-way relationship between entries via Links, providing some of the capabilities of a graph database (Riak documentation calls it a lightweight graph database). Riak documentation hints that links should be kept low, on the order of dozens, not thousands. Using the Twitter example from Riak Handbook, you can probably easily find out who Don Syme follows and who does those people follows, etc. But it would be difficult to find all the people that follows Don Syme using Riak's link capability.


Adding Links

Here's how you would add links with CorrugatedIron library:

type Cage = { room : int }

(*
Linking cage 1 to polly via contains, equivalent to doing the following
curl -X PUT http://localhost:8098/riak/cages/1 \
-H "Content-Type: application/json" \
-H "Link: </riak/animals/polly>; riaktag=\"contains\"" \
-d '{"room" : 101}'
*)
client.Put(
  let cage = new RiakObject("cages","1",{room=101})
  cage.LinkTo("animals","polly","contains")
  cage)

(*
Putting ace in cage 2 and setting cage 2 next to cage 1.
Equivalent to the following sample code:
curl -X PUT http://localhost:8091/riak/cages/2 \
-H "Content-Type: application/json" \
-H "Link:</riak/animals/ace>;riaktag=\"contains\",</riak/cages/1>;riaktag=\"next_to\"" \
-d '{"room" : 101}'
*)
// Adding more than one link  
client.Put(
  let cage = new RiakObject("cages","2",{room=101})
  [("animals","ace","contains");("cages","1","next_to")]
  |> List.iter(fun (bucket,key,tag) -> cage.LinkTo(bucket,key,tag))
  cage)  


Link Walking

With CorrugatedIron, I can use the API to perform link walking and as an extra bonus, I don't have to worry about extracting data from the multipart/mixed mime types. CorrugateIron library takes care of all of that for us.

(*
Link walking, equivalent to 
curl http://riakhost1:8098/riak/cages/1/_,_,_
or in the new version:
curl http://riakhost1:8098/buckets/cages/keys/1/_,_,_
*)
 
let results = client.WalkLinks(new RiakObject("cages","1"),
                               [|new RiakLink(null,null,null)|])
         
// Dump results, which returns a list of RiakObject(s)
results.Value

// Since in this particular case, we have only one result, we can do the following
// and get back polly
results.Value.[0].GetObject<Animal>()

Here is the output result for results.Value:

// results.Value
val it : IList<RiakObject> =
  seq
    [CorrugatedIron.Models.RiakObject
       {BinIndexes = dict [];
        Bucket = "animals";
        CharSet = null;
        ContentEncoding = null;
        ContentType = "application/json";
        HasChanged = false;
        IntIndexes = dict [];
        Key = "polly";
        LastModified = 1359758822u;
        LastModifiedUsec = 523420u;
        Links = seq [];
        Siblings = seq [];
        UserMetaData = dict [];
        VTag = "2BTveSKTYDNOZNCiOxyryw";
        VTags = seq ["2BTveSKTYDNOZNCiOxyryw"];
        Value = [|123uy; 32uy; 34uy; 110uy; 105uy; 99uy; 107uy; 110uy; 97uy;
                  109uy; 101uy; 34uy; 32uy; 58uy; 32uy; 34uy; 83uy; 119uy;
                  101uy; 101uy; 116uy; 32uy; 80uy; 111uy; 108uy; 108uy; 121uy;
                  32uy; 80uy; 117uy; 114uy; 101uy; 98uy; 114uy; 101uy; 100uy;
                  34uy; 32uy; 44uy; 32uy; 34uy; 98uy; 114uy; 101uy; 101uy;
                  100uy; 34uy; 32uy; 58uy; 32uy; 34uy; 80uy; 117uy; 114uy;
                  101uy; 98uy; 114uy; 101uy; 100uy; 34uy; 32uy; 125uy|];
        VectorClock = [|107uy; 206uy; 97uy; 96uy; 96uy; 96uy; 204uy; 96uy;
                        202uy; 5uy; 82uy; 28uy; 169uy; 111uy; 239uy; 241uy;
                        7uy; 114uy; 230uy; 184uy; 103uy; 48uy; 37uy; 50uy;
                        230uy; 177uy; 50uy; 60uy; 59uy; 216uy; 112uy; 138uy;
                        47uy; 11uy; 0uy|];}]
      
// results.Value.[0].GetObject<Animal>()
val it : Animal = {nickname = "Sweet Polly Purebred";
                   breed = "Purebred";}     

There seems to be some limitations with link walking using CorrugatedIron library. There is a WalkLinks() method as part RiakClient, but the input is expecting RiakLink object which has bucket,key, tag as arguments. The link spec is expecting bucket, tag, and keep flag as arguments. I do notice that in the Riak documentation that Link Walking is not available as part of the Protocol Buffers Client (PBC) API, so I'm guessing the WalkLinks() method in CorrugatedIron is either using HTTP protocol or a modified usage of MapReduce. Since link walking is a special case of MapReduce querying, it may not matter much that CorrugatedIron has some limitations on link walking. One other issue with CorrugatedIron and link walking is that when I add multiple links with the same tag and try to link walking with CorrugatedIron, I do not get back all the links, I only get one of the links. In order to follow the examples in the book Seven Databases in Seven Weeks, I can fall back to using ASP.NET MVC Rest API.

Link walking using CorrugatedIron library:

// Link walking, equivalent to 
// curl http://localhost:8098/riak/cages/2/animals,_,_
client.WalkLinks(new RiakObject("cages","2"),
                 [|new RiakLink("animals",null,null)|])
     
// curl http://localhost:8098/riak/cages/2/_,next_to,0/animals,_,_
client.WalkLinks(new RiakObject("cages","2"),
                 [|new RiakLink(null,null,"next_to");
                   new RiakLink("animals",null,null);|])

// I can't seem to specify the keep flag with CorrugatedIron, which keeps
// intermediate results as you walk beyond primary links
// curl http://localhost:8091/riak/cages/2/_,next_to,1/_,_,_     

The book Seven Databases in Seven Weeks is still using the old format for HTTP Link Walking. The new format is as follows:

GET /riak/bucket/key/[bucket],[tag],[keep]            # Old format
GET /buckets/bucket/keys/key/[bucket],[tag],[keep]    # New format

Link walking using REST API:


let riakurl = "http://myriakhost1:8098"
let restClient = new HttpClient()

type LinkWalkSpec = 
    { bucket: string; tag: string;  keep: string; }

    member x.Link = (sprintf "%s,%s,%s" x.bucket x.tag x.keep)

let linkWalker url bucket key (links:LinkWalkSpec list) =
    let baseurl = sprintf "%s/buckets/%s/keys/%s" url bucket key
    let rec buildLinkWalkUrl (linklist:LinkWalkSpec list) baseUrl =
      match linklist with
      | [] -> baseUrl
      | [x] -> sprintf "%s/%s" baseUrl (x.Link)
      | h::t -> let newUrl = sprintf "%s/%s" baseUrl (h.Link)
                buildLinkWalkUrl t newUrl

    buildLinkWalkUrl links baseurl
    |> restClient.GetStringAsync

// Equiv to : curl http://localhost:8091/riak/cages/2/_,next_to,1/_,_,_  
   
[{bucket="_";tag="next_to";keep="1"};{bucket="_";tag="_";keep="_"}] 
|> linkWalker riakurl "cages" "2" 

Link walking results from using REST API:

val it : Task<string> =
  System.Threading.Tasks.Task`1[System.String]
    {AsyncState = null;
     CreationOptions = None;
     Exception = null;
     Id = 1;
     IsCanceled = false;
     IsCompleted = false;
     IsFaulted = false;
     Result = "
--CveXyss6PAqBxOOWxeWBCf6eXii
Content-Type: multipart/mixed; boundary=AvYXKrJYDlkeNxh1bQyqDvBAuBF

--AvYXKrJYDlkeNxh1bQyqDvBAuBF
X-Riak-Vclock: a85hYGBgzGDKBVIcqW/v8Qdy5rhnMCUy5rEy9Oi+P8WXBQA=
Location: /buckets/cages/keys/1
Content-Type: application/json
Link: </buckets/animals/keys/polly>; riaktag="contains", </buckets/cages>; rel="up"
Etag: 6gyXgkIgzvwBGRRotqHK3b
Last-Modified: Fri, 26 Apr 2013 16:55:40 GMT

{"room":101}
--AvYXKrJYDlkeNxh1bQyqDvBAuBF--

--CveXyss6PAqBxOOWxeWBCf6eXii
Content-Type: multipart/mixed; boundary=SaWqmmho48dzhMDmJy3BVcCWrzu

--SaWqmmho48dzhMDmJy3BVcCWrzu
X-Riak-Vclock: a85hYGBgzGDKBVIcqW/v8Qdy5rhnMCUy5rEyPDvYcIovCwA=
Location: /buckets/animals/keys/polly
Content-Type: application/json; charset=utf-8
Link: </buckets/animals>; rel="up"
Etag: 2BTveSKTYDNOZNCiOxyryw
Last-Modified: Fri, 01 Feb 2013 22:47:02 GMT

{ "nickname" : "Sweet Polly Purebred" , "breed" : "Purebred" }
--SaWqmmho48dzhMDmJy3BVcCWrzu--

--CveXyss6PAqBxOOWxeWBCf6eXii--
";
     Status = RanToCompletion;}      

2 comments:

OJ said...

Hi John,

This is another nice post. Well done. I have a just a couple of comments:

I can't seem to specify the keep flag with CorrugatedIron, which keeps intermediate results as you walk beyond primary links.

The reason for this is because CI uses the PBC interface wherever possible and the PBC interface does not support the "keep" parameter. You have to walk the links yourself. The HTTP interface has some extra magic in it to allow for that.

new RiakLink(null,null,null)

This can be replaced with RiakLink.AllLinks to make this a little more intuitive (I hope!).

Thank you again for your posts!
OJ

Sqiar BI said...

Business intelligence analyst
SQIAR (http://www.sqiar.com/services/bi-strategy/) is a leading Business Intelligence company.Sqiar Provide business intelligence Services Which help the company to present Information in Meaningful form.