REVL Visualization Guide

Read-Eval-Viz-Loop

REVL is a tool for quickly reorganizing awkward data formats so that you can inspect the data and use a variety of visualizations to find interesting relationships and properties that would be hard to spot otherwise. It works in a way similar to a powerful command line in that you get data on one end, run it through a series of transformations to pick out the bits you’re interested in and stick them to other bits, finally ending up with just the interesting parts in a format that’s easy to comprehend or ship off to a visualization tool (of which many are included). Internally, REVL uses a result monad to do the value handling, so you’re actually working with a data structure instead of raw text. In this case, this makes it quite a bit more convenient to use than the standard command line.

Getting Started

When you open SCOT, click the Visualization link in the navbar. This will open REVL, which will look like a big blank screen with a little command prompt at the bottom. You will interact with the system by typing strings of commands at the prompt and observing the results either in the text output area (just above the prompt) or in the visualization area (the bulk of the page, which is blank white at this point).

Interacting with REVL

To Get some help click in the prompt and type help (and press Enter).

Just above the prompt, you will see a text output area. You can drag the top of this area to resize it, so drag it up now to see the REVL default help message. This message gives a little background and lists all currently loaded commands. If you can’t remember the name of something, you should be able to jog your memory by looking it up here.

Now type help map at the prompt. This will display the command-specific help for the map command, which is something you will be using a lot.

REVL tries to be convenient - if it recognizes the first word you type in a command segment to be a command, it treats it as one. If not, it will evaluate whatever you type in the context of the shell, which includes variable definitions, locally defined helper functions, and the entire API behind the command system. The syntax is coffeescript, which you can find out more about at [[http://coffeescript.org][CoffeeScript.org]].

Type:

[1..10]

at the prompt and hit Enter. You will see the result of evaluating that coffeescript value, which is

[1,2,3,4,5,6,7,8,9,10]

This particular trick (generating a list of integers) is surprisingly useful for seeding queries later on. Keep it in mind when you want to do something like look at all of the events that came in between two other events (you can sequence their id fields using this list, eg [1044..1102]).

Now hit the up arrow to repeat the last command, then add to the back of it until you get this (the thing right after the list is a single backslash character):

[1..10] \ (n)->n*n

After you hit enter, you’ll see a list of the squares of the integers from the fist list: [1,4,9,16,25,36,49,64,81,100]. You just used the map command. You could also have explicitly written out the map name in front of the function definition, but this particular command is so common that it’s implied after a backslash if no other command is specified.

Commands are chained together using the backslash (‘’) character. Normally the pipe (‘|’) would have been used, but in this case it was just much simpler and more reliable to use the backslash because the pipe is an important character in user-defined coffeescript code, and it would have led to significant ambiguity in parsing the commands.

Using REVL with SCOT data

Now we can do something interesting. Let’s get all the entities with ids from 10000 to 10100:

entity offset:10000,limit:101

This command will take a few seconds to complete, and when it does you’ll see a list of entities in the text output. However, our plan was foiled - our first id is not 10000, it’s something else. If we want to actually get entities with ids 10000 to 10100, we’ll need to specify those ids. Let’s do that:

[10000..10100] \ (n)->API.entity id:n

After you press Enter and wait, you’ll find that you got a list of 100 somethings back, but they aren’t entities. REVL uses asynchronous calls for the API to make things a little faster. This is hidden when you use the top level commands because the shell knows to wait when the result is a promise, but when you make calls directly to the API and embed them in another data structure, you have to be more explicit. Let’s go ahead and tell it to wait on those results:

[10000..10100] \ (n)->API.entity id:n \ wait

The wait command will scan through the data it gets from the pipeline and replace all of the promises with the fulfillments of those promises as they come in. It also has an optional timeout which will cause the wait to stop if it has been more than that long since an update was received. The default timeout is 60 seconds, and you can change it by simply specifying a different number as an argument to the wait command. This argument is a full coffeescript value, so you can use variables and functions if you need to for some reason.

As you wait for the entities to come down, notice that there is a progress bar on top of the command line to let you know something is happening in the background, and the fraction of finished to total promises is displayed at the right end of the command line.

When it’s all said and done, you should have a list of 101 entities in your text window.

Make a bar chart

Let’s take those entities and see how they’re distributed by type. To do that, we’ll fetch the entities, then pick out the type field, group them by that field, and make a chart that has a bar for each type and shows the number of instances of that type. First, let’s get the entities again and stash them so that we don’t have to wait for them to download at each step:

[10000..10100] \ (n)->API.entity id:n \ wait \ store ents

The store command takes a variable name and stores the result of the preceding command in the scope under that name. Now you can access that list of entities using the name ents from anywhere in future commands. First, let’s strip out all of the data we don’t care about from them:

ents \ (e)->e.type

Now you should see a list of the type fields from each entity. Next we’ll group them according to that field:

ents \ (e)->e.type \ group (x)->x

This command uses the group command, which takes a function and returns an object. The function should return a name for its input that specifies what group it belongs in. In this case, all we have are names, so we just tell it to return its input unchanged (that’s what the (x)->x means - a coffeescript identity function).

The output of the group command was an object with a key for each group name, and the list of things in that group for the value. Now we’re going to replace the lists with their lengths, which will give us a nice data structure to pass to the barchart visualization primitive:

ents \ (e)->e.type \ group (x)->x \ (ls)->ls.length

This uses the map command to iterate over the keys of the object returned by group and replace each value by its length. You should now have an object with a few keys, each with a number as its value. This is exactly the format we need for a bar chart, so let’s see what we get:

ents \ (e)->e.type \ group (x)->x \ (ls)->ls.length \ barchart

You should now see a chart showing the relative frequencies of the different entity types in your set. If your text area is covering the chart, you can double click the top of it to auto-minimize. It will remember the last setting for the height, so if you double click it again it will go back to where it was.

Event Timing

Next we’ll use a dot chart to look at the timing of a set of alerts coming in within an alert group. First, let’s get the alerts:

alertgroup: id:1512214,sub:'alert'

After this comes in you should have a list of alerts. There’s a lot of data we don’t really care about there, so let’s tell the server to only send what’s important:

alertgroup: id:1512214,sub:'alert',columns:['id','when']

This filters the data coming in down to just the id and when columns, which suits our needs for this example. We can store that data for future reference:

alertgroup: id:1512214,sub:'alert',columns:['id','data'] \ store a151

We’re going to make a dot chart with time on the horizontal axis and item number on the vertical (vertical axis is just here to separate things for visibility). We need to pull out the time value for each and pair it with its position in the list:

a151 \ (alert,pos)->[pos,alert.data._time]

The map function implicitly passes the index of the current element to the handler function (or the key if it’s an object). We just use the object’s list position to get a vertical coordinate for it. Unfortunately, this timestamp is in human-readable format, which makes it a pain to use. We can parse it using the Strings function though:

a151 \ (r)->r.data._time \
    pick Strings.pat.hms \
    (ls)->(map ls[1..],(s,i)->(60**(2-i))*(parseInt s)).reduce (a,b)->a+b

This takes the alerts and uses the Strings predefined hms (hours:minutes:seconds) pattern to parse just the clock time from the timestamp. The pattern returns the matched string along with its captured substrings, which in this case gives us the hour, minute, and second. The function mapped over it just converts this into a number of seconds since midnight. Coffeescript has a ** operator for exponentiation, if you’re trying to parse out how that function works. Now we have a list of timestamps, so let’s convert it to a list of coordinate pairs that dotchart can use:

a151 \ (r)->r.data._time \
    pick Strings.pat.hms \
    (ls)->(map ls[1..],(s,i)->(60**(2-i))*(parseInt s)).reduce (a,b)->a+b \
    (n,i)->[n,i] \
    dotchart

Whoops, looks like the timing data is all over the map! We need to sort our timestamps in ascending order since they didn’t come that way from the server:

a151 \ (r)->r.data._time \
    pick Strings.pat.hms \
    (ls)->(map ls[1..],(s,i)->(60**(2-i))*(parseInt s)).reduce (a,b)->a+b \
    sort \
    (n,i)->[n,i] \
    dotchart

sort does just what you’d think. You can optionally pass it a comparison function, which should return -1, 0, or 1 depending on whether the first argument is less, equal, or greater than the second. Note that javascript has some very weird ideas about ordering, so if you want to get the expected sort order for normal data (numbers, strings, etc.) REVL provides a sort function in the Utils module called Utils.smartcmp. This basically says numbers go in numeric order and strings go in alphabetic order. In javascript by default, numbers go in alphabetic order (!). Running this command we can now see a nice progression of alerts that ended up in this alert group.

Other interesting command examples

Here are some other commands you might want to play with to get a feel for the system. All of the basic commands have documentation with examples, so if you need to look something up to see how it works start with the help system.

  • Entity Frequencies over time

    Query 1000 entries, pull the entities for each of them, group them by type, and create a barchart to show the relative frequency of each type of entity:

    $ [10000...11000] \
        (n)->API.entry {id:n,sub:'entity'} \
        wait \
        (r)->Struct.tolist (Struct.map r,(v)->v.type) \
        flatten \
        group (ls)->ls[1] \
        (ls)->ls.length \
        barchart
    
  • Examine event timing over long periods

Query 500 events, extract the creation timestamp, sort them in ascending order, rebase the time to show time delta in minutes from start of record, and create a dot chart to show the timing of clusters of events and highlight gaps in the record:

$ event limit:500 \
    (e)->e.created \
    sort \
    into (ls)->map ls,(n)->(n-ls[0])/60000.0 \
    (n,i)->[n,i] \
    dotchart
  • Look at sequence of alerts in alertgroup:

    $ alertgroup id:1512214,limit:100,sub:'alert' \
        (r)->r.data._time \
        pick Strings.pat.hms \
        (ls)->(map ls[1..],(s,i)->(60**(2-i))*(parseInt s)).reduce (a,b)->a+b \
        sort \
        (n,i)->[n,i] \
        dotchart
    
  • Network connections between emails mentioned together in an alert for an alert group

Get the alerts for alertgroup 1512214, concatenate all of the strings in the data field of each, pick out all of the email addresses in the resulting strings, generate pairs from all emails that were in the same alert, and make a force-directed graph from the resulting structure.:

$ alertgroup id:1512214,limit:100,sub:'alert' \
    (r)->(squash (Struct.tolist r.data)).join ' ' \
    (s)->Strings.pick Strings.pat.email, s \
    (ls)->ls.map (m)->m[0] \
    (ls)->cmb ls,2 \
    flatten \
    forcegraph
  • Association matrix of emails from one alertgroup

This is a very heavy computation, but it eventually finishes. Need to look into ways to optimize this to make it more convenient, but the filling out of the table really explodes the size of the data set.:

$ alertgroup id:1512214,limit:100,sub:'alert' \
    (r)->(squash (Struct.tolist r.data)).join ' '\
    (s)->Strings.pick Strings.pat.email, s \
    (ls)->ls.map (m)->m[0] \
    (ls)->cmb ls,2 \
    flatten \
    nest (n)->n \
    (row)->Struct.map row,(col)->col.$.length \
    tabulate {} \
    grid \
    eachpoly (p)->if p.input == {} then p.color='#000' else p.color=Utils.heatColor p.input,10 \
    draw
  • Draw a treemap from an Nspace:

    $  [1..100] \
        foldl new Nspace (s,pt) -> s.insert pt,[['x',Math.random()],['y',Math.random()]]; s  \
        into (s)->s.subdivide() \
        into (sp)->sp.leaves() \
        (l)->l.bounds \
        (bnd)-> zip bnd \
        (pts)->[[pts[0][0],pts[0][1]],[pts[0][0],pts[1][1]],[pts[1][0],pts[1][1]],[pts[1][0],pts[0][1]]] \
        (pts)->(polygon pts).scale 200 \
        into (polys)->{polygons: polys} \
        draw
    
  • Network showing relationship between events and entities

Query an event, find all the entities associated with it, then find all the events associated with those entities. Make links accordingly, then display as a force-directed graph. Mousing over the network nodes will display the entity name or event id number depending on what kind of node it is.:

$ event id:10982,sub:'entity' \
    (e,k)->[{id:e.id,name:k},10982]  \
    tolist \
    (ls)->ls[1] \
    filter (ls)->ls[0].id not in [4802,97248,19,533065,97249] \
    (ls)-> [[[ls[0].name,ls[1]]],(API.entity sub:'event',id:ls[0].id).map (e)->([ev.id,ls[0].name]) for ev in e] \
    wait \
    flatten \
    flatten \
    forcegraph
  • Barchart of event count for each entity

    Fetch the entities associated with an event, then fetch all of the events for each entity and make a barchart that shows how many events are associated to each entity.:

    $ event id:10982,sub:'entity' \
        (ent)->(API.entity id:ent.id,sub:'event',columns:['id']).map (ls)->ls.length \
        wait \
        filter (n)->n>20 \
        barchart