Reference Graph Visualization

Table of Contents

1 Idea

  • Goal: get overview over certain scientific field by taking advantage of the information that is usually provided by reference managers, in addition to information that is usually provided by task planning systems.
  • extract information from BibTeX files:
    • author-paper relation
    • number of citations per paper (custom field)
  • combine with:
    • personal progress on papers
    • personal classification on papers
  • several visualization modes
    1. Collaboration Graph
    2. Classification (WIP)

2 Usage

Open index.html (with Firefox).

When the page loads, an example graph is displayed. To display your own data, drag a file onto the canvas graph and drop it.

The layout is responsive, and nodes can be dragged with the mouse. Dragging the background will pan the view, and mouse wheel can be used to zoom.

Clicking a paper will try to open a pdf file with the same name in the current directory. (Note: this will be changed to use the file name property in the BibTex file.)

Clicking an author will redirect to a Google Scholar search with that author.

Only tested on Firefox.

2.1 Advanced

If something unexpected happened during BibTeX file loading, a warning or error message is printed to the browser's console.

Additional information in non-standard fields is taken into account:

note
number of citations, as generated by the Zotero citations plugin
state
personal reading state, causes colored representation of said state. Currently supported values: unread, started, overview, and read
classification
(WIP) classification of a paper in the form of category/sub category 1/further sub-category

3 Implementation

The application basically consists of one html file that includes the necessary CSS and Javascript.

It depends on d3.js for the visualization, and on md5.js and a modified version of jdenticon for identicon generation.

The Parser responsible for extracting the information from the BibTex files is generated using PEG.js.

<!DOCTYPE html>
<head>
  <meta charset="utf-8">
  <title>BibTeX Graph Visualization (with d3.js)</title>
</head>
<style>

  <<css-definitions>>

</style>
<body>
  <svg height="100%" width="100%"></svg>
  <div id="footer">
    <p> Drag'n'Drop your BibTeX file onto the graph to open.
      Bugs/Ideas/Comments <a href="https://github.com/timor/refgraph"> welcome</a>
    </p>
  </div>
  <script src="http://d3js.org/d3.v3.js"></script>
  <script src="scripts/jdenticon-1.3.2.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/blueimp-md5/2.3.0/js/md5.min.js"></script>
  <script src="scripts/parser.js"></script>
  <script>

    <<javascript-graph-functionality>>

  </script>
</body>

3.1 Styling

3.1.1 General CSS Classes

General page layout:

html, body { overflow: hidden }
svg { position: fixed;
      top: 0;
      left: 0;
      width: 100%;
      height: 90%;
    }

#footer { position: fixed;
          top: 90%
        }

* {
    font-family: sans-serif;
}

CSS Classes used:

.node
elements representing nodes (papers and authors)
.node_bg
the background of the node images (to allow image containing transparency to be more visible)
.node_bg_glow
the glowing background of nodes (used for visualizing state information)
.link
the lines representing the links between nodes
.<state-class>
contains the actual specification for each state of a paper
.author
author nodes
.paper
paper nodes

All text in the nodes should use sans serif fonts, and be non-selectable, because that interferes with dragging. Depending on the personal state of the paper, different colors for highlighting nodes are chosen.

.node text {
    // pointer-events: none;
    font: 10px sans-serif;
    -webkit-user-select: none;  /* Chrome all / Safari all */
    -moz-user-select: none;     /* Firefox all */
    -ms-user-select: none;      /* IE 10+ */
    user-select: none;          /* Likely future */
}

.node_bg {
    fill: white;
    stroke-width: 3px;
    stroke: gray;
}

.node_bg_glow.paper {
    opacity: 0.3;
    filter: url(#glow);
}

.node_bg.paper.read {
    stroke:green;
}

.node_bg.paper.unread {
    stroke:red;
}

.node_bg.paper.started {
    stroke:orange;
}

.node_bg.paper.overview {
    stroke:yellow;
}

.link {
    stroke: #ccc;
}

Elements that are faded out in Highlight on Hover. To make it a bit smoother, add a transition effect for all nodes when changing opacity. Note that this does not seem to work in Chrome.

.faded {
    opacity: 0.5;
}

.node {
    transition: opacity 0.2s;
}

3.1.2 Tooltip

The tooltip is styled here. (see Tooltips section for javascript part)

.tooltip_text
text of tooltips
.tooltip_bg
background (svg rect) of tooltips
.tooltip_text {
    font: 12px sans-serif;
}

.tooltip_bg{
    fill: white;
    stroke: black;
    stroke-width: 1;
    opacity: 0.85;
}

3.2 Code

3.2.1 Global Variables

For lack of better programming style, the following information is defined in global variables:

var width = 1200,               // width of the svg (not used correctly)

    height = 900,               // height of the svg portion (not used correctly)

    icon_size = 16,             // base size of icons for nodes

    jdenticon_size = 50        // base size of the identicons, note
                                // that jdenticon does not allow
                                // images smaller than 30, and padding
                                // is added to that, so 40 should be a
                                // safe minimum

For the actual data imported from BibTeX files:

var nodes=[], links=[], tree;

Other globals are defined before their respective usage.

3.2.2 SVG DOM Structure

In general, d3.js functionality is used to generate the DOM elements.

The svg element should fill the whole width of the browser page, but leave some space below for controls. Also, pointer events have to be caught explicitly. These are actually later caught by the big background rectangle (and I suppose bubbled to the svg element) to implement zooming and panning.

Note that the variable svg actually contains a g element (group).

var svg = d3.select("svg")
    .attr("pointer-events", "all")
    .append("g")
    .call(d3.behavior.zoom().on('zoom', redraw))
;

There is a transparent background rectangle for catching mouse events. It is made as big as the screen to make sure that all background is covered.

svg.append("rect")
    .attr("width", screen.width)
    .attr("height", screen.height)
    .style("fill", "none")
;

There is a container group for all interactive content. This is also the one that the zoom and pan transformations are performed upon:

var container = svg.append("g").attr("id","interactivecontainer");

When updating the layout later (e.g. by loading new data), new nodes and links will be created. If they are simply appended, links will end up on top of nodes. This clutters the view. To prevent that, links and nodes get their own respective sub-groups that they are created in. The order is important: the container for the nodes ends up on top.

var link_container = container.append("g");
var node_container = container.append("g");
3.2.2.1 Tooltips

Tooltips appear when hovering over papers, showing the full title.

There is only one tooltip consisting of a rect and text which live in the top group, and are placed as needed.

var tooltip = svg.append("rect")
    .attr("class","tooltip_bg")
    .attr("id","tooltip_bg")
    .attr("visibility", "hidden")
    .attr("rx", 4)
    .attr("ry", 4)
    .attr("height",16)
    .attr("width",52);

var tooltip_text = svg.append("text")
    .attr("class","tooltip_text")
    .attr("visibility", "hidden");

function show_tooltip(d) {
    if (d.type == "paper") {
        x = d3.event.clientX;
        y = d3.event.clientY;
        tooltip_text
            .text(`${d.display_title}(${d.year})`)
            .attr("visibility","visible")
            .attr("x", x + 11)
            .attr("y", y + 27);
        tooltip
            .attr("visibility","visible")
            .attr("x", x + 8)
            .attr("y", y + 14)
            .attr("width", tooltip_text.node().getComputedTextLength()+8);
    }
}

function hide_tooltip(d) {
    tooltip.attr("visibility", "hidden")
    tooltip_text.attr("visibility", "hidden")
}

The show_tooltip and hide_tooltip functions are later used as onMouseover and onMouseout handlers when the actual nodes are created (TODO: link)

3.2.2.2 Filter for Node Background

The blur effect of the node background is created here. The defs node is attached directly to the svg DOM node.

var defs = d3.select("svg").append("defs");
var filter = defs.append("filter")
    .attr("id", "glow");
filter.append("feGaussianBlur")
    .attr("stdDeviation", "3.5")
    .attr("result", "coloredBlur");

3.2.3 Zooming

Zooming is provided as d3.js-provided behavior, with the following being the zoom event handler.

function redraw() {
    container.attr("transform", "translate(" + d3.event.translate + ")scale(" + d3.event.scale +")");
    // svg.attr("transform", "translate(" + d3.event.translate + ")");
};

3.2.4 Used Layouts

The d3.js Layouts (currently only one) is created here. This follows the general update pattern suggested here.

function update_layout() {

    update_force_layout();

}
3.2.4.1 Force Layout

The force layout is used to display the collaboration graph. All the global properties are set when creating the initial force object.

The d3 selections representing the nodes and links of the layout (NOT the globals holding the actual data)

var force_link,                   // selection of created svg elements for link representation

    force_node                    // selection of created svg elements for node representation

For different modes, different settings are used for the following global variables:

var kx_mul = 0.15,              // multiplier for attractor force in x direction

    ky_mul = 0.4                // multiplier for attractor force in y direction

The actual layout object

Gravity is turned off because all paper nodes have an attractor, so the layout does face the danger of expanding indefinitely. Charge Distance is set, but it seems it does not have a notable influence on performance. It seems because charges are quite high, friction was "increased" from the default 0.9 to 0.7 to stop high-speed movement.

var force = d3.layout.force()
    .nodes(nodes)
    .links(links)
    .linkStrength(0.5)
    .gravity(0)
    .distance(50)
    .chargeDistance(700)
    .charge(collab_charge)
    .friction(0.7)
    .size([width, height])
    .on("tick",force_tick);
  1. Node Property Helper functions

    Several node properties are data-dependent. The following definitions are used to calculate the relevant values for the layout.

    1. Node Significance

      Used as basis for other layout properties.

      The significance of authors is determined by the balls they have, and weighted using a fractional-exponent exponential function, to be able to distinguish the less-significant authors better, since there are usually more of them.

      The significance of papers is the number of citations they have. This is weighted logarithmically for similar reasons.

      function node_significance(d) {
          if (d.type == "author")
              // return icon_size * (1 + (d.balls/20);
              return (1 + (Math.pow((d.balls-1), 0.8) * 0.5));
          else
              return (1 + Math.log10(1 + (d.citations || 0)));
      }
      
    2. Node Image Positioning

      The node image size depends on the significance.

      function node_image_size(d) {
          return icon_size * node_significance(d);
      };
      

      Used to center the image for a node.

      function node_image_offset(d) {
          return - (node_image_size(d) / 2);
      }
      
    3. Node Charge

      For the collaboration layout, the node charge is made dependent on the node significance. This way, it is easier to place lesser-significant nodes around the more central nodes.

      function collab_charge(d) {
          return (node_significance(d) * -300);
      }
      
    4. Node Attractor Targets

      The attractor positions of the papers are a virtual grid, where the papers are ordered in x-direction by the first letter of the bibtex key, and in y-direction by the year of publication. The y positions are compressed in a way that recent publications are spaced wider than older publications.

      function set_collab_paper_targets(node) {
          if (node.type == "paper") {
              // node.y_target = (((2016 - node.year))*20) + 200;
              node.y_target = (Math.sqrt(2016 - node.year) * 100) + 200;
              xmin = "A".charCodeAt(0);
              xmax = "Z".charCodeAt(0);
              xnode = node.name.toUpperCase().charCodeAt(0);
              node.x_target = Math.max(((xnode - xmin) / (xmax - xmin)) * width, 1);
          }
      }
      
  2. Node Dragging Behaviour

    Dragging is provided by a d3.js behavior, but the default event handlers are not used.

    var drag = d3.behavior.drag()
        .origin(function(d) { return d; })
        .on("dragstart", dragstarted)
        .on("drag", dragged)
        .on("dragend", dragended);
    

    Instead, the following handlers are implemented. Note that they rely on undocumented internals (the meaning of the individual bits of the fixed property). These are copied from the original functions.

    function dragstarted(d) {
        d3.event.sourceEvent.stopPropagation();
        d3.select(this).classed("dragging", true);
        // force.d3_layout_forceDragstart(d);
        d.fixed |= 2; // set bit 2
    }
    
    function dragged(d) {
        // d3.select(this).attr("cx", d.x = d3.event.x).attr("cy", d.y = d3.event.y);
        // d.x = d3.event.x, d.y = d3.event.y;
        d.px = d3.event.x, d.py = d3.event.y;
        force.resume(); // restart annealing
    }
    
    function dragended(d) {
        d3.select(this).classed("dragging", false);
        // force.d3_layout_forceDragend(d);
        d.fixed &= ~6; // unset bits 2 and 3
    }
    
  3. Highlight on Hover

    When hovering over a node, all non-connected nodes should be faded out to better highlight the currently selected node.

    The following function is for mouseover and mouseout handlers, respectively, and change the class of the non-highlighted mode to be faded out.

    The argument determines wether to return a fade-in or a fade-out handler.

    function highlight_nodes(highlight_p) {
        return function(d0) {
            force_node
                .filter(function(d) {
                    var match =
                        ( d == d0) ||
                        ((d0.type == "paper") && d0.authors.includes(d)) ||
                        ((d0.type == "author") && d0.papers.includes(d));
                    return !match;
                })
                .classed("faded", highlight_p);
        }
    }
    
  4. Force Layout Creation

    Here is the force layout initialization. It must be called after data is available. See 3.2.4.1.4 for what actually happens, and Force Layout Tick Handler for the tick event handler that is attached.

    It follows the examples here, and here.

    d3.js's enter selection mechanism is used to get the actually created svg DOM nodes for the links (lines) and the nodes (groups). The creation is handled functions for the specific node types. Labels are created in the same way all node types, but link to scholar searches for authors, and pdf files for papers.

    Also cause computation of the attractor targets.

    To help converging, the layout is initialized by setting all the nodes with attractor targets to their calculated target positions.

    function update_force_layout() {
        force.stop();
    
        force.links(links)
            .nodes(nodes);
    
        force_link = link_container.selectAll(".link")
        // .data(force.links(), function(d) {return d.source.name + "-" + d.target.name});
            .data(force.links());
    
        force_link
            .enter().append("line")
            .attr("class", "link");
    
        force_link.exit().remove();
    
        force_node = node_container.selectAll(".node")
            .data(force.nodes(), function(d) {return d.name+d.state+d.bibtype+d.citations+d.balls});
        // .data(force.nodes());
        force_node
            .enter().append("g")
            .attr("class", "node")
            .on("mouseover.tool_tip", show_tooltip)
            .on("mouseout.tool_tip", hide_tooltip)
            .on("mouseover.highlight", highlight_nodes(true))
            .on("mouseout.hightlight", highlight_nodes(false))
            .call(draw_node)
            .call(drag)
        ;
    
        force_node.exit().remove();
    
        force.start().alpha(1);
    }
    

    The creation of all the objects and setting the attributes beneath the node element itself is delegated into draw_node

    function draw_node(node) {
        node.filter(function(n) {return n.type == "author"})
            .call(draw_author_node);
    
        node.filter(function(n) {return n.type == "paper"})
            .call(draw_paper_node);
    
        node.append("g")
            .append("a")
            .attr("xlink:href",function(d) {
                if (d.type == "author")
                    return "http://scholar.google.com/scholar?q=" + encodeURIComponent(d.name)
                else
                    return d.key+".pdf"})
            .append("text")
            .attr("dx", 12)
            .attr("dy", 16)
            .attr("text-anchor", "middle")
            .text(function(d) { return d.name });
    }
    
    1. Author Nodes

      Author nodes are represented by a generic image.

      function draw_author_node(node) {
          node.append("image")
              .attr("xlink:href", "graph-assets/user.png")
              .attr("x", node_image_offset)
              .attr("y", node_image_offset)
              .attr("width", node_image_size)
              .attr("height", node_image_size);
      }
      
    2. Paper Nodes

      For papers that have already been started reading, show an identicon. Otherwise, an empty placeholder. This should make it easier to recognize papers by their identicon over time, also visualizing where there are still "gaps" in the research.

      A small symbol represents the publication type, e.g. conference paper, journal paper, PhD Thesis, etc. For now, only one symbol is used, though. This symbol shall eventually be made optional, since it crowds the whole layout quite a bit. Also, this symbol is hidden by default. See 3.2.5.

      In order to make distinguishing the papers easier, md5 and jdenticon are used to calculate hash values of the bibtex key. The hash is stored in the DOM attribute data-jdenticon-hash. This is a non-standard attribute so far, and is accessed by the modified jdenticon code.

      The identicon gets a white background, to make it easier to distinguish visually.

      A small circle should surround the icon which represents the personal reading state.

      Clicking the label will open a pdf.

      function draw_paper_node(node) {
          // glow
          // node.append("rect")
          //     .attr("x", node_image_offset)
          //     .attr("y", node_image_offset)
          //     .attr("width", node_image_size)
          //     .attr("height", node_image_size)
          //     .attr("class", function(d) {
          //         var s= "node_bg_glow " + d.type;
          //         if (d.type == "paper" && d.state) s = s + " " + d.state;
          //         return s;
          //     });
      
          // white background circle with colored stroke
          node.append("circle")
              .attr("r", function(d) {return node_image_size(d) / 2.2})
              .attr("class", function(d) {
                  var s = `node_bg ${d.type}`;
                  if (d.state) s = s + " " + d.state;
                  return s;
              })
          ;
      
          //jdenticon for partially read papers
          node.filter(function(n) { return (n.state && n.state != "unread")})
              .append("g")
              .attr("class", "jdenticon")
              .attr("data-width", jdenticon_size)
              .attr("data-height", jdenticon_size)
              .attr("data-jdenticon-hash", function(d) { return md5(d.name)})
              .attr("transform", function(d) { return "scale(" + node_significance(d) * (icon_size / jdenticon_size)  + ")"; });
      
          //type symbols
          node.append("image")
              .attr("xlink:href", "graph-assets/note.svg")
              .attr("class", "pub_type")
              .attr("x", node_image_offset)
              .attr("y", node_image_offset)
              .attr("width", function(d) {return node_image_size(d) / 2.5})
              .attr("height", function(d) {return node_image_size(d) / 2.5})
              .attr("visibility", "hidden");
      
      }
      
  5. Force Layout Tick Handler

    This is the "hot loop" that actually updates all the svg elements according to the internal simulation. It implements the attraction forces and updates the position of the svg nodes as well as their links.

    function force_tick(e) {
        var kx = e.alpha * kx_mul;
        var ky = e.alpha * ky_mul;
    
        nodes.forEach(function(node) {
            if (node.x_target)
                node.x += (node.x_target - node.x) * kx;
            if (node.y_target)
                node.y += (node.y_target - node.y) * ky;
        });
    
        force_link.attr("x1", function(d) { return d.source.x; })
            .attr("y1", function(d) { return d.source.y; })
            .attr("x2", function(d) { return d.target.x; })
            .attr("y2", function(d) { return d.target.y; });
    
        force_node.attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; });
    }
    

3.2.5 Controls

At the bottom of the screen, there is space for some user interface controls. This allows live customization of the layout.

The controls are created using d3.js.

3.2.5.1 Display of Publication Type Symbols

The symbols used to display the type of publication tend to crowd the layout, that is why they can be switched on/off. (functionality deactivated, useless right now)

(Currently only one is used, and that one does not distinguish between types yet)

var controls = d3.select("#footer")
    .append("div");

// controls.append("label")
//     .text("Display Type of Publication");

// controls.append("input")
//     .attr("type", "checkbox")
//     // .attr("name", "show_symbols")
//     .on("click", toggle_symbols);

function toggle_symbols() {
    if (this.checked)
        d3.selectAll(".pub_type").attr("visibility", "visible");
    else
        d3.selectAll(".pub_type").attr("visibility", "hidden");
}

3.2.6 BibTex File Loading

A Drag handler on the SVG element reacts to dropping a file onto the canvas. This causes the file to be parsed as BibTex File (see 3.2.6.3). Nodes are compared to the existing ones. If any change happened, the layout is updated.

API reference for ".on()"

Information about FileReader with Drag'n'Drop

Information about reading file contents

3.2.6.1 Read Files on Drag'n'Drop

The drop handler triggers reading the supplied file. This (not visible here) in turn triggers processing the file, as described in the rest of the section.

d3.select("svg")
    .on("dragover", function() {
        d3.event.preventDefault(); })
    .on("drop", function() {
        d3.event.preventDefault();
        var files=d3.event.dataTransfer.files;
        if (files.length == 1) {
            var f = files[0];
            console.log("Filename: " + f.name);
            console.log("Type: " + f.type);
            console.log("Size: " + f.size + " bytes");
            reader.readAsText(f);
        }
    });

A FileReader is instantiated to asynchronously load the data. Further processing (see also 1) is initiated from its onload handler. This includes analyzing the bibtex data and (re-)creating the layout.

var reader = new FileReader();
reader.onload = function(event) {
    var content = event.target.result;
    console.log("File loaded");
    load_bibtex(content);
};

reader.onerror = function(event) {
    console.error("Unable to read file (Code: " + event.target.error.code + ")");
};

load_bibtex_file is used to actually perform the reading and trigger update of the graph. It is taking the bibtex file as text as input.

function load_bibtex(content) {
    var entries = parse_bibtex_file(content);
    var result = analyze_bibtex_entries(entries);
    nodes = result.nodes;
    links = result.links;

    nodes.forEach(function(node) {
        set_collab_paper_targets(node)  /* set the target coordinates for the papers */
        if (node.x_target) node.x = node.x_target;
        if (node.y_target) node.y = node.y_target;
    });

    update_layout();
}
3.2.6.2 Analyze BibTex file

The main information extraction happens here. Each entry corresponds to a paper. The original bibtex fields are in the `bibtex` member of the parsed entry. Some additional properties are computed and added to the entry itself for later use:

authors
list of parsed author objects for the paper
type
node type to distinguish during rendering. Right now, `paper` and `author` are supported, and all these entries are set to `paper`
name
display name of the paper
citationse
number of times the paper has been cited. Determines the size of the node. Right now, hardcoded to use the value of the field "note", because that is where the zotero scholar citations plugin stores the information.
display_title
de-BibTex'd title
year
publication year, just interpret bibtex field as integer
state
personal reading state

Iterating through all the nodes, link and node information is built, and returned in a result object. authors and papers are global for debugging purposes.

The entries that the parser returned are reused as paper objects, later becoming the paper nodes of the layout.

var authors, papers;

function analyze_bibtex_entries(entries) {
    authors = [];
    papers = [];
    var links = [];
    entries.forEach(function(e) {
        if (!e.bibtex.author) { console.error(`BibTeX entry ${e.key} has no author!`); return;}
        as = author_ws_cleanup(e.bibtex.author).split(" and ").map(normalize_author).map(find_author);
        e.authors = as; //replace author list with a list of author objects
        e.type = "paper";
        e.name = e.key;
        if (e.bibtex.state) e.state = e.bibtex.state.toLowerCase();
        e.citations = parseInt(e.bibtex.note) // this is hardcoded right now according to zotero citations plugin
        if (!e.bibtex.title) {
            console.warn(`BibTeX entry ${e.key} has no title, using key instead!`)
            e.display_title = e.key;
        } else
            e.display_title = e.bibtex.title;
        e.year = parseInt(e.bibtex.year);
        e.authors.forEach(function(author) {
            author.papers.push(e);
            links.push({source: author, target: e});
            // this one is deprecated once the old json import is phased out:
            author.balls = (author.balls || 0) + 1;
        });
        papers.push(e);
    });
    return {nodes: papers.concat(authors), links: links};
}

Several helpers are used in above code.

author_ws_cleanup gets rid of leading and trailing white space of an author string, and replaces all sequences of tabs, spaces and newlines with just a single space.

function author_ws_cleanup(a) {
    return a.trim().replace(/\s+/g," ");
}

To see if a paper is attributed to a certain author, first the full author names are compared. If that does not match, only the first letter of the first name is taken into account for the comparison. This caters to the fact that authors are sometimes given by full name, sometimes only by short name. Note that this possibly results in mis-attributions in the graph, when authors with the same last name and the same first letter of the given name exist.

In this case it would be advisable to change the BibTeX source to include the author's full name(s).

Note that when adding an author to the list of known authors, the object previously returned by normalize_author is re-used and initialized with more properties. This is currently only the list of attributed papers, though.

function find_author(a) {
    var found = authors.find(function(item) {
        return item.last == a.last && item.given == a.given }) ||
        authors.find(function(item) {
            return item.last == a.last && item.given[0] == a.given[0]});
    if (found)
        return found;
    else {
        // initialize author fields here
        a.papers = [];
        a.type = "author";
        a.name = a.last;        // display name
        if (a.given) a.name = a.name + ", " + a.given;
        authors.push(a);
        return a;
    }
}

This one is responsible for normalizing an author name of an entry. Note that this does not follow BibTex guidelines completely, but is able to extract the most common cases. When in doubt, specifying the names using "Last, First" is always the most unambiguous way.

function normalize_author(s) {
    var c = s.split(", ");
    if (c.length == 1) {        // no commas?
        c = c[0].split(" ");    // split by spaces
        var last = c.pop();
        return {given: c.join(" "), last: last};
    } else
        return {given: c[c.length-1].split(" ")[0], last: c[0]};
}
3.2.6.3 BibTex Parser

BibTex files are parsed using PEG.js.

Limitations:

  • no support for "@string" (if someone tells me how to handle state during parser execution, that would be quite easy to add)
  • when encountering variables, they are not substituted

This should not really matter, since the information in such fields is not displayed anyways (yet).

This page has very nice information about the BibTex Syntax.

This expression is used to generate the parser:

start = bibtex

_ "whitespace" = [ \t\n\r]*

name = value:[a-zA-Z0-9_-]+ { return value.join("").toLowerCase() }
number = value:[0-9]+ { return parseInt(value.join(""),10) }
month_const = "jan"/"feb"/"mar"/"apr"/"may"/"jun"/"jul"/"aug"/"sep"/"oct"/"nov"/"dec"

non_brace = value: [^{}]+ { return value.join("") }
non_quote_non_brace = value: [^{}"]+ { return value.join("") }

braced_value = "{" values: (non_brace / braced_value)* "}" { return values.join("") }
quoted_value = "\"" values: (non_quote_non_brace / braced_value)* "\"" { return values.join("") }

//fallback: when encountering an unquoted or unbrace value, assume variable name, without doing substitutions
value = braced_value / quoted_value / number / month_const / name

key = value:[^,]+ { return value.join("") }

field = name:name _ "=" _ value:value { return { name:name, value:value}}
field_with_separator = _ field:field _ "," { return field }

normal_entry = "@" type:name _ "{" _ key:key _ "," fields:(field_with_separator)* _ last_field:field? _ '}'
  { if (last_field) fields.push(last_field);
    var ret = {bibtype: type, key: key, bibtex: {}};
    fields.forEach(function(f) { if (!ret.bibtex[f.name]) ret.bibtex[f.name] = f.value});
  return ret; }

string = "@STRING"i _ "{" _ f:field _ "}" {error("@string directive not supported")}

preamble = "@PREAMBLE"i _ "{" value "}"

comment_body = value:[^@]+ {return value.join("")}

comment_entry = "@COMMENT"i _ comment:braced_value { return comment }

comment = comment_entry / comment_body

entry = string / preamble / comment / normal_entry

bibtex = elements:(entry / comment)* { return elements.filter(function(e) {return typeof(e) == "object"})}

Now we need to generate the necessary node and link data from the parsed BibTex. For debugging purposes, the parse tree is stored globally.

If an exception is encountered during parsing, the parser error message is displayed in a message box.

var parse_tree;

function parse_bibtex_file(content) {
    try {
        parse_tree = bibtex_parser.parse(content);
        return parse_tree;
    } catch (e) {
        alert(`Line ${e.location.start.line}, Column ${e.location.start.column}: ${e.message}`);
        throw(e);
    }
}

3.2.7 Application Initialization/Data Loading

When the Application is loaded, a local file "example.bib" is opened.

d3.text("example.bib", function(error, content) {
    if (error) throw(error);
    load_bibtex(content);
});

4 Hacking

This file is used to generate code and documentation. It requires org-mode which is supplied by emacs. To (re-)generate the code file, open this document and evaluate org-babel-tangle.

5 Plan

5.2 TODO rename 1

5.3 TODO re-implement classification layout without actually using the targets as attractors

blend the current attractor position with the classification target position instead. This way, no actual simulation has to be done when switching modes

5.4 TODO rename category -> classification

5.5 TODO remove hardcoding 2016, substitute for current year

5.6 TODO switch to radial cluster for classification layout

5.7 TODO fix initial positioning in 3.2.4.1.4

Problem: initial positioning causes very unstable initial layouting.

Idea: place papers first, then add authors one after each other (at one of their papers) to allow for settling a bit before adding another.

Can that be done still using the entry selection?

5.8 TODO rename icon_size -> node_size

5.9 TODO open pdf based on file property, not on inferred key value

5.10 TODO interpret bibtex types, assign icons, warn when required fields not present

The idea is to catch probably unintended mistakes, but be robust for other kinds of non-well-defined fields

Author: timor

Created: 2016-06-16 Do 15:56

Validate