Complete Data Model on a Page
The latest version of Augur includes a schema that brings together data around key artifacts of open source software development.
This document details how to create the schema as well as some information on its contents and design.
Complete Data Model, With Key Table Types Highlighted
Complete Data Model, For Current Release
Creating the schema
The process for creating the schema is detailed in the database section of the Getting Started guide.
augur_data schema contains most of the information analyzed
and constructed by Augur. The origin’s of the data inside of augur are:
workers/augur_github_worker: Pulls data from the GitHub API.
Presently this is focused on issues, including issue_comments,
issue_events, issue_labels and contributors. Note that all messages are
stored in Augur in the
messages table. This is to facilitate easy
analysis of the tone and characteristics of text communication in a
project from one place.
workers/facade_worker: Based on
http://www.github.com/brianwarner/facade, but substantially modified in
the fork located at http://github.com/sgoggins/facade. The modifications
include modularization of code, connections to Postgresql data instead
of MySQL and other changes noted in the commit logs.
workers/insight_worker: Generates summarizations from raw data
gathered from commits, issues, and other info.
workers/linux_badge_worker: Pulls data from the Linux Foundation’s
workers/value_worker: Populates the table
repo_labor using the “SCC” tool provided the
https://github.com/boyter/scc project. “SCC” required Go to be installed on your system. Visit this resource for instructions on Go installation.
workers/pull_request_worker: Collects Pull Request related data such as commits, contributors,assignees, etc. from the Github API and stores it in the Augur database.
augur_operations tables are where most of the operations tables
are going to exist. There are a few, like
settings that remain in
augur_data for now, but will be moved. They keep records related to
analytical history and data provenance for data in the schema. They also
store information including API keys.
spdx schema serves the storage for software bill of materials
and license declarations scans on projects, conducted using this fork of
the DoSOCSv2 project: https://github.com/Nebrethar/DoSOCSv2