Complete Data Model on a Page
The latest version of Augur includes a schema that brings together data around key artifacts of open source software development.
This document details how to create the schema as well as some information on its contents and design.
Complete Data Model, With Key Table Types Highlighted
Complete Data Model, For Current Release
Creating the schema
The process for creating the schema is detailed in the database section of the Getting Started guide.
Schema Overview
Augur Data
The augur_data
schema contains most of the information analyzed
and constructed by Augur. The origin’s of the data inside of augur are
from data collection tasks and populate this schema.:
1. augur.tasks.github.*
: Tasks that pull data from the GitHub API.
Primarily, pull requests and issues are collected before more complicated
data. Note that all messages are stored in Augur in the messages
table.
This is to facilitate easy analysis of the tone and characteristics of text
communication in a project from one place.
2. augur.tasks.git.facade_tasks
: Based on
http://www.github.com/brianwarner/facade, but substantially modified in
the fork located at http://github.com/sgoggins/facade. The modifications
include modularization of code, connections to Postgresql data instead
of MySQL and other changes noted in the commit logs. Further modifications
have been made to work with augur as well as seemlessly integrate it into
data collection.
3. augur.tasks.data_analysis.insight_worker.tasks
: Generates summarizations from raw data
gathered from commits, issues, and other info.
augur.tasks.github.pull_requests.tasks
: Collects Pull Request related data such as commits, contributors,assignees, etc. from the Github API and stores it in the Augur database.
Augur Operations
The augur_operations
tables are where most of the operations tables
exist. There are a few, like settings
that remain in
augur_data
for now, but will be moved. They keep records related to
analytical history and data provenance for data in the schema. They also
store information including API keys.
Some key tables in this schema include:
config
, which contains the config options for the application. Key options include the facade repo_directory as well as primary api key.collection_status
, contains the status of each aspect of data collection for each repo added to Augur. For example, it shows the status of the facade jobs for every repository.
SPDX
The spdx
schema serves the storage for software bill of materials
and license declarations scans on projects, conducted using this fork of
the DoSOCSv2 project: https://github.com/Nebrethar/DoSOCSv2