Complete Data Model on a Page
=================================

The latest version of Augur includes a schema_ that brings together data around key artifacts of open source software development.

This document details how to create the schema as well as some information on its contents and design.

-------------------------------------------------------
Complete Data Model, With Key Table Types Highlighted
-------------------------------------------------------

.. image:: schema.png
  :width: 1200
  :alt: Augur Unified Schema


-------------------------------------------------------
Complete Data Model, For Current Release
-------------------------------------------------------

.. image:: images/20211011-augur-schema-v0.21.1.png
  :width: 1200
  :alt: Augur Unified Schema

-------------------------------------------------------
Creating the schema
-------------------------------------------------------

The process for creating the schema is detailed in the  `database section <../getting-started/database.html>`_ of the Getting Started guide.

-------------------------------------------------------
Schema Overview
-------------------------------------------------------

Augur Data
-------------------------------------------------------

The ``augur_data`` schema contains *most* of the information analyzed
and constructed by Augur. The origin’s of the data inside of augur are
from data collection tasks and populate this schema.:

1. ``augur.tasks.github.*``: Tasks that pull data from the GitHub API.
Primarily, pull requests and issues are collected before more complicated
data. Note that all messages are stored in Augur in the ``messages`` table.
This is to facilitate easy analysis of the tone and characteristics of text
communication in a project from one place.

2. ``augur.tasks.git.facade_tasks``: Based on
http://www.github.com/brianwarner/facade, but substantially modified in
the fork located at http://github.com/sgoggins/facade. The modifications
include modularization of code, connections to Postgresql data instead
of MySQL and other changes noted in the commit logs. Further modifications
have been made to work with augur as well as seemlessly integrate it into
data collection.

3. ``augur.tasks.data_analysis.insight_worker.tasks``: Generates summarizations from raw data
gathered from commits, issues, and other info.

4. ``augur.tasks.github.pull_requests.tasks``: Collects Pull Request related data such as commits, contributors,assignees, etc. from the Github API and stores it in the Augur database.

Augur Operations
-------------------------------------------------------

The ``augur_operations`` tables are where most of the operations tables
exist. There are a few, like ``settings`` that remain in
``augur_data`` for now, but will be moved. They keep records related to
analytical history and data provenance for data in the schema. They also
store information including API keys.

Some key tables in this schema include:

- ``config``, which contains the config options for the application. Key options include the facade repo_directory as well as primary api key.

- ``collection_status``, contains the status of each aspect of data collection for each repo added to Augur. For example, it shows the status of the facade jobs for every repository.

SPDX
-------------------------------------------------------

The ``spdx`` schema serves the storage for software bill of materials
and license declarations scans on projects, conducted using this fork of
the DoSOCSv2 project: https://github.com/Nebrethar/DoSOCSv2

.. _schema: