List of Regularly Used Data Tables In Augur
This is a list of data tables in augur that are regularly used and the various tasks attached to them.
Commits
Contributor_affiliations
A list of emails and domains, with start and end dates for individuals to have an organizational affiliation.
Populated by default when augur is installed
Can be edited so that an Augur instance can resolve a larger list of affiliations.
These mappings are summarized in the dm_ tables.
Contributor_repo
Storage of a snowball sample of all the repositories anyone in your schema has accessed on GitHub. So, for example, if you wanted to know all the repositories that people on your project contributed to, this would be the table.
Contributors
These are all the contributors to a project/repo. In Augur, all types of contributions create a contributor record. This includes issue comments, pull request comments, label addition, etc. This is different than how GitHub counts contributors; they only include committers.
Contributors_aliases
These are all the alternate emails that the same contributor might use. These records arise almost entirely from the commit log. For example, if I have two different emails on two different computers that I use when I make a commit, then an alias is created for whatever the 2nd to nth email Augur runs across. If a user’s email cannot be resolved, it is placed in the unresolved_commit_emails table. Coverage is greater than 98% since Augur 1.2.4.
Discourse_insights
There are nine specific discourse act types identified by the computational linguistic algorithm that underlies the discourse insights task. This task analyzes each comment on each issue or pull request sequentially so that context is applied when determining the discourse act type. These types are:
issue_assignees || issue_events || issue_labels
issue_message_ref
issues
Message
Message_analysis
Message_analysis_summary
Platform
Reference data with two rows: one for GitHub, one for GitLab.
Pull_request_analysis
pull_request_assignees || pull_request_commits || pull_request_events || pull_request_files || pull_request_labels || pull_request_message_ref
pull_request_meta || pull_request_repo || pull_request_review_message_ref || pull_request_reviewers || pull_request_reviews || pull_request_teams || pull_requests
Releases
Github declared software releases or release tags. For example: https://github.com/chaoss/augur/releases
Repo
Repo_badging
A list of CNCF badging information for a project. Reads this api endpoint: https://bestpractices.coreinfrastructure.org/projects.json
Repo_cluster_messages
Repo_dependencies
Repo_deps_libyear
(enumerates every package managed dependency) Looks up the latest release of any library that is imported into a project. Then it compares that release date, the release version of the library version in your project (and its release date), and calculates how old your version is, compared to the latest version. The resulting statistic is “libyear”. This task runs with the facade tasks, so over time, you will see if your libraries are being kept up to date, or not.
Repo_deps_scorecard
Runs the OSSF Scorecard over every repository ( https://github.com/ossf/scorecard ) : There are 16 factors that are explained at that repository location.
Repo_groups
Repo_info
This task gathers metadata from the platform API that includes things like “number of stars”, “number of forks”, etc. AND it also gives us : Number of issues, number of pull requests, etc. .. THAT information we use to determine if we have collected all of the PRs and Issues associated with a repository.