Steps to Create a Metric API Endpoint

Summary

There are many paths, but we usually follow something along these lines:

  1. What is the CHAOSS metric we want to develop?

  2. Sometimes, there are metrics endpoints that integrate, or visualize several metrics.

  3. Determine what tables in the Augur Schema contain the data we need to develop this metric

  4. Construct a very basic query that does the work of joining those tables in a minimal way so we have a “baseline query.”

  5. Refine the query so that it takes the standard inputs for a “standard metric” if that’s what type it is; alternatively, look at non-standard metrics as they are defined in AUGUR_HOME/augur/routes, or one of the visualization metrics in AUGUR_HOME/augur/routes/contributor.py, AUGUR_HOME/augur/routes/pull_requests.py or AUGUR_HOME/augur/routes/nonstandard_metrics.py. (This step is explained in the next section.)

Example Query

This is an example query to Get Us Started on a Labor Effort and Cost Endpoint.

  1. What tables?

repo
repo_group

If we look at the Augur Schema, we can see that effort and cost are contained in the repo_labor table.

  1. What might our initial query to explore building the endpoint be?

SELECT C.repo_id,
     C.repo_name,
     programming_language,
     SUM ( estimated_labor_hours ) AS labor_hours,
     SUM ( estimated_labor_hours * 50 ) AS labor_cost,
     analysis_date
     FROM
     (
             SELECT A
                     .repo_id,
                     b.repo_name,
                     programming_language,
                     SUM ( total_lines ) AS repo_total_lines,
                     SUM ( code_lines ) AS repo_code_lines,
                     SUM ( comment_lines ) AS repo_comment_lines,
                     SUM ( blank_lines ) AS repo_blank_lines,
                     AVG ( code_complexity ) AS repo_lang_avg_code_complexity,
                     AVG ( code_complexity ) * SUM ( code_lines ) + 20 AS estimated_labor_hours,
                     MAX ( A.rl_analysis_date ) AS analysis_date
             FROM
                     repo_labor A,
                     repo b
             WHERE
                     A.repo_id = b.repo_id
             GROUP BY
                     A.repo_id,
                     programming_language,
                     repo_name
             ORDER BY
                     repo_name,
                     A.repo_id,
                     programming_language
             ) C
     GROUP BY
             repo_id,
             repo_name,
             programming_language,
             C.analysis_date
     ORDER BY
             repo_id,
             programming_language;
  1. Over time, as CHAOSS develops a metric for labor investment, the way we calculate hours, and cost in this query will adapt to whatever the CHAOSS community determines is an apt formula.

  2. We will fit this metric into one of the different types of metric API Endpoints discussed in the next section.

Note

Augur uses https://github.com/boyter/scc to calculate information contained in the labor_value table, which is populated by the value_worker tasks.