Configure only an `allowedList` of `:params` to be included in structured logs (confidential notes leaked in production_json.log)
Problem to solve
Currently, :params
which include private or confidential user content is being recorded in the structured json logs. For example, :notes
on confidential issues or issues in private project.
This issue is an outcome of the discussion on: gitlab-com/gl-infra/infrastructure#6176, so making this issue confidential as well until resolved. In particular, the note
fields of issues and merge requests and the text
fields of wikis of private projects should not be logged.
-
Sidney, Systems Administrator, https://design.gitlab.com/research/personas#persona-sidney
-
Sam, Security Analyst, https://design.gitlab.com/research/personas#persona-sam
Further details
Sam needs to recreate the activity of an account in addition to what is recorded by the audit events. Enough information should be included in the logs to enable correlating between them. This included the numeric or text ids involved in the activity.
Sidney, an administrator of a GitLab instance, needs to provide enough information to support to help debug a problem in a timely manner, without requiring excessive manual redaction or other additional work to share relevant logs.
Proposal
Only :params
in an allowedList
should be logged. Using an allowedList
instead of a blockedList
reduces the risk of regressions due to new fields being added.
Allowed parameters:
- any numeric
_ids
(numeric),userid
- user name, usually
username
. This are generally not considered private, since profiles can be viewed even if all activity is hidden. - Other enumerable text fields. For example
target_type
. Expressed another way, fields that can be optimized in Elastic askeyword
fields without full-text indexing. - The following fields are sensitive, but required for operation of the site, as they also appear in URIs:
group_id
namespace_id
What does success look like, and how can we measure that?
- Reduced load (size and/or processing) on GitLab.com logging infrastructure due to large text and other content fields no longer present in the logs.
- Security analysts are able to perform initial triage/investigation from the available logs.
- Compliance requirements regarding access to production logs, if any, are met.
Other notes on issue scope
Additional issues will need address more sensitive information than that listed above that may still be present in the remaining unstructured logs.