Skip to content

Enable service discovery for schema migrations in k8s deployments for GitLab.com

Context

In Add configuration to use service discovery to i... (#890 - closed) we added a way to configure and use service discovery for schema migrations. This is a follow-up to that issue to actually enable the migrations in our environments.

Update: Based on the recent findings from #1006 (comment 1610698801) . I'm rescoping this issue to only target:

  • Asserting that we can use the database fqdn from the registry.
  • Removing the service discovery configuration in the Gitlab.com deployments and in the registry configuration setup

With the above tackled in this issue we can follow up with:

  • Add a new configuration field (to the registry database config section) for specifying the primary database host used for running migration: #1142 (closed)
  • Update Gitlab.com deployments to use the new configuration in gstgs and gprd: #1143 (closed)

Implementation plan

Once we finish verifying that the registry can resolve the primary address of the database (#890 (closed)), we should update the environments (pre, gstg, gprd) so that the migrations are checked and run using service discovery.

This depends on Registry: add database discovery configuration (gitlab-org/charts/gitlab!3142 - merged) and Release Version v3.72.0-gitlab (#999 - closed)

Steps:

  1. Update the K8s workloads on pre and gstg with the service discovery fields https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/blob/master/releases/gitlab/values/pre.yaml.gotmpl#L56 added via gitlab-org/charts/gitlab!3142 (merged).
 discovery:
    enabled: false
    # in testing mode the registry will attempt to solve the configuration when `nameserver` is not empty.
    nameserver: fqdn.of.valid.dns
    port: 53
    primaryrecord: primary.database.fqdn.
    tcp: true
  1. Monitor the logs and search for json.msg: "DNS lookup found SRV records" and json.msg: "DNS lookup found A records" or related errors.
  2. If we can confirm that the registry can resolve the service names:
    • Add the settings to gprd and do the same checks in the logs
    • Continue with step 4
  3. Make the necessary changes to the registry to use the resolved IP address for the database configuration to run schema migrations.
  4. Release and deploy to pre and test that schema migrations are working.
  5. Proceed to gprd!

If step 3. failed, we will need to work with infra to find the appropriate settings to find the nameserver and the primaryrecord

Note: Depending on the success of this issue, please look at chore(database): set maxopen to 1 for migrations (#922 - closed) and make sure it's still viable. If so, ask for it to be scheduled in milestone N+1.

Edited by SAhmed
OSZAR »