Skip to content

Docker compose using depends_on can lead to duplicate graph traversals  #9014

Closed
@rogerhu

Description

@rogerhu

Description

docker-compose up fails because of creating a duplicate container -- we're not specifying replicas (and not setting the value), but it seems like it's trying to create the same container twice.

 kochiku_redis:
    image: redis:4.0.10
 kochiku_mysql:
    image: mysql57:percona-5.7.30-centos
    hostname: worker-10-132-64-151.ec2
 kochiku_build:
    depends_on:
    - kochiku_mysql
    - kochiku_redis

Can yield:

Container initial_worker-kochiku_mysql-1  Creating
Container initial_worker-kochiku_redis-1  Creating
Container initial_worker-kochiku_mysql-1  Created
Container initial_worker-kochiku_redis-1  Created
Container initial_worker-kochiku_build-1  Creating
Container initial_worker-kochiku_build-1  Creating
Container initial_worker-kochiku_build-1  Created
Error response from daemon: Conflict. The container name "/initial_worker-kochiku_build-1" is already in use by container "bd810aa12bd174ee0c47e0b35c75ae95ade367c89b45b45c632f1c400e3e426c". You have to remove (or rename) that container to be able to reuse that name.

Upon further investigation, the graph traversal algorithm may have a race condition. With a common parent, this can trigger multiple traversals to the parent and cause Docker creation to fail. Basically the algorithm appears to be:

  1. Create the dependency graph. Attach parent/child relationships.
  2. Find all the leaf nodes (kochiku_mysql, kochiku_redis) and start their respective containers up.
  3. Traverse up to parents and start its containers up.

The problem seems to be that both kochiku_mysql and kochiku_redis pointing to the same parent, so we launch multiple Goroutines to start them up. It seems as if filterAdjacentByStatusFn and updateStatus is the source of the race condition. If both calls to updateStatus completes before the filterAdjacentByStatusFn routine, we will see the parent being visited twice. I think there needs to be code added to check for already visited parent nodes.

Steps to reproduce the issue:

What I did

  • Running this revised test in Docker compose bug for common parents #9013, I put stepped through the code section between lines 93 and 102:
    for _, node := range nodes {
    // Don't start this service yet if all of its children have
    // not been started yet.
    if len(traversalConfig.filterAdjacentByStatusFn(graph, node.Service, traversalConfig.adjacentServiceStatusToSkip)) != 0 {
    continue
    }
    node := node
    eg.Go(func() error {
    err := fn(ctx, node.Service)
    if err != nil {
    return err
    }
    graph.UpdateStatus(node.Service, traversalConfig.targetServiceStatus)
    return run(ctx, graph, eg, traversalConfig.adjacentNodesFn(node), traversalConfig, fn)
    . I particularly made sure the Goroutine in line 102 got fired twice to force this race condition.

Related issue

Describe the results you received:

Every so often, I'd see:

kochiku_redis
kochiku_mysql
kochiku_build
kochiku_build

Describe the results you expected:

Watch every so often the output vary between GOOD:

kochiku_redis
kochiku_mysql
kochiku_build

Additional information you deem important (e.g. issue happens only occasionally):

$ docker-compose version
Docker Compose version v2.0.1

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.3-docker)
  scan: Docker Scan (Docker Inc., v0.9.0)

Additional environment details:

Repro'd locally on a MacOS with IntellIJ and the latest master. Problem shows up on latest Linux Docker versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions