MongoDB | Model Tree Structures

In MongoDB, there are several ways to represent tree data structures.

Parent References

Here’s what the tree might look like:

- Comment 1 (id: ObjectId("60925b2f912bf6f590cdd3a3"))
  - Comment 2 (id: ObjectId("60925b8a912bf6f590cdd3a6"))
    - Comment 3 (id: ObjectId("60925b9c912bf6f590cdd3a7"))
      - Comment 4 (id: ObjectId("60925bb4912bf6f590cdd3a8"))
    - Comment 5 (id: ObjectId("60925bc4912bf6f590cdd3a9"))
- Comment 6 (id: ObjectId("60925bd5912bf6f590cdd3aa"))

In this approach, each document has a reference to its parent. For example, consider a directory structure where each directory can have subdirectories:

{
  "_id": ObjectId("60925b2f912bf6f590cdd3a3"),
  "parent": null,
  "text": "This is a top level comment"
},
{
  "_id": ObjectId("60925b8a912bf6f590cdd3a6"),
  "parent": ObjectId("60925b2f912bf6f590cdd3a3"),
  "text": "This is a reply to the first comment"
},
{
  "_id": ObjectId("60925b9c912bf6f590cdd3a7"),
  "parent": ObjectId("60925b8a912bf6f590cdd3a6"),
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bb4912bf6f590cdd3a8"),
  "parent": ObjectId("60925b9c912bf6f590cdd3a7"),
  "text": "This is a reply to the third comment"
},
{
  "_id": ObjectId("60925bc4912bf6f590cdd3a9"),
  "parent": ObjectId("60925b8a912bf6f590cdd3a6"),
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bd5912bf6f590cdd3aa"),
  "parent": null,
  "text": "This is a top level comment"
}

In this example, each document has a parent field that contains the ID of its parent document. The root directory has a null value for its parent field.

Child References

Here’s what the tree might look like:

- Electronics (id: ObjectId("60925af9912bf6f590cdd3a1"))
  - Phones (id: ObjectId("60925b0a912bf6f590cdd3a2"))
    - Smartphones (id: ObjectId("60925b23912bf6f590cdd3a4"))
    - Feature Phones (id: ObjectId("60925b31912bf6f590cdd3a5"))
  - Computers (id: ObjectId("60925b1a912bf6f590cdd3a3"))

In this approach, each document has a reference to its children. For example, consider a category tree where each category can have subcategories:

{
  "_id": ObjectId("60925af9912bf6f590cdd3a1"),
  "name": "Electronics",
  "children": [
    ObjectId("60925b0a912bf6f590cdd3a2"),
    ObjectId("60925b1a912bf6f590cdd3a3")
  ],
  "parent": null
},
{
  "_id": ObjectId("60925b0a912bf6f590cdd3a2"),
  "name": "Phones",
  "children": [
    ObjectId("60925b23912bf6f590cdd3a4"),
    ObjectId("60925b31912bf6f590cdd3a5")
  ],
  "parent": ObjectId("60925af9912bf6f590cdd3a1")
},
{
  "_id": ObjectId("60925b1a912bf6f590cdd3a3"),
  "name": "Computers",
  "children": [],
  "parent": ObjectId("60925af9912bf6f590cdd3a1")
},
{
  "_id": ObjectId("60925b23912bf6f590cdd3a4"),
  "name": "Smartphones",
  "children": [],
  "parent": ObjectId("60925b0a912bf6f590cdd3a2")
},
{
  "_id": ObjectId("60925b31912bf6f590cdd3a5"),
  "name": "Feature Phones",
  "children": [],
  "parent": ObjectId("60925b0a912bf6f590cdd3a2")
}

In this example, each document has a children field that contains an array of IDs of its child documents. The root category has a null value for its parent field.

Materialized Paths

Here’s what the tree might look like:

1
└── 2
    ├── 3
    │   └── 4
    └── 5
6

In this approach, each document contains a path field that represents its position in the tree. For example, consider a comment tree where each comment can have child comments:

{
  "_id": ObjectId("60925b8a912bf6f590cdd3a6"),
  "path": "1/2/",
  "text": "This is a reply to the first comment",
},
{
  "_id": ObjectId("60925b9c912bf6f590cdd3a7"),
  "path": "1/2/3/",
  "text": "This is a reply to the second comment",
},
{
  "_id": ObjectId("60925bb4912bf6f590cdd3a8"),
  "path": "1/2/3/4/",
  "text": "This is a reply to the third comment",
},
{
  "_id": ObjectId("60925bc4912bf6f590cdd3a9"),
  "path": "1/2/5/",
  "text": "This is a reply to the second comment",
},
{
  "_id": ObjectId("60925bd5912bf6f590cdd3aa"),
  "path": "6/",
  "text": "This is a top level comment",
}

Each document has a path field that represents its position in the tree. The path field contains the IDs of all its ancestors in the tree, separated by slashes. For example, the first comment has a path of 1/ because it is a child of the top level comment. The second comment has a path of 1/2/ because it is a child of the first comment, which is a child of the top level comment. The third comment has a path of 1/2/3/ because it is a child of the second comment, which is a child of the first comment, which is a child of the top level comment.

Nested Sets

The Nested Sets pattern identifies each node in the tree as stops in a round-trip traversal of the tree. The application visits each node in the tree twice; first during the initial trip, and second during the return trip. Example of how to represent the same tree:

{
  "_id": ObjectId("60925b2f912bf6f590cdd3a3"),
  "left": 1,
  "right": 12,
  "text": "This is a top level comment"
},
{
  "_id": ObjectId("60925b8a912bf6f590cdd3a6"),
  "left": 2,
  "right": 5,
  "text": "This is a reply to the first comment"
},
{
  "_id": ObjectId("60925b9c912bf6f590cdd3a7"),
  "left": 6,
  "right": 9,
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bb4912bf6f590cdd3a8"),
  "left": 10,
  "right": 11,
  "text": "This is a reply to the third comment"
},
{
  "_id": ObjectId("60925bc4912bf6f590cdd3a9"),
  "left": 7,
  "right": 8,
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bd5912bf6f590cdd3aa"),
  "left": 3,
  "right": 4,
  "text": "This is a top level comment"
}

Each document has a left and right field that represent its position in the tree. The left and right values are assigned in a way that guarantees that the range between left and right contains all the descendants of the node. For example, the top level comment has a left of 1 and a right of 12, which means that all its descendants have a left value greater than 1 and a right value less than 12. The first comment has a left of 2 and a right of 5, which means that all its descendants have a left value greater than 2 and a right value less than 5. The second comment has a left of 6 and a right of 9, which means that all its descendants have a left value greater than 6 and a right value less than 9. And so on.

Array of Ancestors

One approach is to use the Array of Ancestors model, which involves adding an ancestors array to each document in the tree that contains the IDs of all its ancestors.

Here’s an example to demonstrate how the Array of Ancestors model works. Let’s say we have a simple tree structure consisting of comments on a blog post. Each comment can have zero or more child comments, forming a hierarchy. The top level comments have no parent and form the root of the tree. Here’s what the tree might look like:

Top Level Comment 1 (_id:60925b2f912bf6f590cdd3a3)
|
├── Reply to Comment 1 (_id:60925b8a912bf6f590cdd3a6)
|   ├── Reply to Comment 2 (_id:60925b9c912bf6f590cdd3a7)
|   |   └── Reply to Comment 3 (_id:60925bb4912bf6f590cdd3a8)
|   └── Reply to Comment 2 (_id:60925bc4912bf6f590cdd3a9)
|
Top Level Comment 2 (_id:60925bd5912bf6f590cdd3aa)

In this example, Comment 1 is the top level comment and has two children (Comment 2 and Comment 6). Comment 2 has two children (Comment 3 and Comment 5), and Comment 3 has one child (Comment 4).

To represent this tree using the Array of Ancestors model, we would add an ancestors field to each comment document that contains an array of ancestor IDs. For example:

{
  "_id": ObjectId("60925b2f912bf6f590cdd3a3"),
  "ancestors": [], // no ancestors, this is the top level comment
  "text": "This is a top level comment"
},
{
  "_id": ObjectId("60925b8a912bf6f590cdd3a6"),
  "ancestors": [
    ObjectId("60925b2f912bf6f590cdd3a3")
  ],
  "text": "This is a reply to the first comment"
},
{
  "_id": ObjectId("60925b9c912bf6f590cdd3a7"),
  "ancestors": [
    ObjectId("60925b2f912bf6f590cdd3a3"),
    ObjectId("60925b8a912bf6f590cdd3a6")
  ],
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bb4912bf6f590cdd3a8"),
  "ancestors": [
    ObjectId("60925b2f912bf6f590cdd3a3"),
    ObjectId("60925b8a912bf6f590cdd3a6"),
    ObjectId("60925b9c912bf6f590cdd3a7")
  ],
  "text": "This is a reply to the third comment"
},
{
  "_id": ObjectId("60925bc4912bf6f590cdd3a9"),
  "ancestors": [
    ObjectId("60925b2f912bf6f590cdd3a3"),
    ObjectId("60925b8a912bf6f590cdd3a6")
  ],
  "text": "This is a reply to the second comment"
},
{
  "_id": ObjectId("60925bd5912bf6f590cdd3aa"),
  "ancestors": [], // no ancestors, this is the top level comment
  "text": "This is a top level comment"
}

Each document has an ancestors field that contains an array of object IDs representing all its ancestors in the tree, starting from the top level comment and ending with its immediate parent. For example, the first comment has an empty ancestors array because.