And here are some critial questions about the data structures
cgroup carries tasks and controller information, constructs a hierarchy and
act as the interface that interacts with users through kernfs.
As the name, css is a data structure for a cgroup-subsystem pair.
Take blkio subsystem as example,
blkcg_css_alloc()
---
mutex_lock(&blkcg_pol_mutex);
if (!parent_css) {
blkcg = &blkcg_root;
} else {
blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
}
for (i = 0; i < BLKCG_MAX_POLS ; i++) {
struct blkcg_policy *pol = blkcg_policy[i];
struct blkcg_policy_data *cpd;
cpd = pol->cpd_alloc_fn(GFP_KERNEL);
blkcg->cpd[i] = cpd;
cpd->blkcg = blkcg;
cpd->plid = i;
if (pol->cpd_init_fn)
pol->cpd_init_fn(cpd);
}
list_add_tail(&blkcg->all_blkcgs_node, &all_blkcgs);
mutex_unlock(&blkcg_pol_mutex);
return &blkcg->css;
---
css looks like the vfs inode in filesystem. Most of time, the filesystem
would embed vfs inode into their own inode structure.
css/blkcg contains the information to control the io of one cgroup
In the other word, css_set is a collection of cgroups from different hierarchy
Look at the task_struct, one task is linked into the css_set and also get
the information of css through css_set
task_struct
---
#ifdef CONFIG_CGROUPS
/* Control Group info protected by css_set_lock: */
struct css_set __rcu *cgroups;
/* cg_list protected by css_set_lock and tsk->alloc_lock: */
struct list_head cg_list;
#endif
---
blkcg_css()
-> task_css(current, io_cgrp_id);
-> task_css_check(task, subsys_id, false)
-> task_css_set_check((task), (__c))->subsys[(subsys_id)]
-> rcu_dereference_check((task)->cgroups, \
css_set_move_task()
---
cgroup_move_task(task, to_cset);
list_add_tail(&task->cg_list, use_mg_tasks ? &to_cset->mg_tasks :
&to_cset->tasks);
---
The answer is cgroup V1 and its special hierarchy.
cpu-cg-root ' mem-cg-root ' blkio-cg-root
/ \ ' / \ ' / \
cpu-cg-0 cpu-cg-1 ' mem-cg-0 mem-cg-1 ' blkio-cg-0 blkio-cg-1
/ \ ' / \ ' / \
cpu-cg-2 cpu-cg-3' mem-cg-2 mem-cg-3' blkio-cg-2 blkio-cg-3
' '
In cgroup V1, different subsystem has different hierarchy, and
- every cgroup can only have one subsystem
- every task can be attached to different hierarchy
Due to this, css_set is created to bind these tasks and css's together.
When cgroup V2 is introduced, to be compatible with V1, css_set is still needed.
The answer is NO !
In cgroup V2, a cgroup would share its parent's css if the subsystem is not enabled.
root (cpu, pid, io, mem)
/ \
cg0 (cpu, pid) cg1 (io, mem)
/ \
/ \
t0 t1 cg2 (io)
/\
t2 t3
t0&t1 -> cpu: cg.root pid: cg.root, io: cg1, mem: cg1
t2&t3 -> cpu: cg.root pid: cg.root, io: cg2, mem: cg1
The answer is NO !
In cgroup V1, there could be multiple hierarchy, and then
a cgroup can contains tasks from different css_set.
Look at following example,
cpu-root mem-root blkio-root
/ \ / \ / \
cg0 cg1 cg2 cg3 cg4 cg5
(t0, t1) (t0) (t1) (t0) (t1)
t0 is in css-set (cpu-cg1, mem-cg2, blkio-cg4)
t1 is in css-set (cpu-cg1, mem-cg3, blkio-cg5)
In this case, cg1 are associated with two css-sets.
In fact, the case above is the 'flexibility' provided by
cgroup V1. However, do we really need this ? Most of time,
t0 and t1 are in same shape in different hierarchy as following
cpu-root mem-root blkio-root
/ \ / \ / \
cg0 cg1 cg2 cg3 cg4 cg5
(t0) (t1) (t0) (t1) (t0) (t1)
There is no such issue in cgroup V2 which has a uinified hierarchy.
To make clear the life cycle of components in cgroup, we must know the
reference relationship between them.
task
/
css_set ----- css
\ /
cgroup
(1) task.cgroups points css_set
(2) task.cg_list is linked into css_set.tasks
This relationships between then are setup in
css_set_move_task()
---
cgroup_move_task(task, to_cset);
list_add_tail(&task->cg_list, use_mg_tasks ? &to_cset->mg_tasks :
&to_cset->tasks);
---
(1) css->cgroup
(2) cgroup->subsys[] -> css
css_create()
---
css = ss->css_alloc(parent_css);
...
init_and_link_css(css, ss, cgrp);
...
online_css()
---
(1) cgroup->cset_links <--> cgrp_cset_link <--> cgrp_links<-css_set
(2) cgroup->e_csets[] -> css_set->e_cset_node[]
(3) css_set->subsys[] -> css
Both of the relationships above are built in find_css_set()
find_css_set is to setup a new css_set based on old_cset with one cgroup updated.
The first step is to setup a css template based on the old cset and new cgroup.
find_css_set()
-> find_existing_css_set()
---
for_each_subsys(ss, i) {
if (root->subsys_mask & (1UL << i)) {
/*
* @ss is in this hierarchy, so we want the
^^^^^^^^^^^^^^^^^^^^^^^^
* effective css from @cgrp. If one css is not enabled in the cgroup,
* use it's ancestor's
*/
template[i] = cgroup_e_css_by_mask(cgrp, ss)
} else {
/*
* @ss is not in this hierarchy, so we don't want
* to change the css.
*/
template[i] = old_cset->subsys[i];
}
}
---
find_css_set()
---
memcpy(cset->subsys, template, sizeof(cset->subsys)) (3)
...
spin_lock_irq(&css_set_lock);
/* Add reference counts and links from the new css_set. */
list_for_each_entry(link, &old_cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
if (c->root == cgrp->root)
c = cgrp;
link_css_set(&tmp_links, cset, c); (1)
}
...
for_each_subsys(ss, ssid) {
struct cgroup_subsys_state *css = cset->subsys[ssid];
list_add_tail(&cset->e_cset_node[ssid],
&css->cgroup->e_csets[ssid]); (2)
css_get(css);
}
---
In summary
find_css_set()
---
refcount_set(&cset->refcount, 1);
---
When move a task into the cgroup,
cgroup_migrate_execute()
---
spin_lock_irq(&css_set_lock);
list_for_each_entry(cset, &tset->src_csets, mg_node) {
list_for_each_entry_safe(task, tmp_task, &cset->mg_tasks, cg_list) {
get_css_set(to_cset);
to_cset->nr_tasks++;
css_set_move_task(task, from_cset, to_cset, true);
from_cset->nr_tasks--;
put_css_set_locked(from_cset);
}
}
spin_unlock_irq(&css_set_lock);
---
When move task out of the cgroup,
See cgroup_migrate_execute()
When task exits,
release_task()
-> cgroup_release()
---
spin_lock_irq(&css_set_lock);
css_set_skip_task_iters(task_css_set(task), task);
list_del_init(&task->cg_list);
spin_unlock_irq(&css_set_lock);
---
__put_task_struct()
-> cgroup_free()
-> put_css_set()
When move task from one cgroup to another, or in the other word,
move task from one css_set to another, due to there could be multiple
tasks to be moved when echo pid to cgroup.proc, to guarantee 'all or nothing',
the whole process is split into 4 steps,
(1) prepare the source css_set, get_css_set is invoked there
(2) prepare the destination css_set with find_css_set,
get_css_set is invoked or initial css_set is created,
(3) do the real migration, get the reference of destination css_set
and put reference of source css_set
(4) finish the migration, put_css_set_locked all of the source and destination
css_set. See cgroup_migrate_finish()
cgroup's reference is based on cgroup.self which is a css
When a new cgroup is create
cgroup_create()
---
cgroup_get_live(parent);
---
When cgroup is linked with css
init_and_link_css()
-> cgroup_get_live(cgrp)
When cgroup is linked with css_set
link_css_set()
---
if (cgroup_parent(cgrp)) //non-root cgroup
cgroup_get_live(cgrp);
---
Base reference,
cgroup_rmdir()
-> cgroup_destroy_locked()
-> percpu_ref_kill(&cgrp->self.refcnt)
When css_set is freed,
put_css_set_locked()
---
list_for_each_entry_safe(link, tmp_link, &cset->cgrp_links, cgrp_link) {
list_del(&link->cset_link);
list_del(&link->cgrp_link);
if (cgroup_parent(link->cgrp))
cgroup_put(link->cgrp);
kfree(link);
}
---
When css is freed,
css_free_rwork_fn()
---
if (ss) {
/* css free path */
struct cgroup_subsys_state *parent = css->parent;
int id = css->id;
ss->css_free(css);
cgroup_idr_remove(&ss->css_idr, id);
cgroup_put(cgrp);
if (parent)
css_put(parent);
}
---
There are two reference count working for css
(1) css.refcnt a percpu_ref
(2) css.online_cnt
refcnt
find_css_set()
---
for_each_subsys(ss, ssid) {
struct cgroup_subsys_state *css = cset->subsys[ssid];
list_add_tail(&cset->e_cset_node[ssid],
&css->cgroup->e_csets[ssid]);
css_get(css);
}
---
online_cnt:
online_css()
---
atomic_inc(&css->online_cnt);
if (css->parent)
atomic_inc(&css->parent->online_cnt);
---
online_cnt:
cgroup_destroy_locked()
---
for_each_css(css, ssid, cgrp)
kill_css(css);
---
css_get(css);
percpu_ref_kill_and_confirm(&css->refcnt, css_killed_ref_fn);
---
---
This is the percpu_ref confirm callback not drained callback.
^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
confirm callback means all of the cpu has seen the DEAD flags on percpu_ref.
The drain callback of css is css_release()
css_killed_ref_fn()
---
if (atomic_dec_and_test(&css->online_cnt)) {
INIT_WORK(&css->destroy_work, css_killed_work_fn);
queue_work(cgroup_destroy_wq, &css->destroy_work);
}
---
refcnt
put_css_set_locked()
---
for_each_subsys(ss, ssid) {
list_del(&cset->e_cset_node[ssid]);
css_put(cset->subsys[ssid]);
}
---
css_killed_work_fn()
---
do {
offline_css(css);
css_put(css);
/* @css can't go away while we're holding cgroup_mutex */
css = css->parent;
} while (css && atomic_dec_and_test(&css->online_cnt));
---
.-------------------.
/ v
task -> css_set <-> cgroup <-> css
[1]
-> : hold reference
[1]: cgroup holds the css.online_css, and they are bound together.
Think of following cases,
(1) Task A in cg0 issues IO
At the moment, the task hold reference of css_set, and css_set holds
reference of cgroup, cgroup hold reference to css, so the blkcg(css)
will not be gone during this.
(2) cgroup writeback control
At the moment, the css info is carried by writeback kworker, so it
need to hold reference to the css to keep it alive.
refer to cgwb_create() which hold reference of memcg_css and blkcg_css
Do we need css_get during cgroup file read/write method ?
Look at the comment in cgroup_file_write()
/*
* kernfs guarantees that a file isn't deleted with operations in
* flight, which means that the matching css is and stays alive and
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* doesn't need to be pinned. The RCU locking is not necessary
^^^^^^^^^^^^^^^^^^^^^^^^^
* either. It's just for the convenience of using cgroup_css().
*/
How does cgroup-core implement it ?
Look at the code of cgroup_destroy_locked()
---
/* initiate massacre of all css's */
for_each_css(css, ssid, cgrp)
kill_css(css);
/* clear and remove @cgrp dir, @cgrp has an extra ref on its kn */
css_clear_dir(&cgrp->self);
kernfs_remove(cgrp->kn);
...
percpu_ref_kill(&cgrp->self.refcnt);
---
The css's percpu ref and online_cnt will be put after kill_css, and
kernfs_remove() is invoked after that. How to guarantee the css isn't freed
during cgroup file read/write method where css is referenced ?
The magic is cgroup_mutex
cgroup_rmdir()
---
cgrp = cgroup_kn_lock_live(kn, false);
---
if (drain_offline)
cgroup_lock_and_drain_offline(cgrp);
else
mutex_lock(&cgroup_mutex);
---
ret = cgroup_destroy_locked(cgrp);
cgroup_kn_unlock(kn);
---
The cgroup_destroy_locked() is invoked under cgroup_mutex().
Then let's look at the
css_killed_work_fn()
---
mutex_lock(&cgroup_mutex);
do {
offline_css(css);
css_put(css);
/* @css can't go away while we're holding cgroup_mutex */
css = css->parent;
} while (css && atomic_dec_and_test(&css->online_cnt));
mutex_unlock(&cgroup_mutex);
---
The cgroup_mutex guarantees the css won't be offlined before
cgroup_destroy_locked() returns.
In this section, we would like to know how does cgroup interact with users in userland through kernfs.
Let's start from the root of cgroup fs which is initialized in cgroup_init
cgroup_init()
---
BUG_ON(cgroup_init_cftypes(NULL, cgroup_base_files));
BUG_ON(cgroup_init_cftypes(NULL, cgroup1_base_files));
...
BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
---
These are the key of the cgroup kernfs, files and hierarchy
cgroup_base_files is an array of cftype which defines the control file in cgroup directorys,
such as 'cgroup.type', 'cgroup.procs'. cgroup_init_cftypes mainly asigns a kernfs_ops for every cftype.
static struct kernfs_ops cgroup_kf_ops = {
.atomic_write_len = PAGE_SIZE,
.open = cgroup_file_open,
.release = cgroup_file_release,
.write = cgroup_file_write,
.poll = cgroup_file_poll,
.seq_start = cgroup_seqfile_start,
.seq_next = cgroup_seqfile_next,
.seq_stop = cgroup_seqfile_stop,
.seq_show = cgroup_seqfile_show,
};
cgroup_kf_ops is a wrapper which translates the semantics from kernfs to cgroup.
cgroup_file_write()
---
struct cgroup *cgrp = of->kn->parent->priv;
struct cftype *cft = of->kn->priv;
//parent, namely the directory, stands for the cgroup
//file is the attibutes in the cgroup directory
...
if (cft->write)
return cft->write(of, buf, nbytes, off);
rcu_read_lock();
css = cgroup_css(cgrp, cft->ss);
rcu_read_unlock();
if (cft->write_u64) {
unsigned long long v;
ret = kstrtoull(buf, 0, &v);
if (!ret)
ret = cft->write_u64(css, cft, v);
} else if (cft->write_s64) {
long long v;
ret = kstrtoll(buf, 0, &v);
if (!ret)
ret = cft->write_s64(css, cft, v);
}
---
The kernfs would provide two guarantees regarding to a file
For the read,
kernfs_fop_open()
-> seq_open(file, &kernfs_seq_ops)
kernfs_fop_read()
-> seq_read()
-> kernfs_seq_start()
-> mutex_lock(&of->mutex)
-> kernfs_seq_stop()
-> mutex_unlock(&of->mutex)
kernfs_fop_write()
---
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn)) {
mutex_unlock(&of->mutex);
len = -ENODEV;
goto out_free;
}
ops = kernfs_ops(of->kn);
if (ops->write)
len = ops->write(of, buf, len, *ppos);
else
len = -EINVAL;
kernfs_put_active(of->kn);
mutex_unlock(&of->mutex);
---
kernfs_get_active()
---
if (!atomic_inc_unless_negative(&kn->active))
return NULL;
---
void kernfs_put_active(struct kernfs_node *kn)
{
int v;
if (unlikely(!kn))
return;
if (kernfs_lockdep(kn))
rwsem_release(&kn->dep_map, _RET_IP_);
v = atomic_dec_return(&kn->active);
if (likely(v != KN_DEACTIVATED_BIAS))
return;
wake_up_all(&kernfs_root(kn)->deactivate_waitq);
}
__kernfs_remove()
---
/*
* Kill all of its children
*/
pos = NULL;
while ((pos = kernfs_next_descendant_post(pos, kn)))
if (kernfs_active(pos))
atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
/* deactivate and unlink the subtree node-by-node */
do {
pos = kernfs_leftmost_descendant(kn);
kernfs_get(pos);
if (kn->flags & KERNFS_ACTIVATED)
kernfs_drain(pos);
---
/* but everyone should wait for draining */
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
---
else
WARN_ON_ONCE(atomic_read(&kn->active) != KN_DEACTIVATED_BIAS);
...
kernfs_put(pos);
} while (pos != kn);
---
Currently, there are two different kinds of cgroup hierarchy in kernel,
cpu-cg-root ' mem-cg-root ' blkio-cg-root
/ \ ' / \ ' / \
cpu-cg-0 cpu-cg-1 ' mem-cg-0 mem-cg-1 ' blkio-cg-0 blkio-cg-1
/ \ ' / \ ' / \
cpu-cg-2 cpu-cg-3' mem-cg-2 mem-cg-3' blkio-cg-2 blkio-cg-3
' '
cg-dfl-root (cpu, mem, blkio)
/ \
cg-0(cpu, mem) cg-1 (cpu, mem, blkio)
/ \
cg-2 (blkio) cg-3(blkio)
To understand a filesystem, we must to know how to mkfs, mount, and use it.
cgroup_init()
-> cgroup_setup_root(&cgrp_dfl_root)
---
kf_sops = root == &cgrp_dfl_root ?
&cgroup_kf_syscall_ops : &cgroup1_kf_syscall_ops;
root->kf_root = kernfs_create_root(kf_sops,
KERNFS_ROOT_CREATE_DEACTIVATED |
KERNFS_ROOT_SUPPORT_EXPORTOP |
KERNFS_ROOT_SUPPORT_USER_XATTR,
root_cgrp);
root_cgrp->kn = root->kf_root->kn;
...
/*
Add the cgroup_base_files into the kernfs_root
Refer to __kernfs_create_file, kernfs_new_node create a new kernfs_node,
and kernfs_add_one will add this new node into its parent's children
rb_tree which indexed by name hash.
*/
ret = css_populate_dir(&root_cgrp->self);
kernfs_root is likely to the superblock of the normal disk filesystem.
There are two important fields in it,
- kernfs_node, it could be deemed as the root inode on disk
- kernfs_syscall_ops, vfs op -> kernfs op -> syscall op
^^^^^^^^^^
kernfs_fill_super
---
// This could be deemed as loading root inode
mutex_lock(&kernfs_mutex);
inode = kernfs_get_inode(sb, info->root->kn);
mutex_unlock(&kernfs_mutex);
...
/* instantiate and link root dentry */
root = d_make_root(inode);
...
sb->s_root = root;
---
Let's look at the inode operations of kernfs
const struct inode_operations kernfs_dir_iops = {
.lookup = kernfs_iop_lookup,
.permission = kernfs_iop_permission,
.setattr = kernfs_iop_setattr,
.getattr = kernfs_iop_getattr,
.listxattr = kernfs_iop_listxattr,
.mkdir = kernfs_iop_mkdir,
.rmdir = kernfs_iop_rmdir,
.rename = kernfs_iop_rename,
};
There is no .create and .unlink, this means that you cannot create/remove
regular files.
Look into the kernfs_iop_mkdir
kernfs_iop_mkdir()
---
struct kernfs_node *parent = dir->i_private;
struct kernfs_syscall_ops *scops = kernfs_root(parent)->syscall_ops;
int ret;
if (!scops || !scops->mkdir)
return -EPERM;
if (!kernfs_get_active(parent))
return -ENODEV;
ret = scops->mkdir(parent, dentry->d_name.name, mode);
kernfs_put_active(parent);
return ret;
---
kernfs_syscall_ops is provided by the backend of kernfs.
static struct kernfs_syscall_ops cgroup_kf_syscall_ops = {
.show_options = cgroup_show_options,
.mkdir = cgroup_mkdir,
.rmdir = cgroup_rmdir,
.show_path = cgroup_show_path,
};
cgroup_mkdir()
---
cgrp = cgroup_create(parent, name, mode);
if (IS_ERR(cgrp)) {
ret = PTR_ERR(cgrp);
goto out_unlock;
}
/*
* This extra ref will be put in cgroup_free_fn() and guarantees
* that @cgrp->kn is always accessible.
*/
kernfs_get(cgrp->kn);
...
ret = css_populate_dir(&cgrp->self);
...
/* let's create and online css's */
kernfs_activate(cgrp->kn);
---
As expected, it creates a cgroup
Let's look at the difference between V1 and V2
V2:
cgroup_init()
-> cgroup_setup_root(&cgrp_dfl_root, 0);
---
kf_sops = root == &cgrp_dfl_root ?
&cgroup_kf_syscall_ops : &cgroup1_kf_syscall_ops;
...
ret = rebind_subsystems(root, ss_mask);
---
V1
cgroup1_get_tree()
-> cgroup1_root_to_use()
---
root = kzalloc(sizeof(*root), GFP_KERNEL);
ctx->root = root;
init_cgroup_root(ctx);
ret = cgroup_setup_root(root, ctx->subsys_mask);
---
The difference:
(1) root of cgroup1 is dynamically allocated
(2) all of the subsystem are bound to default root, namely V2, by default,
when the mount cgroup1, the required subsystems are rebound to new root.
cgroup_init_subsys
---
ss->root = &cgrp_dfl_root;
---
When root of cgroup1 is created, the corresponding subsystem to hierarchy
is rebound to that root.
(3) V1 and V2 have different syscall ops,
Look into cgroup_kf_syscall_ops and cgroup1_kf_syscall_ops, the only
difference is cgroup1 has .rename callback.
The biggest difference is, during mount cgroup V1, a new root is allocated and subsystem is rebound.
cgroup_mkdir()
-> cgroup_create()
---
/*
* On the default hierarchy, a child doesn't automatically inherit
* subtree_control from the parent. Each is configured manually.
*/
if (!cgroup_on_dfl(cgrp))
cgrp->subtree_control = cgroup_control(cgrp);
---
There are 3 components in cgroup, tasks, subsystem and cgroup
root
/ \
cg0 cg1 (cpu, io, mem)
|
+----------+
|-css_cpu |- task0
|-css_io |- task1
|_css_mem |- task2
|- ...
css : cgroup_subsys_state, per-subsystem/per-cgroup state
css_set : a structure holding pointers to a set of css
blkcg_css()
---
return task_css(current, io_cgrp_id);
-> task->cgroups->subsys[]
---
Why we need css_set ?
root (cpu, pid, io, mem)
/ \
cg0 (cpu, pid) cg1 (io, mem)
/ \
/ \
t0 t1 cg2 (io)
/\
t2 t3
How many css in the diagram above ?
- root-cpu, root-pid, root-io, root-mem
- cg0-cpu, cg0-pid
- cg1-io, cg1-mem
- cg2-io
t0 and t1 are associated with css (root-cpu, root-pid, cg1-io, cg1-mem)
t2 and t3 are associated with css (root-cpu, root-pid, cg1-mem, cg2-io)
Because the cgroup2 and cgroup1 both exist in kernel, a task could be both
of them.
net-root
/ \
ncg0 ncg1
/\
t3 t4
Therefore the t3 has a css set:
(root-cpu, root-pid, cg1-mem, cg2-io, ncg1-net)
This is the css_set, and could be shared by all of the tasks that have the
same cgroup assignment.
Which cgroup css set (root-cpu, root-pid, cg1-mem, cg2-io, ncg1-net) belongs to ?
The answer is cg2 and ncg1 even though this css set shares the css of
root cgroup.
(css_set and cgroup will be linked together in link_css_set())
Look into find_existing_css_set()
---
for_each_subsys(ss, i) {
if (root->subsys_mask & (1UL << i)) { //>
/*
* @ss is in this hierarchy, so we want the
* effective css from @cgrp.
*/
template[i] = cgroup_e_css_by_mask(cgrp, ss);
} else {
/*
* @ss is not in this hierarchy, so we don't want
* to change the css.
*/
template[i] = old_cset->subsys[i];
}
}
key = css_set_hash(template);
hash_for_each_possible(css_set_table, cset, hlist, key) {
if (!compare_css_sets(cset, old_cset, cgrp, template))
continue;
/* This css_set matches what we need */
return cset;
}
/* No existing cgroup group matched */
return NULL;
---
In summary, the relationship between cgroup, css, css-set and task is as following,
cgrp_dfl_root
cg-root (cpu, io, mem) cg-root-pid (pid)
/ \ / \
t0 t1 t2 cg0 (io, mem) t0 t1 cg1
/ \ / \
t3 t4 t2 t3 t4
t3 and t4 belongs to a css-set [cg-root:(cpu), cg0:(io, mem), cg1:(pid)], called css-set-A
\_____ _____/
v
this is a css
t2 belongs to a css-set [cg-root:(cpu, io, mem), cg1:(pid)], called css-set-B
And we could get following diagram,
css-set-A .--- cg-root
\ /
\/
css-set-B-./\----- cg0
\ \
\ \
'-'-- cg1
Note : (1) css-set-A use css cg-root:(cpu) but it belongs to cg0
(2) multiple to multiple relationship between css-set and cgroup
is due to both cgroup v2 and v2 exists in kernel.
(2) cg-root is actually the cgrp_dfl_root, as well as the root of
cgroup v2
What need to be done to attach a task to a cgroup ?