# Counting directed acyclic and elementary digraphs

###### Abstract

Directed acyclic graphs (DAGs) can be characterised as directed graphs whose strongly connected components are isolated vertices. Using this restriction on the strong components, we discover that when , where is the number of directed edges, is the number of vertices, and , the asymptotic probability that a random digraph is acyclic is an explicit function , such that and . When , the asymptotic behaviour changes, and the probability that a digraph is acyclic becomes , where is an explicit function of . Łuczak and Seierstad (2009, Random Structures & Algorithms, 35(3), 271–293) showed that, as , the strongly connected components of a random digraph with vertices and directed edges are, with high probability, only isolated vertices and cycles. We call such digraphs elementary digraphs. We express the probability that a random digraph is elementary as a function of . Those results are obtained using techniques from analytic combinatorics, developed in particular to study random graphs.

## 1 Introduction

Directed Acyclic Graphs (DAGs) appear naturally in the study of compacted trees, automaton recognizing finite languages, and partial orders. Until now, the asymptotic number of DAGs has been known only in the dense case, i.e. for DAGs with vertices and edges. In this paper, we give a solution to the sparse case with , which curiously involves a phase transition in the region corresponding to the phase transition of directed graphs discovered in [luczak2009critical].

#### Exact and asymptotic enumeration.

In 1973, Robinson [robinson1973counting] obtained his beautiful formula for the number of labeled DAGs with vertices and edges

and developed a framework for the enumeration of digraphs whose strong components belong to a given family of allowed strongly connected digraphs. This allowed to express the asymptotics of dense DAGs in [bender1986asymptotic]. The structure of random DAGs has been studied in [liskovets1976number, mckay1989shape, gessel1996counting].

We say that a digraph is *elementary* if all its strong
components are either isolated vertices or cycles.
In [luczak1990phase] and
[luczak2009critical] it was shown that
if the ratio between the numbers of edges
and vertices is less than one, then a digraph is elementary asymptotically almost surely.
More precisely, this happens when a digraph has vertices and edges, as with .
Other interesting results on the structure of random -digraphs around
the point of phase transition are available
in [pittel2017birth, goldschmidt2019scaling]. More precisely, the authors of [goldschmidt2019scaling] show that the
strong components are asymptotically almost surely cubic, i.e. the sum
of the degrees of each of its nodes is at most three with high probability.
This means that these cores play an analogous role as the classical cores in
a random graphs, see [janson1993birth].

A forthcoming independent approach of [sparserandomacyclicdigraphs] in
the analysis of asymptotics of DAGs (manuscript to appear), is similar in
spirit to the tools used in [flajolet2004airy] and relies on a bivariate singularity analysis of the generating function
of DAGs.
Their technique promises to unveil sparse DAGs asymptotics,
covering as well the case
where the ratio of the numbers of edges and vertices is bounded,
but greater than (the *supercritical case*).

#### Our contribution.

Typically, the analysis of graphs is technically easier when loops
and multiple edges are allowed, [janson1993birth].
Essentially, an adaptation of the symbolic techniques to the case of
simple graphs becomes rather a technical, but not a conceptual
difficulty. A systematic way to account for special cases arising for
simple graphs is given in [panafieu2016analytic] and
[collet2018threshold], see the concept of patchworks.
The same principle concerns directed graphs. Nevertheless, in the
current paper we consider the case of *simple digraphs* where loops
and multiple edges are forbidden. In our model, however, the cycles of
size 2 are allowed, because it is natural to suppose that for each two
vertices and both directions are allowed. The analysis
of simple digraphs is technically heavier than the analysis of
multidigraphs, but we prefer to demonstrate explicitly that such an
application is indeed possible.

Firstly, we transform the generating function of DAGs so that it can
be decomposed into an infinite sum. Each of its summands is analysed
using a new bivariate semi-large powers lemma which is a generalisation
of [banderier2001random]. We discover (in the above notations) that
the first term of this infinite expansion is dominating in the
*subcritical* case, i.e. when ; in the case
when is bounded (the *critical* case), all the terms give
contributions of the same order.
Next, using the symbolic tools for directed graphs
from [de2019symbolic], we express the generating function of
elementary digraphs and apply similar tools to obtain explicitly the
phase transition curve in digraphs, that is, the probability that a
digraph is elementary, as a function of .

#### Related studies.

Analytic techniques, largely covered in [flajolet2009analytic], are efficient for asymptotic analysis, because the coefficient extraction operation is naturally expressed through Cauchy formula. A recent study [greenwood2018asymptotics] is dealing with bivariate algebraic functions. In their case, a combination of two Hankel contours, necessary for careful analysis, can have a complicated mutual configuration in two-dimensional complex space, so a lot of details needs to be accounted for. Our approach is close to theirs, while we try to avoid the mentioned difficulty in our study. The principle idea behind our bivariate semi-large powers lemma is splitting of a double complex integral into a product of two univariate ones.

#### Structure of the paper.

## 2 Exact expressions using generating functions

Consider the following model of graphs and directed graphs. A graph is characterized by its set of labeled vertices and its set of unoriented unlabeled edges. Loops and multiple edges are forbidden. The numbers of its vertices and edges are denoted by and . An -graph is a graph with vertices and edges.

We consider digraph without loops, such that from any vertex to any vertex there can be at most one directed edge. Therefore, two edges can link the same pair of vertices only if their orientations are different.

### 2.1 Exponential and graphic generating functions

Two helpful tools in the study of graphs and directed graphs are the
exponential and graphic generating functions.
The *exponential generating function* (EGF)
and the *graphic generating function* (GGF)
associated to a graph or digraph family are defined as

The total numbers of -graphs and -digraphs are and . The classical counting expression for directed acyclic graphs is attributed to Robinson [robinson1973counting]. The EGF of all graphs and GGF of directed acyclic graphs are given by

(1) |

We can reuse the EGF of graphs (1) to obtain an alternative expression for the number of -DAGs :

(2) |

Before considering various digraph families, we need to recall the
classical generating functions of simple graph families, namely the rooted
and unrooted labeled trees and unicycles.
A *unicycle* is a connected graph that has the same numbers of vertices
and edges. Hence, it contains exactly one cycle.

###### Proposition 1 ([janson1993birth]).

The EGFs of rooted trees, of trees and of unicycles are characterized by the relations

The *excess* of a graph (not necessarily connected)
is defined as the difference between
its numbers of edges and vertices. For example, trees have excess , while unicycles have excess .
The bivariate EGFs of graphs of excess can be
obtained from their univariate EGFs by substituting
and multiplying by . In particular, , , .

We say that a graph is *complex* if
all its connected components have a positive excess.
The EGF of complex graphs of excess is

It is known (see [janson1993birth]) that a
complex graph of excess
is reducible to a *kernel* (multigraph of minimal degree at least )
of same excess,
by recursively removing vertices of degree and
and fusioning edges sharing a degree vertex.
The total weight of *cubic* kernels (all degrees equal to ) of excess
is given by (3).
They are central in the study of large critical graphs,
because non-cubic kernels do not typically occur.

###### Proposition 2 ([janson1993birth, Section 6]).

For each there exists a polynomial such that

(3) |

Clearly, any graph can be represented as a set of unrooted trees, unicycles and a complex component of excess . Therefore, the EGF of graphs is equal to

(4) |

### 2.2 Exact expression for directed acyclic graphs

In order to obtain the asymptotic number of DAGs, we need a decomposition different from (1). For comparison, in the expression (4) the first summand is asymptotically dominating in the case of subcritical graphs. Inside the critical window, all the summands of (4) give a contribution of the same asymptotic order.

###### Lemma 3.

The number of -DAGs is equal to

###### Proof.

###### Remark 4.

The number of pairs of graphs, each on vertices, having a total of edges, is Working as in the previous proof leads to

which looks and behaves (when stays smaller than or close to ) like the expression for from the last lemma. This motivates the following intuition. Typically, those two graphs should share the edges more or less equally. Thus, when is close to , and should be close to , so and will exhibit critical graph structure. For a smaller ratio , and will behave like subcritical graphs, containing only trees and unicycles. This heuristic explanation for the critical density for dags guides our analysis in the rest of the paper.

### 2.3 Exact expression for elementary digraphs

As we discovered in our previous paper [de2019symbolic], and which was also pointed in a different form in [robinson1973counting], the graphic generating function of the family of digraphs whose connected components belong to a given set with the EGF , is given by

(5) |

and is the exponential Hadamard product, characterized by

In order to expand the Hadamard product, we develop the exponent and apply the simplification rule . After developing the exponent and expanding the Hadamard product we obtain a very simple expression, namely

(6) |

The following lemma is a heavier version of this expression. One of the reasons behind its visual complexity is the choice of the simple digraphs instead of multidigraphs; however, during the asymptotic analysis, most of the decorations corresponding to simple digraphs are going to disappear.

###### Lemma 5.

The number of elementary digraphs is equal to

where and

###### Proof.

Let us denote . Using the already mentioned representation

and by replacing with the generating function of graphs with vertices as in the proof of lemma 3, we can write the denominator of (6) prior to substitution as

Next, the change of variables

The proof is finished by extracting the coefficient . ∎

## 3 Asymptotic analysis

### 3.1 Bivariate semi-large powers lemma

The typical structure of critical random graphs
can be obtained by application of the *semi-large powers Theorem*
[flajolet2009analytic, Theorem IX.16, Case (ii)].
Since DAGs behave like a superposition of two graphs
(see remark 4),
we design a bivariate variant of this theorem.

###### Lemma 6.

Consider two integers and going to infinity, such that with either staying in a bounded real interval, or while ; let the function be analytic on the open torus of radii and continuous on its closure, and let and be two real values, then the following asymptotics holds as

(7) |

where the function is defined as

###### Remark 7.

###### Proof of lemma 6.

The first step is to represent the coefficient extraction operation from (7) as a double complex integral, using Cauchy formula, and to approximate this double integral with a product of two complex integrals. We start with the Puiseux expansion of the EGF of rooted labeled trees and unrooted labeled trees :

(8) | ||||

(9) |

Applying Cauchy’s integral theorem, we rewrite the coefficient extraction (7) in the form

A further step is to inject , , and , where . By using expansion (8) in order to approximate the terms and , we rewrite the answer in the form

After removal of the negligible terms, a product of integrals is obtained

Each of the integrals can be evaluated similarly as in [flajolet2009analytic, Theorem IX.16, Case (ii)]: in order to evaluate such integral, a variable change is applied, and the integral is expressed as an infinite sum using a Hankel contour formula for the Gamma function:

∎

### 3.2 Asymptotic analysis of directed acyclic graphs

Since we are going to apply lemma 6 to each of the terms of the infinite sum of lemma 3, it is useful to introduce the following notation

where is given by proposition 2. This notation will be used throughout the next two sections.

###### Theorem 8.

When and either stays in a bounded real interval, or while as ,

In particular, for the sparse case ,

###### Proof.

In order to apply lemma 6 (bivariate semi-large powers), we develop the coefficient operator in lemma 3 using the approximation of from proposition 2 and drop the terms that give negligible contribution:

Then we apply lemma 6 and the approximation to obtain

The power of in the sum is , and the sum over of is equal to and converges to . Finally, the sums over and are decoupled and we obtain

The sum over admits a close expression (see remark 7 and [janson1993birth, Section 14]). Applying Stirling’s formula, we can rescale the asymptotic number of DAGs by the total number of digraphs:

This gives the main statement. To obtain the sparse case, we need to use the fact that when , the first summand of the sum over is dominating, and therefore, this sum is asymptotically equivalent to janson1993birth, Equation (10.3)]). ∎ (see [

### 3.3 Asymptotic analysis of elementary digraphs

###### Theorem 9.

When and either stays in a bounded real interval, or while as ,

where the coefficients are given by

In particular, when , ,

###### Proof.

The key ingredient is the exact expression from lemma 5. As in the proof of theorem 8, we can drop the terms that give negligible contributions and develop the coefficient operator accordingly. The key difference between the proofs is the form of the denominator: after taking out a common multiple (ignoring higher powers in variable ), the denominator can be again regarded as a formal power series in . In order to obtain the asymptotics, the transformed expression should be developed, then lemma 6 (bivariate semi-large powers) is applied, and finally the sums are decoupled. For the sum corresponding to variable , we apply again the hypergeometric summation formula from [janson1993birth]. In order to settle the subcritical case , we apply the asymptotic approximation of from remark 7. ∎

###### Remark 10.

Curiously enough, the coefficient in the subcritical probability can be given the same interpretation as a similar coefficient arising in the probability that a random graph does not contain a complex component: namely the compensation factor of the simplest cubic forbidden multigraph.

#### Acknowledgements.

We are grateful to Olivier Bodini, Naina Ralaivaosaona, Vonjy Rasendrahasina, Vlady Ravelomanana, and Stephan Wagner for fruitful discussions.