Article
Open access
Published: 03 July 2025

Statistically validated projection of bipartite signed networks

npj Complexity volume 2, Article number: 22 (2025) Cite this article

941 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Bipartite networks provide a major insight into the organisation of many real-world systems. One of the most relevant issues encountered when modelling a bipartite network is that of facing the information shortage concerning intra-layer linkages. In the present contribution, we propose an unsupervised algorithm to obtain statistically validated projections of bipartite signed networks, according to which any two nodes sharing a statistically significant number of concordant (discordant) relationships are connected by a positive (negative) edge. Our algorithm outputs a matrix of link-specific p values, from which a validated projection can be obtained upon running a multiple-hypothesis testing procedure. After testing our method on synthetic configurations output by a fully controllable generative model, we apply it to several real-world configurations: in all cases, non-trivial mesoscopic structures, induced by relationships that cannot be traced back to the constraints defining the employed benchmarks, hence revealing genuine traces of self-organisation, are detected.

Meta-validation of bipartite network projections

Article Open access 05 April 2022

Inference of monopartite networks from bipartite systems with different link types

Article Open access 19 January 2023

Unveiling the social fabric through a temporal, nation-scale social network and its characteristics

Article Open access 26 May 2025

Introduction

Network theory has emerged as a powerful framework to model different kinds of real-world systems, by representing their units as nodes and the interactions between them as links. Out of the many types of edges that have been considered so far, the signed one, offering the possibility of modelling positive as well as negative interactions, has recently seen its popularity revived^1,2,3,4.

Most works on signed configurations have focused on monopartite graphs, i.e., configurations with a unique set of nodes each of which can interact with any other one. The interest towards one-mode signed networks is ascribable to the so-called balance theory (BT), introduced by Heider in⁵ and later formalised by Cartwright and Harary employing signed graphs⁶.

BT states that a signed graph is strongly balanced (SB) if all cycles contain an even number of negative edges: from a mesoscopic perspective, this implies that the network can be split in two groups with positive intra-modular and negative inter-modular links; in⁷, Davis spoke of weakly balanced (WB) graphs, allowing for more than two groups: taken together, the SB and WB concepts define what is called Traditional Balance Theory (TBT)⁸—a framework that has found applications in contexts as diverse as the biological, ecological, economic and social ones^{8,9,10,11,12,13,14,15,16,17}.

Since, however, real-world networks often deviate from it, Doreian and Mrvar have proposed a generalisation named Relaxed Balance Theory (RBT)^8,18 which allows for positive inter-modular and negative intra-modular links as well.

The concept of balance in the bipartite context

The interest towards two-mode signed networks, i.e., configurations with two sets of vertices where (signed) connections can be established only between pairs of nodes belonging to different sets, is, instead, much more recent. One of the earliest attempts at adapting the notion of structural balance to the bipartite framework has been carried out in ref. ¹⁹, where the difference between the roles played by ‘subjects’ and ‘objects’ in Heider’s formulation of the BT was highlighted. In refs. ^20,21, the authors focused on the balance of bipartite cycles by considering the statistical significance of the shortest ones, known as ‘butterfly motifs’, ‘X-motifs’ or ‘2 × 2 bicliques’²². In²³, the authors carried out three different analyses: first, they tested the degree of balance of real-world bipartite signed networks by employing the definition of ‘butterfly motifs’ provided in ref. ²¹. Second, they inferred the sign of the missing intra-layer edges by connecting any two nodes belonging to the same set with a ‘plus one’ (‘minus one’) if they established either two positive or two negative links (one positive and one negative link) with the same set of vertices on the opposite layer. Finally, they tested the degree of balance of each layer by comparing the empirical percentages of triangles and butterflies with the ones expected under a null model, shuffling the links of the original bipartite network— i.e., implementing the microcanonical version of the Free-topology Signed Random Graph Model introduced in ref. ¹³: as a result, the patterns turned out to be more balanced than expected. In²⁴, bipartite signed networks were employed to model a group of individuals’ opinions about certain topics: the authors addressed the problem of partitioning both types of entities into two groups while maximising the number of positive intra-modular links and minimising the number of negative inter-modular links.

Projecting bipartite unsigned networks

One of the issues of major interest encountered when modelling bipartite networks is that of inferring the presence of a relationship between nodes belonging to the same layer in case a direct measurement of such a relationship is unfeasible (as for data gathering about friendships on social platforms²⁵). The simplest way of solving the problem is linking any two nodes belonging to the same layer as long as they share at least one neighbour: however, this often results in a very dense network whose topological structure is almost trivial. A different recipe prescribes retaining the information on the number of common neighbours, i.e., to project a bipartite network into a monopartite weighted network²⁵: this prescription, however, causes the nodes with larger degree to have larger strengths, thus masking the genuine, statistical relevance of the induced connections. Moreover, such a prescription lets spurious clusters of nodes emerge (e.g., cliques induced by the presence of—even—a single node connected to all the other vertices on the opposite layer).

Algorithms for retaining only the significant weights have been proposed to face this problem²⁵. Many of them are based on a thresholding procedure, a major drawback of which lies in the arbitrariness of the chosen threshold^26,27,28. A different approach prescribes calculating the statistical significance of the projected weights according to a properly defined null model²⁹: the latter, however, encodes relatively little information about the original structure, thus being more suited to analyse natively monopartite networks. A similar-in-spirit approach identifies the backbone of a monopartite weighted projection with its Minimum Spanning Tree and its communities with the trees constituting the Minimum Spanning Forest^30,31: the lack of a comparison with a benchmark, however, makes assessing the statistical relevance of the outcome difficult.

All the aforementioned approaches validate a projection a posteriori. A different class of methods focuses on recipes to obtain statistically validated projections by estimating the tendency of any two nodes belonging to the same layer to share a certain number of neighbours: all such approaches define a similarity measure that either ranges between 0 and 1^32,33 or follows a probability distribution allowing for a p value to be computed^{34,35,36,37,38}. While, in the first case, the application of an arbitrary threshold is still unavoidable, in the second case, prescriptions rooted in traditional statistics can be applied.

The approaches discussed so far lead to unsigned projections of bipartite unsigned networks. In refs. ^25,39, instead, the author carried out a two-sided test of hypothesis to decide whether any two members of the 108th U.S. Senate co-sponsored either enough bills for a political alliance (positive link) to be inferred or not enough bills for political antagonism (negative link) to be inferred. The employed null model was named Stochastic Degree Sequence Model, a principled derivation of which was provided in refs. ^22,36 where the benchmark was re-named Bipartite Configuration Model.

For a comprehensive review of methods to carry out pattern detection in bipartite networks, see ref. ⁴⁰.

Projecting bipartite signed networks

The issue of projecting bipartite signed networks has been addressed to a much less extent. A first example is provided by⁴¹, where the authors projected a user-item network onto the layer of users: the similarity of any two of them was quantified by calculating the scalar product of the corresponding rows of the biadjacency matrix⁴² and its statistical significance evaluated by comparing the empirical value with the one expected under a null model defined by randomly swapping two edges having the same sign. Under such a benchmark—which is the microcanonical version of the Free-topology Signed Configuration Model introduced ref. ¹³—the authors found that both the MovieLens and the Netflix projections were highly balanced—the degree of balance being proxied by the number of triangles having an even number of negative edges. A second example is provided by ref. ⁴³, where the authors obtained signed projections by carrying out a statistical validation of the number of neighbours shared by any two nodes belonging to the same layer via a hypergeometric-binomial mixture distribution.

Hereby, we build upon previous contributions by extending the Exponential Random Graphs framework to include null models suitable for analysing binary undirected bipartite signed networks and employ them to obtain statistically validated signed projections. To address such a problem, we extend the algorithm proposed in³⁶, based upon the idea that any two nodes sharing a significantly large number of neighbours should be linked in the corresponding monopartite projection. More precisely, we propose two variants of it, according to the way ambivalent patterns (i.e. patterns constituted by two nodes and an item, liked by one node and disliked by the other) and missing ties (characterising patterns constituted by two nodes and an item, with just one node expressing an opinion about it) are treated.

The rest of the paper is organised as follows. First, we introduce a quantity to measure the similarity of any two nodes belonging to the same layer. Second, we derive its probability distribution according to each considered benchmark. Third, we consider each pair of nodes and quantify the statistical significance of their similarity. Fourth, we link only the ones surviving a multiple-hypothesis testing procedure. Fifth, we employ our methods to obtain signed projections of several different datasets. Finally, we comment on our results.

Results

With the present contribution, we propose an unsupervised algorithm to obtain statistically validated projections of binary undirected bipartite signed networks, according to which any two nodes sharing a statistically significant number of concordant (discordant) relationships are connected by a positive (negative) edge. Before applying it to a number of real-world configurations, let us, first, illustrate how it works on a toy model.

Projection of synthetic configurations

In order to illustrate how our algorithm for projecting bipartite signed networks works, let us construct a generative model as follows. Let us consider the Bipartite Signed Stochastic Block Model (BiSSBM), induced by the finite scheme

$${b}_{i\alpha } \sim \left(\begin{array}{lll}-1&0&+1\\ {p}_{{g}_{i}{g}_{\alpha }}^{-}&{p}_{{g}_{i}{g}_{\alpha }}^{0}&{p}_{{g}_{i}{g}_{\alpha }}^{+}\end{array}\right),\quad i\in {g}_{i},\,\alpha \in {g}_{\alpha }$$

(1)

with i = 1…N and α = 1…M. Figure 1 provides a graphical representation of three configurations generated via the BiSSBM by considering parameters reading (left panels) N = 80, M = 100,

$${p}^{+}=\left(\begin{array}{ll}0.5&0.08\\ 0.08&0\end{array}\right)\quad \,{\text{and}}\,\quad {p}^{-}=\left(\begin{array}{ll}0&0.08\\ 0.08&0.5\end{array}\right);$$

(2)

(middle panels) N = 100, M = 120,

$${p}^{+}=\left(\begin{array}{ll}0.5&0.4\\ 0.4&0\end{array}\right)\quad \,{\text{and}}\,\quad {p}^{-}=\left(\begin{array}{ll}0&0.4\\ 0.4&0.5\end{array}\right);$$

(3)

(right panels) N = 100, M = 120,

$${p}^{+}=\left(\begin{array}{lll}0.6&0.2&0.1\\ 0&0.6&0.1\\ 0.15&0.1&0\end{array}\right)\,\,{\text{and}}\,\,{p}^{-}=\left(\begin{array}{lll}0&0&0.1\\ 0.2&0&0.1\\ 0.1&0.2&0.6\end{array}\right).$$

(4)

**Fig. 1: Graphical representation of three synthetic configurations, generated by considering the values of the parameters reported in the main text.**

In the first case (left panels), the nodes belonging to g_i = 1 (g_i = 2) like the items belonging to g_α = 1 (g_α = 2) but establish few connections with the items belonging to g_α = 2 (g_α = 1); in the second case (middle panels), the nodes belonging to g_i = 1 (g_i = 2) like the items belonging to g_α = 1 (g_α = 2) but the density of inter-group connections is much larger; in the third case (right panels), the nodes belonging to the first (second) group like the items belonging to the first (second) group, the nodes belonging to the third group dislike the items belonging to the third group and the density of inter-group connections resembles the one characterising the first case. Irrespectively of the exact values of the parameters defining them, however, we expect the first two synthetic configurations to induce projections obeying the statistical variant of the traditional balance theory and the third configuration to induce a projection obeying the statistical variant of the RBT^8,18.

Let us, now, project our synthetic configurations by employing the BiSRGM-FT, i.e., the global statistical benchmark induced by the zero-deflated scheme. Following⁸, one can adopt an ‘agnostic’ attitude and explore the mesoscale organisation of a projection without aligning with any specific conceptual framework: a principled approach to achieve such a goal is that of minimising the Bayesian Information Criterion (BIC) reading

$$\,{\text{BIC}}\,=\kappa \ln V-2\ln {\mathcal{L}};$$

(5)

the first addendum proxies the complexity of a model with the number of its parameters, κ, the second addendum proxies the accuracy of a model with its log-likelihood, $\ln {\mathcal{L}}$, and V = N(N−1)/2 accounts for the system dimensions. Since we aim at describing a projection mesoscale organisation, a natural choice is that of adopting the Signed Stochastic Block Model (SSBM), defined by the likelihood function

$$\begin{array}{l}{{\mathcal{L}}}_{{\rm{SSBM}}}=\mathop{\prod }\limits_{r=1}^{k}{({p}_{rr}^{+})}^{{L}_{rr}^{+}}{({p}_{rr}^{-})}^{{L}_{rr}^{-}}{(1-{p}_{rr}^{+}-{p}_{rr}^{-})}^{\left(\begin{array}{c}{N}_{r}\\ 2\end{array}\right)-{L}_{rr}}\\ \mathop{\prod }\limits_{r=1}^{k}\mathop{\prod }\limits_{\begin{array}{c}s=1\\ s > r\end{array}}^{k}{({p}_{rs}^{+})}^{{L}_{rs}^{+}}{({p}_{rs}^{-})}^{{L}_{rs}^{-}}{(1-{p}_{rs}^{+}-{p}_{rs}^{-})}^{{N}_{r}{N}_{s}-{L}_{rs}}\end{array}$$

(6)

and a number of parameters κ_SSBM = k(k + 1), amounting at twice the number of modules, k, into which the projection can be partitioned. Naturally, N_r is the number of nodes constituting block r, ${L}_{rr}^{+}={p}_{rr}^{+}{N}_{r}({N}_{r}-1)/2$ is the number of positive links within block r, ${L}_{rr}^{-}={p}_{rr}^{-}{N}_{r}({N}_{r}-1)/2$ is the number of negative links within block r, ${L}_{rs}^{+}={p}_{rs}^{+}{N}_{r}{N}_{s}$ is the number of positive links between blocks r and s and ${L}_{rs}^{-}={p}_{rs}^{-}{N}_{r}{N}_{s}$ is the number of negative links between blocks r and s.

What we obtain confirms what we expect, i.e., that any two nodes within the same bipartite group share a significantly large number of concordant motifs that, in turn, induces a positive connection in the projection. On the other hand, any two nodes belonging to different bipartite groups may share either a significantly large number of discordant motifs that, in turn, induces a negative connection in the projection or a significantly large number of concordant motifs that, in turn, induces a positive connection in the projection.

Projection of real-world configurations

Let us, now, consider some real-world networks.

U.S. senate and U.S. house of representatives

The first two datasets that we consider, described in ref. ²¹ and further analysed in refs. ^20,23, are the output of the GovTrack.us project⁴⁴ and collect vote records from the 1st to the 10th Congress of the United States. The nodes on the first layer are either senators or representatives while the nodes on the second layer are bills: a positive (negative) link between a senator/representative and a bill indicates that the senator/representative has voted ‘Yes’ (‘Nay’) for that bill.

FilmTrust

The third dataset that we consider is the output of the FilmTrust project^45,46 and collects rating data from an online community whose users assign a score, i.e., 1, 2, 3, 4, to a number of movies. The nodes on the first layer are users while the nodes on the second layer are movies. We obtain a binary signed version of this dataset by assigning a −1 to all the edges whose weight is either 1 or 2 and a + 1 to all the edges whose weight is either 3 or 4.

From a purely empirical perspective, all the aforementioned datasets are characterised by a small link density c = L/(N ⋅ M): the connectance of U.S. Senate and U.S. House of Representatives amounts to ≃ 0.1 while the connectance of FilmTrust amounts to ≃0.01. The percentages of positive and negative links are, instead, quite different: U.S. Senate and U.S. House of Representatives have ≃ 55% of positive links while FilmTrust has ≃80% of positive links (see also Table I in Appendix E).

naive projection of real-world configurations

The aforementioned configurations have been, first, naively projected according to the recipes , valid within the zero-deflated scheme, and , valid within the zero-inflated scheme.

When considering the unsigned case, the link density of the projection returned by the naive approach is typically large³⁶. As shown in Fig. 2, this is especially true within the zero-inflated scheme–although the naive projections of FilmTrust are both very dense. Let us also notice that the small link density of FilmTrust causes the naive projection obtained within the zero-inflated scheme to be solely populated by positive links. For what concerns the naive projection obtained within the zero-deflated scheme, instead, its large link density signals that the vast majority of pairs of nodes shares at least one V-motif (i.e. the absolute value of the difference between the number of ‘full’ concordant and discordant motifs is at least 1). For what concerns U.S. Senate and U.S. House of Representatives, the naive projections obtained within the zero-deflated scheme are already quite sparse, letting non-trivial mesoscopic patterns emerge—appreciable even by just looking at the adjacency matrices illustrated in the first column of Fig. 3 (see also Table 1).

Table 1 From a purely empirical perspective, the datasets considered in the present contribution are characterised by a small link density and different percentages of positive and negative links: the connectance of U.S. Senate and U.S. House of Representatives amounts to ≃0.1 while ≃55% of links is positive; the connectance of FilmTrust amounts to ≃0.01 while ≃80% of links is positive

Full size table

BIC minimisation confirms the presence of modules seemingly obeying the statistical variant of the RBT^8,18, as a non-negligible number of negative links is found not only between clusters but within clusters as well—the vast majority of links within the second block from the right-bottom angle of the (adjacency matrix of the) zero-deflated naive projection of U.S. Senate is, in fact, negative—and a non-negligible number of positive links is found not only within clusters but between clusters as well—the vast majority of links between the first and the second block from the right-bottom angle of the (adjacency matrix of the) zero-deflated naive projection of U.S. Senate is, in fact, positive.

As a last observation, let us stress that the large link density of the zero-inflated naive projections of U.S. Senate and U.S. House of Representatives does not prevent BIC minimisation from detecting statistically significant mesoscopic structures: its sensitivity to the density of signed links⁸, in fact, makes it capable of revealing the 6 different blocks constituting the zero-inflated naive projection of U.S. Senate and the 2 different blocks constituting the zero-inflated naive projection of U.S. House of Representatives, both depicted in Fig. 4.

Fig. 4: Pictorial representation of the projections of U.S. Senate (top panels) and U.S. House of Representatives (bottom panels), obtained within the zero-inflated scheme, i.e., the naive ones (left panels), the ones induced by the BiSRGM-FT (middle panels), and the ones induced by the BiSCM-FT (right panels).

Validated projection of real-world configurations within the zero-deflated scheme

Although summing motifs while retaining their own sign is, in a sense, enough to obtain sparse projections, the validation procedure proposed in this paper (also) aims at enhancing the identification of patterns encoding non-trivial information about the original structure.

For what concerns U.S. Senate, minimising BIC on the projection filtered via the global statistical benchmark named BiSRGM-FT refines the picture provided by minimising BIC on the naive projection, confirming the presence of a larger number of modules—more precisely, 10 as shown in Fig. 4. Interestingly enough, all such modules are characterised by (a vast majority of) positive links, although positive links are found between modules as well. Finally, minimising BIC on the projection filtered via the local statistical benchmark named BiSCM-FT returns a picture lying, somehow, halfway between the naive one and the one filtered via the BiSRGM-FT, as 8 modules are, now, detected; contrarily to what has been revealed with the aid of the global filter, however, applying the local filter returns a picture where negative links are found within modules as well.

The entire process is even more evident when considering U.S. House of Representatives, whose mesoscopic structure gets progressively resolved into an increasing number of increasingly negative modules (i.e., 2, 5, 6).

Overall, thus, our results confirm that the revealed modular structure seems to align better with the statistical variant of the RBT than with the statistical variant of its traditional counterpart (TBT), consistently across the different projections.

Validated projection of real-world configurations within the zero-inflated scheme

The considerations above are confirmed by the zero-inflated projections depicted in Fig. 2. For example, progressively filtering FilmTrust leads from a fully connected network to configurations whose connectance practically halves at each step. More in detail, applying the BiSRGM allows two different blocks to emerge—a result similar in spirit to the one that has been reported in ref. ³⁶, where the global statistical benchmark induces a rough partition of the system under consideration. The filtering induced by the BiSCM is, instead, more severe and qualitatively different: the projection obtained is, in fact, much sparser and populated by a comparable number of positive and negative links.

Interestingly, filtering U.S. Senate with the BiSRGM lets BIC identify the same number of modules characterising the naive projection—they are 6 in both cases—but with a different arrangements of links; this is even more evident when considering the projection induced by the BiSCM, characterised by just three modules. On the contrary, filtering U.S. House of Representatives means rising (even substantially) the number of clusters while cutting half of the edges populating its naive projection.

Discussion

A straightforward approach to project a bipartite network in the unsigned setting is that of comparing the number of neighbours shared by any two nodes i and j, i.e.

$${V}_{ij}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }{b}_{j\alpha }={[{\bf{B}}\cdot {{\bf{B}}}^{T}]}_{ij}={{\bf{r}}}_{i}\cdot {({{\bf{r}}}_{j})}^{T},$$

(7)

with the one predicted under a chosen benchmark³⁶. In the signed setting, instead, two complementary approaches can be devised, treating the missing ties in a different way: while the zero-deflated projection scheme ignores them, the zero-inflated projection scheme accounts for them as well. Loosely speaking, the zero-deflated projection scheme returns configurations that are sparser than the configurations returned by the zero-inflated projection scheme although the latter are populated by a larger number of negative links; both schemes, however, enhance the detection of mesoscopic structures, the filtered projections being characterised by a larger number of modules than the naive ones.

Additionally, we have compared the patterns revealed by minimising BIC with the ones revealed by minimising the frustration F, defined as

$$F({\boldsymbol{\sigma }})={L}_{\bullet }^{-}+{L}_{\circ }^{+}$$

(8)

according to the traditional balance theory⁸, i.e., counting the number of negative links within modules (indicated with a filled dot—see the first addendum) plus the number of positive links between modules (indicated with an empty dot—see the second addendum). The differences are highlighted in Fig. 5, showing how the optimisation of F basically returns an oversimplified picture leading, for instance, to partition the projection of U.S. House of Representatives induced by the BiSRGM within the zero-inflated scheme into 32 modules. The explanation lies in the sensitivity of F solely towards the signed membership: as the signed density is completely ignored, the groups of nodes dominated by negative links are split into singletons—and, by converse, the nodes connected by positive links are grouped together. Another example is provided by the projection of U.S. House of Representatives induced by the BiSRGM within the zero-deflated scheme: such a configuration is, now, partitioned into 2 modules, the signed membership leading the algorithm to disregard the (internal) hierarchical structure of these clusters. Similar results are found when considering the other kinds of projections.

Fig. 5: Pictorial representation of the projections of the U.S. House of Representatives, obtained within the zero-deflated scheme (first and second column) and the zero-inflated scheme (third and fourth column), induced by the BiSRGM-FT.

A more quantitative comparison can be carried out upon calculating the Wallace, Rand and Jaccard Index, that sum up the coefficients populating the confusion matrix to return three compact measures of similarity between partitions. As illustrated in Appendix F, the three aforementioned indices confirm that the mesoscale structures spotted by the BIC-based and the F-based recipes are more similar whenever a larger number of positive links populates a given projection. If, on the contrary, the latter is defined by a large number of negative links, the F-based recipe outputs many ‘false negatives’, i.e., pairs of nodes that are separated solely because they are found to be connected by a negative link. Such a result sheds further light on the observation made in refs. ¹³ and ⁸ about the (potential) ambiguity concerning the variant of the balance theory best supported by the data: if the intuitive definition of modules as ‘densely connected groups of nodes’ is extended to the signed case, then the recipe prescribing to minimise BIC should be preferred. If, on the other hand, one seeks the configuration best aligning with the TBT, then the recipe prescribing to minimise F should be preferred—although not specifically designed to spot (statistically significant) mesoscale structures.

A very last observation concerns the ‘sign prediction’ problem. In⁴⁷, such an issue is addressed within the framework of the TBT, i.e., under the assumption that signed networks evolve towards balance: within such a context, sign prediction is carried out by combining the edge-based definition of Katz centrality with the minimisation of frustration. In⁴⁸, instead, an approach based upon the ensemble of random graphs induced by the hypergeometric distribution is employed. Other methods are based upon machine learning techniques^{23,49,50,51,52,53}. Our algorithm can, in a sense, be understood as implementing an unsupervised ‘white box’ method for predicting the sign of a link: interestingly, the prescription based upon the signature of its endpoints—summarisable with the motto ‘a significantly large number of concordant motifs induces a + 1 and a significantly large number of discordant motifs induces a − 1’—formalises the tendency towards balance, advocated by other approaches, within the bipartite context.

Methods

Formalism and basic quantities

A binary undirected bipartite signed network is completely defined by its biadjacency matrix, i.e., a rectangular table B whose dimensions will be indicated with M and N, M being the number of nodes in the top layer (i.e. the number of columns of B) and N being the number of nodes in the bottom layer (i.e. the number of rows of B).

Each edge can be positive, negative or missing: since we will focus on binary networks, each edge will be ‘+1’, ‘−1’ or ‘0’. More formally, for any two nodes i and α, the corresponding entry of the biadjacency matrix will be assumed to read b_iα = −1, 0, +1.

To ease mathematical manipulations, let us define the three quantities reading

$${b}_{i\alpha }^{-}=[{b}_{i\alpha }=-1],\quad {b}_{i\alpha }^{0}=[{b}_{i\alpha }=0],\quad {b}_{i\alpha }^{+}=[{b}_{i\alpha }=+1],$$

(9)

where we have employed Iverson’s brackets notation¹³. These new variables are mutually exclusive, satisfy the relationship ${b}_{i\alpha }^{-}+{b}_{i\alpha }^{0}+{b}_{i\alpha }^{+}=1$, ∀ i, α and induce the three non-negative matrices B⁺, B⁰ and B⁻ obeying the relationships B = B⁺ −B⁻ and ∣B∣ = B⁺ + B⁻.

In the following, it will turn out to be useful to identify a biadjacency matrix with the set of its rows, i.e., ${\bf{B}}\equiv {\{{{\bf{r}}}_{i}\}}_{i = 1}^{N}$, and analogously for B⁺ and B⁻. Lastly, B^T indicates the transpose of the biadjacency matrix B.

Global connectivity and node degrees

The number of positive and negative links, respectively, read

$${L}^{+}=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{+},\quad {L}^{-}=\mathop{\sum }\limits_{i=1}^{N}\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{-};$$

(10)

analogously, the positive and negative degree of node i are defined as

$${k}_{i}^{+}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{+},\quad {k}_{i}^{-}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{-}$$

(11)

while the positive and negative degree of node α are defined as

$${h}_{\alpha }^{+}=\mathop{\sum }\limits_{i=1}^{N}{b}_{i\alpha }^{+},\quad {h}_{\alpha }^{-}=\mathop{\sum }\limits_{i=1}^{N}{b}_{i\alpha }^{-}.$$

(12)

Naturally, ${L}^{+}=\mathop{\sum }\nolimits_{i=1}^{N}{k}_{i}^{+}=\mathop{\sum }\nolimits_{\alpha =1}^{M}{h}_{\alpha }^{+}$ and ${L}^{-}=\mathop{\sum }\nolimits_{i=1}^{N}{k}_{i}^{-}=\mathop{\sum }\nolimits_{\alpha =1}^{M}{h}_{\alpha }^{-}$. The advantage of adopting Iverson’s brackets is that of ensuring that each quantity is, now, computed on a matrix with positive entries, i.e., is positive as well.

The role of ambivalent patterns

The approaches to obtain a projection of a bipartite, signed network considered so far rest upon the calculation of any two nodes similarity. Evaluating such a quantity leads to two related problems, i.e., how to treat ambivalent patterns and missing ties.

Ambivalence is introduced in⁵⁴ and defined as a ‘conjunction of positive and negative relations that are psychologically secondary or derived’; in ref. ⁵⁵, Cartwright and Harary suggest that ‘attitudes of ambivalence should be unstable, changing to positive or negative attitudes so as to satisfy the criteria of balance’. In the bipartite setting we are considering here, ambivalence emerges in two, different cases: i) whenever nodes i and j establish motifs whose signature is either (+/−) or (−/+); ii) whenever nodes i and j establish the same number of (+/+) and (−/−) motifs. In both cases, devising a recipe to determine the sign of the link in the corresponding monopartite projection is not immediate.

In ref. ⁵⁶, two different recipes are proposed. The first one is based on matrix multiplication and prescribes to consider the projections induced by positive and negative links separately. The second one rely on a ‘vertex duplication’ mechanism, according to which each node belonging to the layer of interest originates a positive copy, gathering the original positive links, and a negative copy, gathering the original negative links; these connections are treated as unsigned and the network is, then, projected. Finally, the sign of the edges populating the monopartite projection is restored according to a so-called ‘vertex contraction’ rule.

In order to address the problem of the ambivalent patterns, let us define the ‘full’ dyadic motifs reading

$${V}_{ij}^{++}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{+}{b}_{j\alpha }^{+}={[({{\bf{B}}}^{+})\cdot {({{\bf{B}}}^{+})}^{T}]}_{ij}={{\bf{r}}}_{i}^{+}\cdot {({{\bf{r}}}_{j}^{+})}^{T},$$

(13)

that counts the number of nodes (belonging to the second layer) to which the nodes i and j (belonging to the first layer) are both connected via two positive links, and

$${V}_{ij}^{--}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{-}{b}_{j\alpha }^{-}={[({{\bf{B}}}^{-})\cdot {({{\bf{B}}}^{-})}^{T}]}_{ij}={{\bf{r}}}_{i}^{-}\cdot {({{\bf{r}}}_{j}^{-})}^{T},$$

(14)

that counts the number of nodes (belonging to the second layer) to which the nodes i and j (belonging to the first layer) are both connected via two negative links. It is quite intuitive to ascribe them to the class of the so-called concordant motifs, i.e., patterns capturing the ‘agreement’ between any two users establishing them: our proposal is, thus, that of connecting any two users establishing a significantly large number of concordant motifs with a + 1.

Analogously, it is quite intuitive to ascribe the expressions

$${V}_{ij}^{+-}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{+}{b}_{j\alpha }^{-}={[({{\bf{B}}}^{+})\cdot {({{\bf{B}}}^{-})}^{T}]}_{ij}={{\bf{r}}}_{i}^{+}\cdot {({{\bf{r}}}_{j}^{-})}^{T}$$

(15)

and

$${V}_{ij}^{-+}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{-}{b}_{j\alpha }^{+}={[({{\bf{B}}}^{-})\cdot {({{\bf{B}}}^{+})}^{T}]}_{ij}={{\bf{r}}}_{i}^{-}\cdot {({{\bf{r}}}_{j}^{+})}^{T},$$

(16)

that count the number of nodes (belonging to the second layer) to which the nodes i and j (belonging to the first layer) are connected with a positive and a negative link, to the class of the so-called discordant motifs, i.e., patterns capturing the ‘disagreement’ between any two users establishing them: our proposal is, thus, that of connecting any two users establishing a significantly large number of discordant motifs with a − 1.

The role of missing ties

Let us, now, address the problem of the missing ties. To this aim, let us define the ‘partial’ dyadic motifs reading

$${V}_{ij}^{00}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{0}{b}_{j\alpha }^{0}={[({{\bf{B}}}^{0})\cdot {({{\bf{B}}}^{0})}^{T}]}_{ij}={{\bf{r}}}_{i}^{0}\cdot {({{\bf{r}}}_{j}^{0})}^{T},$$

(17)

that counts the number of nodes (belonging to the second layer) to which the nodes i and j (belonging to the first layer) are both not connected,

$${V}_{ij}^{0+}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{0}{b}_{j\alpha }^{+}={[({{\bf{B}}}^{0})\cdot {({{\bf{B}}}^{+})}^{T}]}_{ij}={{\bf{r}}}_{i}^{0}\cdot {({{\bf{r}}}_{j}^{+})}^{T}$$

(18)

and

$${V}_{ij}^{+0}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{+}{b}_{j\alpha }^{0}={[({{\bf{B}}}^{+})\cdot {({{\bf{B}}}^{0})}^{T}]}_{ij}={{\bf{r}}}_{i}^{+}\cdot {({{\bf{r}}}_{j}^{0})}^{T},$$

(19)

that counts the number of nodes (belonging to the second layer) to which node i (node j) is not connected and node j (node i) is connected with a positive link,

$${V}_{ij}^{0-}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{0}{b}_{j\alpha }^{-}={[({{\bf{B}}}^{0})\cdot {({{\bf{B}}}^{-})}^{T}]}_{ij}={{\bf{r}}}_{i}^{0}\cdot {({{\bf{r}}}_{j}^{-})}^{T}$$

(20)

and

$${V}_{ij}^{-0}=\mathop{\sum }\limits_{\alpha =1}^{M}{b}_{i\alpha }^{-}{b}_{j\alpha }^{0}={[({{\bf{B}}}^{-})\cdot {({{\bf{B}}}^{0})}^{T}]}_{ij}={{\bf{r}}}_{i}^{-}\cdot {({{\bf{r}}}_{j}^{0})}^{T},$$

(21)

that count the number of nodes (belonging to the second layer) to which node i (node j) is not connected and node j (node i) is connected with a negative link. While we are led to consider ${V}_{ij}^{00}$ as a signature of concordance (both nodes have no reason to connect to any of the nodes belonging to the opposite layer), we are also led to consider ${V}_{ij}^{0+}$, ${V}_{ij}^{+0}$, ${V}_{ij}^{0-}$, ${V}_{ij}^{-0}$ as a signature of discordance (one of the nodes has no reason to connect to any of the nodes belonging to the opposite layer while the other has (This is also logically coherent with the following situation: imagine i likes item α, j has not a connection with it and k dislikes it. Should j be considered as agreeing with both i and k, they also should for transitivity. But this is definitely not the case.)). The rightmost members indicate that the same numbers can be obtained by taking the scalar products of the rows, indexed by i and j, of the biadjacency matrices B⁺, B⁰, and B⁻.

In summary, while accounting for ${V}_{ij}^{++}$, ${V}_{ij}^{--}$, ${V}_{ij}^{+-}$ and ${V}_{ij}^{-+}$ will lead to a zero-deflated projection scheme, where signs are solely determined by the ‘full’ dyadic motifs, also considering ${V}_{ij}^{00}$, ${V}_{ij}^{0+}$, ${V}_{ij}^{+0}$, ${V}_{ij}^{0-}$ and ${V}_{ij}^{-0}$ will lead to a zero-inflated projection scheme, where signs are determined by the ‘partial’ dyadic motifs too. For a pictorial illustration of the aforementioned ‘full’ and ‘partial’ dyadic motifs, we refer the reader to Figs. 6 and 7.

**Fig. 6: Pictorial representation of a bipartite signed network with blue (positive) and red (negative) edges.**

**Fig. 7: Pictorial representation of a bipartite signed network with blue (positive), red (negative), and grey (missing) edges.**

A scheme for the statistical validation of bipartite signed networks

Schematically, our validation algorithm works as follows:

A.
Focus on a specific pair of nodes belonging to the layer of interest, say i and j, and measure their similarity (see Section IV F);
B.
Quantify the statistical significance of the measured similarity, with respect to a properly defined benchmark, by computing the corresponding p value, say p_ij (see Section IV G);
C.
Repeat the step above for each pair of nodes;
D.
Apply a multiple-hypothesis testing procedure and connect the nodes i and j if and only if significantly similar (see Section IV H).

Let us stress that such a scheme is valid for both variants of our projection algorithm.

Step #1. Quantifying the similarity of any two nodes

Let us divide the next three sections in two subsections each: the first one will be devoted to devise a recipe for projecting a bipartite network that ignores the dyadic motifs constituted by, at least, one missing tie (the bipartite topology is considered ‘fixed’ and only the signs of the ‘full’ dyadic motifs are accounted for); the second one will be devoted to devise a recipe for projecting a bipartite network that accounts for the dyadic motifs constituted by, at least, one missing tie as well (the bipartite topology is considered ‘free’ and the signs of both the ‘full’ and the ‘partial’ dyadic motifs are accounted for). To avoid confusion, the quantities defined within the first framework will be underlined.

Zero-deflated projection scheme

The first step of our method prescribes measuring the degree of similarity of nodes i and j. To this aim, let us consider the quantity named signature and defined as:

$$\begin{array}{ll}{{S}_{ij}}=\mathop{\sum }\limits_{{\alpha =1}}^{{V}_{ij}}{b}_{i\alpha }{b}_{j\alpha }\\ \qquad=\mathop{\sum }\limits_{{\alpha =1}}^{{V}_{ij}}[({b}_{i\alpha }^{+}{b}_{j\alpha }^{+}+{b}_{i\alpha }^{-}{b}_{j\alpha }^{-})-({b}_{i\alpha }^{+}{b}_{j\alpha }^{-}+{b}_{i\alpha }^{-}{b}_{j\alpha }^{+})]\\ \qquad=\mathop{\sum }\limits_{{\alpha =1}}^{{V}_{ij}}({C}_{ij\alpha }-{D}_{ij\alpha })\\ \qquad={{C}_{ij}}-{{D}_{ij}}\end{array}$$

(22)

where the sum runs over the ‘full’ V-motifs (The symbol $\mathop{\sum }\nolimits_{\underline{\alpha =1}}^{{V}_{ij}}(\ldots \,)$ indicates that the sum runs over the connected pairs of nodes and is equivalent to $\mathop{\sum }\nolimits_{\alpha =1}^{M}| {b}_{i\alpha }{b}_{j\alpha }| (\ldots \,)$.), i.e., ${V}_{ij}\equiv \mathop{\sum }\nolimits_{\alpha =1}^{M}| {b}_{i\alpha }{b}_{j\alpha }|$. In words, the signature is the difference between two quantities, i.e., the concordance of nodes i and j, reading

$$\underline{{C}_{ij}}=\mathop{\sum }\limits_{\underline{\alpha =1}}^{{V}_{ij}}{C}_{ij\alpha }=\mathop{\sum }\limits_{\underline{\alpha =1}}^{{V}_{ij}}({b}_{i\alpha }^{+}{b}_{j\alpha }^{+}+{b}_{i\alpha }^{-}{b}_{j\alpha }^{-})={\underline{{V}_{ij}}}^{++}+{\underline{{V}_{ij}}}^{--}$$

(23)

and counting the number of ‘full’ concordant motifs, and the discordance of nodes i and j, reading

$$\underline{{D}_{ij}}=\mathop{\sum }\limits_{\underline{\alpha =1}}^{{V}_{ij}}{D}_{ij\alpha }=\mathop{\sum }\limits_{\underline{\alpha =1}}^{{V}_{ij}}({b}_{i\alpha }^{+}{b}_{j\alpha }^{-}+{b}_{i\alpha }^{-}{b}_{j\alpha }^{+})={\underline{{V}_{ij}}}^{+-}+{\underline{{V}_{ij}}}^{-+}$$

(24)

and counting the number of ‘full’ discordant motifs. As Fig. 6 shows, the pairs of nodes establishing motifs that are accounted for in the zero-deflated scheme are ij and in, establishing a (+/−) motif; il and im, establishing a (−/+) motif; jn, establishing a (−/−) motif; lm, establishing a (+/+) motif.

A naive way of projecting a bipartite signed network would prescribe to stop here and apply the sign function to the signature, hence connecting nodes i and j with a positive link if ${{S}_{ij}}\, > \,0$, i.e., ${{C}_{ij}} \,> \,{{D}_{ij}}$, and with a negative link if ${{S}_{ij}}\, < \,0$, i.e., ${{C}_{ij}}\, < \,{{D}_{ij}}$. More compactly,

(25)

Zero-inflated projection scheme

Let us, now, measure the degree of similarity of nodes i and j within the zero-inflated projection scheme. To this aim, let us consider the novel definition of signature reading

$${S}_{ij}=\mathop{\sum }\limits_{\alpha =1}^{M}({C}_{ij\alpha }-{D}_{ij\alpha })={C}_{ij}-{D}_{ij}$$

(26)

where the concordance between nodes i and j, now, reads

$${C}_{ij}=\mathop{\sum }\limits_{\alpha =1}^{M}{C}_{ij\alpha }={V}_{ij}^{++}+{V}_{ij}^{--}+{V}_{ij}^{00}$$

(27)

and the discordance between nodes i and j, now, reads

$$\begin{array}{ll}{D}_{ij}=\mathop{\sum }\limits_{\alpha =1}^{M}{D}_{ij\alpha }\\\qquad={V}_{ij}^{+-}+{V}_{ij}^{-+}+{V}_{ij}^{0+}+{V}_{ij}^{+0}+{V}_{ij}^{0-}+{V}_{ij}^{-0}.\end{array}$$

(28)

As Fig. 7 shows, the pairs of nodes establishing ‘full’ motifs are ij and in, establishing a (+/−) motif; il and im, establishing a (−/+) motif; jn, establishing a (−/−) motif; lm, establishing a (+/+) motif. Moreover, node i establishes a (0/+) motif with node j via node α, a (0/0) motif with nodes k, l, m and n via node α, a (+/0) motif with any other node via node β, a (0/−) motif with node k via node δ, etc.

A naive way of projecting a bipartite signed network would prescribe to stop here and apply the sign function to the signature, connecting nodes i and j with a positive link if S_ij > 0, i.e., C_ij > D_ij, and with a negative link if S_ij < 0, i.e., C_ij < D_ij. More compactly,

(29)

Notice that such a naive projection would be denser than the naive one induced by the zero-deflated definition of signature, the only possibility of observing being that of having C_ij = D_ij.

Step #2. Quantifying the statistical significance of similarity

Zero-deflated projection scheme

The second step of our method prescribes to evaluate the statistical significance of our nodes similarity. To this aim, let us find the probability distribution obeyed by $\underline{{S}_{ij}}$, after noticing that

$$-{V}_{ij}\le \underline{{S}_{ij}}\le {V}_{ij}$$

(30)

where $\underline{{S}_{ij}}=-{V}_{ij}$ if $\underline{{C}_{ij}}=0$ (i.e., if each ‘full’ V-motif is composed by a −1 and a +1) and ${{S}_{ij}}={V}_{ij}$ if ${{D}_{ij}}=0$ (i.e., if each ‘full’ V-motif is composed by either two −1s or two +1s).

Let us, now, keep the formalism as general as possible and treat links as independent non-identically distributed (i.n.i.d.) random variables. This amounts at considering the finite scheme (Such a notation, introduced by Khintchine in Mathematical Foundations of Information Theory⁵⁷, compactly represents a discrete probability distribution, by listing its support on the first row and the related probability coefficients on the second row.)

$${b}_{i\alpha } \sim \left(\begin{array}{ll}-1&+1\\ 1-{p}_{i\alpha }^{+}&{p}_{i\alpha }^{+}\end{array}\right)$$

(31)

∀ i, α such that ∣b_iα∣ = 1, further inducing

$$\begin{array}{l}{b}_{i\alpha }{b}_{j\alpha } \sim \left(\begin{array}{ll}-1&+1\\ {p}_{i\alpha }^{+}+{p}_{j\alpha }^{+}-2{p}_{i\alpha }^{+}{p}_{j\alpha }^{+}&1-{p}_{i\alpha }^{+}-{p}_{j\alpha }^{+}+2{p}_{i\alpha }^{+}{p}_{j\alpha }^{+}\end{array}\right)\\ \qquad\quad=\left(\begin{array}{ll}-1&+1\\ 1-{q}_{ij\alpha }^{+}&{q}_{ij\alpha }^{+}\end{array}\right)\end{array}$$

(32)

∀ i, j, α such that ∣b_iαb_jα∣ = 1.

Since ${{S}_{ij}}$ is the sum of i.n.i.d. Bernoulli random variables, the probability distribution obeyed by it is the Poisson-binomial

$$P({{S}_{ij}}=s)=\sum _{{C}_{k}\in {{\mathcal{C}}}_{k}}\left[\prod _{\nu \in {C}_{k}}{q}_{ij\nu }^{+}\prod _{\tau \notin {C}_{k}}(1-{q}_{ij\tau }^{+})\right]$$

(33)

where ${{\mathcal{C}}}_{k}$ is the set of all possible k-tuples of which ν and τ are instances. As we will see in Section IV I, this is precisely the case of a benchmark enforcing linear local constraints.

The formula above simplifies in the case of a benchmark enforcing linear global constraints, the links becoming independent identically distributed (i.i.d.) random variables and the Poisson-binomial above reducing to a binomial.

For more details about the benchmarks, see Sections IV I 1, IV I 1 and Appendix A.

Zero-inflated projection scheme

Let us, now, find the probability distribution obeyed by S_ij, after noticing that

$$-M\le {S}_{ij}\le M$$

(34)

where S_ij = −M if C_ij = 0 (i.e. if each V-motif is composed by a −1 and a +1) and S_ij = M if D_ij = 0 (i.e. if each V-motif is composed by either two − 1s or two + 1s).

As in the previous subsection, let us treat links link as i.n.i.d. random variables. Within the zero-inflated projection scheme, this amounts at replacing Eq. (31) with

$${b}_{i\alpha } \sim \left(\begin{array}{lll}-1&0&+1\\ {p}_{i\alpha }^{-}&{p}_{i\alpha }^{0}&{p}_{i\alpha }^{+}\end{array}\right)$$

(35)

∀ i, α, further inducing

$${C}_{ij\alpha }-{D}_{ij\alpha } \sim \left(\begin{array}{ll}-1&+1\\ 1-{q}_{ij\alpha }^{+}&{q}_{ij\alpha }^{+}\end{array}\right)$$

(36)

with

$${q}_{ij\alpha }^{+}={q}_{ij\alpha }^{++}+{q}_{ij\alpha }^{--}+{q}_{ij\alpha }^{00},$$

(37)

$$1-{q}_{ij\alpha }^{+}={q}_{ij\alpha }^{+-}+{q}_{ij\alpha }^{-+}+{q}_{ij\alpha }^{0+}+{q}_{ij\alpha }^{+0}+{q}_{ij\alpha }^{0-}+{q}_{ij\alpha }^{-0}$$

(38)

where, for example, the coefficient ${q}_{ij\alpha }^{++}$ induces the finite scheme

$${b}_{i\alpha }^{+}{b}_{j\alpha }^{+} \sim \left(\begin{array}{ll}0&+1\\ 1-{p}_{i\alpha }^{+}{p}_{j\alpha }^{+}&{p}_{i\alpha }^{+}{p}_{j\alpha }^{+}\end{array}\right)=\left(\begin{array}{ll}0&+1\\ 1-{q}_{ij\alpha }^{++}&{q}_{ij\alpha }^{++}\end{array}\right)$$

(39)

and analogously for the others.

As a consequence, S_ij becomes a sum of i.n.i.d. Bernoulli random variables obeying the Poisson-binomial reading

$$P({S}_{ij}=s)=\sum _{{C}_{k}\in {{\mathcal{C}}}_{k}}\left[\prod _{\nu \in {C}_{k}}{q}_{ij\nu }^{+}\prod _{\tau \notin {C}_{k}}(1-{q}_{ij\tau }^{+})\right]$$

(40)

where ${{\mathcal{C}}}_{k}$ is the set of all possible k-tuples of which ν and τ are instances. As in the previous subsection, if the benchmark induced by linear local constraints is replaced by the benchmark induced by linear global constraints, the links become i.i.d. random variables and the Poisson-binomial reduces to a binomial.

For more details about the benchmarks, see Sections IV I 2, IV I 2 and Appendix B.

Quantifying the statistical significance of similarity

Once we have calculated the distribution for each pair of nodes belonging to the layer of interest, we have to calculate the statistical significance of the empirical signature: as we have a signed quantity, we need both tails of such a distribution to carry out what is known as two-sided test of hypothesis. In what follows we will focus on $\underline{{S}_{ij}}$ but the same considerations hold true for S_ij.

Basically, we need to answer the two related questions i) is the empirical value of the signature significantly different from the one expected under the chosen benchmark? and ii) if so, is the deviation negative (hence, the signature is significantly smaller than expected) or positive (hence, the signature is significantly larger than expected)?

The first question can be answered upon calculating the two-sided p value reading

$${p}_{ij}=2\cdot \min \left\{F(\underline{{S}_{ij}^{* }}),1-F(\underline{{S}_{ij}^{*}})\right\}$$

(41)

with ${S}_{ij}^{* }$ being the empirical value of the signature and F being the cumulative distribution function, defined as $F(\underline{{S}_{ij}^{* }})=\sum _{x\le \underline{{S}_{ij}^{* }}}P(\underline{{S}_{ij}}=x)$: such a number evaluates the probability of observing a deviation from the expected value in either directions.

The second question can be answered upon calculating the sign of such a deviation, by determining if either

$$F({{S}_{ij}^{* }})\, < \,1-F({{S}_{ij}^{* }})$$

(42)

$$F({{S}_{ij}^{*}})\, > \,1-F({{S}_{ij}^{*}})$$

(43)

holds true. In the first case, $F({{S}_{ij}^{* }}) \,< \,1/2$, the empirical value of the signature is smaller than the median of the distribution and the deviation is negative; in the second case, $F({{S}_{ij}^{* }})\, > \,1/2$, the empirical value of the signature is larger than the median of the distribution and the deviation is positive. Upon indicating the threshold individuated by the multiple-hypothesis testing procedure with p_th (It represents the largest p value that satisfies the False Discovery Rate rejection criterion and is defined as ${p}_{th}=\hat{i}t/| H|$, with $\hat{i}$ being the largest integer satisfying the condition ${\text{}p-\text{value}}_{\hat{i}}\le {p}_{th}$, t is the single-test significance level and ∣H∣ is the total number of tested hypotheses (see also Appendix C).), the following cases can be met:

The two conditions $F({{S}_{ij}^{* }})\, < \,1/2$ and p_ij ≤ p_th indicate that nodes i and j have established a number of ‘full’ discordant motifs that is so large to induce a significantly negative signature—and, potentially, a negative link in the projection (a_ij = −1);
The two conditions $F({{S}_{ij}^{* }})\, > \,1/2$ and p_ij ≤ p_th indicate that nodes i and j have established a number of ‘full’ concordant motifs that is so large to induce a significantly positive signature—and, potentially, a positive link in the projection (a_ij = +1);
The condition p_ij > p_th individuates a value of the signature (i.e. of ‘full’ concordant/discordant motifs) that is compatible with the one predicted by the chosen benchmark: stated otherwise, the empirical number of motifs could have been observed in configurations generated by the benchmark itself—hence, inducing a null link in the projection (a_ij = 0).

For a graphical representation of our validation procedure, see Fig. 8.

**Fig. 8: Probability distribution of the signature () and its Gaussian approximation ().**

Step #3. Validating the projection

The second step of our method returns a symmetric matrix of p values. Individuating the ones associated with the hypotheses to be actually rejected requires a procedure to deal with the comparison of multiple hypotheses at the same time. In very general terms, a threshold has to be set: if the specific p value is smaller than such a threshold, the associated event is interpreted as statistically significant and the corresponding nodes are connected in the projection.

In the present paper, we apply the so-called False Discovery Rate (FDR) procedure⁵⁸, allowing one to control for the expected number of ‘false discoveries’ (i.e. incorrectly rejected null hypotheses or incorrectly validated links), irrespectively of the independence of the hypotheses tested.

For more details about the FDR procedure, see Appendix C.

Maximum entropy benchmarks for bipartite signed networks

The second step of our method prescribes to quantify the statistical significance of the similarity of any two nodes i and j. To this aim, a statistical benchmark is needed. A natural choice leads to the adoption of models belonging to the class of the so-called Exponential Random Graphs, induced by the constrained maximisation of the Shannon entropy^{22,59,60,61,62,63}

$$S=-\sum _{{\bf{B}}\in {\mathbb{B}}}P({\bf{B}})\ln P({\bf{B}}),$$

(44)

the sum running over a properly defined ensemble of configurations. Within such a framework, the generic bipartite network B is assigned the probability

$$P({\bf{B}})=\frac{{e}^{-H({\boldsymbol{\theta }},{\boldsymbol{C}}({\bf{B}}))}}{Z({\boldsymbol{\theta }})}$$

(45)

whose value is determined by the vector C(B) = {C_i(B)} of topological constraints via the Hamiltonian reading $H({\boldsymbol{\theta }},{\boldsymbol{C}}({\bf{B}}))=\sum _{i}{\theta }_{i}{C}_{i}({\bf{B}})$. In what follows, we will employ linear constraints: beside allowing our optimisation problem to be analytically solved in these cases—P(B) can be written in a factorised form, i.e., as a product of pair-specific probability coefficients—they individuate the ‘right’ amount of information to be accounted for to obtain a projection: since we are interested in evaluating the statistical significance of co-occurrences, one must discount the tendency of nodes to establish connections with many/few neighbours.

In order to determine the unknown parameters θ, the likelihood-maximisation recipe can be adopted: given an observed biadjacency matrix B^*, it translates into solving the system of equations

$$\langle {C}_{i}\rangle ({\boldsymbol{\theta }})=\sum _{{\bf{B}}\in {\mathbb{B}}}P({\bf{B}}){C}_{i}({\bf{B}})={C}_{i}({{\bf{B}}}^{* }),\quad \forall \,i$$

(46)

which prescribes to equate each ensemble average, e.g., 〈C_i〉(θ), to its observed counterpart, i.e., C_i(B^*) ^{22,61,62,63,64}.

Zero-deflated projection scheme

The first two benchmarks determine the coefficients of the two variants of the finite scheme defined in Eq. (31). As already stressed, dyadic motifs constituted by, at least, one missing tie are ignored, here: the bipartite topology is, thus, considered ‘fixed’. Upon indicating the total number of connected pairs of nodes with $L=\mathop{\sum }\nolimits_{i=1}^{N}\mathop{\sum }\nolimits_{\alpha =1}^{M}| {b}_{i\alpha }| ={L}^{+}+{L}^{-}$ and considering that any node pair can be either positively or negatively connected, the ensemble is constituted by $| {\mathbb{B}}| ={2}^{L}$ possible configurations in both cases.

Fixed-topology Bipartite signed random graph model

The Fixed-topology Bipartite Signed Random Graph Model (BiSRGM-FT) is defined by two global constraints, i.e., the total number of positive and negative links: the corresponding Hamiltonian, thus, reads

$$H({\boldsymbol{\theta }},{\bf{B}})={\beta }^{{\prime} }{L}^{+}({\bf{B}})+{\gamma }^{{\prime} }{L}^{-}({\bf{B}}).$$

(47)

Keeping the topology of the network under analysis fixed while (solely) randomising the edge signs implies that the role of random variables is played by the entries of the biadjacency matrix corresponding to the connected pairs of nodes, i.e., the ones for which ∣b_iα∣ = 1. Let us, however, notice that the Hamiltonian in Eq. (47) can be re-written as

$$H({\boldsymbol{\theta }},{\bf{B}})={\beta }^{{\prime} }{L}^{+}({\bf{B}})+{\gamma }^{{\prime} }(L-{L}^{+})({\bf{B}})$$

(48)

$$=({\beta }^{{\prime} }-{\gamma }^{{\prime} }){L}^{+}({\bf{B}})+{\gamma }^{{\prime} }L({\bf{B}})$$

(49)

and further simplified into

$$H({\boldsymbol{\theta }},{\bf{B}})=\beta {L}^{+}({\bf{B}})$$

(50)

where the constant term has been dropped, as it does not affect the computation of the probability coefficients, and the only relevant Lagrange multiplier has been re-named⁶⁵. The generic entry, thus, obeys

$$\begin{array}{rcl}P({b}_{i\alpha }=+1)&=&\displaystyle\frac{{e}^{-\beta }}{1+{e}^{-\beta }}={p}^{+},\\ P({b}_{i\alpha }=-1)&=&\displaystyle\frac{1}{1+{e}^{-\beta }}=1-{p}^{+};\end{array}$$

in words, each entry satisfying ∣b_iα∣ = 1 obeys a Bernoulli distribution whose probability coefficients are determined by the imposed constraints: each existing link is assigned a ‘plus one’ with probability p⁺ and a ‘minus one’ with probability p⁻.

To employ the BiSRGM-FT for studying real-world networks, the parameters that define it need to be properly tuned. More specifically, one needs to ensure

$${\left\langle {L}^{+}\right\rangle }_{{\rm{BiSRGM}}\text{-}{\rm{FT}}}={L}^{+}({{\bf{B}}}^{* })$$

(51)

with the symbol B^* indicating the empirical network under analysis. The maximisation of the likelihood function ${{\mathcal{L}}}_{{\rm{BiSRGM}}\text{-}{\rm{FT}}}\equiv \ln {P}_{{\rm{BiSRGM}}\text{-}{\rm{FT}}}({{\bf{B}}}^{* })$ with respect to the unknown parameters that define it leads us to find

$${p}^{+}={L}^{+}({{\bf{B}}}^{* })/L({{\bf{B}}}^{* }).$$

(52)

For an alternative, yet equivalent, resolution of the problem, see Appendix D.

Fixed-topology bipartite signed configuration model

The Fixed-topology Bipartite Signed Configuration Model (BiSCM-FT), instead, is defined by local constraints, i.e., the positive and negative degree sequences: the corresponding Hamiltonian, thus, reads

$$\begin{array}{ll}H({\boldsymbol{\theta }},{\bf{B}})=\mathop{\sum }\limits_{i=1}^{N}[{\beta }_{i}^{{\prime} }{k}_{i}^{+}({\bf{B}})+{\gamma }_{i}^{{\prime} }{k}_{i}^{-}({\bf{B}})]\\ \qquad\qquad\quad+\mathop{\sum }\limits_{\alpha =1}^{M}[{\delta }_{\alpha }^{{\prime} }{h}_{\alpha }^{+}({\bf{B}})+{\eta }_{\alpha }^{{\prime} }{h}_{\alpha }^{-}({\bf{B}})];\end{array}$$

(53)

again, the role of random variables is played by the entries of the biadjacency matrix corresponding to the connected pairs of nodes and the Hamiltonian in Eq. (53) can be further simplified into

$$H({\boldsymbol{\theta }},{\bf{B}})=\mathop{\sum }\limits_{i=1}^{N}[{\beta }_{i}^{{\prime} }{k}_{i}^{+}({\bf{B}})+{\gamma }_{i}^{{\prime} }({k}_{i}-{k}_{i}^{+}({\bf{B}}))]$$

(54)

$$+\mathop{\sum }\limits_{\alpha =1}^{M}[{\delta }_{\alpha }^{{\prime} }{h}_{\alpha }^{+}({\bf{B}})+{\eta }_{\alpha }^{{\prime} }({h}_{\alpha }-{h}_{\alpha }^{+}({\bf{B}}))]$$

(55)

$$=\mathop{\sum }\limits_{i=1}^{N}{\beta }_{i}{k}_{i}^{+}({\bf{B}})+\mathop{\sum }\limits_{\alpha =1}^{M}{\delta }_{\alpha }{h}_{\alpha }^{+}({\bf{B}})$$

(56)

with obvious meaning of the symbols⁶⁵. The generic entry, now, obeys

$$P({b}_{i\alpha }=+1)=\frac{{e}^{-({\beta }_{i}+{\delta }_{\alpha })}}{1+{e}^{-({\beta }_{i}+{\delta }_{\alpha })}}={p}_{i\alpha }^{+},$$

(57)

$$P({b}_{i\alpha }=-1)=\frac{1}{1+{e}^{-({\beta }_{i}+{\delta }_{\alpha })}}=1-{p}_{i\alpha }^{+};$$

(58)

in words, given any two connected nodes i and α, their link is assigned a + 1 with probability ${p}_{i\alpha }^{+}$ and a − 1 with probability ${p}_{i\alpha }^{-}$.

To tune the parameters defining the BiSCM-FT, we can maximise the likelihood function ${{\mathcal{L}}}_{{\rm{BiSCM}}\text{-}{\rm{FT}}}\equiv \ln {P}_{{\rm{BiSCM}}\text{-}{\rm{FT}}}({{\bf{B}}}^{* })$ with respect to the unknown parameters that define it: such a recipe leads us to find

$${\langle {k}_{i}^{+}\rangle }_{{\rm{BiSCM}}\text{-}{\rm{FT}}}={k}_{i}^{+}({{\bf{B}}}^{* }),\quad \forall \,i,$$

(59)

$${\langle {h}_{\alpha }^{+}\rangle }_{{\rm{BiSCM}}\text{-}{\rm{FT}}}={h}_{\alpha }^{+}({{\bf{B}}}^{* }),\quad \forall \,\alpha .$$

(60)

The system above can be solved only numerically, along the guidelines provided in ref. ⁶⁶.

For an alternative, yet equivalent, resolution of the problem, see Appendices D and E. Here, we have solved it by employing the SIMONA Matlab-coded package, available at this URL.

Zero-inflated projection scheme

The other two benchmarks determine the coefficients of the two variants of the finite scheme defined in Eq. (35). As already stressed, dyadic motifs constituted by, at least, one missing tie are accounted for, here: the bipartite topology is, thus, considered ‘free’. Since the total number of node pairs is N ⋅ M and any node pair can be positively connected, negatively connected or disconnected, the ensemble is constituted by $| {\mathbb{B}}| ={3}^{N\cdot M}$ possible configurations in both cases.

Free-topology bipartite signed random graph model

The Free-topology Bipartite Signed Random Graph Model (BiSRGM)¹³ is induced by the same Hamiltonian inducing its fixed-topology counterpart, i.e., H(θ, B) = βL⁺(B) + γL⁻(B) but treating the topology as ‘free’ does not allow for any simplification. The generic entry obeys

$$P({b}_{i\alpha }=+1)=\frac{{e}^{-\beta }}{1+{e}^{-\beta }+{e}^{-\gamma }}={p}^{+},$$

(61)

$$P({b}_{i\alpha }=-1)=\frac{{e}^{-\gamma }}{1+{e}^{-\beta }+{e}^{-\gamma }}={p}^{-}$$

(62)

and p⁰ ≡ 1 − p⁻ − p⁺. In words, b_iα obeys a generalised Bernoulli distribution whose probability coefficients are determined by the imposed constraints. Each positive link appears with probability p⁺, each negative link appears with probability p⁻ and each missing link has a probability p⁰.

The maximisation of the likelihood function ${{\mathcal{L}}}_{{\rm{BiSRGM}}}=\ln {P}_{{\rm{BiSRGM}}}({{\bf{B}}}^{* })$ with respect to the unknown parameters that define it leads us to find

$${p}^{+}={L}^{+}({{\bf{B}}}^{* })/(N\cdot M),$$

(63)

$${p}^{-}={L}^{-}({{\bf{B}}}^{* })/(N\cdot M)$$

(64)

and p⁰ ≡ 1 − p⁻ − p⁺.

For more details, see Appendix D.

Free-topology bipartite signed configuration model

The Free-topology Bipartite Signed Configuration Model (BiSCM)¹³ is induced by the same Hamiltonian inducing its fixed-topology counterpart, i.e., $H({\boldsymbol{\theta }},{\bf{B}})=\mathop{\sum }\nolimits_{i=1}^{N}[{\beta }_{i}{k}_{i}^{+}({\bf{B}})+{\gamma }_{i}{k}_{i}^{-}({\bf{B}})]+\mathop{\sum }\nolimits_{\alpha =1}^{M}[{\delta }_{\alpha }{h}_{\alpha }^{+}({\bf{B}})+{\eta }_{\alpha }{h}_{\alpha }^{-}({\bf{B}})]$ but, as for the BiSRGM, it cannot be further simplified. The generic entry, thus, reads

$$P({b}_{i\alpha }=+1)=\frac{{e}^{-({\beta }_{i}+{\delta }_{\alpha })}}{1+{e}^{-({\beta }_{i}+{\delta }_{\alpha })}+{e}^{-({\gamma }_{i}+{\eta }_{\alpha })}}={p}_{i\alpha }^{+},$$

(65)

$$P({b}_{i\alpha }=-1)=\frac{{e}^{-({\gamma }_{i}+{\eta }_{\alpha })}}{1+{e}^{-({\beta }_{i}+{\delta }_{\alpha })}+{e}^{-({\gamma }_{i}+{\eta }_{\alpha })}}={p}_{i\alpha }^{-}$$

(66)

and ${p}_{i\alpha }^{0}\equiv 1-{p}_{i\alpha }^{-}-{p}_{i\alpha }^{+}$. In words, b_iα obeys a generalised Bernoulli distribution whose probability coefficients are determined by the imposed constraints. Given any two nodes i and α, they are connected by a positive link with probability ${p}_{i\alpha }^{+}$, by a negative link with probability ${p}_{i\alpha }^{-}$ and are disconnected with probability ${p}_{i\alpha }^{0}$.

The maximisation of the likelihood function ${{\mathcal{L}}}_{{\rm{BiSCM}}}=\ln {P}_{{\rm{BiSCM}}}({{\bf{B}}}^{* })$ with respect to the unknown parameters that define it leads us to find

$${k}_{i}^{+}({{\bf{B}}}^{* })=\langle {k}_{i}^{+}\rangle ,\quad \forall \,i,$$

(67)

$${k}_{i}^{-}({{\bf{B}}}^{* })=\langle {k}_{i}^{-}\rangle ,\quad \forall \,i,$$

(68)

$${h}_{\alpha }^{+}({{\bf{B}}}^{* })=\langle {h}_{\alpha }^{+}\rangle ,\quad \forall \,\alpha ,$$

(69)

$${h}_{\alpha }^{-}({{\bf{B}}}^{* })=\langle {h}_{\alpha }^{-}\rangle ,\quad \forall \,\alpha .$$

(70)

The system above can be solved only numerically, along the guidelines provided in ref. ⁶⁶.

For more details, see Appendices D and E. Here, we have solved it by employing the SIMONA Matlab-coded package, available at this URL.

Let us conclude this section by mentioning that BiSCM can be obtained as a special case of the Bipartite Score Configuration Model presented in ref. ⁶⁷.

Data availability

Data concerning \textit{U.S. Senate} and \textit{U.S. House of Representatives} are described in\cite{derr2019balance} and can be found at the address https://www.govtrack.us/. Data concerning \textit{FilmTrust} is described in\cite{filmtrust} and can be found at the address http://konect.cc/networks/librec-filmtrust-ratings/.

Code availability

We released a Matlab-coded package that implements all the probabilistic models for a binary undirected bipartite signed network. Its name is SIMONA, an acronym standing for ‘Signed Models for Network Analysis’, and is freely downloadable at this URL.

References

Antal, T., Krapivsky, P. L. & Redner, S. Social balance on networks: the dynamics of friendship and enmity. Phys. D: Nonlinear Phenom. 224, 130–136 (2006).
Article MathSciNet Google Scholar
Leskovec, J., Huttenlocher, D. & Kleinberg, J. Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 1361–1370 (2010).
Zaslavsky, T. A mathematical bibliography of signed and gain graphs and allied areas. Electron. J. Comb. DS8–Dec (2012).
Tang, J., Chang, Y., Aggarwal, C. & Liu, H. A survey of signed network mining in social media. ACM Comput. Surv. 49, 1–37 (2016).
Google Scholar
Heider, F. Attitudes and cognitive organization. J. Psychol. 21, 107–112 (1946).
Article Google Scholar
Cartwright, D. & Harary, F. Structural balance: a generalization of Heider’s theory. Psychol. Rev. 63, 277 (1956).
Article Google Scholar
Davis, J. A. Clustering and structural balance in graphs. Hum. Relat. 20, 181–187 (1967).
Article Google Scholar
Gallo, A. et al. Assessing frustration in real-world signed networks: a statistical theory of balance. Phys. Rev. Res. 6, L042065 (2024).
Harary, F., Lim, M.-H. & Wunsch, D. C. Signed graphs for portfolio analysis in risk management. IMA J. Manag Math. 13, 201–210 (2002).
Google Scholar
Ou-Yang, L., Dai, D.-Q. & Zhang, X.-F. Detecting protein complexes from signed protein-protein interaction networks. IEEE/ACM Trans. Comput. Biol. Bioinforma. 12, 1333–1344 (2015).
Article Google Scholar
Iorio, F. et al. Efficient randomization of biological networks while preserving functional characterization of individual nodes. BMC Bioinformatics 17, 1–14 (2016).
Article Google Scholar
Saiz, H. et al. Evidence of structural balance in spatial ecological networks. Ecography 40, 733–741 (2017).
Article Google Scholar
Gallo, A., Garlaschelli, D., Lambiotte, R., Saracco, F. & Squartini, T. Testing structural balance theories in heterogeneous signed networks. Commun. Phys. 7, 154 (2024).
Article Google Scholar
Gallo, A., Saracco, F., Lambiotte, R., Garlaschelli, D. & Squartini, T. Patterns of link reciprocity in directed, signed networks. Phys. Rev. E 111, 024312 (2025).
Article MathSciNet Google Scholar
Candellone, E., van Kesteren, E.-J., Chelmi, S. & Garcia-Bernardo, J. Community detection in bipartite signed networks is highly dependent on parameter choice. Adv. Complex Syst. 28 (2025).
Talaga, S., Stella, M., Swanson, T. J. & Teixeira, A. S. Polarization and multiscale structural balance in signed networks. Commun. Phys. 6, 349 (2023).
Teixeira, A. S., Santos, F. C. & Francisco, A. P. Emergence of social balance in signed networks. In: International Workshop on Complex Networks, 185–192 (Springer, 2017).
Doreian, P. & Mrvar, A. Partitioning signed social networks. Soc. Netw. 31, 1–11 (2009).
Article Google Scholar
Mrvar, A. & Doreian, P. Partitioning signed two-mode networks. J. Math. Sociol. 33, 196–221 (2009).
Article Google Scholar
Derr, T. & Tang, J. Congressional vote analysis using signed networks. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 1501–1502 (IEEE, 2018).
Derr, T., Johnson, C., Chang, Y. & Tang, J. Balance in signed bipartite networks. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 1221–1230 (2019).
Saracco, F., Di Clemente, R., Gabrielli, A. & Squartini, T. Randomizing bipartite networks: the case of the World Trade Web. Sci. Rep. 5, 10595 (2015).
Article Google Scholar
Huang, J., Shen, H., Cao, Q., Tao, S. & Cheng, X. Signed bipartite graph neural networks. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 740–749 (2021).
Banerjee, S., Sarkar, K., Gokalp, S., Sen, A. & Davulcu, H. Partitioning signed bipartite graphs for classification of individuals and organizations. In: Proceedings of the 5th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction, College Park, MD, USA,196–204 (Springer, 2012).
Neal, Z. The backbone of bipartite projections: Inferring relationships from co-authorship, co-sponsorship, co-attendance and other co-behaviors. Soc. Netw. 39, 84–97 (2014).
Article Google Scholar
Latapy, M., Magnien, C. & Del Vecchio, N. Basic notions for the analysis of large two-mode networks. Soc. Netw. 30, 31–48 (2008).
Article Google Scholar
Watts, D. J. & Strogatz, S. H. Collective dynamics of ‘small-world’networks. Nature 393, 440–442 (1998).
Article Google Scholar
Derudder, B. & Taylor, P. The cliquishness of world cities. Glob. Netw. 5, 71–91 (2005).
Article Google Scholar
Serrano, M. Á., Boguná, M. & Vespignani, A. Extracting the multiscale backbone of complex weighted networks. Proc. Natl Acad. Sci. 106, 6483–6488 (2009).
Article Google Scholar
Zaccaria, A., Cristelli, M., Tacchella, A. & Pietronero, L. How the taxonomy of products drives the economic development of countries. PloS One 9, e113770 (2014).
Article Google Scholar
Caldarelli, G. et al. A network analysis of countries’ export flows: firm grounds for the building blocks of the economy. PLoS One 7, e47278 (2012).
Linden, G., Smith, B. & York, J. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 7, 76–80 (2003).
Article Google Scholar
Bonacich, P. Factoring and weighting approaches to status scores and clique identification. J. Math. Socio;. 2, 113–120 (1972).
Article Google Scholar
Tumminello, M., Micciche, S., Lillo, F., Piilo, J. & Mantegna, R. N. Statistically validated networks in bipartite complex systems. PloS One 6, e17994 (2011).
Article Google Scholar
Gualdi, S., Cimini, G., Primicerio, K., Di Clemente, R. & Challet, D. Statistically validated network of portfolio overlaps and systemic risk. Sci. Rep. 6, 39467 (2016).
Article Google Scholar
Saracco, F. et al. Inferring monopartite projections of bipartite networks: an entropy-based approach. N. J. Phys. 19, 053022 (2017).
Article Google Scholar
Dianati, N. A maximum entropy approach to separating noise from signal in bimodal affiliation networks. arXiv https://arxiv.org/abs/1607.01735 (2016).
Neal, Z. P. How strong is strong? The challenge of interpreting network edge weights. PloS One 19, e0311614 (2024).
Article Google Scholar
Neal, Z. P. A sign of the times? Weak and strong polarization in the us congress, 1973–2016. Soc. Netw. 60, 103–112 (2020).
Article Google Scholar
Neal, Z. P. et al. Pattern detection in bipartite networks: a review of terminology, applications, and methods. PLOS Complex Syst. 1, e0000010 (2024).
Article Google Scholar
Li, L., Gu, K., Zeng, A., Fan, Y. & Di, Z. Modeling online social signed networks. Phys. A: Stat. Mech. Appl 495, 345–352 (2018).
Article Google Scholar
Fouss, F., Pirotte, A., Renders, J.-M. & Saerens, M. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. data Eng. 19, 355–369 (2007).
Article Google Scholar
Baltakys, K. Inference of monopartite networks from bipartite systems with different link types. Sci. Rep. 13, 1072 (2023).
Article Google Scholar
Tracking the United States Congress, https://www.govtrack.us/.
Guo, G., Zhang, J. & Yorke-Smith, N. A novel Bayesian similarity measure for recommender systems. In: IJCAI '13: Proceedings of the Twenty-Third international joint conference on Artificial Intelligence 2619–2625 (2013).
Kunegis, J. KONECT – the Koblenz network collection. In: Proc. Int. Conf. on World Wide Web Companion, 1343–1350 (2013).
Singh, R. & Adhikari, B. Measuring the balance of signed networks and its application to sign prediction. J. Stat. Mech: Theory Exp. 2017, 063302 (2017).
Article MathSciNet Google Scholar
Andres, G., Casiraghi, G., Vaccario, G. & Schweitzer, F. Reconstructing signed relations from interaction data. Sci. Rep. 13, 20689 (2023).
Article Google Scholar
Dang, Q.-V. & Ignat, C.-L. Link-sign prediction in dynamic signed directed networks. In: 2018 IEEE 4th international conference on collaboration and internet computing (CIC), 36–45 (IEEE, 2018).
Derr, T., Ma, Y. & Tang, J. Signed graph convolutional networks. In: 2018 IEEE International Conference on Data Mining (ICDM), 929–934 (IEEE, 2018).
Li, D., Shen, D., Kou, Y. & Nie, T. Integrating sign prediction with behavior prediction for signed heterogeneous information networks. IEEE Access 7, 171357–171371 (2019).
Article Google Scholar
Zhang, Z. et al. Contrastive learning for signed bipartite graphs. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1629–1638 (2023).
Jiang, Y. et al. SBGMN: a multi-view sign prediction network for bipartite graphs. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, 81–96 (Springer, 2024).
Peter, S. Attitude change. 3, 86 (RoutledgeTaylor & Francis Group, London and New york1958).
Cartwright, D. & Harary, F. Ambivalence and indifference in generalizations of structural balance. Behav. Sci. 15, 497–513 (1970).
Article Google Scholar
Schoch, D. Projecting signed two-mode networks. J. Math. Sociol. 45, 37–50 (2021).
Article MathSciNet Google Scholar
Khintchine, A. Y. MathemAtical Foundations Of Information Theory (Dover, 1957).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc: Ser. B 57, 289–300 (1995).
Article MathSciNet Google Scholar
Jaynes, E. T. Information theory and statistical mechanics. Phys. Rev. 106, 620 (1957).
Article MathSciNet Google Scholar
Park, J. & Newman, M. E. J. Statistical mechanics of networks. Phys. Rev. E 70, 66117 (2004).
Article MathSciNet Google Scholar
Squartini, T. & Garlaschelli, D. Analytical maximum-likelihood method to detect patterns in real networks. N. J. Phys. 13, 083001 (2011).
Article Google Scholar
Squartini, T. & Garlaschelli, D. Maximum-Entropy Networks. Pattern Detection, Network Reconstruction and Graph Combinatorics (Springer International Publishing, 2017).
Cimini, G. et al. The statistical physics of real-world networks. Nat. Rev. Phys. 1, 58–71 (2018).
Article Google Scholar
Garlaschelli, D. & Loffredo, M. I. Maximum likelihood: extracting unbiased information from complex networks. Phys. Rev. E 78, 1–5 (2008).
Article Google Scholar
Hao, B. & Kovács, I. A. Proper network randomization is key to assessing social balance. Sci. Adv. 10, eadj0104 (2024).
Article Google Scholar
Vallarano, N. et al. Fast and scalable likelihood maximization for exponential random graph models with local constraints. Sci. Rep. 11, 15227 (2021).
Article Google Scholar
Becatti, C., Caldarelli, G. & Saracco, F. Entropy-based randomization of rating networks. Phys. Rev. E 99, 022306 (2019).
Article Google Scholar

Download references

Acknowledgements

A.G. and T.S. acknowledge support from the projects ‘SoBigData.it—Strengthening the Italian RI for Social Mining and Big Data Analytics’—IR0000013—CUP B53C22001760006, financed by European Union—Next Generation EU—National Recovery and Resilience Plan (Piano Nazionale di Ripresa e Resilienza, PNRR)—M4C2 I.3.1; ‘Reconstruction, Resilience and Recovery of Socio-Economic Networks’ RECON-NET EP_FAIR_005—PE0000013 ‘FAIR’—PNRR M4C2 Investment 1.3, financed by European Union—Next Generation EU; PRIN 2022 2022MTBB22 ‘RE-Net: Reconstructing economic networks: from physics to machine learning and back’, financed by the European Union—Next Generation EU, M4C2 Inv. 1.1 CUP D53D23002330006; PRIN 2022 PNRR P2022E93B8 ‘C2T—From Crises to Theory: towards a science of resilience and recovery for economic and financial systems’, financed by the European Union—Next Generation EU, M4C2 Inv. 1.1, CUP: D53D23019330001. F.S. was partially supported by the project ‘CODE—Coupling Opinion Dynamics with Epidemics’, financed under PNRR Mission 4 ‘Education and Research’—Component C2—Investment 1.1—Next Generation EU ‘Fund for National Research Programme and Projects of Significant National Interest’ PRIN 2022 PNRR, grant code P2022AKRZ9, CUP B53D23026080001.

Author information

Authors and Affiliations

IMT School for Advanced Studies, Lucca, Italy
Anna Gallo, Fabio Saracco & Tiziano Squartini
INdAM-GNAMPA Istituto Nazionale di Alta Matematica ‘Francesco Severi’, Rome, Italy
Anna Gallo & Tiziano Squartini
Centro Studi e Ricerche ‘Enrico Fermi’ (CREF), Rome, Italy
Fabio Saracco
Institute for Applied Computing ‘Mauro Picone’ (IAC), National Research Council, Sesto Fiorentino, FI, Italy
Fabio Saracco

Authors

Anna Gallo
View author publications
Search author on:PubMed Google Scholar
Fabio Saracco
View author publications
Search author on:PubMed Google Scholar
Tiziano Squartini
View author publications
Search author on:PubMed Google Scholar

Contributions

Study conception and design: A.G., F.S., T.S. Data collection: A.G. Analysis and interpretation of results: A.G., F.S., T.S. Draft manuscript preparation: A.G., F.S., T.S.

Corresponding author

Correspondence to Anna Gallo.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gallo, A., Saracco, F. & Squartini, T. Statistically validated projection of bipartite signed networks. npj Complex 2, 22 (2025). https://doi.org/10.1038/s44260-025-00043-1

Download citation

Received: 12 February 2025
Accepted: 14 April 2025
Published: 03 July 2025
DOI: https://doi.org/10.1038/s44260-025-00043-1

Subjects

Abstract

Similar content being viewed by others

Meta-validation of bipartite network projections

Inference of monopartite networks from bipartite systems with different link types

Unveiling the social fabric through a temporal, nation-scale social network and its characteristics

Introduction

The concept of balance in the bipartite context

Projecting bipartite unsigned networks

Projecting bipartite signed networks

Results

Projection of synthetic configurations

Projection of real-world configurations

U.S. senate and U.S. house of representatives

FilmTrust

naive projection of real-world configurations

Validated projection of real-world configurations within the zero-deflated scheme

Validated projection of real-world configurations within the zero-inflated scheme

Discussion

Methods

Formalism and basic quantities

Global connectivity and node degrees

The role of ambivalent patterns

The role of missing ties

A scheme for the statistical validation of bipartite signed networks

Step #1. Quantifying the similarity of any two nodes

Zero-deflated projection scheme

Zero-inflated projection scheme

Step #2. Quantifying the statistical significance of similarity

Zero-deflated projection scheme

Zero-inflated projection scheme

Quantifying the statistical significance of similarity

Step #3. Validating the projection

Maximum entropy benchmarks for bipartite signed networks

Zero-deflated projection scheme

Fixed-topology Bipartite signed random graph model

Fixed-topology bipartite signed configuration model

Zero-inflated projection scheme

Free-topology bipartite signed random graph model

Free-topology bipartite signed configuration model

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links