Weakly Connected Components
SQL function: cugraph_weakly_connected_components
Compute weakly connected components.
Signature
cugraph_weakly_connected_components(table_name [, src_col, dst_col [, weight_col [, options_json]]])
Allowed argument counts: 1, 3, 4, 5.
Quickstart
SELECT * FROM cugraph_weakly_connected_components('target_edges')
Positional arguments
| Argument | Type | Required | Default | Notes |
|---|---|---|---|---|
table_name | Utf8 | yes | ||
src_col | Utf8 | no | src | |
dst_col | Utf8 | no | dst | |
weight_col | Utf8|null | no | accepted as an edge-column binding; native algorithm execution does not consume weights; semantic effect: none for this algorithm | |
options_json | Utf8 | no |
JSON options
This algorithm has no algorithm-specific options.
Graph construction options
Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
construction_policy | Utf8 | "python_cugraph" | one of "python_cugraph", "raw_libcugraph" | Edge-list construction semantics used before calling libcugraph. |
directed | Boolean | false | Whether graph construction treats edges as directed. | |
renumber | Boolean | true | Whether graph construction may renumber external vertex identifiers internally. |
Output schema
| Column | Type | Nullable | Description |
|---|---|---|---|
vertex | Int64 | no | Vertex assigned to a weakly connected component. |
label | Int64 | no | Weakly connected component identifier for the vertex. |
These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.
Examples
These examples run on the citation network demo dataset.
Is computer science one connected literature?
WCC requires directed: false. Of the 4.15M papers that have at least one
edge, 99.4% form a single giant component:
WITH comp AS (
SELECT label, COUNT(*) AS members
FROM cugraph_weakly_connected_components('citation_edges', 'src', 'dst', NULL,
'{"directed":false}')
GROUP BY label)
SELECT COUNT(*) AS components, MAX(members) AS giant, SUM(members) AS vertices
FROM comp;
| components | giant | vertices |
|---|---|---|
| 9,751 | 4,123,602 | 4,146,772 |
Snapshot first, then explore the islands
Component labels are assigned per execution, so a query that scans the
algorithm twice can join mismatched labels. Materialize one run in the mutable
datafusion.public workspace, then summarize the largest islands that never
touch the mainland — one row per island, represented by its most-cited member:
-- Local workspace snapshot; this does not write to lake.citation_network.
CREATE TABLE wcc_snapshot AS
SELECT vertex, label
FROM cugraph_weakly_connected_components('citation_edges', 'src', 'dst', NULL,
'{"directed":false}');
WITH sizes AS (
SELECT label, COUNT(*) AS members FROM wcc_snapshot
GROUP BY label ORDER BY members DESC LIMIT 3 OFFSET 1),
detail AS (
SELECT s.members, p.year, p.venue, p.title,
ROW_NUMBER() OVER (PARTITION BY s.label ORDER BY p.n_citation DESC) AS rn
FROM sizes s
JOIN wcc_snapshot c ON c.label = s.label
JOIN papers p ON p.paper_id = c.vertex)
SELECT members, year, venue, title
FROM detail WHERE rn = 1
ORDER BY members DESC;
| members | year | venue | title |
|---|---|---|---|
| 39 | 2008 | IEICE Transactions on Electronics | Motion of Break Arcs Driven by External Magnetic Field… |
| 25 | 2013 | IEICE Electronics Express | Cavity-resonator-integrated guided-mode resonance filter… |
| 21 | 2015 | Symmetry | Inflationary Cosmology in Modified Gravity Theories |
The three largest islands are a 39-paper cluster on electrical-contact arc discharge, a 25-paper photonics/electromagnetics cluster, and a 21-paper modified-gravity cosmology cluster — each citing only itself. Communities that publish in a corpus's blind spot show up as literal islands in the graph.
Limitations & notes
- dry-run validates table resolution, column presence, static dtypes, and options only
- dry-run does not scan edge data, construct a graph, or prove source-vertex existence
- cuGraph requires directed=false so the graph is constructed as an undirected/symmetric view
Validate before running
Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:
SELECT * FROM cugraph_validate_call(
'cugraph_weakly_connected_components',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);
See Discovery & validation for the full cugraph_validate_call contract.