Strongly Connected Components
SQL function: cugraph_strongly_connected_components
Compute strongly connected components.
Signature
cugraph_strongly_connected_components(table_name [, src_col, dst_col [, weight_col [, options_json]]])
Allowed argument counts: 1, 3, 4, 5.
Quickstart
SELECT * FROM cugraph_strongly_connected_components('target_edges')
Positional arguments
| Argument | Type | Required | Default | Notes |
|---|---|---|---|---|
table_name | Utf8 | yes | ||
src_col | Utf8 | no | src | |
dst_col | Utf8 | no | dst | |
weight_col | Utf8|null | no | accepted as an edge-column binding; native algorithm execution does not consume weights; semantic effect: none for this algorithm | |
options_json | Utf8 | no |
JSON options
This algorithm has no algorithm-specific options.
Graph construction options
Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
construction_policy | Utf8 | "python_cugraph" | one of "python_cugraph", "raw_libcugraph" | Edge-list construction semantics used before calling libcugraph. |
directed | Boolean | true | Whether graph construction treats edges as directed. | |
renumber | Boolean | true | Whether graph construction may renumber external vertex identifiers internally. |
Output schema
| Column | Type | Nullable | Description |
|---|---|---|---|
vertex | Int64 | no | Vertex assigned to a strongly connected component. |
label | Int64 | no | Strongly connected component identifier for the vertex. |
These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.
Examples
These examples run on the citation network demo dataset.
Citation loops should not exist — count them anyway
A citation points backward in time, so a directed cycle of citations is a
small anomaly: two or more papers that all cite each other. SCC finds every
such loop in one pass. Snapshot the run in the mutable datafusion.public
workspace (labels are per-execution), then census the loops:
-- Local workspace snapshot; this does not write to lake.citation_network.
CREATE TABLE scc_snapshot AS
SELECT vertex, label
FROM cugraph_strongly_connected_components('citation_edges', 'src', 'dst', NULL,
'{"directed":true}');
WITH loops AS (
SELECT label, COUNT(*) AS members FROM scc_snapshot
GROUP BY label HAVING COUNT(*) > 1)
SELECT COUNT(*) AS loops, MAX(members) AS biggest, SUM(members) AS papers_in_loops
FROM loops;
| loops | biggest | papers_in_loops |
|---|---|---|
| 14,613 | 1,370,029 | 1,404,311 |
The surprise is the giant: 1.37M papers sit inside one strongly connected component. Preprint/journal double versions, simultaneous publication, and plain metadata error create enough forward-dated citations to glue a third of the graph into one loop — "time travel" is structural in bibliographic data.
Zoom into the two-paper loops
WITH pairs AS (
SELECT label FROM scc_snapshot GROUP BY label HAVING COUNT(*) = 2)
SELECT c.label, p.year, p.title
FROM pairs x
JOIN scc_snapshot c ON c.label = x.label
JOIN papers p ON p.paper_id = c.vertex
ORDER BY c.label
LIMIT 6;
| label | year | title |
|---|---|---|
| 57 | 2017 | Index Modulation Techniques for Next-Generation Wireless Networks |
| 57 | 2017 | Multidimensional index modulation in wireless communications |
| 210 | 2017 | A Survey of Research into Mixed Criticality Systems |
| 210 | 2017 | Probabilistic analysis for mixed criticality systems using fixed priority… |
| 265 | 2018 | DC programming and DCA: thirty years of developments |
| 265 | 2018 | Convergence Analysis of Difference-of-Convex Algorithm with Subanalytic Data |
12,015 of the 14,613 loops are exactly two papers — same-year companion papers and surveys citing each other, exactly what mutual citation looks like in practice.
Limitations & notes
- dry-run validates table resolution, column presence, static dtypes, and options only
- dry-run does not scan edge data, construct a graph, or prove source-vertex existence
Validate before running
Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:
SELECT * FROM cugraph_validate_call(
'cugraph_strongly_connected_components',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);
See Discovery & validation for the full cugraph_validate_call contract.