Skip to main content

BFS

SQL function: cugraph_bfs

Multi-source-capable breadth-first search; source selectors must contain at most one source per connected component.

Signature

cugraph_bfs(table_name, source_vertex_or_vertices [, src_col, dst_col [, weight_col [, options_json]]])

Allowed argument counts: 2, 4, 5, 6.

Quickstart

SELECT * FROM cugraph_bfs('target_edges', 123, 'src', 'dst', NULL, '{"depth_limit":4}')

Positional arguments

ArgumentTypeRequiredDefaultNotes
table_nameUtf8yes
source_vertex_or_verticesInt64|Utf8|List<Int64>|List<Utf8>|null-with-source_vertices_tableyesselector: literal scalar/list source selector, or NULL with source_vertices_table/source_vertex_col options
src_colUtf8nosrc
dst_colUtf8nodst
weight_colUtf8|nullnoaccepted as an edge-column binding but ignored by BFS traversal; semantic effect: none for traversal; BFS distance is a hop count
options_jsonUtf8no

JSON options

OptionTypeDefaultConstraintsDescription
depth_limitInt64|nullnullmin 0Maximum BFS depth in hops; null means unbounded traversal.
edge_id_colUtf8|nullnullvalid when integer vertex domain onlyEdge-id column in the edge relation used by edge-id predicate options.
exclude_vertex_colUtf8|nullnullrequired when exclude_vertices_table is set; column of exclude_vertices_table; type ref vertex_domainColumn in exclude_vertices_table containing excluded vertex identifiers.
exclude_vertices_tableUtf8|nullnullside input (exclude_vertices, cols: exclude_vertex_col)Optional relation containing the vertex denylist.
include_edge_dst_colUtf8|nullnullrequired when include_edges_table is set; column of include_edges_table; type ref vertex_domainDestination endpoint column in include_edges_table.
include_edge_id_colUtf8|nullnullvalid when edge_id_col is set; column of include_edges_table; type ref edge_id_domainOptional edge-id column in include_edges_table; when omitted, include_edges matching uses only endpoint columns.
include_edge_ids_colUtf8|nullnulldefaults to edge_id_col when omitted and include_edge_ids_table is set; column of include_edge_ids_table; type ref edge_id_domainColumn in include_edge_ids_table containing allowed edge identifiers.
include_edge_ids_tableUtf8|nullnullrequired with edge_id_col; side input (include_edge_ids, cols: include_edge_ids_col)Optional relation containing allowed edge identifiers.
include_edge_src_colUtf8|nullnullrequired when include_edges_table is set; column of include_edges_table; type ref vertex_domainSource endpoint column in include_edges_table.
include_edges_tableUtf8|nullnullside input (include_edges, cols: include_edge_src_col, include_edge_dst_col, include_edge_id_col)Optional relation containing allowed edge endpoints and, when edge_id_col is configured, optional edge identifiers.
include_vertex_colUtf8|nullnullrequired when include_vertices_table is set; column of include_vertices_table; type ref vertex_domainColumn in include_vertices_table containing allowed vertex identifiers.
include_vertices_tableUtf8|nullnullside input (include_vertices, cols: include_vertex_col)Optional relation containing the vertex allowlist.
output_modeUtf8"raw"one of "raw", "normalized", "path"Selects the BFS result shape: raw traversal rows, normalized reachability rows, or one reconstructed source-to-target path.
return_target_infoBooleanfalsevalid when output_mode in [raw, normalized]Adds target_found and target_distance columns when target_vertices_table is supplied.
source_vertex_colUtf8|nullnullrequired with source_vertices_table; column of source_vertices_table; type ref vertex_domainColumn in source_vertices_table containing BFS source vertex identifiers.
source_vertices_tableUtf8|nullnullrequired with source_vertex_col; side input (source_vertices, cols: source_vertex_col)Optional relation containing BFS source vertices. Use source_vertex=NULL in the positional SQL argument when this option is set.
target_vertex_colUtf8|nullnullrequired when output_mode=path or return_target_info=true; column of target_vertices_table; type ref vertex_domainColumn in target_vertices_table containing target vertex identifiers.
target_vertices_tableUtf8|nullnullrequired when output_mode=path or return_target_info=true; side input (target_vertices, cols: target_vertex_col)Optional relation containing target vertices for path output or target reachability metadata.

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

OptionTypeDefaultConstraintsDescription
construction_policyUtf8"python_cugraph"one of "python_cugraph", "raw_libcugraph"Edge-list construction semantics used before calling libcugraph.
directedBooleantrueWhether graph construction treats edges as directed.
renumberBooleantrueWhether graph construction may renumber external vertex identifiers internally.

Output schema

raw (default)

ColumnTypeNullableDescription
vertexInt64|Utf8noVertex reached or considered by the BFS traversal.
distanceInt64noHop-count distance from the nearest selected BFS source vertex.
predecessorInt64|Utf8yesPrevious vertex on the discovered BFS tree, null for selected source vertices or unreachable vertices depending on output mode.

normalized

ColumnTypeNullableDescription
vertexInt64|Utf8noVertex reached or considered by the BFS traversal.
distanceInt64yesHop-count distance from the nearest selected BFS source vertex.
predecessorInt64|Utf8yesPrevious vertex on the discovered BFS tree, null for selected source vertices or unreachable vertices depending on output mode.
reachableBooleannoWhether the vertex is reachable under normalized BFS output.

normalized_with_target_info

ColumnTypeNullableDescription
vertexInt64|Utf8noVertex reached or considered by the BFS traversal.
distanceInt64yesHop-count distance from the nearest selected BFS source vertex.
predecessorInt64|Utf8yesPrevious vertex on the discovered BFS tree, null for selected source vertices or unreachable vertices depending on output mode.
reachableBooleannoWhether the vertex is reachable under normalized BFS output.
target_foundBooleannoWhether a requested target vertex was reached by the BFS traversal.
target_distanceInt64yesHop-count distance to the requested target vertex, null when the target was not reached.

path

ColumnTypeNullableDescription
path_indexInt64noZero-based row position in the reconstructed source-to-target path.
sourceInt64|Utf8noSource vertex for the reconstructed BFS path.
targetInt64|Utf8noTarget vertex for the reconstructed BFS path.
vertexInt64|Utf8noVertex reached or considered by the BFS traversal.
distanceInt64noHop-count distance from the nearest selected BFS source vertex.

raw_with_target_info

ColumnTypeNullableDescription
vertexInt64|Utf8noVertex reached or considered by the BFS traversal.
distanceInt64noHop-count distance from the nearest selected BFS source vertex.
predecessorInt64|Utf8yesPrevious vertex on the discovered BFS tree, null for selected source vertices or unreachable vertices depending on output mode.
target_foundBooleannoWhether a requested target vertex was reached by the BFS traversal.
target_distanceInt64yesHop-count distance to the requested target vertex, null when the target was not reached.
note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

These examples run on the citation network demo dataset. Edges point srcdst as "cites"; BFS distance is a hop count.

Swap two arguments to reverse time: the influence wave

Following citations forward ('src', 'dst') walks into a paper's reference ancestry. Swapping the two column arguments ('dst', 'src') traverses the same edges backwards — who cites me, who cites them — without building any new table. From AlexNet (2012), each BFS generation is one wave of influence:

SELECT b.distance, COUNT(*) AS papers, ROUND(AVG(p.year), 1) AS avg_year
FROM cugraph_bfs('citation_edges_by_dst', 2163605009, 'dst', 'src', NULL,
'{"depth_limit":3, "output_mode":"normalized"}') b
JOIN papers p ON p.paper_id = b.vertex
WHERE b.reachable
GROUP BY b.distance
ORDER BY b.distance;
distancepapersavg_year
012012.0
112,1852017.5
267,5172017.9
371,5412017.8

Three citation generations reach ~151k papers. (The reversed traversal reads citation_edges_by_dst, which is clustered by dst, so the scan prunes well.)

Path mode with a SQL-defined target

output_mode: "path" reconstructs the shortest hop chain between the source and a target. The target is not a literal — it is a relation that must resolve to exactly one row, so a WHERE clause on papers is enough to say "the paper titled Long short-term memory". How does BERT (2018) descend from LSTM (1997)?

CREATE VIEW lstm_target AS
SELECT paper_id AS vertex FROM papers WHERE title = 'Long short-term memory';

SELECT b.path_index, b.distance, p.year, p.title
FROM cugraph_bfs('citation_edges', 2896457183, 'src', 'dst', NULL,
'{"output_mode":"path",
"target_vertices_table":"lstm_target",
"target_vertex_col":"vertex"}') b
JOIN papers p ON p.paper_id = b.vertex
ORDER BY b.path_index;
path_indexdistanceyeartitle
002018BERT: Pre-training of Deep Bidirectional Transformers…
112015Semi-supervised Sequence Learning
221997Long short-term memory

Two hops of references separate BERT from LSTM.

Multi-source BFS from a seed view

Passing NULL as the source and pointing source_vertices_table at a view starts the traversal from every row of a SQL result. Here the three CNN classics (AlexNet, VGG, ResNet) form one combined frontier — distance becomes "hops from the nearest founding paper":

CREATE VIEW cnn_founders AS
SELECT paper_id AS vertex FROM papers
WHERE paper_id IN (2163605009, 1686810756, 2194775991);

SELECT b.distance, COUNT(*) AS papers
FROM cugraph_bfs('citation_edges_by_dst', NULL, 'dst', 'src', NULL,
'{"source_vertices_table":"cnn_founders",
"source_vertex_col":"vertex",
"depth_limit":2, "output_mode":"normalized"}') b
JOIN papers p ON p.paper_id = b.vertex
WHERE b.reachable
GROUP BY b.distance
ORDER BY b.distance;
distancepapers
03
128,776
269,038

Limitations & notes

  • dry-run validates table resolution, column presence, static dtypes, and options only
  • dry-run does not scan edge data, construct a graph, or prove source-vertex existence
  • source lists and source_vertices_table are multi-source BFS, not one BFS per source
  • source lists and source_vertices_table must contain at most one source per connected component
  • raw/normalized output does not expose origin source per vertex
  • string logical vertex-domain BFS rejects edge_id_col, include_edge_ids_*, and include_edge_id_col
  • path output requires exactly one target row at execution time
  • options_schema_json is typed registry metadata; validate_call remains the authoritative call-specific checker

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
'cugraph_bfs',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.