Skip to main content

PageRank

SQL function: cugraph_pagerank

Compute PageRank scores.

Signature

cugraph_pagerank(table_name [, src_col, dst_col [, weight_col [, options_json]]])

Allowed argument counts: 1, 3, 4, 5.

Quickstart

SELECT * FROM cugraph_pagerank('target_edges')

Positional arguments

ArgumentTypeRequiredDefaultNotes
table_nameUtf8yes
src_colUtf8nosrc
dst_colUtf8nodst
weight_colUtf8|nullnooptional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided
options_jsonUtf8no

JSON options

OptionTypeDefaultConstraintsDescription
alphaFloat640.85min 0; max 1
epsilonFloat640.00001> 0
max_iterationsUInt32100min 1

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

OptionTypeDefaultConstraintsDescription
construction_policyUtf8"python_cugraph"one of "python_cugraph", "raw_libcugraph"Edge-list construction semantics used before calling libcugraph.
directedBooleantrueWhether graph construction treats edges as directed.
renumberBooleantrueWhether graph construction may renumber external vertex identifiers internally.

Output schema

ColumnTypeNullableDescription
vertexInt64noVertex receiving the PageRank score.
valueFloat64noPageRank score for the vertex.
note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

These examples run on the citation network demo dataset (4.9M papers, 45.6M src-cites-dst edges).

What raw citation counts miss

PageRank over the full graph, joined back to paper metadata. Importance flows through citations: a paper cited by foundational papers outranks a paper with more, but shallower, citations.

SELECT p.title, p.year, p.n_citation, r.value AS pagerank
FROM cugraph_pagerank('citation_edges', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 5;
titleyearn_citationpagerank
Finite automata and their decision problems19591,4010.00139
The Mathematical Theory of Communication194948,3270.00134
The reduction of two-way automata to one-way automata19592240.00126
The complexity of theorem-proving procedures19714,5920.00073
A mathematical theory of communication194822,1220.00066

The #1 and #3 papers have modest raw counts (1,401 and 224 citations) — but the papers citing them are themselves the roots of computer science, and PageRank propagates exactly that. The full call — 45.6M edges, GPU graph build, 4.1M scores, join, sort — returns in about 1.5 s.

A window function over the result: PageRank ranks its own paper

The output of a cugraph_* function is a plain relation, so ROW_NUMBER() works directly on it. Where does the paper that introduced PageRank land, by its own algorithm, among 4.9 million papers?

WITH ranked AS (
SELECT vertex, value, ROW_NUMBER() OVER (ORDER BY value DESC) AS rank
FROM cugraph_pagerank('citation_edges', 'src', 'dst'))
SELECT r.rank, p.title, p.year, r.value
FROM ranked r JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex = 2066636486;
ranktitleyearvalue
115The anatomy of a large-scale hypertextual Web search engine19980.000143

Rank #115 of 4,894,081.

SQL decides which graph the GPU sees: the pre-2000 canon

The first argument is any relation name — including a view. Joining the edge list to papers on both endpoints restricts the graph to citations that stay within an era, and PageRank then answers "what was the canon before 2000?".

CREATE VIEW edges_pre2000 AS
SELECT e.src, e.dst
FROM citation_edges e
JOIN papers ps ON ps.paper_id = e.src
JOIN papers pd ON pd.paper_id = e.dst
WHERE ps.year BETWEEN 1901 AND 2000 AND pd.year BETWEEN 1901 AND 2000;

SELECT p.year, p.title
FROM cugraph_pagerank('edges_pre2000', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 6;
yeartitle
1959Finite automata and their decision problems
1959The reduction of two-way automata to one-way automata
1949The Mathematical Theory of Communication
1974The Design and Analysis of Computer Algorithms
1958Preliminary report: international algebraic language
1963Machine perception of three-dimensional solids

Automata theory, Shannon, Aho–Hopcroft–Ullman, the ALGOL report: the view's WHERE clause rewinds the clock and the algorithm re-ranks the field.

Limitations & notes

  • dry-run validates table resolution, column presence, static dtypes, and options only
  • dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
'cugraph_pagerank',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.