Personalized PageRank

SQL function: cugraph_personalized_pagerank

Compute personalized PageRank scores.

Signature

cugraph_personalized_pagerank(table_name, src_col, dst_col, weight_col, options_json)

Allowed argument counts: 5.

Quickstart

SELECT * FROM cugraph_personalized_pagerank('target_edges', 'src', 'dst', NULL, '{"personalization_table":"ppr_seeds","personalization_vertex_col":"vertex","personalization_value_col":"value"}')

Positional arguments

Argument	Type	Required	Default	Notes
`table_name`	`Utf8`	yes
`src_col`	`Utf8`	no	`src`
`dst_col`	`Utf8`	no	`dst`
`weight_col`	`Utf8\|null`	no		optional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided
`options_json`	`Utf8`	no

JSON options

Option	Type	Default	Constraints	Description
`alpha`	`Float64`	`0.85`	min 0; max 1
`epsilon`	`Float64`	`0.00001`	> 0
`max_iterations`	`UInt32`	`100`	min 1
`personalization_table`	`Utf8`		required	Table containing personalized PageRank vertex weights.
`personalization_value_col`	`Utf8`		required; example `"value"`
`personalization_vertex_col`	`Utf8`		required; example `"vertex"`

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

Option	Type	Default	Constraints	Description
`construction_policy`	`Utf8`	`"python_cugraph"`	one of `"python_cugraph"`, `"raw_libcugraph"`	Edge-list construction semantics used before calling libcugraph.
`directed`	`Boolean`	`true`		Whether graph construction treats edges as directed.
`renumber`	`Boolean`	`true`		Whether graph construction may renumber external vertex identifiers internally.

Output schema

Column	Type	Nullable	Description
`vertex`	`Int64`	no	Vertex receiving the PageRank score.
`value`	`Float64`	no	PageRank score for the vertex.

note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

This example runs on the citation network demo dataset.

A reading list from a biased walk, filtered by an anti-join

Personalized PageRank biases the walk toward seed vertices supplied by a relation. Seeding on BERT ranks the papers its intellectual neighborhood keeps returning to. The interesting rows are the ones BERT does not already cite — a NOT EXISTS anti-join against the edge table removes the direct references, leaving the hidden ancestry:

CREATE VIEW bert_seed AS
SELECT CAST(2896457183 AS BIGINT) AS vertex, CAST(1.0 AS DOUBLE) AS value;

SELECT ROUND(r.value, 6) AS ppr, p.year, p.title
FROM cugraph_personalized_pagerank('citation_edges', 'src', 'dst', NULL,
       '{"personalization_table":"bert_seed",
         "personalization_vertex_col":"vertex",
         "personalization_value_col":"value"}') r
JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex <> 2896457183
  AND NOT EXISTS (SELECT 1 FROM citation_edges e
                  WHERE e.src = 2896457183 AND e.dst = r.vertex)
ORDER BY r.value DESC
LIMIT 8;

ppr	year	title
0.003127	1983	A Maximum Likelihood Approach to Continuous Speech Recognition
0.002984	2003	A neural probabilistic language model
0.002392	1997	Long short-term memory
0.002358	2014	Adam: A Method for Stochastic Optimization
0.002344	1990	A statistical approach to machine translation
0.001988	1993	Building a large annotated corpus of English: the penn treebank
0.001923	1975	Design of a linguistic statistical decoder for the recognition of continuous speech
0.001872	2006	The PASCAL Recognising Textual Entailment Challenge

BERT never cites Jelinek's 1975–1983 speech-decoding papers, statistical machine translation, or the Penn Treebank — yet the walk finds them two or three references deep. The seed table, the exclusion of the seed itself, and the anti-join are all ordinary SQL wrapped around one GPU call.

Limitations & notes

dry-run validates table resolution, column presence, static dtypes, and options only
dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
  'cugraph_personalized_pagerank',
  'your_edges_table',
  '{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.

Signature​

Quickstart​

Positional arguments​

JSON options​

Graph construction options​

Output schema​

Examples​

A reading list from a biased walk, filtered by an anti-join​

Limitations & notes​

Validate before running​