Skip to main content

Personalized PageRank

SQL function: cugraph_personalized_pagerank

Compute personalized PageRank scores.

Signature

cugraph_personalized_pagerank(table_name, src_col, dst_col, weight_col, options_json)

Allowed argument counts: 5.

Quickstart

SELECT * FROM cugraph_personalized_pagerank('target_edges', 'src', 'dst', NULL, '{"personalization_table":"ppr_seeds","personalization_vertex_col":"vertex","personalization_value_col":"value"}')

Positional arguments

ArgumentTypeRequiredDefaultNotes
table_nameUtf8yes
src_colUtf8nosrc
dst_colUtf8nodst
weight_colUtf8|nullnooptional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided
options_jsonUtf8no

JSON options

OptionTypeDefaultConstraintsDescription
alphaFloat640.85min 0; max 1
epsilonFloat640.00001> 0
max_iterationsUInt32100min 1
personalization_tableUtf8requiredTable containing personalized PageRank vertex weights.
personalization_value_colUtf8required; example "value"
personalization_vertex_colUtf8required; example "vertex"

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

OptionTypeDefaultConstraintsDescription
construction_policyUtf8"python_cugraph"one of "python_cugraph", "raw_libcugraph"Edge-list construction semantics used before calling libcugraph.
directedBooleantrueWhether graph construction treats edges as directed.
renumberBooleantrueWhether graph construction may renumber external vertex identifiers internally.

Output schema

ColumnTypeNullableDescription
vertexInt64noVertex receiving the PageRank score.
valueFloat64noPageRank score for the vertex.
note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

This example runs on the citation network demo dataset.

A reading list from a biased walk, filtered by an anti-join

Personalized PageRank biases the walk toward seed vertices supplied by a relation. Seeding on BERT ranks the papers its intellectual neighborhood keeps returning to. The interesting rows are the ones BERT does not already cite — a NOT EXISTS anti-join against the edge table removes the direct references, leaving the hidden ancestry:

CREATE VIEW bert_seed AS
SELECT CAST(2896457183 AS BIGINT) AS vertex, CAST(1.0 AS DOUBLE) AS value;

SELECT ROUND(r.value, 6) AS ppr, p.year, p.title
FROM cugraph_personalized_pagerank('citation_edges', 'src', 'dst', NULL,
'{"personalization_table":"bert_seed",
"personalization_vertex_col":"vertex",
"personalization_value_col":"value"}') r
JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex <> 2896457183
AND NOT EXISTS (SELECT 1 FROM citation_edges e
WHERE e.src = 2896457183 AND e.dst = r.vertex)
ORDER BY r.value DESC
LIMIT 8;
ppryeartitle
0.0031271983A Maximum Likelihood Approach to Continuous Speech Recognition
0.0029842003A neural probabilistic language model
0.0023921997Long short-term memory
0.0023582014Adam: A Method for Stochastic Optimization
0.0023441990A statistical approach to machine translation
0.0019881993Building a large annotated corpus of English: the penn treebank
0.0019231975Design of a linguistic statistical decoder for the recognition of continuous speech
0.0018722006The PASCAL Recognising Textual Entailment Challenge

BERT never cites Jelinek's 1975–1983 speech-decoding papers, statistical machine translation, or the Penn Treebank — yet the walk finds them two or three references deep. The seed table, the exclusion of the seed itself, and the anti-join are all ordinary SQL wrapped around one GPU call.

Limitations & notes

  • dry-run validates table resolution, column presence, static dtypes, and options only
  • dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
'cugraph_personalized_pagerank',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.