Personalized PageRank
SQL function: cugraph_personalized_pagerank
Compute personalized PageRank scores.
Signature
cugraph_personalized_pagerank(table_name, src_col, dst_col, weight_col, options_json)
Allowed argument counts: 5.
Quickstart
SELECT * FROM cugraph_personalized_pagerank('target_edges', 'src', 'dst', NULL, '{"personalization_table":"ppr_seeds","personalization_vertex_col":"vertex","personalization_value_col":"value"}')
Positional arguments
| Argument | Type | Required | Default | Notes |
|---|---|---|---|---|
table_name | Utf8 | yes | ||
src_col | Utf8 | no | src | |
dst_col | Utf8 | no | dst | |
weight_col | Utf8|null | no | optional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided | |
options_json | Utf8 | no |
JSON options
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
alpha | Float64 | 0.85 | min 0; max 1 | |
epsilon | Float64 | 0.00001 | > 0 | |
max_iterations | UInt32 | 100 | min 1 | |
personalization_table | Utf8 | required | Table containing personalized PageRank vertex weights. | |
personalization_value_col | Utf8 | required; example "value" | ||
personalization_vertex_col | Utf8 | required; example "vertex" |
Graph construction options
Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
construction_policy | Utf8 | "python_cugraph" | one of "python_cugraph", "raw_libcugraph" | Edge-list construction semantics used before calling libcugraph. |
directed | Boolean | true | Whether graph construction treats edges as directed. | |
renumber | Boolean | true | Whether graph construction may renumber external vertex identifiers internally. |
Output schema
| Column | Type | Nullable | Description |
|---|---|---|---|
vertex | Int64 | no | Vertex receiving the PageRank score. |
value | Float64 | no | PageRank score for the vertex. |
These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.
Examples
This example runs on the citation network demo dataset.
A reading list from a biased walk, filtered by an anti-join
Personalized PageRank biases the walk toward seed vertices supplied by a
relation. Seeding on BERT ranks the papers its intellectual neighborhood
keeps returning to. The interesting rows are the ones BERT does not
already cite — a NOT EXISTS anti-join against the edge table removes the
direct references, leaving the hidden ancestry:
CREATE VIEW bert_seed AS
SELECT CAST(2896457183 AS BIGINT) AS vertex, CAST(1.0 AS DOUBLE) AS value;
SELECT ROUND(r.value, 6) AS ppr, p.year, p.title
FROM cugraph_personalized_pagerank('citation_edges', 'src', 'dst', NULL,
'{"personalization_table":"bert_seed",
"personalization_vertex_col":"vertex",
"personalization_value_col":"value"}') r
JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex <> 2896457183
AND NOT EXISTS (SELECT 1 FROM citation_edges e
WHERE e.src = 2896457183 AND e.dst = r.vertex)
ORDER BY r.value DESC
LIMIT 8;
| ppr | year | title |
|---|---|---|
| 0.003127 | 1983 | A Maximum Likelihood Approach to Continuous Speech Recognition |
| 0.002984 | 2003 | A neural probabilistic language model |
| 0.002392 | 1997 | Long short-term memory |
| 0.002358 | 2014 | Adam: A Method for Stochastic Optimization |
| 0.002344 | 1990 | A statistical approach to machine translation |
| 0.001988 | 1993 | Building a large annotated corpus of English: the penn treebank |
| 0.001923 | 1975 | Design of a linguistic statistical decoder for the recognition of continuous speech |
| 0.001872 | 2006 | The PASCAL Recognising Textual Entailment Challenge |
BERT never cites Jelinek's 1975–1983 speech-decoding papers, statistical machine translation, or the Penn Treebank — yet the walk finds them two or three references deep. The seed table, the exclusion of the seed itself, and the anti-join are all ordinary SQL wrapped around one GPU call.
Limitations & notes
- dry-run validates table resolution, column presence, static dtypes, and options only
- dry-run does not scan edge data, construct a graph, or prove source-vertex existence
Validate before running
Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:
SELECT * FROM cugraph_validate_call(
'cugraph_personalized_pagerank',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);
See Discovery & validation for the full cugraph_validate_call contract.