Getting All Records in a Graph

Get all records in a graph using the build_graph function.

async geneagrapher_core.traverse.build_graph(start_items, *, http_semaphore=None, max_records=None, user_agent=None, cache=None, record_callback=None, report_callback=None)

Build a complete geneagraph using the start_nodes as the graph’s leaf nodes.

Parameters:
  • start_items (List[TraverseItem]) – a list of nodes and direction from which to traverse from them

  • http_semaphore (Optional[Semaphore]) – a semaphore to limit HTTP request concurrency

  • max_records (Optional[int]) – the maximum number of records to include in the built graph

  • user_agent (Optional[str]) – a custom user agent string to use in HTTP requests

  • cache (Optional[Cache]) – a cache object for getting and storing results

  • record_callback (Optional[Callable[[TaskGroup, Record], Awaitable[None]]]) – callback function called with record data as it is retrieved

  • report_callback (Optional[Callable[[TaskGroup, int, int, int], Awaitable[None]]]) – callback function called to report graph-building progress

Return type:

Geneagraph

Example:

# Build a graph that contains Carl Friedrich Gauß (18231), his advisor tree,
# Johann Friedrich Pfaff (18230), his advisor tree, and his descendant tree.
start_items = [
    TraverseItem(RecordId(18231), TraverseDirection.ADVISORS),
    TraverseItem(
        RecordId(18230),
        TraverseDirection.ADVISORS | TraverseDirection.DESCENDANTS
    ),
]
graph = await build_graph(start_items)

Callbacks

Report callback

The build_graph function optionally takes a reporting callback function. If provided, this function will be called when new records are added to the traversal plan or when records have been retrieved.

The report_callback function is called with:

  • An asyncio.TaskGroup, which is useful if you want to do something expensive in the reporting callback and do not want to block the graph-building path.

  • Three integers that report:

    1. The number of known records yet to be retrieved.

    2. The number of records in the process of being retrieved.

    3. The number of records that have been retrieved.

Examples

Here’s an example of a simple, blocking callback:

async def show_progress(
    tg: asyncio.TaskGroup, to_fetch: int, fetching: int, fetched: int
) -> None:
    print(f"Todo: {to_fetch}    Doing: {fetching}    Done: {fetched}")

Here’s a more complicated example where you might want to create a new task to complete the reporting. Doing so keeps the reporting callback from blocking progress on data retrieval.

async def do_expensive_network_request(
    to_fetch: int, fetching: int, fetched: int
) -> None:
    # Do something that takes a long time.

async def show_progress(
    tg: asyncio.TaskGroup, to_fetch: int, fetching: int, fetched: int
) -> None:
    tg.create_task(do_expensive_network_request(to_fetch, fetching, fetched))

Record callback

The build_graph function optionally takes a record callback function. If provided, this function will be called when a record has been retrieved. The callback function receives the record data as an argument.

The record_callback function is called with:

  • An asyncio.TaskGroup, which is useful if you want to do something expensive in the reporting callback and do not want to block the graph-building path.

  • A Record object containing the record data.

Examples

Here’s an example of a simple, blocking callback:

async def got_record(tg: asyncio.TaskGroup, record: Record) -> None:
    print(record)

Here’s a more complicated example where you might want to create a new task. Doing so keeps the record callback from blocking progress on data retrieval.

async def do_expensive_network_request(record: Record) -> None:
    # Do something that takes a long time.

async def got_record(tg: asyncio.TaskGroup, record: Record) -> None:
    tg.create_task(do_expensive_network_request(record))

Example Code

An example of how to use the cache and report_callback arguments to build_graph is in the repository’s examples directory.