# ToposKG-core
`ToposKG-core` contains the main library logic used to configure and execute the RDF generation pipeline.
This page documents the expected core concepts and parameters.
The core functionality of toposkg-lib is to construct custom geospatial knowledge graphs based on the ToposKG knowledge graph.
## `KnowledgeGraphBlueprint` [source]
The most basic "building block" of the Topos framework. It is responsible for collecting, managing and finally building the desired geospatial knowledge graph.
```python
KnowledgeGraphBlueprint(
output_dir: str,
sources_paths: List[str],
name: str = "ToposKG.nt",
materialization_pairs = [],
translation_targets = []
)
```
A blueprint describing how a ToposKG knowledge graph should be constructed. It stores the output location, the selected RDF source files, and optional post-processing operations such as materialization and translation.
### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `output_dir` | `str` | required | Directory where the constructed knowledge graph will be written. |
| `sources_paths` | `List[str]` | required | List of source files or directories that should be included in the generated knowledge graph. |
| `name` | `str` | `"ToposKG.nt"` | Name of the output N-Triples file. |
| `linking_pairs` | `list` | `[]` | Pairs used for entity-linking operations. Currently stored in the blueprint, but not actively used by `construct()`. |
| `materialization_pairs` | `list` | `[]` | Pairs of source files for which geospatial materialization should be performed. |
| `translation_targets` | `list` | `[]` | Translation configuration, where each entry is expected to contain a source path and a list of predicates to translate. |
### Methods
#### `construct(validate=True, debug=False)` [source]
```python
construct(validate: bool = True, debug: bool = False) -> str
```
Constructs the knowledge graph described by the blueprint.
The method concatenates selected RDF sources, optionally validates and serializes them as N-Triples, performs materialization over configured source pairs, and applies translation over configured predicate targets.
##### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `validate` | `bool` | `True` | If `True`, each source file is parsed with `rdflib` and serialized to N-Triples before being written. If `False` and the file already ends in `.nt`, the file is read directly. |
| `debug` | `bool` | `False` | Enables additional debug output during parsing, loading, placeholder replacement, and translation. |
##### Returns
| Type | Description |
|---|---|
| `str` | A success message containing the generated output path. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `output_dir` does not exist. |
| `ValueError` | Raised if one of the configured source paths does not exist. |
| `ValueError` | Raised if a non-local filesystem source is used during construction. |
##### Example
```python
blueprint = KnowledgeGraphBlueprint(
output_dir="./output",
sources_paths=["./data/greece.nt"],
name="Greece.nt",
)
blueprint.construct(validate=False)
```
## `KnowledgeGraphBlueprintBuilder` [source]
```python
class KnowledgeGraphBlueprintBuilder()
```
Builder class for incrementally configuring and creating a `KnowledgeGraphBlueprint`.
This is the main convenience interface for users who want to select source files, configure output options, and create a knowledge graph construction blueprint without manually passing all parameters to `KnowledgeGraphBlueprint`.
### Methods
#### `set_name(name)` [source]
```python
set_name(name) -> None
```
Sets the output file name for the generated knowledge graph.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `name` | `str` | Output file name, for example `"Greece.nt"`. |
---
#### `set_output_dir(output_dir)` [source]
```python
set_output_dir(output_dir: str) -> None
```
Sets the directory where the generated knowledge graph should be written.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `output_dir` | `str` | Output directory path. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `output_dir` is not a string. |
---
#### `build()` [source]
```python
build() -> KnowledgeGraphBlueprint
```
Creates a `KnowledgeGraphBlueprint` from the current builder configuration.
##### Returns
| Type | Description |
|---|---|
| `KnowledgeGraphBlueprint` | A configured blueprint object. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if required fields are missing. Required fields are `output_dir` and `sources_paths`. |
---
#### `set_sources_path(sources_path)` [source]
```python
set_sources_path(sources_path: list) -> None
```
Replaces the current source path collection with the given list.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `sources_path` | `list` | List of source file or directory paths. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `sources_path` is not a list. |
---
#### `add_source_path(source_path)` [source]
```python
add_source_path(source_path: str) -> None
```
Adds a single source path to the builder.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `source_path` | `str` | Path to an RDF source file or directory. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `source_path` is not a string. |
---
#### `add_source_paths_with_strings(source_paths, substrings)` [source]
```python
add_source_paths_with_strings(
source_paths: list,
substrings: List[str] | str,
) -> None
```
Adds source paths that contain all requested substrings and point to `.nt` files.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `source_paths` | `list` | Candidate source paths to filter. |
| `substrings` | `List[str] \| str` | Required substring or list of substrings. A path is added only if it contains all requested substrings. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `source_paths` is not a list of strings. |
| `ValueError` | Raised if any individual source path is not a string. |
##### Example
```python
builder.add_source_paths_with_strings(
sources_manager.get_source_paths(),
["Greece", "OSM"],
)
```
---
#### `add_source_paths_with_regex(source_paths, regex_pattern)` [source]
```python
add_source_paths_with_regex(
source_paths: list,
regex_pattern: str,
) -> None
```
Adds source paths that match a regular expression.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `source_paths` | `list` | Candidate source paths to filter. |
| `regex_pattern` | `str` | Regular expression used to select source paths. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `source_paths` is not a list of strings. |
| `ValueError` | Raised if any individual source path is not a string. |
##### Example
```python
builder.add_source_paths_with_regex(
sources_manager.get_source_paths(),
r"(?i).*Greece_(?!\d).*\.nt",
)
```
---
#### `remove_source_path(source_path)` [source]
```python
remove_source_path(source_path: str) -> None
```
Removes a source path from the builder, if source paths have already been configured.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `source_path` | `str` | Source path to remove. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `source_path` is not a string. |
---
#### `clear_source_paths()` [source]
```python
clear_source_paths() -> None
```
Clears all configured source paths and resets linking pairs, materialization pairs, and translation targets.
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if no source paths have been configured. |
---
#### `print_source_paths()` [source]
```python
print_source_paths() -> None
```
Prints the currently configured source paths.
---
#### `set_linking_pairs(linking_pairs)` [source]
```python
set_linking_pairs(linking_pairs: list) -> None
```
Sets the entity-linking pair configuration.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `linking_pairs` | `list` | Entity-linking pairs. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `linking_pairs` is not a list. |
---
#### `set_materialization_pairs(materialization_pairs)` [source]
```python
set_materialization_pairs(materialization_pairs: list) -> None
```
Sets all geospatial materialization pairs.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `materialization_pairs` | `list` | List of source-path pairs used for geospatial materialization. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `materialization_pairs` is not a list. |
---
#### `add_materialization_pair(materialization_pair)` [source]
```python
add_materialization_pair(materialization_pair: tuple) -> None
```
Adds a single pair of source paths for geospatial materialization.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `materialization_pair` | `tuple` | Tuple of two source paths. Both paths must already exist in the configured `sources_paths`. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `materialization_pair` is not a tuple of length two. |
| `ValueError` | Raised if the first element is not one of the configured source paths. |
| `ValueError` | Raised if the second element is not one of the configured source paths. |
##### Example
```python
builder.add_materialization_pair((source_a, source_b))
```
---
#### `set_translation_targets(translation_targets)` [source]
```python
set_translation_targets(translation_targets: list) -> None
```
Sets all translation targets.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `translation_targets` | `list` | List of translation target configurations. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `translation_targets` is not a list. |
---
#### `add_translation_target(translation_target)` [source]
```python
add_translation_target(translation_target: tuple) -> None
```
Adds a translation target.
Each translation target is expected to be a tuple whose first element is a source path and whose second element is a list of predicates to translate.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `translation_target` | `tuple` | Tuple of the form `(source_path, predicates_list)`. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `translation_target` is not a tuple of length two. |
| `ValueError` | Raised if the first element is not a string. |
| `ValueError` | Raised if the second element is not a list. |
##### Example
```python
builder.add_translation_target((
"./data/greece.nt",
[""],
))
```
---
---
## `KnowledgeGraphDataSource` [source]
```python
class KnowledgeGraphDataSource(
path: str,
metadata: Metadata,
)
```
Represents a single available ToposKG data source.
A data source can represent either a file or a directory. Directory-like sources may contain child `KnowledgeGraphDataSource` objects, allowing the available sources to be represented as a tree.
### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `path` | `str` | Path to the represented source file or directory. |
| `metadata` | `Metadata` | Metadata object associated with the source. May be `None` if no metadata file is available. |
### Attributes
| Attribute | Type | Description |
|---|---:|---|
| `name` | `str` | Basename of the source path. |
| `path` | `str` | Full source path. |
| `metadata` | `Metadata` | Loaded metadata for the source, or `None`. |
| `children` | `list` | Child data sources. |
### Methods
#### `print(indent=0)` [source]
```python
print(indent: int = 0) -> None
```
Prints the data source and its children as an indented tree.
##### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `indent` | `int` | `0` | Number of indentation levels used when printing the current source. |
---
## `KnowledgeGraphSourcesManager` [source]
```python
class KnowledgeGraphSourcesManager(
sources_repositories: str = "http://localhost:10001",
sources_cache: str = "~/.toposkg/sources_cache",
)
```
Manages the available ToposKG source repositories and the local source cache.
The manager can download source files from a configured repository, create placeholders for sources that are not downloaded yet, load metadata, and expose sources either as a tree or as a flat list of paths.
### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `sources_repositories` | `str` | `"http://localhost:10001"` | Base URL of the source repository service. |
| `sources_cache` | `str` | `"~/.toposkg/sources_cache"` | Local directory where source files or placeholders are stored. |
### Methods
#### `add_data_sources_from_repository(sources_repository)` [source]
```python
add_data_sources_from_repository(
sources_repository: str,
) -> KnowledgeGraphDataSource
```
Loads source information from a repository directory and returns the root data source.
The method recursively traverses files and directories, skips metadata directories, loads metadata when available, and represents the result as a tree of `KnowledgeGraphDataSource` objects.
##### Parameters
| Parameter | Type | Description |
|---|---:|---|
| `sources_repository` | `str` | Path or filesystem URL of the source repository. |
##### Returns
| Type | Description |
|---|---|
| `KnowledgeGraphDataSource` | Root data source loaded from the repository. |
##### Raises
| Exception | Condition |
|---|---|
| `ValueError` | Raised if `sources_repository` is not a directory. |
---
#### `get_sources_as_tree()` [source]
```python
get_sources_as_tree() -> list
```
Returns the available data sources as a tree.
##### Returns
| Type | Description |
|---|---|
| `list` | List of root `KnowledgeGraphDataSource` objects. |
---
#### `get_sources_as_list(data_sources=None)` [source]
```python
get_sources_as_list(data_sources=None) -> list
```
Flattens a source tree into a list.
##### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `data_sources` | `list \| None` | `None` | Source tree to flatten. If `None`, the manager's current `data_sources` are used. |
##### Returns
| Type | Description |
|---|---|
| `list` | Flat list of `KnowledgeGraphDataSource` objects. |
---
#### `get_source_paths()` [source]
```python
get_source_paths() -> list
```
Returns the paths of all available sources.
##### Returns
| Type | Description |
|---|---|
| `list` | List of source paths. |
---
#### `print_available_data_sources(tree=True, filter=None)` [source]
```python
print_available_data_sources(
tree: bool = True,
filter = None,
) -> None
```
Prints the available data sources.
The sources can be printed either as a tree or as a flat list. The optional `filter` argument restricts the printed sources to paths that contain the provided substring.
##### Parameters
| Parameter | Type | Default | Description |
|---|---:|---:|---|
| `tree` | `bool` | `True` | If `True`, print sources as a tree. If `False`, print a flat list of source paths. |
| `filter` | `str \| None` | `None` | Optional substring used to filter displayed source paths. |
##### Example
```python
sources_manager = KnowledgeGraphSourcesManager(
sources_repositories="https://toposkg.di.uoa.gr",
)
sources_manager.print_available_data_sources(
tree=False,
filter="Greece",
)
```
---
## End-to-end example
```python
sources_manager = KnowledgeGraphSourcesManager(
sources_repositories="https://toposkg.di.uoa.gr",
)
sources_manager.print_available_data_sources(tree=False, filter="Greece")
builder = KnowledgeGraphBlueprintBuilder()
builder.add_source_paths_with_strings(
sources_manager.get_source_paths(),
["Greece", "OSM"],
)
builder.set_output_dir("./output")
builder.set_name("Greece.nt")
blueprint = builder.build()
blueprint.construct(validate=False)
```
## End-to-end example with materialization and translation
More details on the materialization and translation pipelines can be found in our [website](https://toposkg.di.uoa.gr/)
```python
from toposkg.toposkg_lib_core import (
KnowledgeGraphBlueprintBuilder,
KnowledgeGraphSourcesManager
)
sources_manager = KnowledgeGraphSourcesManager(
sources_repositories='https://toposkg.di.uoa.gr'
)
sources_manager.print_available_data_sources(
tree=False,
filter="Greece"
)
builder = KnowledgeGraphBlueprintBuilder()
builder.set_name("ToposKG.nt")
builder.set_output_dir("/content/")
builder.add_source_path(
"/root/.toposkg/sources_cache/toposkg/GAUL/countries/Greece/Greece_all.nt"
)
builder.add_source_path(
"/root/.toposkg/sources_cache/toposkg/OSM/forests/Greece/greece_forest.nt"
)
builder.add_translation_target(
(
"/root/.toposkg/sources_cache/toposkg/OSM/countries/Greece/Greece_1.nt",
[""]
)
)
mat_candidates = [("/root/.toposkg/sources_cache/toposkg/GAUL/countries/Greece/Greece_all.nt",
"/root/.toposkg/sources_cache/toposkg/OSM/forests/Greece/greece_forest.nt")]
builder.set_materialization_pairs(mat_candidates)
blueprint = builder.build()
blueprint.construct(validate=False)
```