NEExT package

Submodules

NEExT.embeddings module

NEExT.feature_importance module

NEExT.features module

NEExT.framework module

class NEExT.framework.NEExT(log_level='INFO')[source]

Bases: object

Main interface class for the NEExT framework.

This class maintains the state of various components and provides a unified interface for users to interact with the framework.

logger

Logger instance for the framework

Initialize the NEExT framework.

Parameters:

log_level (str) – Initial logging level (default: “INFO”)

__init__(log_level='INFO')[source]

Initialize the NEExT framework.

Parameters:

log_level (str) – Initial logging level (default: “INFO”)

set_log_level(level)[source]

Set the logging level for the framework.

Parameters:

level (str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)

Return type:

None

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and return a graph collection.

Parameters:
  • edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)

  • node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)

  • graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)

  • node_features_path (Union[str, Path, None]) – Optional path to node features CSV file

  • edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file

  • graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”

  • reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)

  • filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)

  • node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs loaded from CSV files

Return type:

GraphCollection

get_collection_info(graph_collection)[source]

Get basic information about a graph collection.

This method is deprecated. Use graph_collection.describe() instead.

Parameters:

graph_collection (GraphCollection) – The graph collection to get information about

Returns:

Dictionary containing collection information

Return type:

dict

compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]

Compute node features for all graphs in the collection.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to compute features for

  • feature_list (List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])

  • feature_vector_length (int) – Length of feature vector for each node (default: 3)

  • normalize_features (bool) – Whether to normalize features across all nodes (default: True)

  • show_progress (bool) – Whether to show progress bars during computation (default: True)

  • n_jobs (int)

  • my_feature_methods (list)

Returns:

DataFrame containing computed features for all nodes

Return type:

pd.DataFrame

compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]

Compute graph embeddings based on node features.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to compute embeddings for

  • features (Features) – Features object containing node features

  • embedding_algorithm (str) – Algorithm to use for embedding computation

  • embedding_dimension (int) – Dimension of the output embeddings

  • feature_columns (Optional[List[str]]) – Specific feature columns to use (default: all)

  • random_state (int) – Random seed for reproducibility

  • memory_size (str) – Memory limit for algorithms that support it

Returns:

Embeddings object containing computed embeddings

Return type:

Embeddings

train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]

Train and evaluate a machine learning model using graph embeddings.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs with labels

  • embeddings (Embeddings) – Embeddings object containing graph embeddings

  • model_type (Literal['classifier', 'regressor']) – Type of model to train (“classifier” or “regressor”)

  • balance_dataset (bool) – Whether to balance the dataset for classification (default: False)

  • sample_size (int) – Number of training/testing iterations (default: 5)

  • n_jobs (int) – Number of parallel jobs (-1 for all CPUs)

  • parallel_backend (str) – Parallelization backend (“process” or “thread”)

Returns:

Dictionary containing model information and evaluation metrics

Return type:

Dict

compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]

Compute feature importance for graph embeddings.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to analyze

  • features (Features) – Features object containing node features

  • feature_importance_algorithm (str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)

  • embedding_algorithm (str) – Algorithm to use for embedding computation

  • random_state (int) – Random seed for reproducibility

  • n_iterations (int) – Number of iterations for computing average performance

Returns:

DataFrame containing feature importance results

Return type:

pd.DataFrame

NEExT.graph module

NEExT.graph_collection module

NEExT.graph_embeddings module

NEExT.helper_functions module

NEExT.helper_functions.divide_chunks(list, chunks)[source]
NEExT.helper_functions.get_numb_of_nb_x_hops_away(G, node, max_hop_length)[source]

Compute the number of neighbors x hops away from a given node. Supports both NetworkX and iGraph backends.

Parameters:
  • G (Union[Graph, Graph]) – Graph object (NetworkX or iGraph)

  • node (int) – Source node ID

  • max_hop_length (int) – Maximum hop distance to consider

Returns:

List where index i contains number of nodes i+1 hops away

Return type:

List[int]

NEExT.helper_functions.get_nodes_x_hops_away(G, node, max_hop_length)[source]

Efficiently get nodes at each hop distance from a given node, up to max_hop_length.

Return type:

Dict[int, Set[int]]

Parameters:
  • G (Graph | Graph)

  • node (int)

  • max_hop_length (int)

NEExT.helper_functions.get_all_neighborhoods_nx(G, max_hops, nodes_to_process=None)[source]

Get neighborhoods for all specified nodes in a NetworkX graph.

Parameters:
  • G – NetworkX graph

  • max_hops (int) – Maximum number of hops to consider

  • nodes_to_process (Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.

Returns:

Dictionary mapping each node to its neighborhood at each hop

Return type:

Dict

NEExT.helper_functions.get_all_neighborhoods_ig(G, max_hops, nodes_to_process=None)[source]

Get neighborhoods for all specified nodes in an iGraph graph.

Parameters:
  • G – iGraph graph

  • max_hops (int) – Maximum number of hops to consider

  • nodes_to_process (Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.

Returns:

Dictionary mapping each node to its neighborhood at each hop

Return type:

Dict

NEExT.helper_functions.get_specific_in_community_degree(G, node_id, community_partition, community_id)[source]

This method will compute the community degree of a node for a specific community.

Returns an integer, which is the in-community degree of the node for the specified community.

Return type:

int

Parameters:
  • community_partition (List[List[int]])

  • community_id (int)

NEExT.helper_functions.get_all_in_community_degrees(G, node_id, community_partition)[source]

This method will compute the community degree of a node for each of the communities.

Returns a list of integers, where each integer is the in-community degree of the node for that community.

Return type:

List[int]

Parameters:

community_partition (List[List[int]])

NEExT.helper_functions.get_own_in_community_degree(G, node_id, community_partition)[source]

This method will compute the community degree of a node for the community it is in.

Returns an integer, which is the in-community degree of the node for its community.

Return type:

int

Parameters:

community_partition (List[List[int]])

NEExT.helper_functions.get_specific_community_volume(G, community_partition, community_id)[source]

This method will compute the volume of a specific community in the graph. The volume is the sum of all the degrees of the nodes in the community.

Returns an integer, which is the volume of the community.

Return type:

int

Parameters:
  • community_partition (List[List[int]])

  • community_id (int)

NEExT.helper_functions.get_all_community_volumes(G, community_partition)[source]

This method will compute the volume of each community in the graph. The volume is the sum of all the degrees of the nodes in the community.

Returns a list of integers, where each integer is the volume of the community.

Return type:

List[int]

Parameters:

community_partition (List[List[int]])

NEExT.helper_functions.get_own_community_volume(G, node_id, community_partition)[source]

This method will compute the volume of the community the node is in. The volume is the sum of all the degrees of the nodes in the community.

Returns an integer, which is the volume of the community.

Return type:

int

Parameters:
  • node_id (int)

  • community_partition (List[List[int]])

NEExT.io module

class NEExT.io.GraphIO(logger=None)[source]

Bases: object

Input/Output class for reading and writing graph data.

This class provides methods to read graph data from various file formats and create a GraphCollection instance.

Initialize GraphIO with optional logger.

__init__(logger=None)[source]

Initialize GraphIO with optional logger.

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and create a GraphCollection.

Parameters:
  • edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)

  • node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)

  • graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)

  • node_features_path (Union[str, Path, None]) – Optional path to node features CSV file

  • edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file

  • graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”

  • reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)

  • filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)

  • node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs created from the CSV data

Return type:

GraphCollection

load_from_dfs(edges_df, node_graph_df, graph_labels_df=None, node_features_df=None, edge_features_df=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]
Return type:

GraphCollection

Parameters:
  • edges_df (DataFrame)

  • node_graph_df (DataFrame)

  • graph_labels_df (DataFrame | None)

  • node_features_df (DataFrame | None)

  • edge_features_df (DataFrame | None)

  • graph_type (str)

  • reindex_nodes (bool)

  • filter_largest_component (bool)

  • node_sample_rate (float)

_organize_graph_data(edges_df, node_graph_df, node_features_df, edge_features_df, graph_labels_df)[source]

Organizes the data from DataFrames into a list of graph dictionaries.

Parameters:
  • edges_df (pd.DataFrame) – DataFrame containing edge information

  • node_graph_df (pd.DataFrame) – DataFrame containing node-to-graph mapping

  • node_features_df (Optional[pd.DataFrame]) – DataFrame containing node features

  • edge_features_df (Optional[pd.DataFrame]) – DataFrame containing edge features

  • graph_labels_df (Optional[pd.DataFrame]) – DataFrame containing graph labels

Returns:

List of dictionaries containing organized graph data

Return type:

List[Dict]

NEExT.ml_models module

NEExT.node_features module

Module contents

class NEExT.NEExT(log_level='INFO')[source]

Bases: object

Main interface class for the NEExT framework.

This class maintains the state of various components and provides a unified interface for users to interact with the framework.

logger

Logger instance for the framework

Initialize the NEExT framework.

Parameters:

log_level (str) – Initial logging level (default: “INFO”)

__init__(log_level='INFO')[source]

Initialize the NEExT framework.

Parameters:

log_level (str) – Initial logging level (default: “INFO”)

compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]

Compute feature importance for graph embeddings.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to analyze

  • features (Features) – Features object containing node features

  • feature_importance_algorithm (str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)

  • embedding_algorithm (str) – Algorithm to use for embedding computation

  • random_state (int) – Random seed for reproducibility

  • n_iterations (int) – Number of iterations for computing average performance

Returns:

DataFrame containing feature importance results

Return type:

pd.DataFrame

compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]

Compute graph embeddings based on node features.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to compute embeddings for

  • features (Features) – Features object containing node features

  • embedding_algorithm (str) – Algorithm to use for embedding computation

  • embedding_dimension (int) – Dimension of the output embeddings

  • feature_columns (Optional[List[str]]) – Specific feature columns to use (default: all)

  • random_state (int) – Random seed for reproducibility

  • memory_size (str) – Memory limit for algorithms that support it

Returns:

Embeddings object containing computed embeddings

Return type:

Embeddings

compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]

Compute node features for all graphs in the collection.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs to compute features for

  • feature_list (List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])

  • feature_vector_length (int) – Length of feature vector for each node (default: 3)

  • normalize_features (bool) – Whether to normalize features across all nodes (default: True)

  • show_progress (bool) – Whether to show progress bars during computation (default: True)

  • n_jobs (int)

  • my_feature_methods (list)

Returns:

DataFrame containing computed features for all nodes

Return type:

pd.DataFrame

get_collection_info(graph_collection)[source]

Get basic information about a graph collection.

This method is deprecated. Use graph_collection.describe() instead.

Parameters:

graph_collection (GraphCollection) – The graph collection to get information about

Returns:

Dictionary containing collection information

Return type:

dict

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and return a graph collection.

Parameters:
  • edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)

  • node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)

  • graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)

  • node_features_path (Union[str, Path, None]) – Optional path to node features CSV file

  • edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file

  • graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”

  • reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)

  • filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)

  • node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs loaded from CSV files

Return type:

GraphCollection

set_log_level(level)[source]

Set the logging level for the framework.

Parameters:

level (str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)

Return type:

None

train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]

Train and evaluate a machine learning model using graph embeddings.

Parameters:
  • graph_collection (GraphCollection) – Collection of graphs with labels

  • embeddings (Embeddings) – Embeddings object containing graph embeddings

  • model_type (Literal['classifier', 'regressor']) – Type of model to train (“classifier” or “regressor”)

  • balance_dataset (bool) – Whether to balance the dataset for classification (default: False)

  • sample_size (int) – Number of training/testing iterations (default: 5)

  • n_jobs (int) – Number of parallel jobs (-1 for all CPUs)

  • parallel_backend (str) – Parallelization backend (“process” or “thread”)

Returns:

Dictionary containing model information and evaluation metrics

Return type:

Dict