NEExT package

Submodules

NEExT.embeddings module

NEExT.feature_importance module

NEExT.features module

NEExT.framework module

class NEExT.framework.NEExT(log_level='INFO')[source]

Bases: object

Main interface class for the NEExT framework.

This class maintains the state of various components and provides a unified interface for users to interact with the framework.

logger: Logger instance for the framework

Initialize the NEExT framework.

Parameters:: log_level (str) – Initial logging level (default: “INFO”)

__init__(log_level='INFO')[source]

Initialize the NEExT framework.

Parameters:: log_level (str) – Initial logging level (default: “INFO”)

set_log_level(level)[source]

Set the logging level for the framework.

Parameters:: level (str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)
Return type:: None

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and return a graph collection.

Parameters:

edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)
node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)
graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)
node_features_path (Union[str, Path, None]) – Optional path to node features CSV file
edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file
graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”
reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)
filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)
node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs loaded from CSV files

Return type:

GraphCollection

get_collection_info(graph_collection)[source]

Get basic information about a graph collection.

This method is deprecated. Use graph_collection.describe() instead.

Parameters:: graph_collection (GraphCollection) – The graph collection to get information about
Returns:: Dictionary containing collection information
Return type:: dict

compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]

Compute node features for all graphs in the collection.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to compute features for
feature_list (List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])
feature_vector_length (int) – Length of feature vector for each node (default: 3)
normalize_features (bool) – Whether to normalize features across all nodes (default: True)
show_progress (bool) – Whether to show progress bars during computation (default: True)
n_jobs (int)
my_feature_methods (list)

Returns:

DataFrame containing computed features for all nodes

Return type:

pd.DataFrame

compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]

Compute graph embeddings based on node features.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to compute embeddings for
features (Features) – Features object containing node features
embedding_algorithm (str) – Algorithm to use for embedding computation
embedding_dimension (int) – Dimension of the output embeddings
feature_columns (Optional[List[str]]) – Specific feature columns to use (default: all)
random_state (int) – Random seed for reproducibility
memory_size (str) – Memory limit for algorithms that support it

Returns:

Embeddings object containing computed embeddings

Return type:

Embeddings

train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]

Train and evaluate a machine learning model using graph embeddings.

Parameters:

graph_collection (GraphCollection) – Collection of graphs with labels
embeddings (Embeddings) – Embeddings object containing graph embeddings
model_type (Literal['classifier', 'regressor']) – Type of model to train (“classifier” or “regressor”)
balance_dataset (bool) – Whether to balance the dataset for classification (default: False)
sample_size (int) – Number of training/testing iterations (default: 5)
n_jobs (int) – Number of parallel jobs (-1 for all CPUs)
parallel_backend (str) – Parallelization backend (“process” or “thread”)

Returns:

Dictionary containing model information and evaluation metrics

Return type:

Dict

compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]

Compute feature importance for graph embeddings.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to analyze
features (Features) – Features object containing node features
feature_importance_algorithm (str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)
embedding_algorithm (str) – Algorithm to use for embedding computation
random_state (int) – Random seed for reproducibility
n_iterations (int) – Number of iterations for computing average performance

Returns:

DataFrame containing feature importance results

Return type:

pd.DataFrame

NEExT.graph module

NEExT.graph_collection module

NEExT.graph_embeddings module

NEExT.helper_functions module

NEExT.helper_functions.divide_chunks(list, chunks)[source]

NEExT.helper_functions.get_numb_of_nb_x_hops_away(G, node, max_hop_length)[source]

Compute the number of neighbors x hops away from a given node. Supports both NetworkX and iGraph backends.

Parameters:

G (Union[Graph, Graph]) – Graph object (NetworkX or iGraph)
node (int) – Source node ID
max_hop_length (int) – Maximum hop distance to consider

Returns:

List where index i contains number of nodes i+1 hops away

Return type:

List[int]

NEExT.helper_functions.get_nodes_x_hops_away(G, node, max_hop_length)[source]

Efficiently get nodes at each hop distance from a given node, up to max_hop_length.

Return type:

Dict[int, Set[int]]

Parameters:

G (Graph | Graph)
node (int)
max_hop_length (int)

NEExT.helper_functions.get_all_neighborhoods_nx(G, max_hops, nodes_to_process=None)[source]

Get neighborhoods for all specified nodes in a NetworkX graph.

Parameters:

G – NetworkX graph
max_hops (int) – Maximum number of hops to consider
nodes_to_process (Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.

Returns:

Dictionary mapping each node to its neighborhood at each hop

Return type:

Dict

NEExT.helper_functions.get_all_neighborhoods_ig(G, max_hops, nodes_to_process=None)[source]

Get neighborhoods for all specified nodes in an iGraph graph.

Parameters:

G – iGraph graph
max_hops (int) – Maximum number of hops to consider
nodes_to_process (Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.

Returns:

Dictionary mapping each node to its neighborhood at each hop

Return type:

Dict

NEExT.helper_functions.get_specific_in_community_degree(G, node_id, community_partition, community_id)[source]

This method will compute the community degree of a node for a specific community.

Returns an integer, which is the in-community degree of the node for the specified community.

Return type:

int

Parameters:

community_partition (List[List[int]])
community_id (int)

NEExT.helper_functions.get_all_in_community_degrees(G, node_id, community_partition)[source]

This method will compute the community degree of a node for each of the communities.

Returns a list of integers, where each integer is the in-community degree of the node for that community.

Return type:: List[int]
Parameters:: community_partition (List[List[int]])

NEExT.helper_functions.get_own_in_community_degree(G, node_id, community_partition)[source]

This method will compute the community degree of a node for the community it is in.

Returns an integer, which is the in-community degree of the node for its community.

Return type:: int
Parameters:: community_partition (List[List[int]])

NEExT.helper_functions.get_specific_community_volume(G, community_partition, community_id)[source]

This method will compute the volume of a specific community in the graph. The volume is the sum of all the degrees of the nodes in the community.

Returns an integer, which is the volume of the community.

Return type:

int

Parameters:

community_partition (List[List[int]])
community_id (int)

NEExT.helper_functions.get_all_community_volumes(G, community_partition)[source]

This method will compute the volume of each community in the graph. The volume is the sum of all the degrees of the nodes in the community.

Returns a list of integers, where each integer is the volume of the community.

Return type:: List[int]
Parameters:: community_partition (List[List[int]])

NEExT.helper_functions.get_own_community_volume(G, node_id, community_partition)[source]

This method will compute the volume of the community the node is in. The volume is the sum of all the degrees of the nodes in the community.

Returns an integer, which is the volume of the community.

Return type:

int

Parameters:

node_id (int)
community_partition (List[List[int]])

NEExT.io module

class NEExT.io.GraphIO(logger=None)[source]

Bases: object

Input/Output class for reading and writing graph data.

This class provides methods to read graph data from various file formats and create a GraphCollection instance.

Initialize GraphIO with optional logger.

__init__(logger=None)[source]: Initialize GraphIO with optional logger.

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and create a GraphCollection.

Parameters:

edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)
node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)
graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)
node_features_path (Union[str, Path, None]) – Optional path to node features CSV file
edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file
graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”
reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)
filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)
node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs created from the CSV data

Return type:

GraphCollection

load_from_dfs(edges_df, node_graph_df, graph_labels_df=None, node_features_df=None, edge_features_df=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Return type:

GraphCollection

Parameters:

edges_df (DataFrame)
node_graph_df (DataFrame)
graph_labels_df (DataFrame | None)
node_features_df (DataFrame | None)
edge_features_df (DataFrame | None)
graph_type (str)
reindex_nodes (bool)
filter_largest_component (bool)
node_sample_rate (float)

_organize_graph_data(edges_df, node_graph_df, node_features_df, edge_features_df, graph_labels_df)[source]

Organizes the data from DataFrames into a list of graph dictionaries.

Parameters:

edges_df (pd.DataFrame) – DataFrame containing edge information
node_graph_df (pd.DataFrame) – DataFrame containing node-to-graph mapping
node_features_df (Optional[pd.DataFrame]) – DataFrame containing node features
edge_features_df (Optional[pd.DataFrame]) – DataFrame containing edge features
graph_labels_df (Optional[pd.DataFrame]) – DataFrame containing graph labels

Returns:

List of dictionaries containing organized graph data

Return type:

List[Dict]

NEExT.ml_models module

NEExT.node_features module

Module contents

class NEExT.NEExT(log_level='INFO')[source]

Bases: object

Main interface class for the NEExT framework.

This class maintains the state of various components and provides a unified interface for users to interact with the framework.

logger: Logger instance for the framework

Initialize the NEExT framework.

Parameters:: log_level (str) – Initial logging level (default: “INFO”)

__init__(log_level='INFO')[source]

Initialize the NEExT framework.

Parameters:: log_level (str) – Initial logging level (default: “INFO”)

compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]

Compute feature importance for graph embeddings.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to analyze
features (Features) – Features object containing node features
feature_importance_algorithm (str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)
embedding_algorithm (str) – Algorithm to use for embedding computation
random_state (int) – Random seed for reproducibility
n_iterations (int) – Number of iterations for computing average performance

Returns:

DataFrame containing feature importance results

Return type:

pd.DataFrame

compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]

Compute graph embeddings based on node features.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to compute embeddings for
features (Features) – Features object containing node features
embedding_algorithm (str) – Algorithm to use for embedding computation
embedding_dimension (int) – Dimension of the output embeddings
feature_columns (Optional[List[str]]) – Specific feature columns to use (default: all)
random_state (int) – Random seed for reproducibility
memory_size (str) – Memory limit for algorithms that support it

Returns:

Embeddings object containing computed embeddings

Return type:

Embeddings

compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]

Compute node features for all graphs in the collection.

Parameters:

graph_collection (GraphCollection) – Collection of graphs to compute features for
feature_list (List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])
feature_vector_length (int) – Length of feature vector for each node (default: 3)
normalize_features (bool) – Whether to normalize features across all nodes (default: True)
show_progress (bool) – Whether to show progress bars during computation (default: True)
n_jobs (int)
my_feature_methods (list)

Returns:

DataFrame containing computed features for all nodes

Return type:

pd.DataFrame

get_collection_info(graph_collection)[source]

Get basic information about a graph collection.

This method is deprecated. Use graph_collection.describe() instead.

Parameters:: graph_collection (GraphCollection) – The graph collection to get information about
Returns:: Dictionary containing collection information
Return type:: dict

read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]

Read graph data from CSV files and return a graph collection.

Parameters:

edges_path (Union[str, Path]) – Path to edges CSV file (src_node_id, dest_node_id)
node_graph_mapping_path (Union[str, Path]) – Path to node-graph mapping CSV file (node_id, graph_id)
graph_label_path (Union[str, Path, None]) – Optional path to graph labels CSV file (graph_id, graph_label)
node_features_path (Union[str, Path, None]) – Optional path to node features CSV file
edge_features_path (Union[str, Path, None]) – Optional path to edge features CSV file
graph_type (str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”
reindex_nodes (bool) – Whether to reindex nodes to start from 0 (default: True)
filter_largest_component (bool) – Whether to keep only the largest connected component of each graph (default: True)
node_sample_rate (float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.

Returns:

Collection of graphs loaded from CSV files

Return type:

GraphCollection

set_log_level(level)[source]

Set the logging level for the framework.

Parameters:: level (str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)
Return type:: None

train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]

Train and evaluate a machine learning model using graph embeddings.

Parameters:

graph_collection (GraphCollection) – Collection of graphs with labels
embeddings (Embeddings) – Embeddings object containing graph embeddings
model_type (Literal['classifier', 'regressor']) – Type of model to train (“classifier” or “regressor”)
balance_dataset (bool) – Whether to balance the dataset for classification (default: False)
sample_size (int) – Number of training/testing iterations (default: 5)
n_jobs (int) – Number of parallel jobs (-1 for all CPUs)
parallel_backend (str) – Parallelization backend (“process” or “thread”)

Returns:

Dictionary containing model information and evaluation metrics

Return type:

Dict