NEExT package
Submodules
NEExT.embeddings module
NEExT.feature_importance module
NEExT.features module
NEExT.framework module
- class NEExT.framework.NEExT(log_level='INFO')[source]
Bases:
objectMain interface class for the NEExT framework.
This class maintains the state of various components and provides a unified interface for users to interact with the framework.
- logger
Logger instance for the framework
Initialize the NEExT framework.
- Parameters:
log_level (
str) – Initial logging level (default: “INFO”)
- __init__(log_level='INFO')[source]
Initialize the NEExT framework.
- Parameters:
log_level (
str) – Initial logging level (default: “INFO”)
- set_log_level(level)[source]
Set the logging level for the framework.
- Parameters:
level (
str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)- Return type:
None
- read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]
Read graph data from CSV files and return a graph collection.
- Parameters:
edges_path (
Union[str,Path]) – Path to edges CSV file (src_node_id, dest_node_id)node_graph_mapping_path (
Union[str,Path]) – Path to node-graph mapping CSV file (node_id, graph_id)graph_label_path (
Union[str,Path,None]) – Optional path to graph labels CSV file (graph_id, graph_label)node_features_path (
Union[str,Path,None]) – Optional path to node features CSV fileedge_features_path (
Union[str,Path,None]) – Optional path to edge features CSV filegraph_type (
str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”reindex_nodes (
bool) – Whether to reindex nodes to start from 0 (default: True)filter_largest_component (
bool) – Whether to keep only the largest connected component of each graph (default: True)node_sample_rate (
float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.
- Returns:
Collection of graphs loaded from CSV files
- Return type:
GraphCollection
- get_collection_info(graph_collection)[source]
Get basic information about a graph collection.
This method is deprecated. Use graph_collection.describe() instead.
- Parameters:
graph_collection (
GraphCollection) – The graph collection to get information about- Returns:
Dictionary containing collection information
- Return type:
dict
- compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]
Compute node features for all graphs in the collection.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to compute features forfeature_list (
List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])feature_vector_length (
int) – Length of feature vector for each node (default: 3)normalize_features (
bool) – Whether to normalize features across all nodes (default: True)show_progress (
bool) – Whether to show progress bars during computation (default: True)n_jobs (int)
my_feature_methods (list)
- Returns:
DataFrame containing computed features for all nodes
- Return type:
pd.DataFrame
- compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]
Compute graph embeddings based on node features.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to compute embeddings forfeatures (
Features) – Features object containing node featuresembedding_algorithm (
str) – Algorithm to use for embedding computationembedding_dimension (
int) – Dimension of the output embeddingsfeature_columns (
Optional[List[str]]) – Specific feature columns to use (default: all)random_state (
int) – Random seed for reproducibilitymemory_size (
str) – Memory limit for algorithms that support it
- Returns:
Embeddings object containing computed embeddings
- Return type:
Embeddings
- train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]
Train and evaluate a machine learning model using graph embeddings.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs with labelsembeddings (
Embeddings) – Embeddings object containing graph embeddingsmodel_type (
Literal['classifier','regressor']) – Type of model to train (“classifier” or “regressor”)balance_dataset (
bool) – Whether to balance the dataset for classification (default: False)sample_size (
int) – Number of training/testing iterations (default: 5)n_jobs (
int) – Number of parallel jobs (-1 for all CPUs)parallel_backend (
str) – Parallelization backend (“process” or “thread”)
- Returns:
Dictionary containing model information and evaluation metrics
- Return type:
Dict
- compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]
Compute feature importance for graph embeddings.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to analyzefeatures (
Features) – Features object containing node featuresfeature_importance_algorithm (
str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)embedding_algorithm (
str) – Algorithm to use for embedding computationrandom_state (
int) – Random seed for reproducibilityn_iterations (
int) – Number of iterations for computing average performance
- Returns:
DataFrame containing feature importance results
- Return type:
pd.DataFrame
NEExT.graph module
NEExT.graph_collection module
NEExT.graph_embeddings module
NEExT.helper_functions module
- NEExT.helper_functions.get_numb_of_nb_x_hops_away(G, node, max_hop_length)[source]
Compute the number of neighbors x hops away from a given node. Supports both NetworkX and iGraph backends.
- Parameters:
G (
Union[Graph,Graph]) – Graph object (NetworkX or iGraph)node (
int) – Source node IDmax_hop_length (
int) – Maximum hop distance to consider
- Returns:
List where index i contains number of nodes i+1 hops away
- Return type:
List[int]
- NEExT.helper_functions.get_nodes_x_hops_away(G, node, max_hop_length)[source]
Efficiently get nodes at each hop distance from a given node, up to max_hop_length.
- Return type:
Dict[int,Set[int]]- Parameters:
G (Graph | Graph)
node (int)
max_hop_length (int)
- NEExT.helper_functions.get_all_neighborhoods_nx(G, max_hops, nodes_to_process=None)[source]
Get neighborhoods for all specified nodes in a NetworkX graph.
- Parameters:
G – NetworkX graph
max_hops (
int) – Maximum number of hops to considernodes_to_process (
Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.
- Returns:
Dictionary mapping each node to its neighborhood at each hop
- Return type:
Dict
- NEExT.helper_functions.get_all_neighborhoods_ig(G, max_hops, nodes_to_process=None)[source]
Get neighborhoods for all specified nodes in an iGraph graph.
- Parameters:
G – iGraph graph
max_hops (
int) – Maximum number of hops to considernodes_to_process (
Optional[List[int]]) – List of nodes to get neighborhoods for. If None, process all nodes.
- Returns:
Dictionary mapping each node to its neighborhood at each hop
- Return type:
Dict
- NEExT.helper_functions.get_specific_in_community_degree(G, node_id, community_partition, community_id)[source]
This method will compute the community degree of a node for a specific community.
Returns an integer, which is the in-community degree of the node for the specified community.
- Return type:
int- Parameters:
community_partition (List[List[int]])
community_id (int)
- NEExT.helper_functions.get_all_in_community_degrees(G, node_id, community_partition)[source]
This method will compute the community degree of a node for each of the communities.
Returns a list of integers, where each integer is the in-community degree of the node for that community.
- Return type:
List[int]- Parameters:
community_partition (List[List[int]])
- NEExT.helper_functions.get_own_in_community_degree(G, node_id, community_partition)[source]
This method will compute the community degree of a node for the community it is in.
Returns an integer, which is the in-community degree of the node for its community.
- Return type:
int- Parameters:
community_partition (List[List[int]])
- NEExT.helper_functions.get_specific_community_volume(G, community_partition, community_id)[source]
This method will compute the volume of a specific community in the graph. The volume is the sum of all the degrees of the nodes in the community.
Returns an integer, which is the volume of the community.
- Return type:
int- Parameters:
community_partition (List[List[int]])
community_id (int)
- NEExT.helper_functions.get_all_community_volumes(G, community_partition)[source]
This method will compute the volume of each community in the graph. The volume is the sum of all the degrees of the nodes in the community.
Returns a list of integers, where each integer is the volume of the community.
- Return type:
List[int]- Parameters:
community_partition (List[List[int]])
- NEExT.helper_functions.get_own_community_volume(G, node_id, community_partition)[source]
This method will compute the volume of the community the node is in. The volume is the sum of all the degrees of the nodes in the community.
Returns an integer, which is the volume of the community.
- Return type:
int- Parameters:
node_id (int)
community_partition (List[List[int]])
NEExT.io module
- class NEExT.io.GraphIO(logger=None)[source]
Bases:
objectInput/Output class for reading and writing graph data.
This class provides methods to read graph data from various file formats and create a GraphCollection instance.
Initialize GraphIO with optional logger.
- read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]
Read graph data from CSV files and create a GraphCollection.
- Parameters:
edges_path (
Union[str,Path]) – Path to edges CSV file (src_node_id, dest_node_id)node_graph_mapping_path (
Union[str,Path]) – Path to node-graph mapping CSV file (node_id, graph_id)graph_label_path (
Union[str,Path,None]) – Optional path to graph labels CSV file (graph_id, graph_label)node_features_path (
Union[str,Path,None]) – Optional path to node features CSV fileedge_features_path (
Union[str,Path,None]) – Optional path to edge features CSV filegraph_type (
str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”reindex_nodes (
bool) – Whether to reindex nodes to start from 0 (default: True)filter_largest_component (
bool) – Whether to keep only the largest connected component of each graph (default: True)node_sample_rate (
float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.
- Returns:
Collection of graphs created from the CSV data
- Return type:
GraphCollection
- load_from_dfs(edges_df, node_graph_df, graph_labels_df=None, node_features_df=None, edge_features_df=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]
- Return type:
GraphCollection- Parameters:
edges_df (DataFrame)
node_graph_df (DataFrame)
graph_labels_df (DataFrame | None)
node_features_df (DataFrame | None)
edge_features_df (DataFrame | None)
graph_type (str)
reindex_nodes (bool)
filter_largest_component (bool)
node_sample_rate (float)
- _organize_graph_data(edges_df, node_graph_df, node_features_df, edge_features_df, graph_labels_df)[source]
Organizes the data from DataFrames into a list of graph dictionaries.
- Parameters:
edges_df (pd.DataFrame) – DataFrame containing edge information
node_graph_df (pd.DataFrame) – DataFrame containing node-to-graph mapping
node_features_df (Optional[pd.DataFrame]) – DataFrame containing node features
edge_features_df (Optional[pd.DataFrame]) – DataFrame containing edge features
graph_labels_df (Optional[pd.DataFrame]) – DataFrame containing graph labels
- Returns:
List of dictionaries containing organized graph data
- Return type:
List[Dict]
NEExT.ml_models module
NEExT.node_features module
Module contents
- class NEExT.NEExT(log_level='INFO')[source]
Bases:
objectMain interface class for the NEExT framework.
This class maintains the state of various components and provides a unified interface for users to interact with the framework.
- logger
Logger instance for the framework
Initialize the NEExT framework.
- Parameters:
log_level (
str) – Initial logging level (default: “INFO”)
- __init__(log_level='INFO')[source]
Initialize the NEExT framework.
- Parameters:
log_level (
str) – Initial logging level (default: “INFO”)
- compute_feature_importance(graph_collection, features, feature_importance_algorithm, embedding_algorithm='approx_wasserstein', random_state=42, n_iterations=5)[source]
Compute feature importance for graph embeddings.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to analyzefeatures (
Features) – Features object containing node featuresfeature_importance_algorithm (
str) – Algorithm to use for importance analysis (“supervised_greedy”, “supervised_fast”, “unsupervised”)embedding_algorithm (
str) – Algorithm to use for embedding computationrandom_state (
int) – Random seed for reproducibilityn_iterations (
int) – Number of iterations for computing average performance
- Returns:
DataFrame containing feature importance results
- Return type:
pd.DataFrame
- compute_graph_embeddings(graph_collection, features, embedding_algorithm, embedding_dimension, feature_columns=None, random_state=42, memory_size='4G')[source]
Compute graph embeddings based on node features.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to compute embeddings forfeatures (
Features) – Features object containing node featuresembedding_algorithm (
str) – Algorithm to use for embedding computationembedding_dimension (
int) – Dimension of the output embeddingsfeature_columns (
Optional[List[str]]) – Specific feature columns to use (default: all)random_state (
int) – Random seed for reproducibilitymemory_size (
str) – Memory limit for algorithms that support it
- Returns:
Embeddings object containing computed embeddings
- Return type:
Embeddings
- compute_node_features(graph_collection, feature_list, feature_vector_length=3, normalize_features=True, show_progress=True, n_jobs=-1, my_feature_methods=None)[source]
Compute node features for all graphs in the collection.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs to compute features forfeature_list (
List[str]) – List of features to compute (e.g., [“page_rank”, “degree_centrality”])feature_vector_length (
int) – Length of feature vector for each node (default: 3)normalize_features (
bool) – Whether to normalize features across all nodes (default: True)show_progress (
bool) – Whether to show progress bars during computation (default: True)n_jobs (int)
my_feature_methods (list)
- Returns:
DataFrame containing computed features for all nodes
- Return type:
pd.DataFrame
- get_collection_info(graph_collection)[source]
Get basic information about a graph collection.
This method is deprecated. Use graph_collection.describe() instead.
- Parameters:
graph_collection (
GraphCollection) – The graph collection to get information about- Returns:
Dictionary containing collection information
- Return type:
dict
- read_from_csv(edges_path, node_graph_mapping_path, graph_label_path=None, node_features_path=None, edge_features_path=None, graph_type='networkx', reindex_nodes=True, filter_largest_component=True, node_sample_rate=1.0)[source]
Read graph data from CSV files and return a graph collection.
- Parameters:
edges_path (
Union[str,Path]) – Path to edges CSV file (src_node_id, dest_node_id)node_graph_mapping_path (
Union[str,Path]) – Path to node-graph mapping CSV file (node_id, graph_id)graph_label_path (
Union[str,Path,None]) – Optional path to graph labels CSV file (graph_id, graph_label)node_features_path (
Union[str,Path,None]) – Optional path to node features CSV fileedge_features_path (
Union[str,Path,None]) – Optional path to edge features CSV filegraph_type (
str) – Backend to use (“networkx” or “igraph”). Defaults to “networkx”reindex_nodes (
bool) – Whether to reindex nodes to start from 0 (default: True)filter_largest_component (
bool) – Whether to keep only the largest connected component of each graph (default: True)node_sample_rate (
float) – Rate at which to sample nodes from each graph (default: 1.0). Must be between 0 and 1.
- Returns:
Collection of graphs loaded from CSV files
- Return type:
GraphCollection
- set_log_level(level)[source]
Set the logging level for the framework.
- Parameters:
level (
str) – Logging level (“DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”)- Return type:
None
- train_ml_model(graph_collection, embeddings, model_type, balance_dataset=False, sample_size=5, n_jobs=-1, parallel_backend='process')[source]
Train and evaluate a machine learning model using graph embeddings.
- Parameters:
graph_collection (
GraphCollection) – Collection of graphs with labelsembeddings (
Embeddings) – Embeddings object containing graph embeddingsmodel_type (
Literal['classifier','regressor']) – Type of model to train (“classifier” or “regressor”)balance_dataset (
bool) – Whether to balance the dataset for classification (default: False)sample_size (
int) – Number of training/testing iterations (default: 5)n_jobs (
int) – Number of parallel jobs (-1 for all CPUs)parallel_backend (
str) – Parallelization backend (“process” or “thread”)
- Returns:
Dictionary containing model information and evaluation metrics
- Return type:
Dict