Quick Start ========== This guide demonstrates how to use NEExT for graph analysis using a real-world dataset. Basic Example ------------ .. code-block:: python from NEExT import NEExT import numpy as np def main(): # Define data URLs - using the NCI1 dataset # NCI1 is a chemical compound dataset where each graph represents a molecule, # labeled as either active or inactive against non-small cell lung cancer edge_file = "https://raw.githubusercontent.com/AnomalyPoint/NEExT_datasets/refs/heads/main/real_world_networks/csv_format/NCI1/edges.csv" node_graph_mapping_file = "https://raw.githubusercontent.com/AnomalyPoint/NEExT_datasets/refs/heads/main/real_world_networks/csv_format/NCI1/node_graph_mapping.csv" graph_label_file = "https://raw.githubusercontent.com/AnomalyPoint/NEExT_datasets/refs/heads/main/real_world_networks/csv_format/NCI1/graph_labels.csv" # Initialize NEExT framework nxt = NEExT() nxt.set_log_level("INFO") # Set logging level for detailed progress information # Load graph data from CSV files # - reindex_nodes=True: Ensures consistent node indexing across graphs # - filter_largest_component=True: Keeps only the largest connected component of each graph # - graph_type="networkx": Uses NetworkX as the backend (alternatively can use "igraph") graph_collection = nxt.read_from_csv( edges_path=edge_file, node_graph_mapping_path=node_graph_mapping_file, graph_label_path=graph_label_file, reindex_nodes=True, filter_largest_component=True, graph_type="networkx" ) # Print collection info print("\nGraph Collection Info:") print(graph_collection.describe()) # Compute node features # - feature_list=["all"]: Computes all available node features including: # * Degree centrality: Measures node connectivity # * Betweenness centrality: Measures node importance in information flow # * Closeness centrality: Measures node's average distance to all others # * Page rank: Measures node importance based on neighbor importance # * Clustering coefficient: Measures local clustering around node # - feature_vector_length=3: Aggregates features from 3-hop neighborhoods features = nxt.compute_node_features( graph_collection=graph_collection, feature_list=["all"], feature_vector_length=3, show_progress=True ) # Normalize features using StandardScaler # This ensures all features are on the same scale features.normalize(type="StandardScaler") # Print feature information print("\nComputed Node Features:") print(f"Number of nodes: {len(features.features_df)}") print(f"Features computed: {list(features.features_df.columns)}") # Compute graph embeddings using the Wasserstein distance # This creates fixed-size vector representations for each graph # - embedding_algorithm="approx_wasserstein": Uses approximate Wasserstein distance # - embedding_dimension=3: Creates 3-dimensional embeddings embeddings = nxt.compute_graph_embeddings( graph_collection=graph_collection, features=features, embedding_algorithm="approx_wasserstein", embedding_dimension=3, random_state=42 ) # Print embedding information print("\nComputed Graph Embeddings:") print(f"Number of graphs: {len(embeddings.embeddings_df)}") print(f"Embedding dimensions: {len(embeddings.embedding_columns)}") print(f"Embedding algorithm: {embeddings.embedding_name}") # Train a classification model # - model_type="classifier": For classification tasks (use "regressor" for regression) # - sample_size=50: Number of train/test splits for robust evaluation # - balance_dataset=False: Use original class distribution model_results = nxt.train_ml_model( graph_collection=graph_collection, embeddings=embeddings, model_type="classifier", sample_size=50, balance_dataset=False ) # Print model results print("\nClassification Model Results:") print(f"Average Accuracy: {np.mean(model_results['accuracy']):.4f}") print(f"Average F1 Score: {np.mean(model_results['f1_score']):.4f}") print(f"Classes: {model_results['classes']}") # Compute feature importance using supervised fast algorithm # This determines which node features are most important for the classification task importance_df = nxt.compute_feature_importance( graph_collection=graph_collection, features=features, feature_importance_algorithm="supervised_fast", embedding_algorithm="approx_wasserstein", n_iterations=5 ) # Print feature importance results print("\nFeature Importance Results:") print(importance_df) if __name__ == '__main__': main() Understanding the Output ---------------------- The code above will produce several outputs: 1. Graph Collection Info: - Number of graphs in the dataset - Graph backend being used - Whether graphs have labels 2. Node Features: - Number of nodes across all graphs - List of computed features for each node - Features are normalized using StandardScaler 3. Graph Embeddings: - Number of embedded graphs - Dimensionality of embeddings - Algorithm used for embedding 4. Model Results: - Average accuracy across multiple train/test splits - Average F1 score for classification performance - List of unique classes in the dataset 5. Feature Importance: - Ranking of node features by importance - Performance scores for each feature - Total computation time This example demonstrates the complete workflow from loading graph data to analyzing feature importance, using NEExT's high-level interface while maintaining flexibility and configurability at each step.