GraphSL package

Submodules

GraphSL.utils module

class GraphSL.utils.Metric(acc, pr, re, f1, auc)[source]: Bases: object

GraphSL.utils.diffusion_generation(graph, sim_num=10, diff_type='IC', time_step=10, repeat_step=10, seed_ratio=0.1, infect_prob=0.1, recover_prob=0.005, threshold=0.5, random_seed=0)[source]

Generate diffusion matrices for a graph.

Args:

graph (dict): Dictionary containing the graph information.
sim_num (int): Number of simulations.
diff_type (str): Type of diffusion model (IC, LT, SI, SIS, SIR). IC stands for Independent Cascade, LT stands for Linear Threshold, SI stands for Susceptible or Infective, SIS stands for Susceptible or Infective or Susceptible, SIR stands for Susceptible or Infective or Recovered.
time_step (int): Number of time steps in the simulation.
repeat_step (int): Number of repetitions for each simulation.
infect_prob (float): Infection probability, used in SIS, SIR or SI.
recover_prob (float): Recovery probability, used in SIS or SIR.
threshold (float): Threshold parameter for diffusion models, used in IC or LT.
random_seed (int): Random seed.

Returns:

dataset (dict): Dictionary containing (‘adj_mat’) adjacency matrix (the dimensionality is number of nodes * number of nodes) and (‘diff_mat’) diffusion matrices (the dimensionality is number of simulations * number of nodes * 2(the first column is the source vector, and the second column is the diffusion vector)).

Example:

import os

curr_dir = os.getcwd()

from data.utils import load_dataset, diffusion_generation

data_name = ‘karate’

graph = load_dataset(data_name, data_dir=curr_dir)

dataset = diffusion_generation(graph=graph, infect_prob=0.3, diff_type=’IC’, sim_num=100, seed_ratio=0.1)

GraphSL.utils.download_dataset(data_dir)[source]

Download datasets from url.

Args:

data_dir (str): The directory where the downloaded dataset files are stored.

GraphSL.utils.generate_seed_vector(top_nodes, seed_num, G, random_seed)[source]

Generate a seed vector for diffusion simulation.

Args:

top_nodes (list): List of top nodes based on node degree.
seed_num (int): Number of seed nodes.
G (networkx.Graph): The graph object.
random_seed (int): Random Seed

Returns:

seed_vector (list): Seed vector for diffusion simulation.

GraphSL.utils.load_dataset(dataset, data_dir)[source]

Load a dataset from a pickle file.

Args:

dataset (str): The name of the dataset file, ‘karate’, ‘dolphins’, ‘jazz’, ‘netscience’, ‘cora_ml’, ‘power_grid’.
data_dir (str): The directory where the dataset files are stored.

Returns:

graph (dict): A dictionary containing the dataset.

GraphSL.utils.split_dataset(dataset, train_ratio: float = 0.6, seed: int = 0)[source]

Split the dataset into training and testing sets.

Args:

dataset (dict): Dictionary containing the dataset.
train_ratio (float): Ratio of training data. Default is 0.6.
seed (int): Random seed for reproducibility. Default is 0.

Returns:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.
train_dataset (torch.utils.data.dataset.Subset): The train dataset (number of simulations * number of graph nodes * 2(the first column is seed vector and the second column is diffusion vector)).
test_dataset (torch.utils.data.dataset.Subset): The test dataset (number of simulations * number of graph nodes * 2(the first column is seed vector and the second column is diffusion vector)).

Example:

import os

curr_dir = os.getcwd()

from data.utils import load_dataset, diffusion_generation, split_dataset

data_name = ‘karate’

graph = load_dataset(data_name, data_dir = curr_dir)

dataset = diffusion_generation(graph=graph, infect_prob=0.3, diff_type=’IC’, sim_num=100, seed_ratio=0.1)

adj, train_dataset, test_dataset =split_dataset(dataset)

GraphSL.utils.visualize_source_prediction(adj: csr_matrix, predictions: ndarray, labels: ndarray, save_dir: str, save_name: str)[source]

Visualize source predictions.

Args:

adj (csr_matrix): Dictionary containing the dataset.
predictions (numpy.ndarray): Predicted source vector, each entry should be either 0 or 1, where 1 means the source, and 0 means otherwise.
labels (numpy.ndarray): Labeled source vector, each entry should be either 0 or 1, where 1 means the source, and 0 means otherwise.
save_dir (str): Dirctory of the saved figure.
save_name (str): Name of the saved figure.

Example:

from GraphSL.GNN.GCNSI.main import GCNSI

from GraphSL.utils import load_dataset, diffusion_generation, split_dataset,download_dataset,visualize_source_prediction

import os

curr_dir = os.getcwd()

download_dataset(curr_dir)

data_name = ‘karate’

graph = load_dataset(data_name, data_dir=curr_dir)

dataset = diffusion_generation(graph=graph, infect_prob=0.3, diff_type=’IC’, sim_num=100, seed_ratio=0.2)

adj, train_dataset, test_dataset = split_dataset(dataset)

print(“GCNSI:”)

gcnsi = GCNSI()

gcnsi_model, thres, auc, f1, pred = gcnsi.train(adj, train_dataset)

print(f”train auc: {auc:.3f}, train f1: {f1:.3f}”)

pred = (pred >= thres)

visualize_source_prediction(adj,pred[:,0],train_dataset[0][:,0].numpy(),save_dir=curr_dir,save_name=”GCNSI_source_prediction”)

GraphSL.Prescribed module

class GraphSL.Prescribed.LPSI[source]

Bases: object

Implement the Label Propagation based Source Identification (LPSI) algorithm.

Wang, Zheng, et al. “Multiple source detection without knowing the underlying propagation model.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1. 2017.

predict(laplacian, num_node, alpha, diff_vec)[source]

Prediction of the LPSI algorithm.

Args:

laplacian (torch.Tensor): The Laplacian matrix of the graph.

num_node (int): Number of nodes in the graph.

alpha (float): The fraction of label information that a node gets from its neighbors (between 0 and 1).

diff_vec (torch.Tensor): The diffusion vector.

Returns:

x (torch.Tensor): Prediction of source nodes.

test(adj, test_dataset, alpha, thres)[source]

Test the LPSI algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.

test_dataset (torch.utils.data.dataset.Subset): The test dataset.

alpha (float): The fraction of label information that a node gets from its neighbors.

thres (float): Threshold value.

Returns:

metric (Metric): Evaluation metric containing accuracy, precision, recall, F1 score, and AUC.

train(adj, train_dataset, alpha_list=[0.001, 0.01, 0.1], num_thres=10)[source]

Train the LPSI algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.

train_dataset (torch.utils.data.dataset.Subset): The training dataset.

alpha_list (list): List of alpha values to try.

num_thres (int): Number of threshold values to try.

Returns:

opt_alpha (float): Optimal fraction of label information that a node gets from its neighbors.

opt_thres (float): Optimal threshold value.

opt_auc (float): Optimal Area Under the Curve (AUC) value.

opt_f1 (float): Optimal F1 score value.

opt_pred (torch.Tensor): Prediction of training seed vector given opt_alpha.

class GraphSL.Prescribed.NetSleuth[source]

Bases: object

Implement the NetSleuth algorithm.

Prakash, B. Aditya, Jilles Vreeken, and Christos Faloutsos. “Spotting culprits in epidemics: How many and which ones?.” 2012 IEEE 12th international conference on data mining. IEEE, 2012.

predict(G, k, diff_vec)[source]

Prediction of the NetSleuth algorithm.

Args:

G (networkx.Graph): The input graph.
k (int): Number of source nodes to identify.
diff_vec (torch.Tensor): The diffusion vector.

Returns:

seed_vec (torch.Tensor): A binary tensor representing identified source nodes.

test(adj, test_dataset, k)[source]

Test the NetSleuth algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.
test_dataset (torch.utils.data.dataset.Subset): The test dataset.
k (int): Number of source nodes.

Returns:

metric (Metric): Evaluation metric containing accuracy, precision, recall, F1 score, and AUC.

train(adj, train_dataset, k_list=[2, 5, 10])[source]

Train the NetSleuth algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.
train_dataset (torch.utils.data.dataset.Subset): The training dataset.
k_list (list): List of the numbers of source nodes to try.

Returns:

opt_k (int): Optimal number of source nodes.
opt_auc (float): Optimal Area Under the Curve (AUC) value.
train_f1 (float): Training F1 score value.

class GraphSL.Prescribed.OJC[source]

Bases: object

Implement the Optimal-Jordan-Cover (OJC) algorithm.

Zhu, Kai, Zhen Chen, and Lei Ying. “Catch’em all: Locating multiple diffusion sources in networks with partial observations.” Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 31. No. 1. 2017.

Candidate(G, Y, I, target)[source]

Identify potential source nodes based on the given criteria.

Args:

G (networkx.Graph): The input graph.
Y (int): Number of desired source nodes.
I (list): List of diffused nodes.
target (torch.Tensor): Target vector.

Returns:

K (list): List of potential source nodes.
G_bar (networkx.Graph): Subgraph containing potential source nodes.

get_K_list(G, Y, I, target)[source]

Get the list of potential source nodes.

Args:

G (networkx.Graph): The input graph.
Y (int): Number of desired source nodes.
I (list): List of diffused nodes.
target (torch.Tensor): Target vector.

Returns:

K (list): List of potential source nodes.

predict(G, Y, I, target, num_source)[source]

Prediction of the OJC algorithm.

Args:

G (networkx.Graph): The input graph.
Y (int): Number of source nodes.
I (list): List of diffused nodes.
target (torch.Tensor): Target vector.
num_source (int): Maximal number of source nodes.

Returns:

x (torch.Tensor): A binary vector representing identified potential source nodes.

test(adj, test_dataset, Y)[source]

Test the OJC algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.
test_dataset (torch.utils.data.dataset.Subset): The test dataset.
Y (int): Number of source nodes.

Returns:

metric (Metric): Evaluation metric containing accuracy, precision, recall, F1 score, and AUC.

train(adj, train_dataset, Y_list=[2, 5, 10])[source]

Train the OJC algorithm.

Args:

adj (scipy.sparse.csr_matrix): The adjacency matrix of the graph.
train_dataset (torch.utils.data.dataset.Subset): The train dataset.
Y_list (list): List of numbers of source nodes to try.

Returns:

opt_Y (int): Optimal number of source nodes.
opt_auc (float): Optimal Area Under the Curve (AUC) value.
train_f1 (float): Training F1 score value.

GraphSL package

Subpackages

Submodules

GraphSL.utils module

GraphSL.Prescribed module

Module contents