Converting MySQL Table Data to a Graphml File
I recently found myself in the situation where I was given access to a huge MySQL database that contained network traffic flows and IDS signature match data. As I work a lot with graph-based approaches, I needed to convert the table's flow data into a graphml file for later visualization and analysis with scripts I have already written. Now without further ado here's the code:
import pymysql.cursors
import pymysql
import networkx as nx
import sys
# Connect to the database
conn = pymysql.connect(
host='localhost',
user='root',
password='my-secret-pw',
db='flowdata',
charset='utf8mb4',
cursorclass=pymysql.cursors.SSCursor
)
graph = nx.DiGraph()
cursor = conn.cursor()
cursor.execute('SELECT src_ip, dst_ip FROM flows')
for i, row in enumerate(cursor):
sys.stdout.write("\rReading line %s" % i)
sys.stdout.flush()
graph.add_edge(row[0], row[1])
nx.write_graphml(graph, "trente-flowgraph.graphml")
It's obvious to see that I only need the data from the first two columns as they contain source and destination IP. The trick here is to use pymysql.cursors.SSCursor
. This will prevent pymysql
from loading the whole result set from the SELECT * ...
query into RAM. Another catch is that pymysql
apparently is not available for Python 3 yet. SQLAlchemy is a good workaround for bigger projects (such as my Pastebin Scraper) but in this case it's complete overkill. Just run the script with python2.7
and you're good.