Iron March - Fascist Forum Network Analysis (incomplete)
Introduction
In November 2019, the SQL database for defunct fascist forum Iron March was leaked to the public. Iron March, closed in 2017 for reasons that remain unclear, served as a breeding ground for right-wing political extremism, often edging into terrorism. Analysis of users and patterns found on this forum provides interesting insight into the trends present in the modern far-right through 2017 and may expose vectors for further research going forward.
Data
For this analysis, I’ve pulled all the necessary data directly from the MySQL dump, converting each user’s IP address to a set of estimated coordinates through the IPStack API. From here it’s been exported to CSV and is ready for use. We’ll start by converting our CSV into a Pandas DataFrame.
members = pd.read_csv('data/core_members_coords_extra.csv', index_col=0)
Let’s get a quick idea of what our data looks like.
members.sort_values(by="Member ID").head()
Member ID | Name | IP | Latitude | Longitude | ||
---|---|---|---|---|---|---|
0 | 1 | Александр Славрос | slavros_a@mail.ru | 178.140.119.217 | 55.764408 | 37.636600 |
5 | 2 | PhalNat | illuminatienlightened@hotmail.com | 68.37.21.125 | 42.250519 | -83.172760 |
9 | 3 | Blood and Iron | renegader23@aim.com | 68.10.255.89 | 36.764111 | -76.341103 |
10 | 4 | Mierce | hominemcura@gmail.com | 82.29.169.221 | 52.471390 | -1.735000 |
11 | 5 | Will to Power | tashkentfox@hotmail.com | 90.214.150.70 | 53.810280 | -1.544440 |
Great! This is well-formatted, easy-to-read data. However, it doesn’t tell us much by itself. We want to perform analysis on this data, and I’m particularly interested in two things: generating a clear vizualization of the forum and identifying key figures in the community. Let’s start with a visualization.
Visualization
There are a number of options at our disposal for visualizing our data (in fact, my earliest approach used QGIS and can be found here). For the purposes of keeping this analysis in the Python family, let’s use the geopandas
library. Here, we’ll read in a few geographic datasets for visualization.
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
states = gpd.read_file('data/gis/cb_2018_us_state_500k').to_crs(epsg=4326)
members_points = gpd.GeoDataFrame(members, geometry = gpd.points_from_xy(members.Longitude, members.Latitude), crs="epsg:4326")
Let’s make sure each of our datasets shares the same coordinate reference system (CRS). We’ll use
all(world.crs == x.crs for x in [cities, states, members_points])
because
- It makes it clear that we want
world
to set our baseline CRS. - It’s one operation performed against a list of objects, and I prefer to keep that obvious and explicit. Daisy-chaining comparisons obscures the purpose of the operation.
all(world.crs == x.crs for x in [states, members_points])
True
Now that we’re sure everything matches up, let’s take a look at what’s changed in our data.
members_points.sort_values(by="Member ID").head()
Member ID | Name | IP | Latitude | Longitude | geometry | ||
---|---|---|---|---|---|---|---|
0 | 1 | Александр Славрос | slavros_a@mail.ru | 178.140.119.217 | 55.764408 | 37.636600 | POINT (37.63660 55.76441) |
5 | 2 | PhalNat | illuminatienlightened@hotmail.com | 68.37.21.125 | 42.250519 | -83.172760 | POINT (-83.17276 42.25052) |
9 | 3 | Blood and Iron | renegader23@aim.com | 68.10.255.89 | 36.764111 | -76.341103 | POINT (-76.34110 36.76411) |
10 | 4 | Mierce | hominemcura@gmail.com | 82.29.169.221 | 52.471390 | -1.735000 | POINT (-1.73500 52.47139) |
11 | 5 | Will to Power | tashkentfox@hotmail.com | 90.214.150.70 | 53.810280 | -1.544440 | POINT (-1.54444 53.81028) |
You’ll notice we now have a geometry
column filled with POINT
s! members_points
is what’s called a GeoDataFrame, which is just an extension of the standard Pandas DataFrame that has a higher understanding of geographical information.
Plotting our Data
To plot our data, we’ll set our axis to be world
and plot each additional set of information (individual states and the location of each member) on top of it.
ax = world.plot(figsize=(30,30), color='none', edgecolor='grey')
states.plot(ax=ax, color='none', edgecolor='grey')
members_points.plot(ax=ax, markersize=5)
<AxesSubplot:>
def get_member_state(member_id):
return members_usa.loc[members_usa["Member ID"] == member_id]["State"]
def get_member_location(member_id):
try:
x = members_points.loc[members_points["Member ID"] == member_id].geometry.x.values[0]
y = members_points.loc[members_points["Member ID"] == member_id].geometry.y.values[0]
return (x, y)
except:
return (0, 0)
to = gpd.GeoSeries(LineString([get_member_location(1), get_member_location(7)]))
edges = pd.read_csv("edges.csv")
def dict_head(d, n):
for item in list(d.items())[0:n]:
print(item)
def get_member_name(id):
try:
return members.loc[members["Member ID"] == id].Name.item()
except ValueError:
return ''
messages = pd.read_csv("all_messages_headers.csv", index_col="mt_id")
connections = pd.read_csv("distinct_connections.csv")
d = defaultdict(list)
connection_tuples = zip(connections["msg_author_id"], connections["mt_to_member_id"])
for sender, recipient in connection_tuples:
d[sender].append(recipient)
g = nx.Graph()
for k, v in d.items():
g.add_edges_from([(k, t) for t in v if k != t])
with open("edges.csv", 'w') as f:
f.write("Sender,Recipient\n")
for edge in g.edges():
f.write(f"{edge[0]},{edge[1]}")
f.write('\n')
def convert_connection_to_location(tuple_of_ids):
return (get_member_location(tuple_of_ids[0]), get_member_location(tuple_of_ids[1]))
for edge in list(g.edges)[0:5]:
print(edge, convert_connection_to_location(edge))
(1, 3) ((37.63660049438477, 55.76440811157226), (-76.34110260009766, 36.76411056518555))
(1, 11) ((37.63660049438477, 55.76440811157226), (-6.267000198364258, 53.29199981689453))
(1, 20) ((37.63660049438477, 55.76440811157226), (0, 0))
(1, 23) ((37.63660049438477, 55.76440811157226), (115.8829574584961, -31.987459182739254))
(1, 25) ((37.63660049438477, 55.76440811157226), (25.65834045410156, 60.97164916992188))
lineseries = []
for edge in list(g.edges):
lineseries.append(gpd.GeoSeries(LineString(convert_connection_to_location(edge))))
lineseries_pd = pd.DataFrame(lineseries)
lineseries_pd.rename(columns={0: "geometry"}, inplace=True)
lineseries_gpd = gpd.GeoDataFrame(lineseries_pd)
lineseries_gpd.head()
geometry | |
---|---|
0 | LINESTRING (37.63660 55.76441, -76.34110 36.76... |
1 | LINESTRING (37.63660 55.76441, -6.26700 53.29200) |
2 | LINESTRING (37.63660 55.76441, 0.00000 0.00000) |
3 | LINESTRING (37.63660 55.76441, 115.88296 -31.9... |
4 | LINESTRING (37.63660 55.76441, 25.65834 60.97165) |
ax = world.plot(figsize=(30,30), color='none', edgecolor='grey')
states.plot(ax=ax, color='none', edgecolor='grey')
members_points.plot(ax=ax, markersize=5)
lineseries_gpd.plot(ax=ax, color='black', alpha=0.05)
<AxesSubplot:>
bt = nx.betweenness_centrality(g)
centrality = [(get_member_name(k), k, v) for k, v in sorted(bt.items(), key=lambda x: x[1], reverse=True)]
for line in centrality[:50]:
print(f"#{centrality.index(line) + 1}\t{line[1]}\t{line[0]+'':<20}\t{line[2]}")
#1 1 Александр Славрос 0.24749951007026946
#2 7 Daddy Terror 0.11138219902228376
#3 7600 Odin 0.08294539794003127
#4 9174 Atlas 0.062343767883346626
#5 353 SpookHunter 0.05620958276449609
#6 2170 Zeiger 0.0542133015520747
#7 130 Pro Patria Mori 0.05101861417992674
#8 2306 Sammy 0.04833233980581861
#9 6113 Aquila 0.04319930048452017
#10 6168 Raycis 0.04010608213921796
#11 35 American_Blackshirt 0.03932823869527507
#12 132 Myrrysmies 0.037157120344800613
#13 3491 New Canadian Empire 0.035986004270294886
#14 7424 Bear 0.03426406418693662
#15 49 Владимир_Борисов 0.033637839606292116
#16 960 0.03328299868224244
#17 7816 kllш 0.03159984495073807
#18 9393 Blackshirt 13 0.02838381655749529
#19 4873 Rintrah 0.026853258702452924
#20 2075 EvilCatholicNaziGoy 0.02620845655983659
#21 2 PhalNat 0.024780602906112358
#22 168 Growth of the Soil 0.023543498873975305
#23 1558 Spöket 0.02315096126098888
#24 6322 Асенов 0.022951479245250105
#25 9503 HermannTheGerman 0.022390651394737424
#26 9304 TheWeissewolfe 0.021976726108282035
#27 6249 The Yank 0.021918791051853674
#28 72 T-34-85Forthewin 0.020986748048481427
#29 288 Clive Bissel 0.02020353366276106
#30 4 Mierce 0.019875739014479567
#31 16 Talleyrand 0.019532703339924507
#32 9144 Neizbezhnost 0.018914314909373346
#33 7481 RadioFreeGB 0.01825898645824197
#34 3721 suspiciouscelerystalk 0.018018732299192576
#35 150 Insurrectionist 0.01767605239362129
#36 3 Blood and Iron 0.016697829349899367
#37 315 Бронеизтребител 0.01624798989444871
#38 9927 Victor Breivik 0.01622369540683362
#39 67 État de Stase 0.015685250972063098
#40 293 ☧☧☧ 0.015275113481666165
#41 84 FascistCapitalist 0.014628389068116072
#42 161 Loved I Not Honour More 0.014489454003828783
#43 2392 gingertoast 0.012930395203228334
#44 9475 mengligiraykhan 0.01289428196441316
#45 11 Four Suited Jack 0.012484008161833971
#46 7757 Carcamano 0.012451261917570088
#47 9288 Змајевит 0.011751561429145494
#48 2220 Kulturkampf 0.010976563605598308
#49 6155 AlbaNuadh 0.010884779568055595
#50 158 TotalitarianSocialist 0.010717973897778022
get_member_name(1)
'Александр Славрос'