Dan’s Data Viz - The New Space Race

Recently, I listened to a story from the New York Times about how the world is entering a new space race. The US, China, and Russia are all vying for military dominance and developing new weapons to target satellites. The story also touched on the increasing role of private companies in space.

I was curious to see just how much satellite launches are accelerating. I found a few nice visualizations, like this one from the Economist. Let’s see if we can recreate these figures, but with an x-axis that extends to the present day.

For the data, we’ll turn to Jonathan’s Space Report, a fantastic (and old school) website with a massive amount of information on the entire history of spaceflight. It also has tons of cool data visualizations. This website appears to be the passion project of Jonathan McDowell, a researcher at the Harvard-Smithsonian Center for Astrophysics. So thank you Jonathan!

First, let’s download and read in the data.

import wget
import pandas as pd
import os.path

# download database of launches, if it doesn't already exist
if not os.path.isfile("satcat.tsv"):
  print("test")
  url = 'https://planet4589.org/space/gcat/tsv/cat/satcat.tsv'
  wget.download(url)

# read in tsv file to data frame
sats = pd.read_table("satcat.tsv",  sep='\t')

# we need to remove the first row, since it does not contain data
sats = sats.drop(index=0)

print("there are " + str(len(sats.index)) + " satellites in this dataset")

# examine data table
sats.head()

there are 59046 satellites in this dataset

C:\Users\DanJuliaPC\AppData\Local\Temp\ipykernel_33912\2828006063.py:12: DtypeWarning:

Columns (1,18,20,22,24,26,28,32,36) have mixed types. Specify dtype option on import or set low_memory=False.

	#JCAT	Satcat	Piece	Type	Name	PLName	LDate	Parent	SDate	Primary	...	ODate	Perigee	Apogee	Inc	OpOrbit	OQUAL	AltNames
1	S00001	1.0	1957 ALP 1	R2	8K71PS No. M1-10 Stage 2	8K71A M1-10 (M1-1PS)	1957 Oct 4	-	1957 Oct 4 1933	Earth	...	1957 Oct 4	214.0	938	65.1	LLEO/I	-	-
2	S00002	2.0	1957 ALP 2	P	1-y ISZ	PS-1	1957 Oct 4	S00001	1957 Oct 4 1933	Earth	...	1957 Oct 4	214.0	938	65.1	LLEO/I	-	:RE,:RC
3	S00003	3.0	1957 BET 1	P A	2-y ISZ	PS-2	1957 Nov 3	A00002	1957 Nov 3 0235	Earth	...	1957 Nov 3	211.0	1659	65.33	LEO/I	-	:RE,:RC
4	S00004	4.0	1958 ALP	P A	Explorer I	Explorer 1	1958 Feb 1	A00004	1958 Feb 1 0355	Earth	...	1958 Feb 1	359.0	2542	33.18	LEO/I	-	:UA,:UB,DEAL I:IA
5	S00005	5.0	1958 BET 2	P	Vanguard I	Vanguard Test Satellite H	1958 Mar 17	S00016	1958 Mar 17 1224	Earth	...	1959 May 23	657.0	3935	34.25	MEO	-	:UA,:VA

5 rows × 41 columns

Let’s plot the total number of satellites launched through time. First, we’ll need to reformat the LDate (launch date) column and deal with problematic entries.

import dateutil.parser as dateparser
import numpy as np

test = sats['LDate']

LDate_fmt=[]
probs=0
for i in test:
  try:
    LDate_fmt.append(dateparser.parse(i).strftime("%Y-%m-%d"))
  except Exception:
    LDate_fmt.append(np.nan)
    probs += 1

# add new formatted column
sats['LDate_fmt'] = LDate_fmt

# remove rows with NaN for date
sats_noNa = sats[sats['LDate_fmt'].notna()].copy()

# sort data frame by date
sats_noNa.sort_values(by=['LDate_fmt'], inplace=True)

# add cumulative sum column
max = int(len(sats_noNa.index)) + 1
print(max)
sats_noNa['cumsum'] = list(range(1, max, 1))

print(str(probs) + " rows had problematic launch dates, replaced with NaN")

58966
81 rows had problematic launch dates, replaced with NaN

Now for a quick plot.

import plotly.express as px

fig = px.line(sats_noNa, x='LDate_fmt', y="cumsum",
 title="Cumulative number of global satellites launches",
 template="plotly_dark",
 line_shape='hv') # line_shape will plot lines as steps
fig.update_xaxes(title_text="year")
fig.update_yaxes(title_text="satellites")
fig.update_traces(line_color='cyan', line_width=3)

# reduce margins for better viewing on mobile
fig.update_layout(margin=dict(l=20, r=20, b=20))

fig.show()

Let’s break it down into a yearly bar plot.

import collections

# add column with just launch year
dates = sats_noNa['LDate_fmt']
LDate_year=[]
for i in dates:
  LDate_year.append(dateparser.parse(i).strftime("%Y"))
sats_noNa['LYear'] = LDate_year

# get table of launches by year
LYear_table = dict(collections.Counter(sats_noNa['LYear'].tolist()))
df_data = []
for key in LYear_table:
  df_data.append([key, LYear_table[key]])
LYear_table_df = pd.DataFrame(df_data, columns=['year', 'launches']) # convert lists to dataframe

# now make a bar plot
fig = px.bar(LYear_table_df, x='year', y='launches',
 title="Global satellites launches per year",
 template="plotly_dark")
fig.update_traces(marker_color='cyan')
fig.show()

Our plot looks a little strange. There are massive peaks that seem out of place in 1999, 1982, and a few other years. Were there really more satellites launched in orbit in 1999 than in 2023?

We can see the issue also manifests in our previous line plot of cumulative launches. There are few points where the line goes vertical, indicating that many satellites were launched on the exact same day. The most obvious of these spikes occurred on May 10, 1999.

Is this real? Or is this an artifact of the data structure we downloaded from Jonathan’s Space Report?

Let’s dig into this in our next post.