The New Space Race

python
visualization
GIS
politics
satellites
Author

Dan MacGuigan

Published

May 14, 2024

Recently, I listened to a story from the New York Times about how the world is entering a new space race. The US, China, and Russia are all vying for military dominance and developing new weapons to target satellites. The story also touched on the increasing role of private companies in space.

I was curious to see just how much satellite launches are accelerating. I found a few nice visualizations, like this one from the Economist. Let’s see if we can recreate these figures, but with an x-axis that extends to the present day.

For the data, we’ll turn to Jonathan’s Space Report, a fantastic (and old school) website with a massive amount of information on the entire history of spaceflight. It also has tons of cool data visualizations. This website appears to be the passion project of Jonathan McDowell, a researcher at the Harvard-Smithsonian Center for Astrophysics. So thank you Jonathan!

First, let’s download and read in the data.

import wget
import pandas as pd
import os.path

# download database of launches, if it doesn't already exist
if not os.path.isfile("satcat.tsv"):
  print("test")
  url = 'https://planet4589.org/space/gcat/tsv/cat/satcat.tsv'
  wget.download(url)

# read in tsv file to data frame
sats = pd.read_table("satcat.tsv",  sep='\t')

# we need to remove the first row, since it does not contain data
sats = sats.drop(index=0)

print("there are " + str(len(sats.index)) + " satellites in this dataset")

# examine data table
sats.head()
there are 59046 satellites in this dataset
C:\Users\DanJuliaPC\AppData\Local\Temp\ipykernel_33912\2828006063.py:12: DtypeWarning:

Columns (1,18,20,22,24,26,28,32,36) have mixed types. Specify dtype option on import or set low_memory=False.
#JCAT Satcat Piece Type Name PLName LDate Parent SDate Primary ... ODate Perigee PF Apogee AF Inc IF OpOrbit OQUAL AltNames
1 S00001 1.0 1957 ALP 1 R2 8K71PS No. M1-10 Stage 2 8K71A M1-10 (M1-1PS) 1957 Oct 4 - 1957 Oct 4 1933 Earth ... 1957 Oct 4 214.0 938 65.1 LLEO/I - -
2 S00002 2.0 1957 ALP 2 P 1-y ISZ PS-1 1957 Oct 4 S00001 1957 Oct 4 1933 Earth ... 1957 Oct 4 214.0 938 65.1 LLEO/I - :RE,:RC
3 S00003 3.0 1957 BET 1 P A 2-y ISZ PS-2 1957 Nov 3 A00002 1957 Nov 3 0235 Earth ... 1957 Nov 3 211.0 1659 65.33 LEO/I - :RE,:RC
4 S00004 4.0 1958 ALP P A Explorer I Explorer 1 1958 Feb 1 A00004 1958 Feb 1 0355 Earth ... 1958 Feb 1 359.0 2542 33.18 LEO/I - :UA,:UB,DEAL I:IA
5 S00005 5.0 1958 BET 2 P Vanguard I Vanguard Test Satellite H 1958 Mar 17 S00016 1958 Mar 17 1224 Earth ... 1959 May 23 657.0 3935 34.25 MEO - :UA,:VA

5 rows × 41 columns


Let’s plot the total number of satellites launched through time. First, we’ll need to reformat the LDate (launch date) column and deal with problematic entries.

import dateutil.parser as dateparser
import numpy as np

test = sats['LDate']

LDate_fmt=[]
probs=0
for i in test:
  try:
    LDate_fmt.append(dateparser.parse(i).strftime("%Y-%m-%d"))
  except Exception:
    LDate_fmt.append(np.nan)
    probs += 1

# add new formatted column
sats['LDate_fmt'] = LDate_fmt

# remove rows with NaN for date
sats_noNa = sats[sats['LDate_fmt'].notna()].copy()

# sort data frame by date
sats_noNa.sort_values(by=['LDate_fmt'], inplace=True)

# add cumulative sum column
max = int(len(sats_noNa.index)) + 1
print(max)
sats_noNa['cumsum'] = list(range(1, max, 1))

print(str(probs) + " rows had problematic launch dates, replaced with NaN")
58966
81 rows had problematic launch dates, replaced with NaN


Now for a quick plot.

import plotly.express as px

fig = px.line(sats_noNa, x='LDate_fmt', y="cumsum",
 title="Cumulative number of global satellites launches",
 template="plotly_dark",
 line_shape='hv') # line_shape will plot lines as steps
fig.update_xaxes(title_text="year")
fig.update_yaxes(title_text="satellites")
fig.update_traces(line_color='cyan', line_width=3)

# reduce margins for better viewing on mobile
fig.update_layout(margin=dict(l=20, r=20, b=20))

fig.show()


Let’s break it down into a yearly bar plot.

import collections

# add column with just launch year
dates = sats_noNa['LDate_fmt']
LDate_year=[]
for i in dates:
  LDate_year.append(dateparser.parse(i).strftime("%Y"))
sats_noNa['LYear'] = LDate_year

# get table of launches by year
LYear_table = dict(collections.Counter(sats_noNa['LYear'].tolist()))
df_data = []
for key in LYear_table:
  df_data.append([key, LYear_table[key]])
LYear_table_df = pd.DataFrame(df_data, columns=['year', 'launches']) # convert lists to dataframe

# now make a bar plot
fig = px.bar(LYear_table_df, x='year', y='launches',
 title="Global satellites launches per year",
 template="plotly_dark")
fig.update_traces(marker_color='cyan')
fig.show()


Our plot looks a little strange. There are massive peaks that seem out of place in 1999, 1982, and a few other years. Were there really more satellites launched in orbit in 1999 than in 2023?

We can see the issue also manifests in our previous line plot of cumulative launches. There are few points where the line goes vertical, indicating that many satellites were launched on the exact same day. The most obvious of these spikes occurred on May 10, 1999.

Is this real? Or is this an artifact of the data structure we downloaded from Jonathan’s Space Report?

Let’s dig into this in our next post.