top of page

EDA | Air Traffic Passenger Data

mellishamallikage

Updated: Sep 10, 2022

Examination of movement of passengers though the San Francisco International Airport.

Introduction

Working in the aerospace sector, Air traffic is extremely fascinating. The demand for air travel dictate the buoyancy of the aerospace manufacturing sector. Furthermore, commercial flights frequently are used to transport parts to their location as part of the globalised supply chain. Therefore understanding of air traffic, its trends and issues is pivotal.

Subsequently, in this project, Air traffic passenger data is a dataset published by the City and County of San Francisco will be reviewed.


Overview

The dataset contains over 1500 entries and covers Air Traffic data from 2005 to 2016 for planes flying in and out of the San Francisco International Airport. The majority of the of variables are strings holding data about specific flights. It also holds information on passenger numbers and year and month of the flights.


Based on the type of data, there is very little null values, with the exception of IATA codes. These columns have approx. 50 rows of null values in each column. Reviewing the other columns, these columns are codes for the data in "Operating Airline '' and "Published Airline". Consequently, the IATA could be dropped without negatively impacting the overall structure of the dataset.

Another feature to note is that the Month column is a string object. This may cause issues in some instances as ascending order for months begin with Jan whilst alphabetically it would begin with April. Therefore it will need to be reformatted. Converting the year and month to a datatime will also enable time series analysis of the data.


International

One of the fundamental features of Air Traffic is whether the flight is international or domestic. Separating the datasets by this feature reveals that the dataset is skewed in favour of international flights by 61%. This is understandable as the airport in concern focuses on international flights.


There are 54 airlines servicing international routes, in contrast to only 36 airlines service domestic routes. Furthermore, as the data is from San Francisco International Airport, there is an over representation of US and North American airlines such as United Airlines and Air Canada.


For reference, United Airlines appears twice with one stating "pre 7th Jan 2013". This is due to the merger between United Airlines and Continental. This activity would have likely impacted the two operations and therefore merging the data together would distort the observation.(Encyclopaedia Britannica, 2021)

Geo Region

As the name alludes, domestic flights are flights made in US. For international, it covers a number of regions, particularly Asia.

In addition, Canada and Mexico are the only values in this column where a name of a country is used rather than a region. It is due to confusion which will arise if the term such as "North America" is used as US would be excluded from such term. That being said, temporarily merging the data for Canada and Mexico shows that combined 2533 entries record flights for these destinations thus the second most popular destination for flights in and out of San Francisco International Airport.


Price

One may assume that prices of tickets will vary between international and domestic flights, with international flights being more expensive due to distance of travel and clearing additional regulations incurred as part of flying over foreign air space.


This has been acknowledged in the dataset as international flights dominate the "other" price category. However, interestingly, there appears to be expensive domestic flights too.


Further analysis of international flights indicates that these other prices predominately belong to neighbouring nations such as Canada and Mexico.

Terminal

Typically, to ensure the smooth running of the airport, set terminals would cater for specific flights and airlines. This is mirrored in this dataset as international flights overwhelmingly used the international terminal whilst domestic flights used terminal 1 and terminal 3 more than international flights.

Other Terminals

One aspect revealed in the above examination of terminals is that some flights used what is classed as "other" terminals. Examining this data, reveals that the following airlines used the "other" terminal.


The airlines in this list are predominately cargo airlines. These are flights specialised in carrying goods and therefore accommodate a limited number of passengers. In this dataset, the maximum number of passengers traveling though such "other" terminals is 65. The preparations of these flights ready for arrival/departure also differ from commercial flights hence the separation of such flights from commercial ones.


Passenger Count

Prior to exploring the passenger count, it may be wise to exclude the cargo flights as these are unlikely to be commercial passengers. Following this, the passenger figures for international and domestic commercial flights can be examined. Although the majority of the flights operating in and out of the airport are international flights, in terms of passenger numbers, domestic flights dominate.


Overtime, the passenger figures for domestic and international flights are as follows. 2005 and 2016 data are limited and do not cover a full 12 month period. As such, these results have been excluded in the graph.

Splitting the international and domestic data, indicates some similarities in the characteristics of passenger numbers.

Moreover, looking deeper it is apparent that the passenger count is subject to seasonal fluctuations.

This aspect can be further explored by decomposing the time series.


Decomposition of passenger numbers for domestic flights

For domestic flights, the passenger numbers has some patters.

  • Passenger numbers decrease in the colder months and peaks in the hotter months.

  • Overtime numbers have shown a steady increase.

  • There are some noise in the data but the level remains relatively stable throughout the time period

Decomposition of passenger numbers for international flights

Likewise there are some factors that are similar in the passenger numbers for international flights. However there are also some variations.

  • Passenger numbers decrease in the colder months and peaks in the hotter months.

  • In the years between 2008 and 2010, passenger numbers decreased but post 2010 it began to increase once more.

  • There are some noise in the data but the level remains relatively stable throughout the time period


The dip in the passenger numbers between 2008 and 2010, could be linked to the global economic recession (Pettinger, 2019). If so, it appears that international flights are more susceptible to economic conditions. This in tern proses an interesting challenge for the airport as international flights are more profitable though are more sensitive to external factors.


Clusters

The cluster and heatmap mirrors the findings of the time series. Typically the number of passengers travelling through the airport in the winter months were less than those travelling the spring and summer months. This is a general trend of Air Traffic and corresponds to the supporting information provided for this dataset.



Activity Type Code

Another feature which the dataset provides is information on the type of journeys the fights were conducting as they travelled through the airport. For both domestic and international, the data is relatively similar (i.e. there were more deplaned and enplaned flights than transit). However, interestingly, proportionally there were fewer transits for international flights than domestic flights. These may be due to the location of the airport relative to the flight routes.

For domestic flights, enplaned and deplaned follow a generally similar trend. Meanwhile, transit flights remain low throughout the dataset.

As for international flights, there seems to be some fluctuations between the deplaned and enplaned flights. One possible reasoning may be due to cabin crew requiring rest before departure, though further investigation is required to verify this. As for transit, it has seen a gradual decrease overtime, which may be due to improvements in engines/aircraft.


In terms of destinations, it is clear that transit flights are more probe to occur for specific regions. Curiously, this includes Canada and Mexico, which are relatively closer to the US than destinations such as the Middle East which did not have nay transit flights.

In terms of price, international flights had more “other” prices as noted before. However as the below breakdown shows transit flights were all under “other”. This is likely to be an indication that transit are long haul flights and therefore should be more expensive.

As for prices, for international flights, transit mainly occurred in "other" prices.


For domestic flights the data is more diverse and it is possible to secure low fare tickets for a transit flight.


However for domestic flights, transit flights occurred in both low fare and other flights.

That being said, the likelihood favoured "other" more than low fares









Conclusion

Airports are a complex centres with various factors needing to operate smoothly to ensure efficient running. Depending on the airport and its location, it can cater for different types of flights, with San Francisco International Airport handling a high number of international flights, particularly to and from Asia. International flights are also likely to be more expensive and therefore should yield more profits for the airport from per passenger.


However, a majority of the passengers traveling though the airport are using domestic flights. These passenger numbers also have an upward trends. Domestic flights are also not sensitive to economic changes and therefore will be a more stable source of income for the airport.


Regardless, international and domestic flights both have a seasonal trend with summer months proving to be the busiest for the airport in terms of passenger numbers.

The airport also caters for cargo flights


There are a few aspects not covered under this project which may be enlightening. Firstly, how does the observations made in this project differ for a different airport? Secondly, how has Covid-19 and the conflict in Russia affected the dataset? For reference, following the conflict, some routes to and from Asia have needed to be redirected to avoid the region. (Harper, 2022)

“The journey of a thousand miles begins with a single step.”Lao Tzu

Please see GitHub for the python code used in this project. ​

20 views0 comments

Recent Posts

See All

Comments


Join my mailing list

Thanks for submitting!

  • LinkedIn
  • GitHub-Mark
  • tableau icon
  • Kaggle

© 2023 by The Mountain Man. Proudly created with Wix.com

bottom of page