Understanding planet formation processes through analysis of data from single- and multi-exoplanetary systems


I have always been excited to explore the world we live in, and am fascinated by the new findings in astronomy. In the last few years, many exoplanetary systems have been discovered. While much work has focused on finding other habitable Earth-like planets, a lot of exoplanetary data has still not been fully analysed. In my project, I used data from the NASA exoplanet archive to study various aspects of planetary system formation. In particular, I examined if the distributions of planet mass, radius, orbital period, stellar temperature, and metallicity are statistically different in single- and multi-planetary systems. The cumulative frequency distributions of these parameters were analyzed and compared to check if the distributions are statistically different. Planet properties (mass, radius, and period) and stellar metallicity appear significantly different for single and multi-planetary systems. The orbital period ratios of planets in multi-planetary systems were also evaluated to look for common types of two-body resonances. I observe robust resonances at ratios 3:2, 5:3 and 2:1, with smaller but significant resonances at 4:3, 5:2 and 3:1 ratios.  My findings can help better understand planet formation and evolution mechanisms and they also show that the solar system is a very atypical multi-planetary system. In future, I plan to perform simulations to figure out the initial conditions that might lead to the formation of such systems. I hope that data from new missions like TESS will help me find solutions to unanswered questions that I have come across during the course of my project.

Question / Proposal


The discovery of exoplanets has completely changed our perception of the universe we live in. Thanks to various new discoveries we now know that there are more planets than stars in our galaxy. Exoplanet research is one of the most fast-moving and exciting areas for professional astronomers. An immense amount of publicly-available data has been generated that has not been fully analysed! For my project I decided to examine this data to answer the following question:

Can a statistical analysis of various parameters of exoplanetary systems help us understand how our own solar system formed and evolved? In particular,

  • How are single planetary systems different from multiple planetary systems?
  • Are the distributions of planetary mass, radius, and orbital period similar in both types of systems?
  • Do single planetary systems evolve differently than multi-planetary systems?
  • Are the differences real or limitations due to our detection techniques or insufficient data?
  • What kind of resonances exist in multi-planetary systems? Are they similar to those seen in our solar system?



Analysis of exoplanetary data, specifically the comparison of single and multiple planetary systems, will provide us with information about how planetary formation and evolution occurs, and we can apply this information to our own solar system to better understand its formation.

A better understanding of the differences between single and multi-planetary systems based on observation and analysis of trends in cumulative frequency distributions, histograms, and statistical measures of various planetary/stellar parameters might show up differences which have not been studied before.


Research in exoplanets is one of the most active and expanding areas in astronomy today. While the first discoveries were made using earth-based telescopes, a lot of the current information on exoplanets has been obtained thanks to space telescopes like NASA’s Kepler mission that was in operation from 2009 till it was recently retired in October 2018. The exoplanetary data is not only analysed by professional astronomers, but is also publicly accessible at two online databases – the NASA exoplanet archive and Exoplanet Orbit Database. One of the major motivations is the search for exoplanets which may lie in the “habitable zone”, where temperatures would allow liquid water to exist. However, a lot of exoplanetary data has still not been fully analysed. Detailed analysis of exoplanetary systems can help in understanding the origin/evolution of planetary systems, including our solar system.

I first started by reading standard textbook chapters on exoplanets (e.g books by Seager and Chaisson and McMillan). My mentor taught me how to use the analysis tools on the online database and some basic python scripting. I started my research analysing the distribution of the observed exoplanets in terms of their mass, planet radius, separation from the parent star, orbital period, density, etc. to see if patterns could be found, taking help of published work (e.g. Winn2014, Bashi2017)  I looked for correlations between the properties of the parent star and the orbiting planet(s).  I then narrowed my study to compare how single-planetary systems are different from multi-planetary systems, and analyse resonances in multi-planetary systems.

Exoplanetary parameters have been studied in detail, but a statistical comparison of single- and multi-planetary systems has not been reported before. I decided to focus on this particular area when I noticed that the NASA database had an option to “filter” systems with >1 planet, and wondered what the differences would be.

There has been earlier research on two-body resonances in multi-planetary systems [Popova2015, Aschwanden2017]. However, the newer 2016 Kepler data release more than doubled the dataset size. I report previously unseen resonances and compare them to resonances in the Solar System.

Fulton et al. have shown that a gap exists in the radius distribution between the (previously interchangeably used terms) “Super-Earths” and “Mini-Neptunes”, suggesting that they belong to two different planet families. I show that this gap exists in both single- and multi-planetary systems.

My research will help us better understand the formation and evolution of planetary systems. We cannot travel back in time to see how our own Solar System evolved, however observing different exoplanetary systems at various points in their evolution can provide important insights. This research provides clues towards the answers of fundamental questions we ask ourselves – how did the Earth come to exist and what happens next? With TESS already observing thousands of stars and on the verge of detecting 200+ planet candidates, and follow-up missions like JWST, TMT etc. helping to better characterise these planets, we will gain an improved understanding of the difference in their formation mechanisms.

Method / Testing and Redesign

Data for single- and multi-planetary systems were obtained from the NASA exoplanet archive (Data Release 22 (2018)) and the Exoplanet Orbit Database. Python scripts were used to plot cumulative frequency distributions, resonance histograms, and perform statistical comparisons. For the resonance calculations, orbital period ratios were determined using spreadsheets. All calculations were done on my laptop computer.

By using a two-sample Kolmogorov-Smirnov (K-S) test, we aim to determine if the underlying distributions of planetary and stellar parameters are different in single vs. multi planetary systems. The planetary parameters I analysed were the distributions of mass, radius, orbital period, and for stellar parameters I looked at distributions of effective temperature and metallicity. These were chosen because they are the fundamental parameters from which other information (eg. density) can be derived. 

The K-S test is used as it is a nonparametric hypothesis test (does not assume normal distributions) that measures the probability that a chosen dataset is drawn from the same parent population as a second dataset. This is particularly valuable in astronomy, as it does not require one to know the mathematical distribution of observed properties of planets, stars, etc. If the p value returned by the K-S test is smaller than 0.05, one can conclude that the two samples are derived from different underlying distributions at greater than 95% confidence. We also compared our data using the Anderson-Darling test, which gave similar conclusions.


(code used for metallicity KS test)

For some of my initial work, I also made histograms and scatter plots using the in-built plotting tools on the online databases.

I encountered a few issues while analysing the data:

a) While writing python scripts for the K-S tests, initially I did not realize that for stellar parameters, the same value was read more than once in multi-planetary systems, thereby biasing the results. I fixed this error by dropping the duplicates in the host-name column, but then found that planetary parameters, which were different for each planet in multi-planetary systems, were not read correctly. The easiest solution was to make separate files and write the code separately for each parameter.

b) While calculating the multi-planetary resonances, I noticed that the planet periods were not in ascending or descending order, since planets are named in order of their discovery and in many cases a shorter-period planet had been discovered after a longer-period planet. To fix this, I had to manually sort periods for each system from largest to smallest.


(code for histogram of resonances)

My mentor independently rechecked my calculations to ensure they were correct and the conclusions meaningful.


I checked if the distributions of planet mass, radius, orbital period, stellar temperature, and metallicity are statistically different in single- and multi-planetary systems by plotting and comparing the cumulative frequency distributions of these parameters and performing a two-sample Kolmogorov-Smirnov test. For planet mass, radius, and period, and stellar metallicity, the p-value returned by the K-S test indicated that the underlying distributions were statistically significantly different.

In the graph comparing planet mass, frequency distributions clearly look statistically different. In single-planetary systems 60% of planets are heavier than Jupiter whereas for multi-planetary systems only 30% have mass greater than Jupiter. The lack of low-mass single planets could probably arise because a single large planet may preferentially accrete material from the protoplanetary disc, not allowing multiple planets to form. However, this could also be an observational limitation, since detecting an isolated small planet is difficult (unlike multi-planetary systems where transit-timing variations can detect small planets).

Comparing planet radii, the frequency distributions show that single-planetary systems have two distinct families, with a clear subset of giant planets, while multi-planetary systems have mostly small planets with very few planets larger than Neptune. An interesting observation is that the gap in the radius distribution between super-Earths and mini-Neptunes persists in both single and multi-planetary systems! This suggests that the gap is intrinsic to the planet formation process rather than the natal environment.

Frequency distributions for the orbital period show statistically significant differences between the single and multi-planetary systems. Most single planets orbit their host star within 10 days, while the average period in multi-planetary systems is larger.

Comparing host star metallicity again brings out statistically significant differences. While single planets are found around stars in a narrow metallicity range around the solar value (zero), 60% of planets in multi-planetary systems orbit stars with sub-solar metallicity (less than zero). Larger stellar metallicity would imply larger metallicity in the protoplanetary disc and hence more solid material, making it easier to form 10-15 earth-mass cores needed to form giant planets, not leaving behind enough material to form multiple planets.

The cumulative frequency distributions of stellar temperature show no significant statistical differences. This may just be due to Kepler preferentially looking at Sun-like stars to increase the likelihood of finding Earth-like planets.

An analysis of pairs of orbital periods in 2- to 8-planet systems shows robust 2:1, 3:2 resonances similar to the common resonances seen in the solar system, and other resonances (5:2, 5:3 etc.) My analysis is based on a substantially larger dataset compared to earlier works on exoplanetary system resonances. In some cases the resonances are slightly shifted from exact integer ratios. Most probably these systems are evolving into or from the resonance condition, or are affected by perturbations from other bodies.

My results show our Solar System to be an extremely unusual planetary system that does not follow the typical trends seen in exoplanetary systems. These results can help better understand the formation and evolution of the Solar System and refine search techniques for future exoplanet missions.


On comparing the various properties of single- and multi-planetary systems, the results of my project show that single- and multi-exoplanetary systems have vastly different stellar and planetary properties, be it the mass and radius of the planet, or the metallicity of the star.

Comparing our Solar System to exoplanetary systems, we find that the Solar System is very different from a typical multi-planetary system. Most multi-planetary systems have planets with sizes between 1-4 Earth radii, but there is no such planet in the Solar System. Most multi-planetary systems also tend to be close to the host star and compact with orbital periods of less than 100 days unlike our Solar System. Furthermore, multi-planetary systems tend to only have smaller Earth to Neptune-sized planets whereas single planetary systems tend to have more massive Jupiter-sized planets. On the contrary, our Solar System has planets of greatly varying radii. The small integer ratio resonances found while examining exoplanet orbital periods also points to compact planetary systems.

The results of my analysis conclusively answered the questions I had asked initially, and this research will help us gain a better understanding of how planetary systems form and evolve.

The main limitation in my results is that since Kepler was gathering data mainly from Sun-like stars in a select, specific region of the sky, the data collected could be inherently biased. This may explain why the comparison of effective temperature of host stars did not show differences in single- and multi-planetary systems. Further, there could be observational biases that have not been taken into account –  e.g. planets with orbital periods much greater than one year may not have been detected yet, as at least three transits need to be detected for consideration as a planet candidate. Hence some single-planetary systems may actually be multi-planetary systems, where other planets have not yet been detected.

In future, I would like to perform dynamical simulations if possible to figure out the initial conditions that might lead to the formation of such systems. We can also propose follow-up observations of these planetary systems to better characterize their host star to explore in-depth various stellar properties like chemical composition (carbon-to-oxygen or magnesium-to-iron ratios) to look for differences. These kind of studies can help us better understand some of the processes happening during the formation and evolution of planetary systems. Thinking beyond astronomy, some statistical tools used for analysing exoplanetary systems may even have wider applicability in other areas, e.g. evaluating learning outcomes in single-large vs. multiple-small schools.

I learnt a lot during the course of the project, and also came up with several questions that I still need to answer. For example, I observed that high-eccentricity orbits seems to be more prevalent as planet mass increases but there does not seem to be an obvious explanation.

I would like to find answers to some of these questions as I believe it will help us to better understand the amazing universe we live in.

About me

I’m a sophomore at the GD Somani Memorial School in Mumbai. My favourite subjects are physics and chemistry. Beyond science, I love reading, quizzing, playing football, solving cryptic crosswords and training in Hindustani classical vocal music.

Living in a light-polluted metropolis, I’ve always been amazed at the starry night sky from a dark place. As a young child I vividly remember tagging along with my father on stargazing trips. At one of these events, while hunting for clusters and double-stars, a chance conversation with my eventual mentor about his field of research so intrigued me that I read about exoplanets the moment I reached home.

I'm inspired by many women in STEM, from textbook examples like Marie Curie to today's researchers. I was completely awed by Jocelyn Bell-Burnell when I had the opportunity to attend a talk of hers, and I admire Sara Seager for her groundbreaking research and for her immense strength of character.

I don’t really know what I want to be when I grow up! I’ve met so many people from different fields who have opened my eyes to the infinite possibilities around me. I do know that I want to motivate and inspire young women in STEM fields, especially since in my country only 14% of researchers are female.

Winning the Google Science Fair would be an incredible honour. It would enable me to convey my passion for research to a wide section of people, and hopefully inspire other young girls to pursue careers in STEM. 

Health & Safety

I did this project under the guidance of Mayank Narang, a graduate student in the Department of Astronomy and Astrophysics at the Tata Institute of Fundamental Research, Mumbai.

Contact Information:

Email: mayank.narang@tifr.res.in

Phone: +91 22 2278 2387

Since this was a theoretical project analyzing data no specific health and safety procedures for experimentation were required. All the calculations were done on a personal laptop computer. 

Bibliography, references, and acknowledgements


“Exoplanets”, by Sara Seager, Univ. of Arizona Press (2011)

“Welcome to the Universe: An Astrophysical Tour”, by Neil deGrasse Tyson, et al., Princeton Univ. Press (2016)

Chapter 15, exoplanets in “Astronomy Today”, 8th ed., Eric Chaisson and Steve McMillan, Pearson (2013)



Web resources






Journal and arXiv articles

B.J. Fulton E.A. Petigura et al., “The California-Kepler Survey. III. A Gap in the Radius Distribution of Small Planets”, Astrophysical  Journal 154, 108 (2017) DOI: 10.3847/1538-3881/aa80eb

D. Bashi, R. Helled, S. Zucker and C. Mordasini, “Two empirical regimes of the planetary mass-radius relation”, Astronomy & Astrophysics 604, A83 (2017) DOI: 10.1051/0004-6361/201629922

J.N. Winn, D.C. Fabrycky, “The Occurrence and Architecture of Exoplanetary Systems”, Annual Review of Astronomy and Astrophysics, 53, 409 (2015) DOI: 10.1146/annurev-astro-082214-122246

E.A. Popova, I.I. Schevchenko, “Orbital resonances in exoplanetary systems”, J. Phys. Conf. Series. 572, 012006 (2014) DOI: 10.1088/1742-6596/572/1/012006

M.C. Ghilea, "Statistical distributions of mean motion resonances and near-resonances in multiplanetary systems", arXiv:1410.2478v3 [astro-ph.EP] (2015)

M.J. Aschwanden, F. Scholkmann, "Exoplanet Predictions Based on Harmonic Orbit Resonances", arXiv:1705.07138v1 [astro-ph.EP] (2017)

C.E. Munoz Romero, E Kempton,  "No Metallicity Correlation Associated with the Kepler Dichotomy", Astronomical Journal, 155 134 (2018) DOI: 10.3847/1538-3881/aaab5e

M. Narang, P. Manoj et al. “Properties and Occurrence Rates for Kepler Exoplanet Candidates as a Function of Host Star Metallicity from the DR25 Catalog”, Astronomical Journal 156, 221 (2018) DOI: 10.3847/1538-3881/aae391



I profusely thank Mayank Narang, a research scholar at the Tata Institute of Fundamental Research, Mumbai, for his overall guidance, patiently answering all my questions (even the silly ones!), and for teaching me how to use the database and plotting tools, and python scripting.

I also thank Prof. Manoj Puravankara for his invaluable suggestions.

I thank my father for introducing me to the wonders of the night sky and supporting my curiosity.

Official acknowledgement statements: This research has made use of 1) the Exoplanet Orbit Database and the Exoplanet Data Explorer at exoplanets.org, and 2) the NASA Exoplanet Archive, which is operated by the California Institute of Technology under contract with NASA under the Exoplanet Exploration Program.