Classifying celestial objects with ML

1 minute read

The aim of this notebook is the classification of observations of astronomical objects in three different categories: stars, galaxies, quasars. To do so I use data from the Sloan Digital Sky Survey Classification.

See on Github

The data

The Sloan Digital Sky Survey is a project that periodically provides data from space imaging free to the public. This project specifically uses data from the RD14 data release.

For this purpose a special 2.5 m diameter telescope was built at the Apache Point Observatory in New Mexico, USA. The telescope uses a camera of 30 CCD-Chips with 2048x2048 image points each. The chips are ordered in 5 rows with 6 chips in each row. Each row observes the space through different optical filters (u, g, r, i, z) at wavelengths of approximately 354, 476, 628, 769, 925 nm.

Uploading image.png…

The dataset contains the following features:

Object identity and position

objid = Object Identifier
ra = J2000 Right Ascension (r-band)
dec = J2000 Declination (r-band)

Filters

u = better of DeV/Exp magnitude fit
g = better of DeV/Exp magnitude fit
r = better of DeV/Exp magnitude fit
i = better of DeV/Exp magnitude fit
z = better of DeV/Exp magnitude fit

Acquisition metadata

run = Run Number
rereun = Rerun Number
camcol = Camera column
field = Field number

Object info

specobjid = Object Identifier
class = object class (galaxy, star or quasar object)
redshift = Final Redshift
plate = plate number
mjd = MJD of observation
fiberid = fiber ID

Some features are removed as they consist of unique identifiers or are irrelevant to the identification of astronomical objects: objid, specobjid,fiberid,mjd,plate, rerun, run

The following color channels are also removed, as they are highly correlated with each other and another channel: r, i, z

Training and optimizing the model

The algorithm chosen for the classification task is XGBoost Classifier. Algorythm optimization is achieved with a gridsearch and the parameters of the best performing model are: {'gamma': 1, 'learning_rate': 0.05, 'max_depth': 6, 'min_child_weight': 1, 'n_estimators': 100}

After optimization the scores for the train and test set are, respectively 0.998 and 0.99.

Share on

Twitter Facebook Google+ LinkedIn

Helena Canever

Classifying celestial objects with ML

The data

Object identity and position

Filters

Acquisition metadata

Object info

Training and optimizing the model

Share on

You May Also Enjoy

Predicting Walmart sales with ML

Exploratory Data Analysis on Speed Dating data

Detecting Uber pickups hotspots in NY

Scraping Booking.com and generating an hotel database of France’s best destinations