Data Discovery for Earth Science (Devise an Earth science dataset recommender to guide users to relevant datasets for studying events such as those represented in the Earth Observatory images and descriptions)
Go to ChallengeThe challenge here is to help researcher or any other people who is interested in topics in articles posted on Earth Observatory find possible dataset they want to look for and provide the readers with a separate dataset recommendation list on the bottom of an article. Though our service, general readers can learn deeply about topics discussed in the articles.
We develop a built-in service that NASA can implement on their Earth Observatory. When used on a webpage, it will search through the context and extracts topics and keywords that researcher would use for their research. For the searching, the program would use elasticsearch and score the keywords through tf-idf. Top 10 keywords with the most scores will be extracted from the context.
In scoring the keywords, the program would also search on other web contents such as Nature, Science news journal to collect recently raised topics and match them with the keywords generated from the Earth Observatory article. Then, the words that have the most matches get higher scores.
Then, the final list of keywords would be put into Aleph(search api) and check if there are any dataset that can be linked with the keywords. Through internal Aleph search filtering, the most relevant dataset is provided for each keywords.
The keywords would contain both link to download the dataset and description of the dataset. Moreover, if there is no dataset that matches with the keywords, these keywords would then be listed on a separate field within the article as a list of possible search recommendation.
The keywords would be included within the context as hyperlink format so if the readers click the words inside the context, they can directly visit the link to download the specific dataset. The recommendation list will be included on the bottom of the article, so readers can look into the topics on their discretion.
Huge amount of dataset inside NASA archive as well as EarthData webpage are not easy to handle without prior experience in earth science research. For the general citizens, they are more likely to read articles from Earth Observatory and find their interests in some topics and further develop their inquiries in NASA’s repositories. Earth Observatory is a great archive for general readers to learn and experience NASA’s project and research. Therefore, we wanted to provide more comprehensive tool for the readers on the page to help them navigate through their journey into NASA datasets.
We decided to provide bookmarklet for each article page, so readers don’t have to open any other application in order to search on research keywords provided in the article. In this way, it would be the most efficient and user-friendly way to interact with the readers.
Tools we used include Apache airflow(in order to create a pipeline that process the context on each article), Aleph(open source search engine that can filter down the most relevant dataset within NASA repositories).
There were difficulties how to implement the program inside the NASA Earth Observatory page. We don’t want to add any more pages, but integrate the recommendation list inside the existing pages. Additional front-end work should be done by web-page editors who can add our program features inside the existing web page.
Designer
Product Manager
Data Analyst