Knowledge Globe (KnoGlo)

Visit KnoGlo’s official website at

We present a new way to see how knowledge evolves by looking at what researchers around the world put their effort on specific industries in a range of time. By looking for what people were thinking and doing academically and professionally, we could observe the evolution in specific areas of study, and even how this world changes from a global and historical perspective.

Introducing Knowledge Globe (KnoGlo), a powerful tool to analyze and visualize the distribution of research interests for the selected topic and year. You can see how these research interests distributed by subjects, keywords, countries, publishers, and publication types.

For example, if you ask:

“In which country, researchers show more interests and put efforts on the field of the Internet?”

“In the United States, when did Social Media start to be mentioned in publications?”

“How many articles about LGBTQ was published, compare with 1965 and 2015?”

“What kind of specific topics (subjects/keywords) do researchers focused on in the field of digital media in 2017?”

KnoGlo will answer your question in an intuitive way by delivering several types of data visualizations.

How do KnoGlo works?
1. Where to get data?
2. How to get data (research challenge)?
3. How to manage data?
4. How to visualize data?
Our vision

How do KnoGlo works?

Our goal is to create datasets of the statistics of academic publications from all over the world, manage, and visualize them.

1. Where to get the data?

Our tool is able to gather information from Springer, which is one of the world’s leading scholar database. Powered by Springer Nature metadata API that provides metadata for academic and professional publications, we could access 12 million online documents, including journal articles, book chapters, and protocols. Using these data, we could analyze the statistic information about how many articles published in selected topic and time ranges.

2. How to get the data? (research challenge)

First things first, we need to get the data from Springer Nature, which is also my major research challenge. Springer Nature Metadata API provides raw data in JSON format. We build a tool in Python, which could gather the data from the JSON raw data filtered by topic and year.

Figure 1. A screenshot of the command-line interface of the Python tool.

The JSON data consists of several sets of data, including one that we need to use, which is “facets”. It includes the statistical data shows how many articles were published for each subject, keyword, country, publisher, and type (books or journals). In other words, we need to process the data from these six attributes in “facets”. Inside each attribute, there are two types of variables: “count” and “value”. “Count” represents the number of publications associated with each “value”.

Another research challenge is how to organize these data. After gathering data, the Python tool will create 6 datasets for each element in “facets”. The dataset consists in two columns, “count” and “value”. For each attribute, the Python tool will make two lists for “count” and “value” and then generate a CSV file. Each column contains data for each list.

3. How to manage data?

For specific cases, we would like to make comparisons between years and see how the number changes. For example, to see the change in the number of publications about the computer from several countries between 1988 and 2018 (per 5 years). After we created the dataset for 1988, 1993, 1998, 2003, 2008, 2013, and 2018, we’d like to merge these seven datasets in one. We can implement this using LibreOffice by creating a query.

Figure 2. Create a query in design view.

Figure 3. Merged datasets.

4. How to visualize data?

Once we have the data, we can visualize it as graphs using R programming language, which could tell you the results in an intuitive way and tell you stories in a selected time and space.

Say you are asking:

“For which subject, researchers show more interests in the field of computer in the year 2013?”

We can easily visualize this in a bar plot:

Figure 4. Bar plot visualization.

This bar plot contains the top 20 results of the most frequent subjects. As we can see, for all publications in the Springer Nature database about computers around the world in 2013, “computer science” and “artificial intelligence” are the most popular terms as subjects, and researchers show more interests on these as focuses. The result demonstrates that artificial intelligence is one of the most popular focuses on the computer world. Also, we can see some cross-disciplinary subjects, like “Medicine & Public Health”, which ranks the 7th.

We can even generate interactive graphs using rbokeh. For example,

“In which country, researchers focus more in the field of digital media in the year 2017?”

See this visualization in rbokeh. This is an interactive graph exported in HTML format. You can move your mouse on data points to see more details.

We can see that in 2017, articles about digital media published at most in the United Kingdom and the United States.

Our Vision

KnoGlo is designed for everyone, especially students, researchers, historians, and professionals.

People would like to do historical researches on a topic and its changes.
Professionals who would like to see the change in the industries that they are getting involved with.
Students who need statistical data about the change of a topic for their specific researches.

In the future updates, we’d like to add more features and bring you a broader range of research and a better experience.

Another database and API that could provide information about the academic and professional publications came out from what institution/organization/company.
More filters besides topic and year (query implementation in Python).
More visualization ideas, including map view.
GUI for the Python tool.

We believe that by observing the history of academics and see how researchers from all over the world drive the development of everything and making impacts, KnoGlo will help us to make predictions, observe today’s changing world, and shape our future.

About KnoGlo’s Team

Knowledge Globe is a final project for the Data Management and Data Analysis course offered by the University of Chicago. Who made and participated in this project are students and faculties from UChicago’s brand-new graduate program, Digital Studies of Language, Culture, and History.

Learn more about UChicago’s Digital Studies program.

Junshu Liu is the founder and director of this project. He is a graduate student in UChicago majoring Digital Studies of Language, Culture, and History. Learn more about Junshu and his past projects at junshuliu.com.

Professor Jeffrey R. Tharsen is the instructor of the Data Analysis course. He is one of the advisors for this project. He is an excellent person who masters not only data science but also East Asian language and culture. With the help from Dr. Tharsen, I learned a lot of R programming skills, and he helped me a lot Learn more about Dr. Tharsen at his website, tharsen.net.

Professor Miller Prosser is the instructor of the Data Management course. He is also the advisors for Knowledge Globe. Dr. Prosser is also a professional data analyst that proficient in a lot of data management software and tools. As a beginner for data science, I learned a lot with the help of Dr. Prosser in the past 2 months. Learn more about Dr. Prosser.

Notes

This project was demoed on December 5th and 7th, 2018 in class.
This project is now officially available. It was released on December 15, 2018.
Stay tuned for future updates with more new features.

Visit KnoGlo’s official website at

tinyurl.com/knoglo

See KnoGlo’s GitHub repository at

github.com/JunshuTedLiu/KnowledgeGlobe