System for collecting and processing biobibliographic data
Institute of Literary Research

The Institute of Literary Research was established in 1948 as a research institute of the Polish Academy of Sciences. The main purpose of the institute is to conduct research on the history of Polish literature, literary theory, cultural history and literary documentation. The institute's employees are mainly specialized professors, associate professors and researchers. IBL participates in many international scientific projects. One of them is a research and scientific program entitled "Polish cultural heritage in the new Europe". Within the framework of this project, the IBL closely cooperates with many foreign cultural centers.

The Institute of Literary Research of the Polish Academy of Sciences has asked us to create a system to enable teamwork on the creation and expansion of a biobibliographical dictionary of Polish writers and scholars of Polish literature. The dictionary is the culmination of years of work by a multi-person research team. IBL needed a tool that would enable them to keep track of the changes made to the dictionary and provide a simple and clear system for transferring information between collaborators, and in the long run facilitate access to the vast knowledge base on Polish literature for literary researchers from all over the world.

As part of the order, we created a state-of-the-art system adapted to the specifics of processing big data previously created and edited in natural language. We deeply analyzed the needs of academics and built an advanced process for information flow between users. In addition, we imported and integrated large collections of historical biobibliographic data, making records from several sources identified by the client consistent.

A great asset of our solution was the structuring and consistency of the dictionary through the use of tools that enforce adherence to a set editorial convention. As a result, all team members using our solution work on texts with a well-defined structure.

One of the sources we imported was the online dictionary "Polish writers and literary scholars of the late 20th and early 21st centuries." The content was downloaded using web scraping tools. The analyzed and organized data was transferred to the system we created. A great advantage of our solution was the structuring and consistency of the dictionary through the use of tools that enforce adherence to a set editorial convention. As a result, all team members using our solution work on texts with a strictly defined structure. It is worth mentioning that our client's dictionary is unique in many respects, because in addition to the basic biographical and bibliographical sections, it also includes selected studies for the author's works as well as materials collected over decades from letter correspondence with writers.

We built a platform for displaying, entering and managing data. Wanting to build a professional tool for our client, we had to add a number of plug-ins to the platform that enabled the display of additional panels and information. IBL is very pleased with the WYSIWYG text editor created specifically for the project, which makes it easier for researchers to create entries and content. Thanks to our system, IBL employees are able to intuitively search and analyze information. In this way, the system has also become a tool for data analysis. It is worth noting that the search functionality has been enhanced through the use of a proprietary tagging algorithm developed especially for this project.

One of the unusual challenges we encountered during an assignment for the IBL was importing a 10-volume dictionary in PDF format into the system. The dictionary was subjected to OCR analysis for text recognition. We were also given access to a draft version of the same dictionary in QRT format. However, due to the numerous errors present in the documentation in both the PDF and QRT versions, we had to apply proprietary heuristics that allowed us to merge the entries from both sources. During the process of merging the passwords, we used the statistics we prepared for the differences occurring between the sources. It is worth mentioning that the tenth volume of the dictionary contained additions to the headwords appearing in the nine previous volumes, so we introduced a function into our system that allows us to display a panel of additions and shows by which item a given headword has been supplemented.

In accordance with the client's requirements, we added an additional panel used by translators to the system. The panel is fully integrated with Polish passwords. When a change is made in the Polish dictionary, the translator automatically gets information about which password and position the change was made in, and can immediately make a correction to the English version of the dictionary. In this way, the process of communication between academics and translators has been greatly improved.

Further development of the digital dictionary

After the positive reception of the first version of the application, together with the Literary Research Institute, we undertook further development of the software. The work was mainly focused on creating an interface for an external user. Currently, a digital dictionary of biography and bibliography "Polish Writers and Literary Scholars of the 20th and 21st Centuries" can be used by anyone, even an unregistered user.

It features entries on around 2,300 authors who began their careers as writers, both in Poland and abroad, after 1918. The target audience includes researchers from various fields, as well as students, teachers and a wide range of readers looking for reliable information on the Internet. It is worth noting that the digital dictionary replaces the multi-volume paper dictionary published by the Institute of Literary Research for many years.

Thanks to our solution, the efficiency of the editorial work of IBL's scientific staff has increased significantly. ImpiCode imported and integrated large biobibliographic collections, which enabled IBL staff to avoid months of work involving manual transcription of dictionaries. In addition, the client obtained a tool for advanced data management. Satisfaction with the solution received and the course of cooperation is evidenced by the references provided.

Explore other projects

ImpiCode managed the project well, adhering to the budget and taking scope changes in stride. The responsive and patient team provided actionable suggestions. Management and executives were communicative and readily available.

Marlena Sęczek Researcher Institute of Literary Research
Explore other projects
Contact us.
If you have a desire to learn more about this solution, or need a similar application, please email us.
Write to us