REGISTRO DOI: 10.70773/revistatopicos/775974905
ABSTRACT
This bibliometric study maps the scientific production on microbial enzymes in the biochemical/industrial biotechnology sector over 2000–2024, operationalized through the PRIA workflow (planning, refinement, integration, analysis). Searches were iteratively structured in Web of Science, Scopus, and PubMed; records were harmonized and deduplicated (DOI and title year), yielding a unified corpus of 297 documents. Analyses were conducted in Bibliometrix/Biblioshiny, combining descriptive indicators with Bradford and Lotka regularities, co-occurrence networks, three-field plots, and factorial mapping. Results indicate a steady annual growth (1.26%), dispersion across 139 sources and 885 authors, 14.81% international co-authorship, and 5.18 authors per document. The Bradford core is led by Bioresource Technology, Applied Microbiology and Biotechnology, and Applied Biochemistry and Biotechnology; local source impact (h-index) is 12, 10, and 9, respectively, while total citations highlight Biotechnology for Biofuels. The conceptual structure converges to two dominant axes: (i) lignocellulosic biomass conversion and biorefinery (enzymatic hydrolysis, bioethanol/biofuels) and (ii) biocatalysis and enzyme performance (activity, immobilization, metabolism), bridged by “biotechnology,” “fermentation,” and “hydrolysis.” Author productivity follows Lotka’s law, with a small core of highly prolific researchers, such as Sharma S, Martins LO, and Satyanarayana T. Limitations include incomplete metadata (~6% lacking DOI) and under-indexing for 2024. Overall, the field appears mature and expanding, anchored in industrial biocatalysis and the bioeconomy, with clear trajectories for sustainable innovation.
Keywords: microbial enzymes; industrial biotechnology; biocatalysis; biorefinery; bibliometrics; co-occurrence; Bradford law; Lotka law.
RESUMO
Este estudo bibliométrico mapeia a produção científica sobre enzimas microbianas no setor de biotecnologia bioquímica/industrial no período de 2000 a 2024, operacionalizada por meio do fluxo de trabalho PRIA (planejamento, refinamento, integração e análise). As buscas foram estruturadas iterativamente no Web of Science, Scopus e PubMed; os registros foram harmonizados e desduplicados (DOI e ano do título), resultando em um corpus unificado de 297 documentos. As análises foram conduzidas no Bibliometrix/Biblioshiny, combinando indicadores descritivos com regularidades de Bradford e Lotka, redes de coocorrência, gráficos de três campos e mapeamento fatorial. Os resultados indicam um crescimento anual constante (1,26%), dispersão por 139 fontes e 885 autores, 14,81% de coautoria internacional e 5,18 autores por documento. O núcleo de Bradford é liderado por Bioresource Technology, Applied Microbiology and Biotechnology e Applied Biochemistry and Biotechnology; O impacto da fonte local (índice h) é de 12, 10 e 9, respectivamente, enquanto o total de citações destaca a Biotecnologia para Biocombustíveis. A estrutura conceitual converge para dois eixos dominantes: (i) conversão de biomassa lignocelulósica e biorrefinaria (hidrólise enzimática, bioetanol/biocombustíveis) e (ii) biocatálise e desempenho enzimático (atividade, imobilização, metabolismo), interligados por “biotecnologia”, “fermentação” e “hidrólise”. A produtividade dos autores segue a lei de Lotka, com um pequeno núcleo de pesquisadores altamente prolíficos, como Sharma S, Martins LO e Satyanarayana T. As limitações incluem metadados incompletos (cerca de 6% sem DOI) e subindexação para 2024. No geral, o campo parece maduro e em expansão, ancorado na biocatálise industrial e na bioeconomia, com trajetórias claras para inovação sustentável.
Palavras-chave: enzimas microbianas; biotecnologia industrial; biocatálise; biorrefinaria; bibliometria; Coocorrência; Lei de Bradford; Lei de Lotka.
INTRODUCTION
The application of microbial enzymes in industrial processes represents one of the most promising frontiers of modern biotechnology, particularly in light of contemporary demands for sustainable solutions and low-impact processes. These enzymes, derived from selected microorganisms, exhibit high catalytic versatility and stability under extreme conditions, making them particularly attractive for use in the food, pharmaceutical, energy, and bioprocessing industries (Aria & Cuccurullo, 2017).
In recent decades, the consolidation of the bioeconomy has driven the transition from conventional chemical processes to biotechnological pathways based on enzymatic biocatalysts. In particular, advances in microbial biocatalysis have enabled the conversion of lignocellulosic biomass into biofuels and high-value-added bioproducts, placing such technologies at the core of green industrial policies and circular economy strategies (Garfield, 2006). This trend has been accompanied by a significant growth in scientific output on the subject, driven by innovations in genetic engineering, enzyme immobilization, and solid-state fermentation.
Given the breadth and complexity of this field, it becomes essential to understand how science has evolved in terms of publication volume, collaboration networks, core journals, and emerging thematic clusters. Bibliometrics, in this regard, constitutes a robust methodological approach capable of uncovering hidden patterns, identifying key authors, and mapping knowledge frontiers. Classical laws such as those of Bradford and Lotka, combined with co-occurrence visualizations and factorial analysis, allow for a deeper understanding of the structure and dynamics of scientific research (Lotka, 1926; Bradford, 1934).
This study therefore aims to map and critically analyze the scientific production related to microbial enzymes applied to industrial biotechnology, covering the period from 2000 to 2024. To this end, it adopts the PRIA method – Integrated Review Process with Artificial Intelligence – which articulates systematic phases of search, screening, integration, and bibliometric analysis, ensuring traceability, replicability, and methodological rigor. The intention is to provide a consolidated overview of the scientific and technological trends in the field, thereby contributing to strategic decision-making by researchers, institutions, and policymakers in science, technology, and innovation.
THEORETICAL FRAMEWORK
The application of microbial enzymes in industrial contexts represents a consolidated field of biotechnology, driven by the pursuit of more efficient, selective, and environmentally sustainable processes. Enzymes produced by microorganisms present significant advantages compared to those of plant or animal origin, such as higher thermal stability, substrate specificity, and large-scale production capacity through controlled fermentation (Singh et al., 2016).
In the context of industrial biocatalysis, such enzymes play a central role in the conversion of lignocellulosic biomass into biofuels, acting in the hydrolysis of complex polymers such as cellulose and hemicellulose. This activity is directly linked to the feasibility of biorefineries, a concept that integrates production chains for the generation of energy, inputs, and by-products from agro-industrial residues (Zhang et al., 2021). Advances in enzyme immobilization and modification techniques have also contributed to increased catalytic efficiency and reuse in continuous processes (Sheldon & van Pelt, 2013).
The specialized literature also highlights the growing relevance of industrial microbiology and metabolic engineering in optimizing enzyme production. Strategies such as heterologous expression, site-directed mutagenesis, and directed evolution have enabled the customization of enzymes with properties tailored to specific operational conditions, thereby expanding their applicability across different industrial sectors (Bornscheuer et al., 2012).
From the perspective of scientific evaluation, bibliometrics has established itself as an effective tool for understanding the dynamics of academic production in strategic areas. Through the analysis of co-occurrence, co-authorship, and citation networks, it is possible to identify structural and thematic patterns that guide scientific development (Waltman, 2016). Classical laws such as Lotka’s, which describes the concentration of productivity among a few authors, and Bradford’s, which highlights the concentration of production in core journals, are widely applied in studies that seek to delineate the structure of emerging fields (Aria & Cuccurullo, 2017).
In this context, bibliometric studies gain even greater relevance when integrated into systematic methodologies such as the PRIA method. This approach promotes rigor and transparency in all stages of the scientific review process, from planning to analysis, and has proven particularly effective in interdisciplinary areas such as the interface between applied microbiology, industrial biochemistry, and bioprocess engineering (Aria & Cuccurullo, 2017).
METHODOLOGY
To conduct this bibliometric study on microbial enzymes in the biochemical industry, the PRIA method—Integrated Review Process with Artificial Intelligence—was adopted. This choice is grounded in the need to combine scientific rigor with operational efficiency, ensuring the systematization of data collection, selection, and analysis of scientific production. PRIA is organized into five sequential phases: formulation of the research question, systematic search, automated screening, critical analysis and synthesis, and final manuscript preparation, which, in an integrated manner, minimize redundancies, reduce bias, and enhance the robustness of the results obtained.
This methodological approach ensures not only the traceability of each stage of the investigative process but also the possibility of replication in different scientific contexts, establishing itself as a model applicable to interdisciplinary studies involving biotechnology and industrial biochemistry.
The first stage of the PRIA method consists of formulating the research question, a central element that guides all subsequent phases of the study. In the context of this chapter, the aim was to develop a clear, objective, and methodologically precise question, capable of guiding the retrieval and analysis of international scientific production.
Thus, the investigative problem was defined as follows: “What are the scientific and technological trends in the application of microbial enzymes in the biochemical industry during the period 2000 to 2024, considering temporal evolution, collaboration networks, and predominant thematic axes?” This formulation establishes not only the temporal and thematic scope of the investigation but also delineates the expected outcomes, which include the identification of knowledge cores, high-impact journals, and gaps in the literature.
After the initial formulation, the research question was subjected to a methodological refinement process, as established in the PRIA method. This step aimed to ensure greater conceptual clarity, precision in temporal and thematic boundaries, and alignment with bibliometric criteria. The process involved an iterative review of the formulated question, supported by digital artificial intelligence tools used to assess the semantic consistency of the terms and their adherence to the scope of the investigation. This procedure made it possible to reduce ambiguities, eliminate redundancies, and ensure that the main descriptors were compatible with the selected databases.
Based on this refinement, the final research question retained its investigative essence but was adjusted in form to enhance analytical objectivity: “What are the scientific and technological trends in the application of microbial enzymes in the biochemical industry between 2000 and 2024, considering indicators of temporal evolution, scientific collaboration networks, and emerging thematic clusters?” This structured version, more aligned with bibliometric logic, follows Garfield’s (2006) recommendation that question formulation should guide the entire process of analysis and systematization of scientific production. Moreover, it aligns with the guidance of Aria and Cuccurullo (2017), who argue that the quality of bibliometric analyses depends directly on the clarity and operationality of the research question.
The practical operationalization consisted of an iterative process in three stages. First, the formulated question was tested through exploratory queries in the Web of Science and Scopus databases in order to identify the breadth and relevance of the initial results. Second, the retrieved terms were compared with controlled vocabularies and thesauri from the databases themselves, allowing for the inclusion of synonyms and the exclusion of less representative terms.
Finally, the refined version of the question was consolidated based on criteria of precision (retrieving studies effectively related to the topic) and comprehensiveness (avoiding the omission of relevant works). This practical operationalization ensured that the final question was not only conceptually robust but also technically applicable to the systematic search strategy that would guide the subsequent phases of PRIA.
With the research question refined, the next step was to define the structured search strategy, a fundamental stage to ensure the traceability and reproducibility of the study. The databases selected were Web of Science (WoS), Scopus, and PubMed, internationally recognized for their broad and multidisciplinary coverage in biotechnology, biochemistry, and applied sciences.
Access to the Web of Science database was carried out through the CAPES Journals Portal, using institutional CAFE/UNIR authentication. Figure 1 shows the portal interface with the Web of Science database available for consultation.
Figure 1 – Access to the Web of Science database via the CAPES Journals Portal
The established time frame covered the period from 2000 to 2024, in accordance with the scope of the research question, ensuring both the analysis of historical trends and the identification of recent advances.
The formulation of the search string was based on Boolean operators and combinations of free and controlled terms. Main descriptors such as “microbial enzymes,” “industrial enzymes,” and “enzymatic biocatalysis” were used, associated with application-related terms such as “biochemical industry,” “bioprocessing,” and “biotechnology applications.” Thus, the typical strategy took the form: (“microbial enzymes” OR “industrial enzymes” OR “enzymatic biocatalysis”) AND (“biochemical industry” OR “bioprocessing” OR “biotechnology applications”). This set of descriptors was validated through pilot searches, ensuring that it retrieved a representative and relevant corpus without compromising precision.
The inclusion criteria comprised original peer-reviewed articles published in English or Portuguese, with an explicit focus on the application of microbial enzymes in biochemical industrial processes. Narrative reviews without empirical basis, opinion papers, conference abstracts, and publications without full-text access were excluded. The execution of the structured search resulted in an initial set of records that would be submitted to the subsequent automated screening stage, as recommended by the PRIA method.
In the initial stage of the structured search, the application of the preliminary string retrieved only 27 documents, an insufficient number to compose a robust bibliometric corpus. Figure 2 illustrates the search interface and the filters applied at this initial stage.
Figure 2 – First attempt at searching the Web of Science database, with 27 documents retrieved (2000–2024)
This result highlighted the need for adjustments in the descriptors and the scope of the query. Based on this diagnosis, the strategy was refined by including synonyms in both English and Portuguese, in addition to expanding the terms related to industrial applications. As a result, the application of the new string in the Web of Science Core Collection increased the number of records to 101 documents, ensuring greater thematic and temporal representativeness, as illustrated in Figure 3.
Figure 3 – Second search strategy in the Web of Science database, with 101 documents retrieved after reformulating the search string
In addition to the search carried out in Web of Science, the refined strategy was also applied to the Scopus database in order to increase the representativeness of the bibliographic corpus. To ensure compatibility between platforms, the string was adapted to Scopus’s specific syntax, using the TITLE-ABS-KEY field to retrieve records in titles, abstracts, and keywords. The final query was structured as follows: TITLE-ABS-KEY (“enzimas microbianas” OR “enzimas industriais” OR “biocatálise enzimática” OR “microbial enzymes” OR “industrial enzymes” OR “enzymatic biocatalysis” OR “enzyme technology” OR “biorefinery enzymes”) AND TITLE-ABS-KEY (“indústria bioquímica” OR “bioprocessamento” OR “biotecnologia industrial” OR “biochemical industry” OR “bioprocessing” OR “industrial biotechnology” OR “biorefinery”).
Filters were applied for the period 2000 to 2024, considering both original articles and reviews published in English and Portuguese. This procedure ensured methodological consistency across databases and contributed to expanding the breadth of retrieved records, culminating in the export of a dataset containing 101 articles, as shown in Figure 4.
Figure 4 – Export of search records from Web of Science in BibTeX format
In the Scopus database, the application of the refined string adapted to its syntax resulted in the retrieval of 131 documents, covering the period from 2000 to 2024. Both original articles and reviews published in English and Portuguese were included, maintaining the same filtering logic applied in Web of Science. Figure 5 presents the export screen of the Scopus database, highlighting the selection of detailed fields such as titles, abstracts, authorship, keywords, and cited references, ensuring metadata completeness for subsequent analysis in R/Biblioshiny.
Figure 5 – Export of bibliographic data from Scopus with complete metadata
The number of articles retrieved confirms the relevance of the topic in the industrial and biochemical context, enhancing the representativeness of the bibliographic corpus. The integration of the results obtained from Scopus with the Web of Science records ensures broader journal coverage and interdisciplinary scope, consolidating a more robust basis for subsequent screening and bibliometric analysis.
Complementing the searches conducted in Web of Science and Scopus, the strategy was also adapted and applied to the PubMed database, in order to encompass the scientific production indexed in the fields of biotechnology, microbiology, and health sciences. To ensure compatibility with the query syntax, the [tiab] field was used to retrieve the terms of interest in titles and abstracts. As shown in figure 6.
Figure 6 – PubMed interface with search results using the applied string
The search string included synonyms in both English and Portuguese, ensuring consistency with the other databases. Filters were applied for the period from 2000 to 2024, covering original articles and reviews in English and Portuguese. This step reinforced the interdisciplinary scope of the corpus, ensuring that relevant studies published in biomedical journals were also incorporated into the bibliographic dataset to be submitted to screening.
Next, the workflow of the PRIA method is presented, as shown in Figure 7.
Figure 7 – PRIA methodology
In the third phase of the PRIA method, the records retrieved from the three databases were integrated into a single analysis corpus. For this purpose, metadata were exported in their original formats: BibTeX (.bib) for Web of Science and Scopus, and NBIB (.nbib) for PubMed. The files were imported into the Bibliometrix package within the RStudio environment using the convert2df function, which enables the standardized conversion of records into tabular format. This process ensures the standardization and analysis of scientific metadata, as shown in Figure 8.
Figure 8 – RStudio interface showing the import of data from Web of Science, Scopus, and PubMed using the convert2df() function
This procedure ensured the unification of 101 records from Web of Science, 131 from Scopus, and 66 from PubMed, totaling 298 raw entries. Subsequently, the screening stage was initiated, which involved detecting and removing duplicates, as well as applying the previously defined inclusion and exclusion criteria. This process guaranteed the consistency of the bibliographic corpus, establishing the definitive basis for the subsequent bibliometric analyses.
Subsequently, the processed records were exported in Excel (.xlsx) format, facilitating manual control and data verification. The final, revised version of the database was later imported into Biblioshiny, the graphical interface of Bibliometrix, where the interactive bibliometric analyses were conducted. Figure 9 shows the initial interface of this stage, highlighting the database upload and the selection of the corresponding format.
Figure 9 – Biblioshiny interface for uploading the unified bibliographic file in WoS format
This methodological workflow, consisting of environment setup, generation of the unified database, export and processing in Excel, and subsequent uploading into Biblioshiny, ensured traceability, transparency, and reproducibility of the study, consolidating the empirical foundations necessary for exploring scientific and technological trends related to microbial enzymes in the biochemical industry.
RESULTS
The first step in the bibliometric analysis was the import of the unified database into Biblioshiny. The system automatically generates a metadata integrity report, allowing the identification of potential limitations and information gaps that may affect subsequent analysis. Figure 10 presents the report generated for the 298 documents from the unified database.
Figure 10 – Metadata completeness report (PubMed) generated by Biblioshiny
According to the report shown in Figure 10, it can be observed that essential fields such as document type (DT), source (SO), language (LA), year of publication (PY), title (TI), and total citations (TC) showed 100% completeness, classified as excellent.
However, information such as corresponding author (RP), cited references (CR), and keywords (DE and ID) showed completeness levels ranging from acceptable to poor, which indicates the need for caution when analyzing co-authorship, co-citation, and keyword networks. The field of scientific categories (WC) is entirely absent, which is expected for PubMed records, as this database does not adopt such classification.
During the structural mapping stage of the databases, a comparative analysis was carried out on the variables contained in the files from PubMed, Scopus, and Web of Science (WoS). The main objective was to identify the columns common to all datasets, ensuring compatibility and integrity in the construction of the unified database to be used in the bibliometric analyses.
As a result, it was found that only 25 variables were simultaneously present across the three databases, forming a minimal standardized core. Among these, the most relevant are: AU (authors), TI (title), AB (abstract), PY (year of publication), SO (source), DI (DOI), CR (cited references), DE (author keywords), TC (total citations), ID (indexed keywords), DT (document type), J9 (source abbreviation), RP (corresponding author), VL (volume), PP (starting page), AF (full author names), C1 (institutional affiliations), LA (language), SR (short reference), SR_FULL (full reference), AB_raw, DE_raw, KW_Merged, DB (database of origin), and TI_raw.
On the other hand, exclusive variables were identified in each database, many of them related to internal identifiers, indexing formats, or redundant fields. In the case of PubMed, the exclusive variables were PMID, PMC, MHDA, PHST, STAT, EDAT, VI, PG, FAU, and JT. In Scopus, the exclusive fields included Page.start, Page.end, URL, OA, Correspondence.Address, Correspondence.E-mail, Author.Keywords, Funding.Details, and Article.Type. Finally, Web of Science presented unique variables such as U2, usage.count.last.180.days, usage.count.since.2013, Research.Areas, Categories, Z9, C2, PU, BP, EI, FX, GP, NR, SC, and SN.
In this scenario, it was decided to consider, during the unification stage, only the set of variables common to the three databases, in order to ensure interoperability and robustness of the analyses within the Bibliometrix package and its graphical interface Biblioshiny. The exclusive variables were preserved separately for potential complementary analyses or future documentary consultation, but were not incorporated into the main consolidated database.
During the preliminary inspection of the imported databases, bibliographic records lacking a digital identifier (DOI) an essential element for tracking, retrieval, and later reclassification of articles, were identified. Automated filtering revealed the presence of 19 articles without DOI, distributed among the three databases analyzed: PubMed (8), Web of Science (4), and Scopus (7). These records were consolidated into a specific spreadsheet containing the essential fields for future manual or semi-automatic retrieval, including title, authors, year of publication, journal, and internal ID. After review, the files were excluded from the database for not being within the scope of the topic.
During the refinement stage of the bibliographic databases, a systematic pre-processing protocol was applied, essential to ensure data integrity, standardization, and compatibility before unification. Initially, records without DOI were identified and marked with a control variable (NO_DOI) for traceability purposes, although they were not automatically removed, respecting their potential informational value in thematic analyses. Subsequently, internal duplicates were eliminated based on the combination of title and digital identifier fields, preserving only the first occurrence.
TABLE OF BIBLIOMETRIC PRE-PROCESSING CRITERIA
No. | Criterion | Technical description | Methodological justification |
1 | Identification of records without DOI | Creation of the NO_DOI column, marked as TRUE when DI is absent or empty | Allows traceability and later analytical decision-making without premature exclusion |
2 | Creation of DB (database of origin) column | Adds field "PUBMED," "WOS," or "SCOPUS" to all records | Ensures control, segmentation, and post-unification tracking |
3 | Removal of internal duplicates | Exclusion of records with the same TI + DI combination | Eliminates redundancy in production and citation analysis |
4 | Creation of missing essential fields | Adds mandatory fields (AU, TI, SO, DI, PY, CR, etc.) with value "ND" | Ensures compatibility with Bibliometrix functions and prevents execution errors |
5 | Selective filling of null values | Replaces NA and "" with "ND" only when the field is empty | Prevents calculation errors while preserving original data |
6 | Text sanitization of bibliometric fields | Removal of breaks, tabs, extra spaces, and standardization to UTF-8 | Prevents noise in textual, co-authorship, and co-word analyses |
7 | Validation of publication year (PY) | Conversion to YYYY format; invalid values replaced with "ND" | Prevents distortions in temporal curves and period segmentation |
8 | Normalization of authors (AU) and sources (SO) | Conversion to uppercase while preserving accents | Ensures consistency in co-authorship networks and journal analysis |
9 | Preservation of valid original data | No valid field is overwritten; changes occur only when data are missing or inconsistent | Maintains fidelity to the original database |
10 | Inclusion of additional compatible fields | Adds fields such as DT, AB, C1, RP, ID, TC, etc. | Ensures full functionality of Biblioshiny |
11 | Reorganization of column structure | Standard ordering of main columns at the front | Facilitates visualization, export, and cross-database compatibility |
Source: Author, 2025
The bibliometric analysis was operationalized through the use of the Bibliometrix package, in its graphical interface Biblioshiny, with the aim of interactively exploring the conceptual, intellectual, and social structures of the unified bibliographic corpus. To ensure process integrity, the data were previously processed and consolidated into a single file in .xlsx format, named base_bibliometrica_unificada.xlsx, and stored locally in the project directory. Biblioshiny does not connect to external databases (such as PubMed, Scopus, or Web of Science); therefore, all analyses rely exclusively on the dataset provided by the user.
During the initialization process, the consolidated database was manually uploaded using the “Import File” option available in the main panel of the interface. From that point on, the application’s functionalities operated on this local file, which already contained the essential bibliographic fields for performing calculations and generating visualizations. This approach ensures traceability, reproducibility, and methodological control over the information used in the subsequent analyses.
For the interactive data analysis stage, the Biblioshiny graphical interface, part of the Bibliometrix package in the R environment, was used. The application was executed locally after configuring the virtual environment through the renv manager, ensuring reproducibility and isolation of the packages used. To start the interface, it was necessary to explicitly load the library with library(bibliometrix), followed by the command biblioshiny(). This action launched a local server accessible via a web browser, where all exploratory analyses of the unified bibliographic database were carried out.
During the activation process, any additional dependencies were automatically installed, ensuring the full functionality of the application. It should be noted that Biblioshiny operates exclusively with locally stored files that have been previously structured and validated, without establishing external connections to scientific databases. This configuration guarantees full control over the analyzed corpus and reinforces the criteria of transparency and scientific traceability required by the PRIA methodology adopted in this study.
After unifying the databases from PubMed, Web of Science, and Scopus, with a consolidated total of 298 articles, a critical verification step was carried out to ensure the compatibility of the resulting file for use in the Biblioshiny interface of the Bibliometrix package. During the export process to .xlsx format, an error was detected related to the character limit per cell imposed by Excel (32,767 characters). The analysis identified that only one cell, located in row 231 of the CR column (cited references), exceeded this limit, with a total of 36,364 characters.
To overcome this issue without compromising the overall integrity of the database, the content of this cell was truncated, limited to the first 200 characters. This measure was sufficient to allow the correct export of the .xlsx file without violating the technical standards of the writexl library.
Subsequently, a consistency analysis of the unified database structure was carried out, verifying:
the existence of completely empty columns;
the presence of mandatory fields;
the standardization of variable names compatible with Bibliometrix/Biblioshiny requirements;
the UTF-8 structure of the file and overall data integrity.
It was verified that, except for the previously adjusted cell, all other columns were properly structured and no blocking inconsistencies were found. The file was saved in the official project directory under the name:
base bibliometrica_unificada.xlsx
It was also confirmed that the file name does not interfere with import into Biblioshiny, provided that:
the format is correct (.xlsx, bib, or .csv);
the first row contains the column names;
and the fields are properly standardized.
Thus, it was concluded that the database was ready to be uploaded into Biblioshiny for the execution of the bibliometric analyses.
To ensure traceability, reproducibility, and transparency of all stages of the bibliometric study, a public repository was created on GitHub containing the scripts used for preprocessing, importing, and unifying the databases, as well as the consolidated bibliometric dataset and the export generated by Biblioshiny. The repository can be accessed at: https://github.com/ProfMacielDavid/Capitulo_01_Estudo-Bibliom-trico_Enzimas-Microbianas-na-Ind.-Bioquimica-e-suas-tend-ncias. This repository is licensed under the MIT license, allowing free academic use with proper credit.
ANALYSIS
After the validation and import of the data, the initial panel of Biblioshiny presents a summary of the main bibliometric indicators of the analyzed collection, as illustrated in Figure 11. These data provide an overview of the scientific production on microbial enzymes in the context of industrial biotechnology between the years 2000 and 2025.
Figure 11 – General indicators of scientific production (Biblioshiny)
The total of 297 documents were identified over 23 years (2000–2024), with an average annual growth rate of 1.26%, indicating a relatively stable line of scientific development during the period. The total of 139 sources and 885 authors reveals both diversity of journals and considerable academic collaboration.
The international co-authorship indicator (14.81%) highlights the presence of global partnerships in the field, while the average of 5.18 authors per document reflects the collaborative nature of research in industrial biotechnology. Finally, the average of 45.29 citations per document and the mean age of 7.13 years per publication suggest the existence of high-impact works that remain relevant within the scientific community.
Scientific production is one of the main indicators of the dynamics of evolution and consolidation of a research field. Figure 12 presents the annual distribution of the number of articles published on the topic “microbial enzymes in industrial biotechnology,” covering the period from 2000 to 2024, based on the unified database processed through Biblioshiny.
Figure 12 – Annual evolution of scientific production on microbial enzymes in industrial biotechnology (2000–2024)
The analysis of Figure 12 shows a gradual increase in scientific production between 2000 and 2010, followed by a pattern of fluctuations with an upward trend in the subsequent years. Three marked growth milestones stand out: the years 2016, 2022, and especially 2024, which recorded the highest number of publications in the period (over 40 articles).
These increases may be related to the advancement of emerging technologies, such as enzymatic biocatalysis and genetic engineering applied to industrial microbiology, as well as to the growing appreciation of sustainable biotechnological solutions within the context of the bioeconomy.
The interrelationship between authors, countries, and keywords is illustrated in Figure 13, allowing for an understanding of the main axes of collaboration and thematic focus of recent research.
Figure 13 – Three-field plot showing relationships among authors, countries, and frequent title words.
Figure 13 shows the network of connections between the most productive authors, the countries in which they operate, and the main terms used in the titles of articles on microbial enzymes in industrial biotechnology. It is observed that India, China, Brazil, and the United States are the countries with the highest concentration of active authors in this field, highlighting the strong role of the Global South and emerging economies in advancing research applied to biocatalysis and waste utilization.
Authors such as Martins LO (Brazil), Kulkarni BD, and Satyanarayana T (India) stand out for contributing with several publications, being connected to keywords such as enzymes, production, microbial, and industrial, which highlight the centrality of microbial processes in biotechnological production.
The thematic diversity ranges from practical applications (waste, biorefinery, valorization) to more technical concepts (characterization, enzymatic, lignocellulosic), indicating the breadth of research, both in the structural understanding of enzymes and in their use in sustainable industrial processes.
In addition, the visualization shows that countries such as Egypt, Iran, and Nigeria also appear connected, although with lower density, which may indicate a growth of research in these regions or punctual collaborations with more consolidated centers.
This analysis demonstrates that the field is undergoing broad global expansion, with a strong focus on the bioeconomy, waste treatment, and the replacement of chemical processes with more sustainable bioprocesses. The journals that published the most on microbial enzymes applied to industrial biotechnology are presented in Figure 14, highlighting the main periodicals that concentrate scientific production on the topic.
Figure 14 – Leading journals by number of publications on microbial enzymes in industrial biotechnology.
Figure 14 shows the most productive journals in the field of microbial enzymes applied to industrial biotechnology, based on the frequency of published articles. The highlight is Bioresource Technology, with 18 publications, consolidating itself as the main vehicle for scientific dissemination in this domain. This journal has a scope focused on sustainable technologies for waste conversion and renewable resources, aligning perfectly with the topic of microbial enzymes in industrial applications.
Next, Applied Microbiology and Biotechnology (15 articles) and Applied Biochemistry and Biotechnology (12 articles) stand out, both with a strong tradition in bioprocesses and industrial applications of microorganisms and enzymes. This concentration highlights the centrality of journals operating at the intersection of applied microbiology, biochemical engineering, and technological innovation.
Other relevant journals include Applied and Environmental Microbiology, Frontiers in Bioengineering and Biotechnology, and the International Journal of Biological Macromolecules, each with six publications. The balance observed among journals focused on environmental impact, biocatalysis, and bioengineering demonstrates the diversity of approaches and the multidisciplinarity of the field.
Finally, the presence of journals such as Journal of Environmental Management, PLOS ONE, Biotechnology for Biofuels, and Chemosphere indicates a growing interest in sustainable applications, reinforcing the importance of microbial enzymes in the circular bioeconomy and in the reutilization of organic waste. The distribution of publications according to Bradford’s Law is presented in Figure 15, allowing the identification of the core journals that concentrate most of the scientific production on microbial enzymes applied to industrial biotechnology.
Figure 15 – Core sources according to Bradford’s Law
Figure 15 applies Bradford’s Law to identify the most relevant journals in the field of microbial enzymes and industrial biotechnology. The shaded area of the graph represents the so-called Bradford core sources, which concentrate the journals with the highest number of published articles on the subject. These journals are the most strategic for monitoring the main scientific trends and advances in the field.
The three journals that clearly stand out as core sources are:
Bioresource Technology;
Applied Microbiology and Biotechnology;
Applied Biochemistry and Biotechnology.
These journals had already been highlighted in Figure 14, and their recurrence now confirmed by Bradford’s Law reinforces their centrality in disseminating knowledge on the subject. They represent the outlets with the highest density of publications and are essential for monitoring scientific updates on microbial enzymes applied to industrial processes, including biocatalysis, waste recovery, and biotechnological production.
The shape of the curve also reinforces Bradford’s premise: as one moves down the ranked list of journals, the number of articles per source decreases sharply. This suggests that the literature in the field is relatively concentrated in a few high-impact journals, which facilitates the identification of key repositories for researchers and professionals in the sector.
This result contributes to the definition of more efficient strategies for scientific monitoring, journal selection for article submission, and access to relevant content in the context of enzymatic biotechnology. The evaluation of the local impact of the sources, measured by the h-index, is presented in Figure 16. This metric makes it possible to identify the journals with the greatest influence within the analyzed set of documents, revealing the outlets of highest scientific relevance in the specific context of the study.
Figure 16 – Local impact of sources (h-index)
Figure 16 presents the distribution of the local impact of journals based on the h-index, which combines productivity and impact by considering both the number of publications and their respective citations. This approach reveals which journals stand out not only for the quantity of articles but also for the relevance these articles hold within the analyzed corpus.
Three journals stand out with the highest h-index values:
Bioresource Technology (H = 12);
Applied Microbiology and Biotechnology (H = 10);
Applied Biochemistry and Biotechnology (H = 9).
These three journals constitute the high-impact core of the studied field, reaffirming their centrality already evidenced in previous analyses (Figures 14 and 15). Furthermore, the high h-index confirms that the articles published in these journals are not only numerous but also consistently cited, indicating that these outlets concentrate relevant and reference-worthy scientific contributions.
Other journals, such as Applied and Environmental Microbiology, Biotechnology for Biofuels, Frontiers in Bioengineering and Biotechnology, and Journal of Environmental Management, presented an h-index of 4, signaling significant presence but with lower impact density compared to the leading journals in the ranking.
These data reinforce the importance of directing future submissions and bibliographic searches to the journals with the highest local h-index, since they play a central role in consolidating knowledge on biocatalysis, waste recovery, fermentation, microbial enzymes, and their industrial applications.
Figure 17 presents the impact of scientific sources based on the total accumulated citations (TC – Total Citations) within the analyzed corpus, highlighting the most influential journals in disseminating knowledge on microbial enzymes and industrial biotechnology.
Figure 17 – Source impact by total citations (TC)
The Total Citations (TC) metric, shown in Figure 17, highlights the global impact of publications from each journal in the analyzed thematic field. Unlike the h-index, which measures local consistency, TC indicates the reach and external influence of published contributions.
The journal Biotechnology for Biofuels stands out prominently with 1,998 citations, taking the lead in impact. This result suggests that the articles published in this source not only address central themes of biocatalysis and biofuels but are also highly referenced by other researchers in the field, indicating high visibility and international relevance.
Other noteworthy sources include:
Applied Microbiology and Biotechnology (856 citations);
Microbial Cell Factories (824);
Bioresource Technology (697);
Trends in Biotechnology (618)
The presence of the journals Science and Nature Reviews Microbiology, although with a lower frequency of documents, reinforces the transversality of the topic and its connection with high-impact research in both basic and applied science.
The analysis of TC broadens the understanding of the influence of publications and allows the development of strategies for future submissions and literature reviews, prioritizing journals with greater reach and scientific recognition. Figure 18 presents the authors with the highest number of publications in the field of microbial enzymes and biocatalysis applied to industrial biotechnology, according to the number of documents recorded in the analyzed corpus.
Figure 18 – Most relevant authors by number of publications
The analysis of author productivity, shown in Figure 18, reveals the main researchers leading publications in the studied thematic area. Author Sharma S stands out with 9 publications, followed by Martins LO and Satyanarayana T, both with 6 documents, indicating strong individual contributions and recurring involvement in research related to the application of microbial enzymes.
Next are authors with 5 publications, such as Kumar V and Singh S, along with a group of researchers with 4 and 3 articles each, revealing a relatively homogeneous distribution of contributions among different research groups. This pattern indicates that, although evident leaders exist, the field is not overly concentrated in a few authors, which reinforces its collaborative and multidisciplinary character.
This panorama indicates the existence of a consolidated group of authors who systematically contribute to the scientific advancement of the topic, possibly being associated with reference laboratories, international collaboration networks, or projects focused on innovation in industrial bioprocesses.
Understanding who the most productive authors are enables the identification of scientific leaders and potential strategic partners for academic and institutional collaboration in future projects.
Figure 19 presents the distribution of author productivity in the analyzed corpus based on Lotka’s Law, highlighting the behavior of scientific production in the field of microbial enzymes applied to industrial biotechnology.
Figure 19 – Distribution of author productivity according to Lotka’s Law
Lotka’s Law establishes that in a given field of knowledge, most authors publish only one article, while a minority concentrates multiple publications. The curve shown in Figure 19 confirms this pattern: more than 50% of the authors produced only one document, while the number of authors decreases proportionally as the number of published articles increases.
The solid line represents the empirical data obtained, whereas the dashed line shows the theoretical distribution predicted by Lotka. The partial overlap between the two demonstrates good adherence of the dataset to the model, validating the hypothesis of concentration of scientific productivity.
This pattern is typical of specialized research fields, such as the study of microbial enzymes and biocatalysis, where a few researchers lead scientific production over time. It also highlights the presence of consolidated groups and key authors, already identified in Figure 18.
This analysis is important for understanding the dynamics of scientific production and the concentration of knowledge in certain centers of excellence, serving as a basis for strategic actions in collaboration, funding, and the development of academic networks.
Figure 20 presents the word cloud generated from the corpus of analyzed articles, allowing the identification of the most frequent and recurring terms in the field of study on microbial enzymes in industrial biotechnology.
Figure 20 – Word cloud of the analyzed corpus
The word cloud provides a synthetic and visual overview of the most frequent terms in the publications, reflecting the main thematic foci of the analyzed field. The most prominent terms include “biotechnology,” “enzymes,” “biorefinery,” “enzyme activity,” “fermentation,” and “nonhuman” terms directly associated with the use of microbial enzymes in industrial processes.
The prominent presence of “biomass,” “lignocellulose,” “lignin,” “cellulose,” and “biofuels” suggests a strong emphasis on the valorization of lignocellulosic residues and their conversion through fermentative and enzymatic processes, consistent with advances in biotechnology aimed at sustainability and the circular economy.
In addition, terms such as “enzyme immobilization,” “catalysis,” “hydrolysis,” and “biocatalyst” point to technological approaches aimed at increasing enzyme efficiency and stability in industrial contexts.
The term “nonhuman,” although visually prominent, suggests the predominance of in vitro experiments with microorganisms or plant biomass and reinforces the non-clinical nature of most studies, which are directed toward applications in environmental or industrial biotechnology.
Finally, the recurrence of words such as “review” and “article” indicates a well-consolidated bibliographic body, with a significant number of systematic or narrative reviews that are useful for mapping the state of the art.
This graphical representation reinforces thematic coherence and the convergence of studies around sustainable enzymatic solutions applied to biomass conversion and the production of biofuels and bioproducts. Figure 21 presents the co-occurrence network of keywords from the analyzed documents, highlighting the conceptual structure of the field of study through thematic clusters.
Figure 21 – Keyword co-occurrence network with thematic clusters
The keyword co-occurrence network highlights two main clusters in the field of industrial biotechnology focusing on microbial enzymes. The red cluster, located at the top of the image, brings together terms most directly related to the biological and enzymatic aspects of the topic, with emphasis on the terms “nonhuman,” “enzymes,” “fermentation,” “microbial enzyme,” and “metabolism.” These terms indicate a conceptual core centered on experimental research, particularly in laboratory contexts, with a focus on enzymatic activities, metabolic regulation, and applications in controlled fermentation.
The blue cluster, positioned at the bottom, focuses on terms such as “biomass,” “lignin,” “biorefinery,” “biofuels,” “hydrolysis,” and “lignocellulose,” referring to industrial and environmental applications, especially those aimed at plant biomass conversion. This cluster represents the use of enzymes as biocatalysts in the processing of agro-industrial residues and the production of clean energy, aligning with the circular economy and the bioeconomy.
The word “biotechnology,” located at the center of the network, acts as a semantic bridge between the two groups, revealing the interdisciplinarity between basic science (enzymology) and technological application (bioprocesses).
The strong interconnection among the nodes demonstrates the thematic maturity of the field, with a high density of relationships between topics, and shows that the area of microbial enzymes in industrial biotechnology is consolidated from both theoretical and applied perspectives.
Figure 22 presents the factorial analysis of the keywords extracted from the analyzed documents, allowing the identification of the most representative conceptual clusters in the scientific literature on microbial enzymes in industrial biotechnology. The factorial analysis of keywords, based on co-occurrence correspondence, reveals thematic clusters organized along factorial dimension 1 (61.15%) and dimension 2 (15.61%).
Figure 22 – Factorial analysis of keywords by co-occurrence
The bidimensional factorial analysis presented in Figure 22 reveals the underlying conceptual structure of scientific production on microbial enzymes in industrial biotechnology. The distribution of terms across the two dimensions (Dim 1 and Dim 2) makes it possible to observe five main semantic clusters, highlighting the thematic heterogeneity of the field.
In the upper right quadrant, a cluster is observed centered on biomass conversion and bioenergy, with terms such as biomass, bioethanol, biofuel, biorefineries, and enzymatic hydrolysis, reflecting the focus on sustainable technologies for obtaining biofuels from lignocellulosic residues.
In the lower right quadrant, a group related to biocatalysis and enzymatic activity stands out, with terms such as enzymes, enzyme activity, nonhuman, biocatalyst, and metabolism. This thematic concentration indicates studies focused on enzyme kinetics, microorganism screening, and optimal operating conditions.
At the center, terms such as fermentation, hydrolysis, and biotechnology appear, functioning as linking pivots between different thematic domains. The lower left quadrant is dominated by terms such as industrial biotechnology and solid-state fermentation, which reinforce the interest in industrial applications and high-efficiency fermentative processes.
Finally, the upper left quadrant, although less dense, groups terms such as purification and valorization, suggesting approaches aimed at improving downstream processes and adding value to final products.
The analysis shows that the field is structured around two main axes: (i) industrial and energy applications of microbial enzymes, and (ii) mechanistic and biochemical investigations of enzymatic activities, confirming the interdisciplinarity and maturity of the field.
CONCLUSION
The rigorous application of the method made it possible to build a consolidated and auditable corpus, integrating Web of Science, Scopus, and PubMed. After syntactic refinements of the search strings, metadata normalization, and deduplication by DOI/title-year, an analytical base of 297 documents (2000–2024) was obtained. Data cleaning included addressing occasional DOI absences, standardizing critical fields, and correcting one record with excessively long references, ensuring traceability and reproducibility of the analyses performed in Biblioshiny.
The descriptive indicators demonstrate consistent growth in production, with recent peaks and an average annual growth rate of 1.26%, an average of 5.18 co-authors per document, and 14.81% international co-authorship, indicating moderate and stable collaboration. The literature is distributed across 139 journals, with the Bradford core led by Bioresource Technology, Applied Microbiology and Biotechnology, and Applied Biochemistry and Biotechnology. In terms of impact, the h-index of the sources stands out (e.g., Bioresource Technology, h=12; AMB, h=10) along with robust total citations in bioenergy journals (Biotechnology for Biofuels, AMB, Microbial Cell Factories), corroborating the centrality and visibility of the topic.
The thematic structure, derived from the word cloud, co-occurrence network, and factorial analysis, reveals two dominant axes. The first converges toward lignocellulosic biomass → enzymatic hydrolysis → biofuels (biomass, lignin, cellulase, hydrolysis, bioethanol/biofuel, biorefineries), reflecting the transition to the circular bioeconomy. The second encompasses biocatalysis and enzymatic activity (enzymes, enzyme activity, metabolism, biocatalyst), with emphasis on performance, immobilization, and process conditions. Complementary subthemes include solid-state fermentation, purification, and downstream valorization, suggesting an integrated value chain from laboratory research to industrial application.
In terms of authorship and leadership, a concentration consistent with Lotka’s Law was observed, in which a small number of authors account for a larger share of production (e.g., Sharma S, Martins LO, Satyanarayana T), supporting specialized groups that connect topics, countries, and terms in the three-field plot. This arrangement, combined with the core of highly cited journals, indicates a mature field with moderate methodological entry barriers and strong potential for technological transfer.
As limitations, metadata gaps were recorded (e.g., ~6% of missing DOIs) inherent to the databases, as well as the linguistic scope (English), which may underestimate local productions. Even so, the results provide a robust map of trends: (i) biomass deconstruction via cellulolytic/hemicellulolytic complexes; (ii) enzyme immobilization and thermostability for process intensification; and (iii) upstream–downstream integration in biorefineries. In summary, the field of microbial enzymes in the biochemical industry is expanding and structurally anchored in industrial biocatalysis and biomass conversion, with promising technoscientific trajectories for sustainable innovation.
BIBLIOGRAPHIC REFERENCES
Aria, M., & Cuccurullo, C. (2017). bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics, 11(4), 959–975. https://doi.org/10.1016/j.joi.2017.08.007
Bornscheuer, U. T., Huisman, G. W., Kazlauskas, R. J., Lutz, S., Moore, J. C., & Robins, K. (2012). Engineering the third wave of biocatalysis. Nature, 485(7397), 185–194. https://doi.org/10.1038/nature11117
Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137, 85–86.
Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295(1), 90–93. https://doi.org/10.1001/jama.295.1.90
Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16(12), 317–323.
Sheldon, R. A., & van Pelt, S. (2013). Enzyme immobilisation in biocatalysis: Why, what and how. Chemical Society Reviews, 42(15), 6223–6235. https://doi.org/10.1039/C3CS60075K
Singh, R., Kumar, M., Mittal, A., & Mehta, P. K. (2016). Microbial enzymes: Industrial progress in 21st century. 3 Biotech, 6(2), 174. https://doi.org/10.1007/s13205-016-0485-8
Waltman, L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365–391. https://doi.org/10.1016/j.joi.2016.02.007
Zhang, C., Zhang, R., Liu, G., & Peng, X. (2021). Advances in biomass conversion technologies and biorefineries. Renewable and Sustainable Energy Reviews, 139, 110685. https://doi.org/10.1016/j.rser.2020.110685
1 Master in Master of Science in Emergent Technologies in Education. MUST UNIVERSITY, MUST, EUA, Academic of the Doctoral Program in Regional Development and Environment (PGDRA/UNIR). E-mail: [email protected]
2 Nurse. Bachelor’s Degree in Nursing from Faculdade Interamericana de Porto Velho (UNIRON), Brazil. Postgraduate in Public Health Management (IFAM), Oncology (FAP), Urgency and Emergency (FAVENI), and Health Audit (FaHol/DNA)
3 Computer Engineering – UniSAPIENS. Porto Velho – RO.
4 PhD in Physics (UFC), with post-doctorate in Scientific Regional Development (DCR/CNPq).MBA in Software Engineering/Systems Analysis and Development – UniÚnica. Researcher of the Doctoral and Master Program in Regional Development and Environment (PGDRA/UNIR). E-mail: [email protected]