How to extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1?

To extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1, you can use the following code snippet:

import PyPDF2

pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)

for page_num in range(pdf_reader.numPages):
    page = pdf_reader.getPage(page_num)
    font_list = page['/Resources']['/Font'].keys()
    print(f'Fonts used in page {page_num+1}: {font_list}')


Here, we first open the PDF file in binary mode and create a PdfFileReader object using PyPDF2 library. Then we loop through each page of the PDF file and get the font list used in that page by accessing ‘/Resources’ and ‘/Font’ keys of that page object.

I hope this helps! Let me know if you have any other questions.

Comments

Popular posts from this blog

Exploring the Power of Mathematica: From Data Analysis to Complex Calculations

Difference between DSolve and NDSolve command of Wolfram Mathematica