How to extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1?

April 14, 2023

To extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1, you can use the following code snippet:

import PyPDF2

pdf_file = open('example.pdf', 'rb')

pdf_reader = PyPDF2.PdfFileReader(pdf_file)

for page_num in range(pdf_reader.numPages):

page = pdf_reader.getPage(page_num)

font_list = page['/Resources']['/Font'].keys()

print(f'Fonts used in page {page_num+1}: {font_list}')

Here, we first open the PDF file in binary mode and create a PdfFileReader object using PyPDF2 library. Then we loop through each page of the PDF file and get the font list used in that page by accessing ‘/Resources’ and ‘/Font’ keys of that page object.

I hope this helps! Let me know if you have any other questions.

Search This Blog

The Creative Chronicles

How to extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1?

Comments

Post a Comment

Popular posts from this blog

Exploring the Power of Mathematica: From Data Analysis to Complex Calculations

Difference between ACL and PCL

About Mathematics