How to extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1?
To extract fonts of text in each page of a PDF file using PyPDF2 library version 3.0.1, you can use the following code snippet:
import PyPDF2
pdf_file = open('example.pdf', 'rb')
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
font_list = page['/Resources']['/Font'].keys()
print(f'Fonts used in page {page_num+1}: {font_list}')
Here, we first open the PDF file in binary mode and create a PdfFileReader object using PyPDF2 library. Then we loop through each page of the PDF file and get the font list used in that page by accessing ‘/Resources’ and ‘/Font’ keys of that page object.
I hope this helps! Let me know if you have any other questions.
Comments
Post a Comment