In this Python tutorial, we will discuss what is PyPDF2 in python and various methods of PdfFileMerger and also PdfFileWriter Python examples.
We will learn about the PdfFileWriter class and methods. It is the class from the PyPDF2 module that is widely used to Write PDF files into one in Python. We will check the below examples:
- How to Add Attachment in PDF in Python
- How to Add Blank Page in Python
- Add Bookmark to PDF in Python
- How to Add JavaScript to PDF file in Python
- How to Add Link to PDF file in Python
- Add Metadata to PDF file in Python using PyPDF2
- How to Add Pages to PDF file in Python
- Add Encryption to PDF file in Python using PyPDF2
- Get Number of Pages using PyPDF2 in Python
- How to get page with Page Number from PDF file in Python
- Get Page Layout of PDF file in Python
- Get Page Mode of PDF file in Python
- Insert Blank Page to PDF file in Python
- Insert Page to PDF file in Python
- Remove Images from PDF in Python
- Remove Links from PDF in Python using PyPDF2
- Set Page Layout of PDF in Python using PyPDF2
- Set Page Mode of PDF in Python
- Update Page Form Field Values of Interactive PDF using PyPDF2 in Python
- Write to a PDF in Python
PyPDF2 Python Library
- Python is used for a wide variety of purposes & is adorned with libraries & classes for all kinds of activities. Out of these purposes, one is to read text from PDF in Python.
- PyPDF2 offers classes that help us to Read, Merge, Write a pdf file.
- PdfFileReader used to perform all the operations related to reading a file.
- PdfFileMerger is used to merge multiple pdf files together.
- PdfFileWriter is used to perform write operations on pdf.
- All of the classes have various methods that facilitate a programmer to control & perform any operation on pdf.
- PyPDF2 has stopped receiving any updates after Python3.5 but it is still used to control PDFs. In this tutorial, we will be covering everything about PdfFileMerger class & we will tell you what all functions are depreciated or broken.
Read: PdfFileReader Python example
Install PyPDF2 in python
To use the PyPDF2 library in Python, we need to first install PyPDF2. Follow the below code to install the PyPDF2 module in your system.
pip install PyPDF2
After reading this tutorial, you will have complete knowledge of each function in PdfFileMerger class. Also, we will be demonstrating the examples for each function in PdfFileMerger class.
Read: Create and modify PDF file in Python
PdfFileWriter in Python
- This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader).
- It provides various functions that help us to write PDF
- The first step towards using this class is importing the PyPDF module.
import PyPDF
- The next step is to initialize the class
PyPDF2.PdfFileWriter()
Like other classes, PdfFileWriter also has functions that help the programmer to add various functionalities into his program. Moving forward let’s learn about the available functions.
Read: PdfFileMerger Python examples.
PdfFileWriter Python Examples
In this section, we will cover all the methods available in PyPDF2.PdfFileWriter class in Python. We will be explaining everything with an example.
Please note that the PyPDF2 module is not updated after python3.5 so there may be few functions that are broken and not working. We are using Python3.8 so we will leave a note there.
1. Add Attachment in PDF file using PyPDF2 in Python
PyPDF2 provides a method addAttachment(fname, fdata) using which attachments can be added in the PDF in Python.
- This function is used to embed attachments to PDF file. While writing PDF in Python. if you want to add any attachment like image, video, giff, etc then you can insert it using this function.
- It takes two arguments, Filename and file data
- Parameters:
- fname (str): provide a name to the file
- fdata (str): Provide the data that you want to embed
2. Add Blank Page to PDF file in Python
PyPDF2 offers a method addBlankPage(width=None, height=None) which allows to add a blank page in the PDF in Python.
- Appends a blank page to the PDF file and returns it. If no page size is specified, uses the size of the last page.
- In case no page size is specified for the last page then an exception is raises
PageSizeNotDefinedError.
- Parameters:
- width (float) – The width of the new page expressed in default user space units.
- height (float) – The height of the new page expressed in default user space units.
- Here is the example addBlankPage() using PyPDF2 in Python.
Code Snippet:
In this code, we have created a PDF file with the name ‘BlankPdf.pdf’ and we have added 2 blank pages with different width & height.
from PyPDF2 import PdfFileWriter
file = PdfFileWriter()
file.addBlankPage(
width= 200,
height= 200
)
file.addBlankPage(
width=100,
height=500
)
output = open('blankPdf.pdf', 'wb')
file.write(output)
Output:
In this output, we have displayed the implementation on addBlankPage() method using the PyPDF in Python.
In this second picture you can notice that there are two blank pages & both have different height and width.
This is how we can add blank page to a pdf file in Python using PyPDF2 library.
3. Add Bookmark to PDF file in Python using PyPDF2
PyPDF2 in Python offers a method addBookMark() which allows adding a bookmark in the PDF document in Python.
addBookmark(
title,
pagenum,
parent=None,
color=None,
bold=False,
italic=False,
fit='/Fit',
*args
)
- This function adds a bookmark to the PDF file in Python. It becomes easy to navigate among the PDF pages with the bookmark.
- Parameters:
- title (str) – Title to use for this bookmark.
- pagenum (int) – Page number this bookmark will point to.
- parent – A reference to a parent bookmark to create nested bookmarks.
- color (tuple) – Color of the bookmark as a red, green, blue tuple from 0.0 to 1.0
- bold (bool) – Bookmark is bold
- italic (bool) – Bookmark is italic
- fit (str) – The fit of the destination page.
- Here is an example for addBookmark() method.
Code Snippet:
In this code, we have used PdfFileReader to read the PDF file in python and then PdfFileWriter to add a book and write new PDF file in Python using PyPDF2 module.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addBookmark(
title='Grade-1998',
pagenum=0,
parent=None,
color= None,
bold=True,
italic=False,
fit='/Fit',
)
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
In this output, We have opened the newly created PDF file in Python. And you can see the bookmark appearing on the left.
This is how to add bookmark to PDF using PyPDF2 in Python.
4. Add JavaScript to PDF file using PyPDF2 in Python
PyPDF2 provdes a method addJS(javascript) using which JavaScript can be added to PDF File in Python.
- This function adds JavaScript which will launch upon opening of the PDF.
- It is mostly used while creating interactive pdf.
- Parameters:
- javascript (str) – any javascript
5. How to Add Link to PDF in Python using PYPDF2
PyPDF2 in Python provides method addLink(pagenum, pagedest, rect, border=None, fit=’/Fit’, *args) using which internal links can be added to PDF page in Python.
- This function add an internal link from a rectangular area to the specified page.
- Parameters:
- pagenum (int) – index of the page on which to place the link.
- pagedest (int) – index of the page to which the link should go.
- rect – RectangleObject or array of four integers specifying the clickable rectangular area [xLL, yLL, xUR, yUR], or string in the form “[ xLL yLL xUR yUR ]”.
- border – if provided, an array describing border-drawing properties. See the PDF spec for details. No border will be drawn if this argument is omitted.
- fit (str) – Page fit or ‘zoom’ option (see below). Additional arguments may need to be supplied. Passing None will be read as a null value for that coordinate.
Available zoom options are as follows:
Zoom arguments | Description |
---|---|
/Fit | No additional arguments |
/XYZ | [left] [top] [zoomFactor] |
/FitH | [top] |
/FitV | [left] |
/FitR | [left] [bottom] [right] [top] |
/FitB | No additional arguments |
/FitBH | [top] |
/FitBV | [left] |
Here is the implementation of addLink() method of PyPDF2 in Python
Code Snippet:
In this code we have read the PDF file using PdfFileReader in PyPDF2 and then using that PDF we have added link to it and created NewGrades.pdf using PDFFileWriter in PyPDF2.
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import RectangleObject
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
linkfrom = writer.addPage(reader.getPage(0))
linkto = writer.addPage(reader.getPage(2))
writer.addLink(
pagenum=0,
pagedest=1,
rect=RectangleObject([0,20,100,120]),
border='dott',
fit='/Fit'
)
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
Here is the output of the above code implementation using PyPDF2 in Python. In this code we are reading Grades.pdf and creating NewGrades.pdf that has link in it.
This second image shows the total pages created in a PDF and there is a link added on page 1, if clicked will take the user to page 3 directly.
This is how add link to PDF using PYPDF2 in Python.
6. How to Add Metadata to PDF file in Python using PyPDF2
PyPDF2 provides method addMetadata(infos) using which metadata can be added to the PDF file in Python.
- This function adds custom metadata to the output.
- Parameters:
- infos (dict) – a Python dictionary where each key is a field and each value is new metadata.
- Here is the example of addMetadata() method using PyPDF in Python.
Code Snippet:
from PyPDF2 import PdfFileWriter, PdfFileReader
from PyPDF2.generic import RectangleObject
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
writer.addLink(
pagenum=1, # page on which link will be placed
pagedest=2, # page to which link will go
rect=RectangleObject([0,20,100,120]),
border='dott',
fit='/Fit'
)
metadata = {
'/Title': 'Richard\'s Grade',
'/Subject': 'Richard\'s updated performance report',
'/Author': 'PythonGuides'
}
writer.addMetadata(metadata)
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
In this output, first image shows the PDF document property without metadata.
In this second output picture, PDF document property is displayed with metadata
This is how to add Metadata to PDF file using PyPDF2 in Python.
7. Add Pages to PDF file in Python using PyPDF2
PyPDF2 in Python provide method addPage(page) which can be used to add pages in a PDF document in Python.
- This function adds a page to the PDF file. The page is usually acquired from a PdfFileReader instance.
- Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject.
8. Add Encryption to PDF file in Python using PyPDF2
PyPDF2 in Python provides method encrypt(user_pwd, owner_pwd=None, use_128bit=True) using which Encryption can be added to the PDF file in Python.
- Encrypt the PDF file with the PDF Standard encryption handler. Using this function one can secure the PDF in Python.
- Parameters:
- user_pwd (str) – The “user password”, which allows for opening and reading the PDF file with the restrictions provided.
- owner_pwd (str) – The “owner password”, which allows for opening the PDF files without any restrictions. By default, the owner password is the same as the user password.
- use_128bit (bool) – flag as to whether to use 128bit encryption. When false, 40bit encryption will be used. By default, this flag is on.
- Here is the example of encrypt() method on PdfFileWriter in PyPDF2 module in Python.
Code Snippet:
In this code, we are applying encryption on PDF using PyPDF2 in Python.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
# adding encryption
writer.encrypt(
user_pwd='Test',
owner_pwd='test',
use_128bit=True
)
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
In this output, PDF has been secured with encryption. There is a lock on the PDF and it requires password to open it.
This is how we can password protect a PDF file in Python using PyPDF2 library.
9. How to Get Number of Pages of PDF file in Python
PyPDF2 provides a method getNumPages() using which a total number of pages can be displayed in Python.
Here is the example of getNumPages() method using PyPDF2 in Python.
Code Snippet:
Here is the code for displaying Number of pages in PDF using PyPDF2 in Python.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
print(writer.getNumPages())
Output:
In this output, we have read 3 pages using PdfFileWriter() in PyPDF2 on line 6,7,8 and when we applied getNumPages() method then three pages are shown on the terminal.
This is how to get number of pages using PyPDF2 in Python.
10. Get Page with Page Number Of PDF file in Python
PyPDF2 in Python provides a method getPage(pageNumber) using which information from specific page number can be retrieved.
- Retrieves a page by a number from the PDF file in Python. It returns page at the index given by page number
- Parameters:
- pageNumber (int) – The page number to retrieve (pages begin at zero)
- Here is the example of getPage() method using PyPDF2 in Python.
Code Snippet:
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
print(writer.getPage(1))
Output:
In this output, information of page 1 is displayed in the terminal. To see the information in human readable format use extractText() method of PyPDF2 in Python.
In Python, this is how to get page with page number using PyPDF2 in Python.
11. Get Page Layout of PDF file in Python
PDF2 in Python provides getPageLayout() which returns the current page layout of the PDF in Python. It returns the Page layout currently being used in the PDF document in Python
Here is the list of available Layout in getPageLayout() method using PyPDF2 in Python
Layout | Explanation |
---|---|
/NoLayout | Layout explicitly not specified |
/SinglePage | Show one page at a time |
/OneColumn | Show one column at a time |
/TwoColumnLeft | Show pages in two columns, odd-numbered pages on the left |
/TwoColumnRight | Show pages in two columns, odd-numbered pages on the right |
/TwoPageLeft | Show two pages at a time, odd-numbered pages on the left |
/TwoPageRight | Show two pages at a time, odd-numbered pages on the right |
Here is the example of getPageLayout() method using PyPDF2 in Python.
code Snippet:
In this code, we are displaying page layout of PDF using getPageLayout() method in PyPDF2.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
print(writer.getPageLayout())
Output:
In this output, None is displayed on the terminal. That means there no page layout set for this PDF in Python.
The above code we can use to get page layout using PDF2 in Python.
12. Get Page Mode of PDF using PyPDF2 in Python
PyPDF2 offers a method getPageMode() which returns the current mode used in the PDF document in Python.
Here is the list of valid modes in getPageMode() method of PyPDF2 in Python.
Modes | Explanation |
---|---|
/UseNone | Do not show outlines or thumbnails panels |
/UseOutlines | Show outlines (aka bookmarks) panel |
/UseThumbs | Show page thumbnails panel |
/FullScreen | Fullscreen view |
/UseOC | Show Optional Content Group (OCG) panel |
/UseAttachments | Show attachments panel |
13. How to Insert Blank Page in Python
PyPDF2 offers a method insertBlankPage(width=None, height=None, index=0) using which blank page can be inserted in the PDF document in Python.
- Inserts a blank page to this PDF file and returns it. If no page size is specified, use the size of the last page.
- If there is no previous page then an exception is raised:
Raises PageSizeNotDefinedError
- Parameters:
- width (float) – The width of the new page expressed in default user space units.
- height (float) – The height of the new page expressed in default user space units.
- index (int) – Position to add the page.
- Here is the example of insertBlankPage() method using PyPDF2 in Python.
Code Snippet:
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
writer.addBlankPage()
writer.addBlankPage()
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
In this output, first we have displayed the PDF without blank page.
In this output, you can see the result after we have run this program. You can see that two blank pages has been added to the PDF in Python.
This is how to insert blank page using PyPDF2 in Python.
14. Insert Page to PDF file in Python using PyPDF2
PyPDF2 offers a method insertPage(page, index=0) using which new page can be inserted in the PDF document in Python.
- Insert a page in the PDF file in Python. The page is usually acquired from a PdfFileReader instance.
- Parameters:
- page (PageObject) – The page to add to the document. This argument should be an instance of PageObject.
- index (int) – Position at which the page will be inserted.
15. Remove Images from PDF in Python using PyPDF2
PyPDF in Python provides a method removeImages(ignoreByteStringObject=False) which allows to the removal of images from the PDF file in Python.
- Removes images from the output in Python.
- Parameters:
- ignoreByteStringObject (bool) – optional parameter to ignore ByteString
- Here is the example of removeimage() method using PyPDF2 in Python.
Code Snippet:
In this code, we will be using removeImages() method in PyPDF2 to remove images from the PDF in Python.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
newreader = PdfFileReader(open('Smallpdf.pdf', 'rb'))
writer.addPage(newreader.getPage(0))
writer.removeImages(ignoreByteStringObject=False)
output = open('Digital.pdf','wb')
writer.write(output)
output.close()
Output:
This is the PDF that we are using in this example and in the next image you will see the same PDF with all the images removed using PyPDF2 in Python.
This second image is when the program is executed. So can this with the first image, you will find that images are missing from this picture. So in this way we can remove images from PDF using PyPDF2 in Python.
This is how we can remove images from PDF file using PyPDF2 in Python.
16. Remove Links from PDF in Python using PyPDF2
PyPDF2 provides a method removeLinks() which allows removing links from the PDF file in Python. It Removes links and annotations from the output PDF in Python.
Here is the example of removeLinks() module using PyPDF2 in Python.
Code Snippet:
In this code, we have removed all the links from the PDF using PyPDF2 in Python. Please note color of linked text will remain unchanged but it will no longer have address of page or webpage.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
newreader = PdfFileReader(open('Smallpdf.pdf', 'rb'))
writer.addPage(newreader.getPage(0))
writer.removeLinks()
output = open('Digital.pdf','wb')
writer.write(output)
output.close()
Output:
This is the PDF we are using, all the circled words are hyperlinks. Using removeLinks() method we will remove all the links from the document.
In this picture you the marked text are no more links. The text color remains same but the hyperlink is removed using removeLinks() method in PyPDF2.
This is how we can remove links from PDF file using PyPDF2 in Python.
17. Set Page Layout of PDF in Python
PyPDF2 in Python provides a method setPageLayout(layout) using which page layout can be assigned to the PDF in Python.
- Set the page layout
- Like using getPageLayout() method we can view the page layout similarly using this method we can assign the page layout.
- Here is the list to Layout that can be applied
Layout | Explanation |
---|---|
/NoLayout | Layout explicitly not specified |
/SinglePage | Show one page at a time |
/OneColumn | Show one column at a time |
/TwoColumnLeft | Show pages in two columns, odd-numbered pages on the left |
/TwoColumnRight | Show pages in two columns, odd-numbered pages on the right |
/TwoPageLeft | Show two pages at a time, odd-numbered pages on the left |
/TwoPageRight | Show two pages at a time, odd-numbered pages on the right |
Here is the example of setPageLayout() method using PyPDF2 in Python.
Code Snippet:
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
writer.setPageLayout('/TwoColumnRight')
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
This is how to set page layout of PDF using PyPDF2 in Python.
18. Set Page Mode of PDF using PyPDF2 in Python
PyPDF2 provides a method setPageMode(mode) using which page mode can be set.
Here is the list of valid modes available:
Modes | Explanation |
---|---|
/UseNone | Do not show outlines or thumbnails panels |
/UseOutlines | Show outlines (aka bookmarks) panel |
/UseThumbs | Show page thumbnails panel |
/FullScreen | Fullscreen view |
/UseOC | Show Optional Content Group (OCG) panel |
/UseAttachments | Show attachments panel |
Code Snippet:
In this code, we have set the page mode to Full Screen using PyPDF2 module in Python.
from PyPDF2 import PdfFileWriter, PdfFileReader
writer = PdfFileWriter()
reader = PdfFileReader(open('Grades.pdf', 'rb'))
writer.addPage(reader.getPage(0))
writer.addPage(reader.getPage(1))
writer.addPage(reader.getPage(2))
writer.setPageMode('/FullScreen')
output = open('NewGrades.pdf','wb')
writer.write(output)
output.close()
Output:
In this output, we have set Page Mode as Full Screen now every time the PDF is opened it opens in a full screen.
19. Update Page Form Field Values of Interactive PDF using PyPDF2 in Python
PyPDF2 provides method updatePageFormFieldValues(page, fields) using which PDF form fields can be updated.
- Update the form field values for a given page from a fields dictionary. Copy field texts and values from fields to the page.
- This method can be used for interactive PDF. Interactive PDFs allows user to provide their input.
- Parameters:
- page – Page reference from PDF writer where the annotations and field data will be updated.
- fields – a Python dictionary of field names (/T) and text values (/V)
20. Write a PDF using PyPDF2 in Python
PyPDF2 provides a method write(stream) using which PDF can be written in Python.
- Writes the collection of pages added to this object out as a PDF file.
- Parameters:
- stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.
- If you are following this tutorial then you must be having an idea about what write() method do. So whatever operations we perform if we want to create a new PDF with those settings and functions than we use write() method in PyPDF2.
You may like the following Python pdf tutorials:
So in this tutorial, we have learned everything about PdfFileWriter and its methods using the PyPDF2 module in Python. Also, we covered 20 PdfFileWriter Python examples:
- How to Add Attachment in PDF in Python
- How to Add Blank Page in Python
- Add Bookmark to PDF in Python
- How to Add JavaScript to PDF file in Python
- How to Add Link to PDF file in Python
- Add Metadata to PDF file in Python using PyPDF2
- How to Add Pages to PDF file in Python
- Add Encryption to PDF file in Python using PyPDF2
- Get Number of Pages using PyPDF2 in Python
- How to get page with Page Number from PDF file in Python
- Get Page Layout of PDF file in Python
- Get Page Mode of PDF file in Python
- Insert Blank Page to PDF file in Python
- Insert Page to PDF file in Python
- Remove Images from PDF in Python
- Remove Links from PDF in Python using PyPDF2
- Set Page Layout of PDF in Python using PyPDF2
- Set Page Mode of PDF in Python
- Update Page Form Field Values of Interactive PDF using PyPDF2 in Python
- Write to a PDF in Python
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.