In this tutorial, we’re gonna look at way to use openpyxl
module to read, write Excel spreadsheet files in Python program.
Excel spreadsheet files
An Excel file is called a workbook which is saved on PC as .xlsx extension.
Each workbook contains one or more sheets.
Each sheet has columns (letters: A, B, C…) and rows (numbers: 1, 2, 3…).
A box addressed by [column, row] pair is called a cell.
Read/Write Excel files in Python
Install openpyxl module
Open cmd, then run:
pip install openpyxl
Once the installation is successful, we can see openpyxl
folder at Python\Python[version]\Lib\site-packages
.
(In this tutorial, we use openpyxl 2.5.12
)
Now we can import the module by running import openpyxl
.
Read Excel file
Open workbook
We use openpyxl.load_workbook(filename)
function with input filename
to get a Workbook
object that represents the Excel file.
>>> import openpyxl
>>> gkzWb = openpyxl.load_workbook('ozenero.xlsx')
>>> type(gkzWb)
Get sheets
We call Workbook's get_sheet_names()
method to get list of all the sheet names in the workbook.
>>> gkzWb.get_sheet_names()
['Python Tutorials', 'Java Tutorials']
Then we can obtain a Worksheet
object with the sheet name using Workbook's get_sheet_by_name()
method, or we can get the Workbook’s active sheet using get_active_sheet()
method.
>>> pySheet = gkzWb.get_sheet_by_name('Python Tutorials')
>>> type(pySheet)
>>> pySheet.title
'Python Tutorials'
>>> gkzWb.get_active_sheet()
We can get size of Worksheet
object using its max_row
and max_column
attributes.
>>> pySheet.max_row
11
>>> pySheet.max_column
3
Get cells
With a Worksheet
object, we can access Cell
objects inside.
>>> pySheet['A2']
>>> pySheet['A1'].value
'Title'
>>> pySheet['A2'].value
'How to parse HTML in Python using BeautifulSoup module'
>>> pySheet['B2'].value
'beautifulsoup, parse html, python, web crawling, web scraping\t'
>>> pySheet['C2'].value
datetime.datetime(2019, 1, 19, 0, 0)
|
After column Z
, the columns continues with two letters: AA
, AB
... To make things simple, we can also get a Cell
using Worksheet's cell()
method with input integers for row
and column
arguments.
*Note: The first row or column integer is 1
.
>>> pySheet.cell(row=1, column=1)
>>> pySheet.cell(row=1, column=2)
>>> pySheet.cell(row=3, column=3)
>>> pySheet.cell(3, 3)
>>> pySheet.cell(3, 27)
>>> pySheet.cell(3, 28)
| | | | | |
We can also accessCell
object location information with its row
, column
, and coordinate
attributes.
>>> cellC1 = pySheet['C1']
>>> cellC1.column
'C'
>>> cellC1.row
1
>>> cellC1.coordinate
'C1'
We can also slice Worksheet
object to get list of Cell objects in a rectangular area:
>>> pySheet['A2':'C5']
(
(, , ),
(, , ),
(, , ),
(, , )
)
>>> for row in pySheet['A2':'C5']:
... for cell in row:
... print(cell.coordinate, cell.value)
... print('---')
...
A2 How to parse HTML in Python using BeautifulSoup module
B2 beautifulsoup, parse html, python, web crawling, web scraping
C2 2019-01-19 00:00:00
---
A3 How to copy, move, rename, delete files/folders in Python
B3 python, file
C3 2019-01-13 00:00:00
---
A4 How to read/write files in Python
B4 python, file, read file, write file
C4 2019-01-11 00:00:00
---
A5 Python Regular Expression to extract email from text
B5 python, regex, regular expression
C5 2019-01-06 00:00:00
| | | | | | | | | | | |
For cells in a particular row
or column
, we can also use a Worksheet object's rows
and columns
attribute.
>>> list(pySheet.columns)
[
(, , ..., ),
(, , ..., ),
(, , ..., )
]
>>> column = list(pySheet.columns)[0]
>>> for cell in column:
... print(str(cell.row) + ': ' + cell.value)
...
1: Title
2: How to parse HTML in Python using BeautifulSoup module
3: How to copy, move, rename, delete files/folders in Python
4: How to read/write files in Python
5: Python Regular Expression to extract email from text
6: Python Regular Expression to extract phone number
7: Python Regular Expression
8: Python String methods
9: Python Dictionary Data Structure
10: How to iterate over a List in Python
11: Python List functions
>>> row = list(pySheet.rows)[1]
>>> for cell in row:
... print(cell.value)
...
How to parse HTML in Python using BeautifulSoup module
beautifulsoup, parse html, python, web crawling, web scraping
2019-01-19 00:00:00
| | | | | | | | |
Write Excel file
Create workbook
We use openpyxl.Workbook()
function that returns a new blank Workbook
object.
>>> gkzWb = openpyxl.Workbook()
>>> type(gkzWb)
>>> gkzWb.get_sheet_names()
['Sheet']
Create & Remove sheet
We use Workbook's create_sheet()
method to create a new Worksheet
object.
By default, it is set to be the last sheet in the workbook named SheetX
.
We can also specify the index and name of the new sheet by passing them as arguments to code>create_sheet() method.
>>> gkzWb.create_sheet()
>>> gkzWb.create_sheet()
>>> gkzWb.get_sheet_names()
['Sheet', 'Sheet1', 'Sheet2']
>>> gkzWb.create_sheet(index=0, title='Python Tuts')
>>> gkzWb.create_sheet(index=2, title='Java Tuts')
>>> gkzWb.get_sheet_names()
['Python Tuts', 'Sheet', 'Java Tuts', 'Sheet1', 'Sheet2']
To remove a Worksheet
object from a Workbook, we use remove_sheet()
method and pass that Worksheet
object as argument (not the sheet name).
>>> gkzWb.remove_sheet(gkzWb.get_sheet_by_name('Sheet')) >>> gkzWb.get_sheet_names() ['Python Tuts', 'Java Tuts', 'Sheet1', 'Sheet2'] >>> gkzWb.remove_sheet(gkzWb.get_sheet_by_name('Sheet1')) >>> gkzWb.remove_sheet(gkzWb.get_sheet_by_name('Sheet2')) >>> gkzWb.get_sheet_names() ['Python Tuts', 'Java Tuts']
Write value to cells
It is just like writing values to keys in a Python dictionary. In this case, key are cell's coordinate
.
>>> pySheet = gkzWb.get_sheet_by_name('Python Tuts')
>>> pySheet['A1'] = 'Title'
>>> pySheet['B1'] = 'Tags'
>>> pySheet['C1'] = 'Date'
Save workbook
All the changes we have made on Workbook
object will not be saved until we call Workbook's save()
method.
Whenever we edit a file, we should always save the new, edited one to a different filename than the original. Passing a different filename to save()
method (such as 'ozenero_copy_1.xlsx'
) will save the changes to a copy of the file.
>>> gkzWb.save('ozenero.xlsx')
>>> pySheet['A2'] = 'How to parse HTML in Python using BeautifulSoup module'
>>> pySheet['A3'] = 'How to copy, move, rename, delete files/folders in Python'
>>> gkzWb.save('ozenero-copy-1.xlsx')
Setting cells
Font object
To customize font styles in cells, we import Font
from the openpyxl.styles
module.
>>> import openpyxl
>>> from openpyxl.styles import Font
>>> gkzWb = openpyxl.load_workbook('ozenero.xlsx')
>>> sheet = gkzWb.get_sheet_by_name('Python Tuts')
>>> fontObj = Font(name='Times New Roman', size=23, bold=True, italic=True)
>>> sheet['A1'].font = fontObj
>>> gkzWb.save('ozenero-copy.xlsx')
Formula
Adding formulas to cells programmatically is just like any normal value.
>>> import openpyxl
>>> sumWb = openpyxl.Workbook()
>>> sh = sumWb.get_active_sheet()
>>> sh['A1'] = 100
>>> sh['A2'] = 200
>>> sh['A3'] = 300
>>> sh['A4'] = '=SUM(A1:A3)'
>>> sumWb.save('sum.xlsx')
>>> sh['A4'].value
'=SUM(A1:A3)'
Size of cells
We can set row_dimensions
and column_dimensions
attributes value of Worksheet
object to change row heights (using numbers) and column widths (using letters).
>>> import openpyxl
>>> gkzWb = openpyxl.load_workbook('ozenero.xlsx')
>>> sheet = gkzWb.get_active_sheet()
>>> sheet.row_dimensions[3].height = 68
>>> sheet.column_dimensions['B'].width = 42
>>> gkzWb.save('gkz-size.xlsx')
Merge and Unmerge cells
We can use WorkSheet
's merge_cells()
method to merge a rectangular area into a single cell.
>>> import openpyxl
>>> wb = openpyxl.Workbook()
>>> sheet = wb.get_active_sheet()
>>> sheet.merge_cells('A1:C3')
>>> sheet['A1'] = 'ozenero Python Tutorials'
>>> wb.save('gkz-merged.xlsx')