This lesson introduces Python file processing.

Objectives and Skills

edit

Objectives and skills for this lesson include:[1]

  • Standard Library
    • os module
    • sys module
  • Input Output
    • Files I/O

Readings

edit
  1. Wikipedia: File system
  2. Wikipedia: Directory (computing)
  3. Wikipedia: Directory structure
  4. Wikipedia: Text file
  5. PythonLearn: Automating common tasks on your computer
  6. Python for Everyone: Files

Multimedia

edit
  1. YouTube: Python for Informatics - Chapter 7 Files
  2. YouTube: Python - How to Read and Write Files

Examples

edit

The os.getcwd() Method

edit

The os.getcwd() method returns a string representing the current working directory.[2]

import os

print("Current working directory:", os.getcwd())

Output:

Current working directory: /home/ubuntu/workspace

The os.chdir() Method

edit

The os.chdir() method changes the current working directory to the given path.[3]

import os

directory = os.getcwd()
print("Current working directory:", directory)
os.chdir("..")
print("Changed to:", os.getcwd())
os.chdir(directory)
print("Changed back to:", directory)

Output:

Current working directory: /home/ubuntu/workspace
Changed to: /home/ubuntu
Changed back to: /home/ubuntu/workspace

The os.path.isdir() Method

edit

The os.path.isdir() method returns True if the given path is an existing directory.[4]

import os

path = os.getcwd()
if os.path.isdir(path):
    print("Current working directory exists.")
else:
    print("Current working directory does not exist.")

Output:

Current working directory exists.

The os.path.join() Method

edit

The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep) characters.[5]

import os

path = os.getcwd()
directory = os.path.join(path, "__python_demo__")
print("path:", path)
print("directory:", directory)

Output:

path: /home/ubuntu/workspace
directory: /home/ubuntu/workspace/__python_demo__

The os.mkdir() Method

edit

The os.mkdir() method creates a directory with the given path.[6]

import os

path = os.getcwd()

directory = os.path.join(path, "__python_demo__")
if os.path.isdir(directory):
    raise Exception("Path already exists. Can't continue.")
os.mkdir(directory)

The os.rmdir() Method

edit

The os.rmdir() method removes (deletes) the directory with the given path.[7]

import os

path = os.getcwd()
directory = os.path.join(path, "__python_demo__")
if os.path.isdir(directory):
    raise Exception("Path already exists. Can't continue.")
os.mkdir(directory)
print("Created directory:", directory)
os.chdir(directory)
print("Changed to:", os.getcwd())
os.chdir(path)
print("Changed back to:", os.getcwd())
os.rmdir(directory)
print("Removed directory:", directory)

Output:

Created directory: /home/ubuntu/workspace/__python_demo__
Changed to: /home/ubuntu/workspace/__python_demo__
Changed back to: /home/ubuntu/workspace
Removed directory: /home/ubuntu/workspace/__python_demo__

The os.walk() Method

edit

The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.[8]

import os

for path, directories, files in os.walk(os.getcwd()):
    for directory in directories:
        print(os.path.join(path, directory))
    for file in files:
        print(os.path.join(path, file))

Output:

... <all subdirectories and files in the current working directory>

The os.path.isfile() Method

edit

The os.path.isfile() method returns True if the given path is an existing file.[9]

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
    print("File exists.")
else:
    print("File does not exist.")

Output:

File does not exist.

The open() Function

edit

The open() function opens the given file in the given mode (read, write, append) and returns a file object.[10]

file = open(filename, "r")    # read
file = open(filename, "w")    # write
file = open(filename, "a")    # append
file = open(filename, "r+")   # read + write
file = open(filename, "w+")   # read + write (new / cleared file)
file = open(filename, "a+")   # read + append (position starts at end of file)

The file.write() Method

edit

The file.write() method writes the contents of the given string to the file, returning the number of characters written.[11]

file.write("Temporary Python Demo File")

The file.close() Method

edit

The file.close() method closes the file and frees any system resources taken up by the open file.[12]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
file = open(filename, "w")
file.write("Temporary Python Demo File")
file.close()
if os.path.isfile(filename):
    print("Created %s" % filename)

Output:

Created /home/ubuntu/workspace/__python_demo.tmp

The file.read() Method

edit

The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.[13]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
    file = open(filename, "r")
    text = file.read()
    file.close()
    print("File text:", text)

Output:

File text: Temporary Python Demo File

Reading Lines

edit

For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code.[14]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
file = open(filename, "r")
for line in file:
    print(line, end='')
file.close()

Output:

Temporary Python Demo File

The file.tell() Method

edit

The file.tell() method returns an integer giving the file object’s current position in the file.[15]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
    file = open(filename, "a+")
    print("Open file position:", file.tell())
    file.write(" - Appended to the end of the file")
    print("Write file position:", file.tell())

Output:

Open file position: 26
Write file position: 60

The file.seek() Method

edit

The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.[16]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if os.path.isfile(filename):
    file = open(filename, "a+")
    print("Open file position:", file.tell())
    file.seek(0, 0)
    print("Seek file position:", file.tell())
    text = file.read()
    file.close()
    print("File text:", text)

Output:

Open file position: 60
Seek file position: 0
File text: Temporary Python Demo File - Appended to the end of the file

The os.rename() Method

edit

The os.rename() method renames the given source file or directory the given destination name.[17]

import os

path = os.getcwd()
filename = os.path.join(path, "__python_demo.tmp")
if not os.path.isfile(filename):
    raise Exception("File doesn't exist. Can't continue.")
filename2 = os.path.join(path, "__python_demo2.tmp")
if os.path.isfile(filename2):
    raise Exception("File already exists. Can't continue.")

os.rename(filename, filename2)
if os.path.isfile(filename2):
    print("Renamed %s to %s" % (filename, filename2))

Output:

Renamed /home/ubuntu/workspace/__python_demo.tmp to /home/ubuntu/workspace/__python_demo2.tmp

The os.remove() Method

edit

The os.remove() method removes (deletes) the given file.[18]

import os

path = os.getcwd()
filename2 = os.path.join(path, "__python_demo2.tmp")
if not os.path.isfile(filename2):
    raise Exception("File doesn't exist. Can't continue.")

os.remove(filename2)
if not os.path.isfile(filename2):
    print("Removed %s" % filename2)

Output:

Removed /home/ubuntu/workspace/__python_demo2.tmp

The sys.argv Property

edit

The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.[19]

import sys

for i in range(0, len(sys.argv)):
    print("sys.argv[%d]: %s" % (i, sys.argv[i]))

Output:

sys.argv[0]: /home/ubuntu/workspace/argv.py
sys.argv[1]: test1
sys.argv[2]: test2

Activities

edit

Tutorials

edit
  1. Complete one or more of the following tutorials:

Practice

edit
  1. Create a Python program that displays high, low, and average quiz scores based on input from a file. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and add each score to a list. Display the list of entered scores sorted in descending order and then calculate and display the high, low, and average for the entered scores. Include error handling in case the file is formatted incorrectly. Create a text file of names and grade scores to use for testing based on the following format:
        Larry Fine: 80
        Curly Howard: 70
        Moe Howard: 90
  2. Create a Python program that asks the user for a file that contains HTML tags, such as:
        <p><strong>This is a bold paragraph.</strong></p>
    Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to search for and remove all HTML tags from the text, saving each removed tag in a dictionary. Print the untagged text and then use a function to display the list of removed tags sorted in alphabetical order and a histogram showing how many times each tag was used. Include error handling in case an HTML tag isn't entered correctly (an unmatched < or >). Use a user-defined function for the actual string processing, separate from input and output. For example:
        </p>: *
        </strong>: *
        <p>: *
        <strong>: *
  3. Create a Python program that asks the user for a file that contains lines of dictionary keys and values in the form:
        Larry Fine: 80
        Curly Howard: 70
        Moe Howard: 90
    Keys may contain spaces but should be unique. Values should always be an integer greater than or equal to zero. Check for a filename parameter passed from the command line. If there is no parameter, ask the user to input a filename for processing. Verify that the file exists and then use RegEx methods to parse the file and build a dictionary of key-value pairs. Then display the dictionary sorted in descending order by value (score). Include input validation and error handling in case the file accidentally contains the same key more than once.
  4. Create a Python program that checks all Python (.py) files in a given directory / folder. Check for a folder path parameter passed from the command line. If there is no parameter, ask the user to input a folder path for processing. Verify that the folder exists and then check all Python files in the folder for an initial docstring. If the file contains an initial docstring, continue processing with the next file. If the file does not start with a docstring, add a docstring to the beginning of the file similar to:
        """Filename.py"""
    Add a blank line between the docstring and the existing file code and save the file. Test the program carefully to be sure it doesn't alter any non-Python files and doesn't delete existing file content.

Lesson Summary

edit

File Concepts

edit
  • A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.[20]
  • File systems are responsible for arranging storage space; reliability, efficiency, and tuning with regard to the physical storage medium are important design considerations.[21]
  • File systems allocate space in a granular manner, usually multiple physical units on the device.[22]
  • A filename (or file name) is used to identify a storage location in the file system.[23]
  • File systems typically have directories (also called folders) which allow the user to group files into separate collections.[24]
  • A file system stores all the metadata associated with the file—including the file name, the length of the contents of a file, and the location of the file in the folder hierarchy—separate from the contents of the file.[25]
  • Directory utilities may be used to create, rename and delete directory entries.[26]
  • File utilities create, list, copy, move and delete files, and alter metadata.[27]
  • All file systems have some functional limit that defines the maximum storable data capacity within that system.[28]
  • A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.[29]
  • A text file is a kind of computer file that is structured as a sequence of lines of electronic text.[30]
  • MS-DOS and Windows use a common text file format, with each line of text separated by a two-character combination: CR and LF, which have ASCII codes 13 and 10.[31]
  • Unix-like operating systems use a common text file format, with each line of text separated by a single newline character, normally LF.[32]

Python Files

edit
  • The os.getcwd() method returns a string representing the current working directory.[33]
  • The os.chdir() method changes the current working directory to the given path.[34]
  • The os.path.isdir() method returns True if the given path is an existing directory.[35]
  • The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep()) characters.[36]
  • The os.mkdir() method creates a directory with the given path.[37]
  • The os.rmdir() method removes (deletes) the directory with the given path.[38]
  • The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.[39]
  • The os.path.isfile() method returns True if the given path is an existing file.[40]
  • The open() function opens the given file in the given mode (read, write, append) and returns a file object.[41]
  • The file.write() method writes the contents of the given string to the file, returning the number of characters written.[42]
  • The file.close() method closes the file and frees any system resources taken up by the open file.[43]
  • The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.[44]
  • For reading lines from a file, you can loop over the file object using a for loop. This is memory efficient, fast, and leads to simple code.[45]
  • The file.tell() method returns an integer giving the file object’s current position in the file.[46]
  • The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.[47]
  • The os.rename() method renames the given source file or directory the given destination name.[48]
  • The os.remove() method removes (deletes) the given file.[49]
  • The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.[50]
  • Python text mode file processing converts platform-specific line endings (\n on Unix, \r\n on Windows) to just \n on input and \n back to platform-specific line endings on output.[51]
  • Binary mode file processing must be used when reading and writing non-text files to prevent newline translation.[52]

Key Terms

edit
catch
To prevent an exception from terminating a program using the try and except statements.[53]
newline
A special character used in files and strings to indicate the end of a line.[54]
Pythonic
A technique that works elegantly in Python. “Using try and except is the Pythonic way to recover from missing files”.[55]
Quality Assurance
A person or team focused on insuring the overall quality of a software product. QA is often involved in testing a product and identifying problems before the product is released.[56]
text file
A sequence of characters stored in permanent storage like a hard drive.[57]

Review Questions

edit
Enable JavaScript to hide answers.
Click on a question to see the answer.
  1. A file system is _____.
    A file system is used to control how data is stored and retrieved. There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more.
  2. File systems are responsible for _____.
    File systems are responsible for arranging storage space; reliability, efficiency, and tuning with regard to the physical storage medium are important design considerations.
  3. File systems allocate _____.
    File systems allocate space in a granular manner, usually multiple physical units on the device.
  4. A filename (or file name) is used to _____.
    A filename (or file name) is used to identify a storage location in the file system.
  5. File systems typically have directories (also called folders) which _____.
    File systems typically have directories (also called folders) which allow the user to group files into separate collections.
  6. A file system stores all the metadata associated with the file—including _____.
    A file system stores all the metadata associated with the file—including the file name, the length of the contents of a file, and the location of the file in the folder hierarchy—separate from the contents of the file.
  7. Directory utilities may be used to _____.
    Directory utilities may be used to create, rename and delete directory entries.
  8. File utilities _____.
    File utilities create, list, copy, move and delete files, and alter metadata.
  9. All file systems have some functional limit that defines _____.
    All file systems have some functional limit that defines the maximum storable data capacity within that system.
  10. A directory is _____.
    A directory is a file system cataloging structure which contains references to other computer files, and possibly other directories.
  11. A text file is _____.
    A text file is a kind of computer file that is structured as a sequence of lines of electronic text.
  12. MS-DOS and Windows use a common text file format, with _____.
    MS-DOS and Windows use a common text file format, with each line of text separated by a two-character combination: CR and LF, which have ASCII codes 13 and 10.
  13. Unix-like operating systems use a common text file format, with _____.
    Unix-like operating systems use a common text file format, with each line of text separated by a single newline character, normally LF.
  14. The os.getcwd() method _____.
    The os.getcwd() method returns a string representing the current working directory.
  15. The os.chdir() method _____.
    The os.chdir() method changes the current working directory to the given path.
  16. The os.path.isdir() method _____.
    The os.path.isdir() method returns True if the given path is an existing directory.
  17. The os.path.join() method _____.
    The os.path.join() method joins one or more path components intelligently, avoiding extra directory separator (os.sep()) characters.
  18. The os.mkdir() method _____.
    The os.mkdir() method creates a directory with the given path.
  19. The os.rmdir() method _____.
    The os.rmdir() method removes (deletes) the directory with the given path.
  20. The os.walk() method _____.
    The os.walk() method generates the subdirectories and files in a given path as a 3-tuple of a path string with subdirectory list and filename list.
  21. The os.path.isfile() method _____.
    The os.path.isfile() method returns True if the given path is an existing file.
  22. The open() function _____.
    The open() function opens the given file in the given mode (read, write, append) and returns a file object.
  23. The file.write() method _____.
    The file.write() method writes the contents of the given string to the file, returning the number of characters written.
  24. The file.close() method _____.
    The file.close() method closes the file and frees any system resources taken up by the open file.
  25. The file.read() method _____.
    The file.read() method reads the given number of bytes from the file, or all content if no size is given, and returns the bytes that were read.
  26. For reading lines from a file, you can _____.
    For reading lines from a file, you can loop over the file object using a for loop. This is memory efficient, fast, and leads to simple code.
  27. The file.tell() method _____.
    The file.tell() method returns an integer giving the file object’s current position in the file.
  28. The file.seek() method _____.
    The file.seek() method moves the file position to the given offset from the given reference point. Reference points are 0 for the beginning of the file, 1 for the current position, and 2 for the end of the file.
  29. The os.rename() method _____.
    The os.rename() method renames the given source file or directory the given destination name.
  30. The os.remove() method _____.
    The os.remove() method removes (deletes) the given file.
  31. The sys.argv property _____. argv[0] is _____.
    The sys.argv property returns the list of command line arguments passed to a Python script. argv[0] is the script name.
  32. Python text mode file processing _____.
    Python text mode file processing converts platform-specific line endings (\n on Unix, \r\n on Windows) to just \n on input and \n back to platform-specific line endings on output.
  33. Binary mode file processing must be used when _____.
    Binary mode file processing must be used when reading and writing non-text files to prevent newline translation.

Assessments

edit

See Also

edit

References

edit
  1. Vskills: Certified Python Developer
  2. Python.org: Miscellaneous operating system interfaces
  3. Python.org: Miscellaneous operating system interfaces
  4. Python.org: os.path
  5. Python.org: os.path
  6. Python.org: Miscellaneous operating system interfaces
  7. Python.org: Miscellaneous operating system interfaces
  8. Python.org: Miscellaneous operating system interfaces
  9. Python.org: os.path
  10. Python.org: Built-in Functions
  11. Python.org: Input and Output
  12. Python.org: Input and Output
  13. Python.org: Input and Output
  14. Python.org: Input and Output
  15. Python.org: Input and Output
  16. Python.org: Input and Output
  17. Python.org: Miscellaneous operating system interfaces
  18. Python.org: Miscellaneous operating system interfaces
  19. Python.org: System-specific parameters and functions
  20. Wikipedia: File system
  21. Wikipedia: File system
  22. Wikipedia: File system
  23. Wikipedia: File system
  24. Wikipedia: File system
  25. Wikipedia: File system
  26. Wikipedia: File system
  27. Wikipedia: File system
  28. Wikipedia: File system
  29. Wikipedia: Directory (computing)
  30. Wikipedia: Text file
  31. Wikipedia: Text file
  32. Wikipedia: Text file
  33. Python.org: Miscellaneous operating system interfaces
  34. Python.org: Miscellaneous operating system interfaces
  35. Python.org: os.path
  36. Python.org: os.path
  37. Python.org: Miscellaneous operating system interfaces
  38. Python.org: Miscellaneous operating system interfaces
  39. Python.org: Miscellaneous operating system interfaces
  40. Python.org: os.path
  41. Python.org: Built-in Functions
  42. Python.org: Input and Output
  43. Python.org: Input and Output
  44. Python.org: Input and Output
  45. Python.org: Input and Output
  46. Python.org: Input and Output
  47. Python.org: Input and Output
  48. Python.org: Miscellaneous operating system interfaces
  49. Python.org: Miscellaneous operating system interfaces
  50. Python.org: System-specific parameters and functions
  51. Python.org: Input and Output
  52. Python.org: Input and Output
  53. PythonLearn: Files
  54. PythonLearn: Files
  55. PythonLearn: Files
  56. PythonLearn: Files
  57. PythonLearn: Files