Applied Programming/Collection
Applied Programming
editComputer programming (often shortened to programming) is a process that leads from an original formulation of a computing problem to executable computer programs. Programming involves activities such as analysis, developing understanding, generating algorithms, verification of requirements of algorithms including their correctness and resources consumption, and implementation (commonly referred to as coding) of algorithms in a target programming language.[1]
This course comprises 15 lessons on applied programming. Each lesson includes a combination of Wikipedia and Internet-based readings, YouTube videos, and hands-on, interactive learning activities. It is an extension of the Python Programming course, intended to apply the same programming concepts using a variety of programming languages.
Preparation
editThis is a third-semester, college-level course. Learners should already be familiar with computer programming.
Lessons
edit- Variables
- Functions
- Conditions
- Loops
- Testing
- Strings
- Files
- Lists and Tuples
- Dictionaries and Sets
- RegEx
- Internet Data
- Databases
- Modules and Classes
- GUI
- Web Design
See Also
editBibliography
edit- Severance, C. (2016). Python for Everybody
References
editLesson 1 - Variables
editThis lesson introduces variables, expressions, and statements.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Evaluate an expression to identify the data type assigned to each variable
- Identify str, int, float, and bool data types
- Perform data and data type operations
- Convert from one data type to another type; construct data structures; perform indexing and slicing operations
- Determine the sequence of execution based on operator precedence
- Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)
- Select the appropriate operator to achieve the intended result
- Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)
- Construct and analyze code segments that perform console input and output operations
- Read input from console; print formatted text; use of command line arguments
Readings
edit- Wikipedia: Expression (computer science) - includes constants, variables, operators, and functions
- Wikipedia: Data type
- Wikipedia: Statement (computer science)
- Wikipedia: Assignment (computer science)
- Wikipedia: Order of operations
- Wikipedia: Input/output
- Wikipedia: Coding conventions
- Wikipedia: Self-documenting code
Multimedia
edit- YouTube: Introduction to Variables
- YouTube: Programming/Scripting Concepts Explained (Variables, Arrays, Strings, & Length)
- YouTube: Programming For Beginners - Variables
- YouTube: Programming For Beginners - Data Types
- YouTube: Duck Typing
- YouTube: Introduction to Programming - Basics
- YouTube: Declaring and using variables and constants
- YouTube: Performing arithmetic operations
- YouTube: CodeTips: What is Self-Documenting Code?
- YouTube: Variables - Coding Basics
Examples
editActivities
editProgramming Languages
edit- Research different programming languages and select a programming language to use for this course. If unsure, Python3 is currently a popular choice. Based on your selected programming language, use Programming Fundamentals/Introduction#Examples and one of the free online IDE links provided to try running a Hello World program.
- Research free downloadable tools for your selected programming language (interpreter/compiler, IDE, etc.). Consider downloading and installing a development environment on your system. If you set up your own development environment, test the environment using the Hello World program.
- Review Wikipedia: Coding conventions Research style guides or coding standards for your selected programming language. If applicable, discuss coding conventions with your classmates and agree on a standard to follow for this course.
Coding
editComplete one or more of the following coding activities in your selected programming language. Use self-documenting variable and constant names that follow the naming conventions for your selected programming language. Include comments at the top of the source code that describe the program and are consistent with the program or module documentation conventions for your selected programming language.
- Review Wikipedia: Body mass index and MathsIsFun: Metric - US/Imperial Conversion Charts. Create a program that asks users for their weight in pounds and their height in feet and inches. Calculate and display their BMI. Format the output to one decimal place. Include a legend that displays value ranges for underweight, normal, and overweight. Be sure to indicate the source for your BMI range recommendations.
Lesson Summary
edit- A variable is a named memory address paired with an identifier.[2]
- The value contained inside a variable is mutable and can be overwritten during the execution of the program.[2]
- In strongly typed languages, the variable is permanently associated with a specified data type upon declaration.[3]
- In order to write self-documenting code, identifiers must be comprehensible and describe the purpose or function of a given object. For example, instead of variables x and y, you might have variables feet and inches.[4]
- Self-documenting code needs to be readily understood and maintained by other and future programmers, even those without intimate knowledge of the project at hand.[4] Sticking to your language’s coding conventions will also help other programmers understand your work more easily.[5]
- An expression uses values, constants, variables, operators and functions to produce another value.[6]
- The returned value computed by an expression can be of primitive data types as well as complex data types.[6]
- Expressions statements evaluate an expression when terminated, typically by a semicolon.[6]
- Order of operations (Also referred to as Precedence Rules[7]) is a set of rules that dictate which order to perform procedures when evaluating an equation.[8]
- PEMDAS (parenthesis, exponents, multiplication, division, addition, subtraction) is the default order.[8]
- These rules allow for the use of parenthesis to force a desired order of operations.[8]
- An exception is the minus (-) symbol where it has highest precedence in certain applications like Excel.[8]
- Data type is a classification of data which tells the compiler or interpreter how the programmer intends to use the data.[9]
- Data types include: integers, booleans, characters, floating-point (or real) numbers and alphanumeric strings.[9]
- A programmer might create a new data type named "complex number" that would include real and imaginary parts.[9]
- Inputs are the signals or data received by the system and outputs are the signals or data sent from it.[10]
- I/O devices are the pieces of hardware used by a human (or other system) to communicate with a computer.[10]
- An I/O interface is required whenever the I/O device is driven by the processor.[10]
- Channel I/O requires the use of instructions that are specifically designed to perform I/O operations.[10]
- Memory-mapped I/O (MMIO) and port-mapped I/O (PMIO) (which is also called isolated I/O) are two complementary methods of performing input/output (I/O) between the central processing unit (CPU) and peripheral devices in a computer.[10]
- A constant is a value that cannot be altered by the program.[4] To visually differentiate them from variables, your language’s style guide may recommend different casing for both.[11]
- Constants are useful for self-documenting code and for allowing correctness.[4]
- There are three ways to express a data value that cannot be altered: [4]
- literal
- macro
- constant
- We can define constants in a variety of programming languages with the following qualifiers const, final and read-only for uses in C/C++, Java and C# respectfully.[4]
- It is common for some programming languages such as Ruby and Python to use capitals and underscores for constants.[4]
- Coding conventions are common practices and guidelines aimed to unofficially standardize the structure and coding style for the specific programming language.[12]
- Consistency in indentation, comments, declarations, line length, statements, white space, naming conventions will help improve readability, maintenance and reduce time.[12]
- Some guides are published by the creators of the language. For example, PEP8 for Python.[13]
- According to PEP 8 formatting, each line of code in Python should never exceed 79 characters. This is done to avoid wrapping lines and to make more code readable at once. Docstrings and comments should be limited to 72 characters. If a team unanimously agrees, the line limit can be extended to 99 characters as long as docstrings and comments are still capped at 72 characters.[14]
- PEP 8 recommends using 4 spaces per indentation level. Spaces have to be chosen over tabs. Mixing tabs and spaces is not allowed in Python 3.[15]
- Some libraries have their own style, but in general it is recommended to have a descriptive naming style using variations of snake case (words separated by underscores). More specific variations can be found on the PEP8 page. [16]
- A statement is a syntactic unit that expresses some action to be carried out.[17]
- Examples of simple statements can include, but are not limited to:[17]
- Assertion Statements
- Assignment Statements
- Goto Statements
- Return Statement
- Function/Module Call statements
- Examples of compound statements can include, but are not limited to:[17]
- Block Statements
- Do-Loop Statements
- For-Statements
- If-Statements
- While-Loop Statements
- Programming languages are characterized by the type of statements they use.[17]
- Most statement parameters are call-by-name parameters which are evaluated when needed.[17]
- Statements do not return results and are executed solely for their side effects.[17]
- Most languages have a fixed set of statements defined by the language, but there have been experiments with extensible languages that allow the programmer to define new statements.[17]
- An assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable.[18]
- In an assignment, the expression is evaluated in the current state of the program and the variable is assigned the computed value, replacing the prior value of that variable.[18]
- Augmented assignment is where the assigned value depends on a previous one like *=, so a=2*a can instead be written as a*=2.[18]
- A chained assignment is when the value of one variable is assigned to multiple other variables.[18]
Key Terms
edit- array
- A list or group of similar types of data values that are grouped.[19]
- assignment
- Sets the value saved in the storage location denoted by a given variable name.[3]
- Boolean
- A data type having two values, typically denoted true and false.[20]
- constant
- A value that cannot be altered by the program during normal execution.[21]
- data type
- A classification of data which tells the compiler or interpreter how the programmer intends to use the data.[22]
- declaration
- A language construct that specifies the properties of a given identifier.[23]
- expression
- A combination of one or more explicit values, constants, variables, operators, and functions that a programming language interprets and computes to produce another value.[24]
- floating point
- The formulaic representation that approximates a real number to a fixed amount of significant digits.[25]
- integer
- A number that can be written without a fractional component.[26]
- modulus
- The remainder after division of one number by another.[27]
- objects
- A combination of related variables, constants and other data structures which can be selected and manipulated together.[28]
- operator
- A programming language construct that performs a calculation from zero or more input values to an output value.[29]
- order of operations
- A collection of rules that reflect conventions about which procedures to perform first in order to evaluate a given mathematical expression.[30]
- pointer
- A variable that contains the address of a location in the memory. The location is the commencing point of an object, such as an element of the array or an integer.[31]
- program
- As an organized collection of instructions, which when executed perform a specific task or function. It is processed by the central processing unit (CPU) of the computer before it is executed.[32]
- real number
- A value that represents a quantity along a line, including integers, fractions, and irrational numbers.[33]
- statement
- The smallest standalone element of an imperative programming language that expresses some action to be carried out.[34]
- string
- A sequence of characters, either as a literal constant or as some kind of variable.[35]
- variable
- A storage location paired with an associated symbolic name (an identifier), which contains some known or unknown quantity of information referred to as a value.[2]
See Also
edit- Computer Programming/Variables
- PlanetGeek: Comprehensive Guide to Self-Documenting Programs
- SlideShare: Examples of Primitive Data Types
- FreeCodeCamp: How to Think Like a Programmer
- FreeCodeCamp: How the let, const, and var Keywords Work in JavaScript
- LearnPython: Python Variables and Types
- Tutorials Point: Computer Programming - Variables
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ 2.0 2.1 2.2 Wikipedia: Variable (computer science)
- ↑ 3.0 3.1 Wikipedia: Type safety
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 4.6 Wikipedia: Self-documenting code
- ↑ "Coding conventions". Wikipedia. 2018-03-16. https://en.wikipedia.org/w/index.php?title=Coding_conventions&oldid=830687922.
- ↑ 6.0 6.1 6.2 Wikipedia: Expression (computer science)
- ↑ "Order of operations". Wikipedia. 2018-03-14. https://en.wikipedia.org/w/index.php?title=Order_of_operations&oldid=830361255.
- ↑ 8.0 8.1 8.2 8.3 Wikipedia: Order of operations
- ↑ 9.0 9.1 9.2 Wikipedia: Data type
- ↑ 10.0 10.1 10.2 10.3 10.4 Wikipedia: Input/output
- ↑ "Naming convention (programming)". Wikipedia. 2018-02-19. https://en.wikipedia.org/w/index.php?title=Naming_convention_(programming)&oldid=826478696.
- ↑ 12.0 12.1 Wikipedia: Coding conventions
- ↑ "Python (programming language)". Wikipedia. 2018-03-16. https://en.wikipedia.org/w/index.php?title=Python_(programming_language)&oldid=830690084.
- ↑ PEP 8 - Style Guide for Python Code
- ↑ PEP 8 - Style Guide for Python Code
- ↑ [1]
- ↑ 17.0 17.1 17.2 17.3 17.4 17.5 17.6 Wikipedia: Statement
- ↑ 18.0 18.1 18.2 18.3 Wikipedia: Assignment
- ↑ "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
- ↑ Wikipedia: Boolean data type
- ↑ Wikipedia: Constant (computer programming)
- ↑ Wikipedia: Data type
- ↑ Wikipedia: Declaration (computer programming)
- ↑ Wikipedia: Expression (computer science)
- ↑ Wikipedia: Floating point
- ↑ Wikipedia: Integer
- ↑ Wikipedia: Modulo operation
- ↑ "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
- ↑ Wikipedia: Operation (mathematics)
- ↑ Wikipedia: Order of operations
- ↑ "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
- ↑ "Top Programming Terms and Definitions for Beginners [Updated]". Hackr.io. Retrieved 2021-01-22.
- ↑ Wikipedia: Real number
- ↑ Wikipedia: Statement (computer science)
- ↑ Wikipedia: String (computer science)
Lesson 2 - Functions
editThis lesson introduces functions and related code documentation.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Document code segments using comments and documentation strings
- Use indentation, white space, comments, and documentation strings; generate documentation by using pydoc
- Construct and analyze code segments that include function definitions
- Call signatures; default values; return; def; pass
Readings
edit- Wikipedia: Modular programming
- Wikipedia: Function (computer science)
- Wikipedia: Parameter (computer programming)
- Wikipedia: Scope (computer science)
- Wikipedia: Naming convention (programming)
Multimedia
edit- YouTube: Introduction to Structured Programming
- YouTube: The Disadvantages of Spaghetti Code
- YouTube: The Advantages of Modularization
- YouTube: Modularity with Functions
- YouTube: Programming Basics - Statements & Functions
- YouTube: Programming Tutorial - Function Parameters
- YouTube: How to Use Functions in Python
- YouTube: Python Programming - Variable Scope
- YouTube: Naming Convention with Programming Languages
- YouTube: Python Programming: Docstrings
Examples
editActivities
editProgramming Languages
edit- Review Wikipedia: Coding conventions, Research style guides or coding standards for your selected programming language, looking specifically for function documentation. If applicable, discuss coding conventions with your classmates and agree on a standard to follow for this course.
Coding
editComplete one or more of the following coding activities in your selected programming language. Use self-documenting variable and constant names that follow the naming conventions for your selected programming language. Include comments at the top of the source code that describe the program and are consistent with the program or module documentation conventions for your selected programming language.
- Extend the BMI program from the previous lesson. Use separate functions for input, each type of conversion, BMI calculation, and output. Avoid global variables by passing parameters and returning results. Define constants for height and weight conversions and use the self-documenting function, variable, and constant names that follow the naming conventions for your selected programming language. Include comments at the top of the source code and comments for each function that is consistent with the documentation conventions for your selected programming language.
Lesson Summary
edit- Modular programming is a software design technique that emphasizes separating the functionality of a program into reusable, independent modules.[2]
- This concept is related to structured and object-oriented programming; all have the same end goal of deconstructing a comprehensive program into smaller pieces.[2]
- Structured programming enforces consistent structures (sequential, conditional, and repetition), requiring them to have a single entry and a single exit.[3]
- Object-oriented programming entails the use of classes, a template for object instantiation. An object is a user-created data type that carries its own attributes (fields) and behaviors (methods).[4]
- A subroutine, function, or method is a sequence of instructions designed to complete a given task.[5]
- Subroutines aid in the decomposition of complicated, multi-step tasks. They also facilitate the related concepts of re-usability and maintainability.[5]
- A subroutine may receive arguments upon invocation and return values upon termination.[5]
- A subroutine may call other subroutines, or itself recursively.[5]
- A subroutine has a side-effect when it modifies external data beyond its scope. For example, a subroutine may alter its supplied argument, changing its value in the calling environment.[5]
- A variable's scope is the portion of the source code in which the binding of an identifier (user-defined name of a program element[6]) gets associated with the underlying entity.[7]
- The scope of a name binding is the region where a given identifier can be recognized or declared; it could apply to variables, functions, and user-created structures.[7]
- A variable's scope can vary from a single expression to an entire program, with many gradations and flavors in-between.[7]
- The 6 levels of scope in order from the most to least inclusive: [7]
- global scope
- module scope
- file scope
- subroutine/function scope
- block scope
- expression scope
- A declaration is global in scope if the name has effect throughout an entire program. The use of non-constant variables with global recognition is considered a harmful practice, due to the increased likelihood of colliding identifiers and unintentional masking.[7]
- When the scope is limited to a module, the identifier can only reference within the visibility of that module (which could potentially span multiple files of source code).[7]
- This idea of module scope can also be extended to class hierarchies where certain attributes or behaviors may be specified as private, only to be accessed from within the class itself.[7]
- When variables are declared outside a subroutine, they usually are declared at the file-level; their scope ranges from the point of declaration until the end of the file.[7]
- Function scope variables are local to a specific function. They go 'out of scope' when the function returns.[7]
- In most cases, the lifetime of these variables spans the duration of the function call; the entity gets created when the variable is declared, only to be permanently destroyed when the function returns.[7]
- However, some languages also have static local variables, where the lifetime of the variable is the entire lifetime of the program. The variable is only known in the context of its resident function.[7]
- A variable known only to a block is usually restricted to a construct such as a conditional statement or a loop.[7]
- Some functional languages offer a let-expression mechanism that confines a declaration's scope to a single expression.[7]
- This could be useful if you only need a temporary, intermediate value for a computation.[7]
- There are two strategies employed to resolve names: lexical scoping and dynamic scoping.[7]
- With lexical scoping, the entity referenced is known prior to run-time, determined by the lexical structure and content of the program's source code.[7]
- With dynamic scoping, the entity referenced is deduced at run-time, depending on the variable's relative position in the call stack. This method is far less popular among implemented programming languages.[7]
- Parameters are variables that receive data inside a subroutine or function.[8]
- The subroutine or function head defines a parameter list, the standard by which future calls are to be made.[8]
- Subroutines may use a ‘call by value’ strategy where the arguments passed set-up local variables inside the function. These variables don't have access to data in the calling environment, so critical processes remain safe from accidental error.[8]
- The arguments passed to a subroutine are evaluated and transferred onto the corresponding parameter.[8]
- Subroutines may also use a ‘call by reference’ strategy where the original argument itself gets referenced—not just a copy of its value; by referring to the actual memory location of the variable, it can be directly affected, producing an observable side-effect.[8]
- There is a slight distinction to be made between an "argument" and a "parameter." The argument is the value or reference supplied when calling the function; the parameter is the variable inside the function that receives the argument. For variables that are ‘passed by reference,’ both aliases would refer to the same object.[8]
- Function parameters can have default parameter values written in parameter = expression form. When calling a function with default parameters, the corresponding argument can be omitted in a call, and the default value will be used instead.[9]
- If one function parameter has a default value, then all following parameters must have default values.[10]
- Following a naming convention is to properly name a variable, function, and other identifiers according to mutual standards.[11]
- Having a consistent naming convention allows other developers to focus on issues of logic rather than syntactical concerns.[11]
- Identifiers should be relatively brief, yet long enough to adequately describe the object being identified.[11]
- A popular naming convention in some programming languages is to use lower camel case, where the first word is lower-cased and subsequent words are capitalized (e.g., getHeight(), printChart(), and so on).[11]
- Constants are typically named with all upper-case letters and if applicable, underscores separating the words (e.g., WEIGHT_CONVERSION, HEIGHT_CONVERSION, and so on).[11]
- Different programming languages have different naming conventions; following your language's convention is not essential to writing code, but strongly advised. When peers review your code, they will be well-adjusted to your style of nomenclature.[11]
Key Terms
edit- argument
- Data that gets passed to a function upon invocation. Depending on the evaluation strategy, the argument supplied may consist of variables/constants, literal values, other function calls, or an expression involving operators and the aforementioned objects.[8]
- call-by-reference
- Arguments are passed to the subroutine by direct reference, typically using the argument's address.[5]
- call-by-value
- Arguments are evaluated and a copy of the value is passed to the subroutine.[5]
- docstrings
- A string literal which appears as the first expression in a class, function or module. While ignored when the suite is executed, it is recognized by the compiler and put into the __doc__ attribute of the enclosing class, function or module. Since it is available via introspection, it is the canonical place for documentation of the object. [12]
- module
- A collection of programming objects (functions, variables, and other mechanisms) that get packaged together as reusable, adaptable code.[2]
- parameter
- Placeholder variables that receive the supplied arguments so they can be accessed inside the function.[8]
- return value
- A function may optionally return a value or another object to its calling environment upon termination. This return value may be captured or simply ignored.[13]
- scope
- The block or section of code in which the variable exists and can be referenced. This is known as its visibility.[7]
- subroutine
- A sequence of program instructions that perform a specific task, packaged as a unit.[5]
See Also
edit- Computer Programming/Functions
- Cornell University Lecture: Modular Programming
- University of Utah Resource: Functions
- DataCamp: Docstrings in Python
- GitHub: Python Docstring Style Guide
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ 2.0 2.1 2.2 Wikipedia: Modular programming
- ↑ Wikipedia: Structured programming
- ↑ Wikipedia: Object-oriented programming
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Wikipedia: Subroutine
- ↑ https://www.techopedia.com/definition/1810/identifier-c
- ↑ 7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07 7.08 7.09 7.10 7.11 7.12 7.13 7.14 7.15 7.16 7.17 Wikipedia: Scope (computer science)
- ↑ 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 Wikipedia: Parameter (computer programming)
- ↑ Python Reference Manualː Function definitions
- ↑ Python Reference Manualː Function definitions
- ↑ 11.0 11.1 11.2 11.3 11.4 11.5 Wikipedia: Naming convention (programming)
- ↑ Python 3.8.1 Glossary
- ↑ Wikipedia: Return statement
Lesson 3 - Conditions
editThis lesson introduces conditions, validation, exception handling, and defensive programming.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Construct and analyze code segments that use branching statements
- if; elif; else; nested and compound conditional expressions
- Analyze, detect, and fix code segments that have errors
- Syntax errors; logic errors; runtime errors
- Analyze and construct code segments that handle exceptions
- Try; except; else; finally; raise; assert
Readings
edit- Wikipedia: Structured programming
- Wikipedia: Conditional (computer programming)
- Wikipedia: Data validation
- Wikipedia: Exception handling
- Wikipedia: Python syntax and semantics#Exceptions - EAFP vs. LBYL
- Wikipedia: Defensive programming
Multimedia
edit- YouTube: Introduction to Structured Programming
- YouTube: The Three Basic Structures—Sequence, Selection, and Loop
- YouTube: Programming For Beginners - Relational Operators
- YouTube: Introduction to Programming - Control Flow
- YouTube: Exception Handling in Java
- YouTube: Defensive Programming
- YouTube: Python 3 Programming Tutorial: If Statement
- YouTube: Python 3 Programming Tutorial - Try and Except error Handling
- YouTube: Assert statements and unit tests (Python)
Examples
editActivities
editModify your program from the previous lesson to add input validation, parameter validation, assertions, and exception handling.
- Review Wikipedia: Data validation. Add input validation to ensure that only valid data may be entered for each input. Validate for both data type and range. Invalid input should terminate the program with an appropriate error message.
- Review Wikipedia: Parameter validation. Add parameter validation to all calculation / conversion functions to ensure that only valid parameters are passed to each function. Validate for both data type and range. Throw or raise appropriate exceptions for invalid parameters.
- Review Wikipedia: Assertion (software development). Add assertions to the output function to assure that only valid parameters are passed. Validate for both data type and range.
- Review Wikipedia: Exception handling. Add exception handling to your main function to catch any errors thrown during processing and terminate the program gracefully.
- Update program and function documentation regarding parameters and exceptions, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- Assertion is a predicate(a Boolean-valued function over the state space, usually expressed as a logical proposition using the variables of a program) connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects. [2]
- Exception handling is a means to respond to and ideally address run-time errors that occur during the execution of a program—these are errors that either cannot be handled by another mechanism or their handling would make for clumsy and inelegant design.[3]
- There are viable alternatives. Critical functions may return error codes that are subject to explicit checks by the programmer. For example, the new operator in C++ allocates dynamic memory, returning a pointer after doing so. If for some reason, the system cannot perform such a task, the pointer returned may hold the special value NULL, indicating the failure of the desired operation.[3]
- Where possible, you may also use other data validation techniques (making use of conditions and loops) to "preemptively filter exceptional cases."[3]
- Designated handling constructs separate the main logic of the program from the handling of exceptional cases into semantic blocks, resulting in code that is simpler to read and maintain.[3]
- Most implementations of exception handling feature a try-catch or in the specific case of Python, a try-except, statement. Despite syntactical differences between languages, the concept of exception handling is similar across the board.[3]
- The try clause houses code that is to be attempted. If an exception (either thrown or raised by the run-time environment or manually by the programmer) were to occur, further processing halts as the exception are passed onto the corresponding catch or except clause. There may be several of these clauses.[3]
- Generally, the intent of exception handling is to ameliorate the problem with grace. If this is not possible, you can throw or raise the exception again to be caught by a higher-level exception handler.[3]
- In Python, as well as many other languages, you may specify to catch an exception generically or precisely (by denoting the exact name of the exceptional case, either language-specified or user-defined). You can even group exceptional cases to handle by adjoining them into a tuple data structure.[4]
- There are two models to approach code that may result in exceptional cases: EAFP and LBYL.[5]
- The LBYL, "look before you leap," mentality has the programmer write code such that a precondition is tested before accessing the sought after resource.[5]
- This, however, doesn't always work as intended. If the state of an object were changing in real-time, the resource that was once safe can be rendered unsafe between the evaluation of the precondition and the later accessing of that resource. This is a specific bug known as a time of check to time of use (abbreviated TOCTTOU) race condition.[5]
- The EAFP, "it's easier to ask for forgiveness than permission," form is a staple of Python idioms. Abiding by such a philosophy, you are to simply attempt the desired action. If there is a resultant exception, it's to be handled by a try-except block. This sidesteps the potentiality for a change of state and the ensuing "race condition" that it might cause, as described above.[5]
- The LBYL, "look before you leap," mentality has the programmer write code such that a precondition is tested before accessing the sought after resource.[5]
- Structured programming aims to improve clarity, quality, and readiness by taking advantage of subroutines, block codes, loops, and more.[6]
- The discovery of what is now known as the structured program theorem contributed to the acceptance of structured programming by computer scientists.[6]
- The structured program theorem is composed of control structures: Sequence, Selection, Iteration, and Recursion. It proved that goto statements are not necessary to code programs. It led to Spaghetti code which was difficult to read, maintain, and had a complex structure.[6]
- Conditional statements in computer science allow for the selection between alternatives at runtime.[7]
- If-then-else is widely used in many programming languages. An additional else-if can be used multiple times to combine several conditions. Switch statements also offer the same concept.[7]
- Data Validation is the process of ensuring that data is both correct and useful. [8]
- There are 4 general kinds of validation: Data type validation, Range and constraint validation, Code and Cross-reference validation, and Structured validation. [8]
- Data type validation verifies that the individual characters provided through user input are consistent with the expected characters of one or more known primitive data types: integer, float (decimal), or string. [8]
- Simple range and constraint validation examine user input to ensure it falls within the minimum/maximum range or consistency with a test for evaluating a sequence of characters. [8]
- Code and cross-reference validation include tests for data type validation, combined with one or more operations to verify that the user-supplied data is consistent with one or more external rules. These additional validity constraints may involve cross-referencing supplied data with a known look-up table or directory information service. [8]
- Structured validation allows for the combination of any of various basic data type validation steps, along with more complex processing. In other words, when a piece of data is accepted and continued its operation, it does not mean that it is the correct piece of data. This method examines that the piece of data is entered correctly. [8]
- Different methods of validation include but are not limited to: Allowed character checks, Batch totals, Cardinality check, Check digits, Consistency checks, Control totals, Cross-system consistency checks, Data type checks, File existence checks, Format or picture check, Hash totals, Limit check, Logic check, Presence check, Range check, Referential integrity, Spelling and Grammar check, Uniqueness check, and Table Look Up check. [8]
- Post Validation Actions include:
- Enforcement Action - typically rejects the data entry request and requires the input actor to make a change that brings the data into Advisory Action.[8]
- Advisory Actions - typically allow data to be entered unchanged but sends a message to the source actor indicating those validation issues that were encountered. [8]
- Verification Actions - special cases of advisory actions in which the source actor can be asked to verify that this data is what they would really want to enter, in the light of a suggestion to the contrary. [8]
- Defensive programming is a form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances. [9]
- Defensive programming practices are often used where high availability, safety or security is needed. [9]
- Defensive programming is an approach to improve software and source code, in terms of:
- General quality – reducing the number of software bugs and problems. [9]
- Making the source code comprehensible – the source code should be readable and understandable, so it is approved in a code audit. [9]
- Making the software behave in a predictable manner despite unexpected inputs or user actions.[9]
- Offensive programming is a category of defensive programming, with the added emphasis that certain errors should not be handled defensively. [9]
- Defensive Programming Techniques:
- Intelligent Source Code Reuse - If existing code is tested and known to work, reusing it may reduce the chance of bugs being introduced. However, reusing code is not always a good practice, because it also amplifies the damages of a potential attack on the initial code. [9]
- Canonicalization - libraries that can be employed to avoid bugs due to non-canonical input. [9]
- Low tolerance against "potential" bugs – Assume that code constructs that appear to be problem prone are bugs and potential security flaws. [9]
Key Terms
edit- assertion
- An assertion is a predicate connected to a point in the program, that always should evaluate to true at that point in code execution. Assertions can help a programmer read the code, help a compiler compile it, or help the program detect its own defects.[2]
- Boolean expression
- An expression in a programming language that produces a Boolean value when evaluated, i.e. one of true or false.[10]
- conditional statements
- Allow for selection between alternatives at runtime.[7]
- data validation
- The process of ensuring that data have undergone data cleansing to ensure they have data quality, that is, that they are both correct and useful.[11]
- defensive programming
- A form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances.[9]
- EAFP (Easier to Ask Forgiveness than Permission)
- Approach that first attempts the desired action then handles any resulting exceptions.[5]
- exception handling
- The process of responding to the occurrence, during computation, of exceptions. Anomalous or exceptional conditions requiring special processing, often changing the normal flow of program execution.[3]
- GoTo statement
- Performs a one-way transfer of control to another line of code; in contrast, a function call normally returns control.[12]
- if statement
- An if statement is a programming conditional statement that, if proved true, performs a function or displays information.[13]
- LBYL (Look Before You Leap)
- Approach which a precondition is tested before accessing the sought-after resource.[5]
- logic error
- Error that makes the program deliver unexpected results without crashing it.[14]
- relational operator
- A programming language construct or operator that tests or defines some kind of relation between two entities, including numerical equality (e.g., 5 = 5) and inequalities (e.g., 4 ≥ 3).[15]
- runtime error
- Error produced by the runtime system if something goes wrong while a syntactically correct program is running.[14]
- software bug
- An error, flaw, failure or fault in a computer program or system that causes it to produce an incorrect or unexpected result, or to behave in unintended ways.[16]
- structured programming
- A programming paradigm aimed at improving the clarity, quality, and development time of a computer program.[6]
- syntax
- Syntax of a computer language is the set of rules that defines the combinations of symbols that are considered to be a correctly structured document or fragment in that language.[17]
- syntax error
- Error that indicates something is wrong with program syntax. It's produced by Python during the translation of the source code into byte code.[14]
- truth table
- A mathematical table used in logic, truth tables can be used to show whether a propositional expression is true for all legitimate input values, that is, logically valid.[18]
See Also
edit- Computer Programming/Conditions
- Jeff Knupp: LBYL vs. EAFP
- TutorialsPoint: Conditions Tutorial with Examples
- Open Book Project: Error Handling
- Scott Dorman: What is "Defensive Programming"?
- Python for Beginners: Exception Handling in Python
- Digital Ocean: How To Write Conditional Statements in JavaScript
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ 2.0 2.1 Wikipedia: Assertion (software development)
- ↑ 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 Wikipedia: Exception handling
- ↑ 4.0 4.1 4.2 Python Documentation: Errors and Exceptions
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 Wikipedia: Python syntax and semantics
- ↑ 6.0 6.1 6.2 6.3 Wikipedia: Structured programming
- ↑ 7.0 7.1 7.2 Wikipedia: Conditional (computer programming)
- ↑ 8.00 8.01 8.02 8.03 8.04 8.05 8.06 8.07 8.08 8.09 Wikipedia: Data Validation
- ↑ 9.00 9.01 9.02 9.03 9.04 9.05 9.06 9.07 9.08 9.09 Wikipedia: Defensive programming
- ↑ Wikipedia: Boolean expression
- ↑ Wikipedia: Data validation
- ↑ Wikipedia: Goto
- ↑ Definition: If statement
- ↑ 14.0 14.1 14.2 How to Think Like a Computer Scientist (Swarthmore Comp Sci Version)ː Debugging
- ↑ Wikipedia: Relational operator
- ↑ Wikipedia: Software bug
- ↑ Wikipedia: Syntax (programming languages)
- ↑ Wikipedia: Truth table
Lesson 4 - Loops
editThis lesson introduces loops, including while, for, and do loops.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Construct and analyze code segments that perform iteration
- while; for; break; continue; pass; nested loops and loops that include compound conditional expressions
Readings
editMultimedia
editLoop Concepts
edit- YouTube: Introduction to Programming - Iteration
- Youtube: While Loop Mistakes
- Youtube: Loops and Iterations - For/While Loops
Python Loops
edit- YouTube: Python 3 Programming Tutorial: While Loop
- YouTube: Python 3 Programming Tutorial - For loop
- Youtube: Python Nested For Loops
Examples
editActivities
edit- Extend the BMI program from the previous lesson. Rather than terminating the program on invalid input, display an error message and ask the user to enter valid input. Provide a way for the user to terminate the program instead if they choose.
- Extend the BMI program above. Add a loop to the main program that continues asking for input after displaying each BMI result. Provide a way for the user to terminate the program instead if they choose.
- Update program and function documentation, consistent with the documentation standards for your selected programming language.
- Extend the BMI program to displays a BMI table, with columns for height from 58 inches to 76 inches in 2-inch increments and rows for weight from 100 pounds to 250 pounds in 10-pound increments. Reuse the conversion/calculation functions from the BMI program above. Include appropriate program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- For loop is a control flow statement for specifying iteration, which allows code to be executed repeatedly.[2]
- For-loop is known by a clear loop counter or loop variable which allows the body of the for loop to be repeatedly executed.[2]
- For loop is typically used when the number of iterations is known before entering the loop.[2]
- For-loop is commonly the source of an infinite loop since the fundamental steps of iteration are completely in the control of the programmer.[2]
- Foreach loop means "do this to everything in this set", rather than "do this x times" which is what the traditional for loop means.[3]
- When using a count-controlled loop to search through a table, it might be desirable to stop searching as soon as the required item is found. Some programming languages provide a statement such as break, which terminates the current loop immediately, and transfers control to the statement immediately after that loop.[4]
- A loop variant is an integer expression which has an initial non-negative value.[4]
- A loop invariant is an assertion which must be true before the first loop iteration and remain true after each iteration.[4]
- Modern languages have a specialized structured construct for exception handling which does not rely on the use of GoTo or multi-level breaks or returns.[4]
- Do-while loop is an exit-condition loop. This means that the code will enter the loop and then the condition will be evaluated, unlike while loops.[5]
- A loop (like a Do-while loop) which checks its' condition after the execution of the loop body is called a post-test loop.[5]
- A do-while loop is executed at least once and may continue to loop multiple times given the condition is true. It will loop until the given condition becomes false. In other words, in the event it doesn’t, an infinite loop will be created.[5]
- Some languages use a different naming convention for this particular type of loop such as repeat until, while true, repeat while, and exit when.[5]
- Often break statements are used with infinite loops to allow for termination.[5]
- As long as the system is responsive, an infinite loop can be aborted in terminal or task manager with the Control-C command.[6]
- Continue statements suspend the current iteration and then resume as normal with the next iteration. If the iteration is the last one in the loop, the statement will end the entire loop early.[4]
Key Terms
edit- break
- A keyword available in most languages (the functionality is near-ubiquitous regardless) that exits the innermost loop structure.[4]
- continue
- A keyword that suspends the current iteration and jumps directly onto the next.[4]
- do-while loop
- A post-test loop, meaning the body is guaranteed to execute at least once, as the condition is evaluated after the fact. Often post-test loops are indefinite loops, which iterate until a sentinel value gets encountered.[5]
- for loop
- A specialized loop for iterating a given number of times (definitely), usually featuring an encapsulated head of initialization, Boolean, and incrementation (or decrementation) components.[2]
- for-each loop
- A take on the for loop, where the definite iteration can occur directly over a collection or sequence of data, e.g. a list or an array, avoiding the possibility of an off-by-one error.[3]
- infinite loop
- A common pitfall where the user is locked into a never-ending process because of an unsatisfied terminating condition.[7]
- loop counter
- The variable that controls the iterations of a loop.[2]
- nested loop
- A loop that is enclosed by an outer, surrounding loop, useful when it has two or more dimensions of elements to parse.[4]
- pass
- The pass statement is used as a placeholder for future code. It does nothing but avoid an error being thrown.[8]
- raise
- Allows the user to throw an exception at any time. If no expressions are present, the raise statement re-raises the last expression that was raised in the current scope.[9]
- sentinel
- A value that signals the completion of a process. An example is the EOF (end-of-file) marker present in many languages, representing not another character to read, but the end of the file and thus its processing.[10]
- while loop
- A pre-test loop, meaning there is no guarantee the body will execute. Like their do while counterparts, these loops are usually indefinite loops, iterating until a sentinel value gets encountered.[11]
See Also
edit- Computer Programming/Loops
- Wikipedia: While Loops
- Wikipedia: For Loops
- Tutorials Point: Loops
- Dr. Andrew N. Harrington – If Statements
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 Wikipedia: For loop
- ↑ 3.0 3.1 3.2 3.3 Wikipedia: Foreach loop
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 Wikipedia: Control flow
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 Wikipedia: Do while loop
- ↑ Wikipediaː Infinite loop
- ↑ Wikipedia: Infinite loop
- ↑ https://www.w3schools.com/python/ref_keyword_pass.asp
- ↑ https://www.python.org/dev/peps/pep-0317/]
- ↑ Wikipedia: Sentinel value
- ↑ Wikipedia: While loop
Lesson 5 - Strings
editThis lesson introduces strings and string processing.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Evaluate an expression to identify the data type assigned to each variable
- Identify str, int, float, and bool data types
- Perform data and data type operations
- Convert from one data type to another type; construct data structures; perform indexing and slicing operations
- Determine the sequence of execution based on operator precedence
- Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)
- Select the appropriate operator to achieve the intended result
- Assignment; Comparison; Logical; Arithmetic; Identity (is); Containment (in)
Readings
editMultimedia
edit- YouTube: Python Tutorial: Slicing Lists and Strings
- YouTube: Python Tutorial: String Formatting
- YouTube: Python Simple String Manipulation
- YouTube: Strings, Escape Sequences and Comments : Python Tutorial #4
- YouTube: Implement Run-length Encoding of Strings
- YouTube: Iterating Over a Python String
- YouTube: Python Tutorial for Beginners 7: Loops and Iterations - For/While Loops
- YouTube: Computer Programming - Strings
- YouTube: 20 String Methods in 7 Minutes - Beau teaches JavaScript
- YouTube: JavaScript Strings
Examples
editActivities
edit- Review Wikipedia: Run-length encoding. Create a program that asks the user for an input string of alphabetic characters. Convert the string to a run-length encoded (RLE) string of characters and numbers. Use the compressed format, where a single instance of a character has no count. For example,
AAABCC
would beA3BC2
. Use a separate function for string processing. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language. - Enhance the RLE program above to check to see if a string has numbers in it. If so, it is already in RLE format. Decode RLE strings and display the results. Strings that have no encoding should be encoded as above. Use a separate function for decoding. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
- Review Wikipedia: Escape sequence. Enhance the RLE program above by allowing a # symbol to be used as an escape sequence, indicating that the following number is a number character rather than an encoding count. Use a pair of # symbols (##) to indicate a # character. This change should allow any input sequence to be encoded. Enhance the decoding function to support the new format. Begin an encoded sequence with
##00
to indicate it is already encoded. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- Run-Length Encoding is a form of lossless data compression in which runs of data are stored as a single data value and count.[2]
- Escape Sequence is a combination of characters that has a meaning other than the literal characters contained therein; it is marked by one or more preceding (and possibly terminating) characters.[3]
- A string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. Strings are widely used in almost all programming languages as they are quite powerful. [4]
- A character in a programming language is the smallest unit of textual information.[5]
- Strings are lists or containers of individual characters 'strung' together, integrated with other useful functionality (like the ability to 'find' a character or a sub-string).[5]
- Characters may be alphabetic letters, numeric digits, punctuation marks and other symbols, whitespace, or 'control characters'.[5]
- Control characters are not printed like other symbols; instead, they communicate a more abstract idea, like the signaling of a newline or the ringing of a bell.[6]
- In some lower-level languages like C++ or Java, individual characters are of a specific data type, usually termed 'char'.[5]
- Since information stored in a computer is represented in binary format, the need for a standardized table, pairing numbers with characters, became readily apparent. Thus, ASCII, the American Standard Code for Information Interchange, was born.[7]
- ASCII is a near-ubiquitous example of an encoding scheme or a character set, the systematic mapping of code points to symbols of language.[7]
- Originally, ASCII only defined 128 characters, encompassing the entire alphabet (both lower- and upper-case), the digits (0-9), and other characters of importance.[7]
- It was subsequently expanded through various unofficial revisions in order to make full use of the 28 or 256 possibilities given a one byte series of data.[8]
- Although still relevant, ASCII is being phased out by Unicode, a superset of ASCII that implements universal (not just English-based) symbols by extending the original set of 128 characters.[9]
- String literal is a quoted sequence of characters (formally "bracketed delimiters"), as in x = "foo", where "foo" is a string literal with value foo – the quotes are not part of the value.[10]
- There are numerous alternate notations for specifying string literals and the exact notation depends on the individual programming language.[10]
- An empty string is literally written by a pair of quotes with no character at all in between.[10]
- The string must begin and end with the same kind of quotation mark and the type of quotation mark may give slightly different semantics.[10]
- A number of languages provide for paired delimiters, where the opening and closing delimiters are different.[10]
- For example: “Hi There!”, ‘Hi There!’, „Hi There!“, «Hi There!»
- String literals can contain literal newlines, spanning several lines. Alternatively, newlines can be escaped, most often as \n.[10]
- Python has a special form of a string, designed for multiline literals, called triple quoting. These literals strip leading indentation are especially used for inline documentation, known as docstrings.[10]
- Strings are a data type typically implemented as an array data structure of bytes. [4]
- In general, there are two types of strings, fixed and variable-length strings.[4]
- Fixed-length strings contain a fixed maximum length to be determined at compile time and the same amount of memory will be used.[4]
- Variable-length strings do not have a fixed length and can vary the amount of memory to be used at runtime. Variable-Length strings are also more common in modern programming languages.[4]
- Python string objects are immutable, i.e., a state of a string can't be modified once it's created. On the other side, string variables are changeable. A string variable can be assigned a new value, but this action won't affect the original string.[11][12]
- There are a variety of algorithms for processing strings:[4]
- String searching algorithms for finding a given substring or pattern
- String manipulation algorithms
- Sorting algorithms
- Regular expression algorithms
- Parsing a string
- Sequence mining
- Most programming languages offer string functions in order to manipulate a string. Refer to String functions for a list of string functions used in various languages.
- String indexing and slicing ...
- Strings and substrings can be accessed by index, with the first character receiving an index of 0, and all others with an index incremented from that of the previous character.
- All characters including white spaces are given an index.
- Though implementation varies between languages, various functions such as slicing, and concatenating can be done by utilizing index. Refer to String functions
Key Terms
edit- concatenation
- When a sequence of symbols in string S is joined/followed by the sequence of characters in string T, and is denoted string ST.[4]
- escape sequence
- An escape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly. [13]
- fixed-length strings
- Fixed length strings have a fixed, maximum length to be determined at compile time and use the same amount of memory whether the maximum is needed or not.[4]
- prefix
- A string A = a1, a2, …an has a prefix  = a1, a2, … am when m ≤ n. A proper prefix of the string A would not be equal to itself (0 ≤ m < n).[4]
- reversal
- The reverse of a string is a string with the same symbols but in reverse order.[4]
- rotation
- A string s = uv is said to be a rotation of t if t = vu.[4]
- run-length encoding
- run-length encoding (RLE) is a form of lossless data compression in which runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run.[14]
- string
- Traditionally a sequence of characters, either as a literal constant or as some kind of variable.[4]
- string datatype
- A datatype modeled on the idea of a formal string.[4]
- string literal
- When a string appears literally in source code, also known as an anonymous string.[4]
- substring
- Occurs when one string is a prefix of a suffix of an original string, and equivalently a suffix of a prefix.[4]
- suffix
- Any substring of an original string that includes the original string’s last letter, including itself. A proper suffix of a string is not equal to/the same as the string original string itself.[4]
- variable-length strings
- Variable-length strings have a length that is not arbitrarily fixed and can use varying amounts of memory depending on the actual requirements at run time.[4]
See Also
edit- Programming Fundamentals/Strings
- Python Programming/Strings
- Python Concepts/Strings
- Wikipedia: Comparison of programming languages (string functions)
- Programming Basics: Simple Introduction to Strings(activity)
- Wikipedia: Comparison of programming languages (strings)
- Wikipedia: ASCII
- Wikipedia: String literal
- Stack Abuse: Run-Length Encoding
- YouTube: Run-length encoding explained
- Mark's Head: State Machines
- YouTube: Data Compression: Run Length Encoding (RLE)
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ https://en.wikipedia.org/wiki/Run-length_encoding
- ↑ https://https://en.wikipedia.org/wiki/Escape_sequence
- ↑ 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 4.14 4.15 4.16 Wikipedia: String (computer science)
- ↑ 5.0 5.1 5.2 5.3 Wikipedia: Character (computing)
- ↑ Wikipedia: Control character
- ↑ 7.0 7.1 7.2 Wikipedia: ASCII
- ↑ Wikipedia: Extended ASCII
- ↑ Wikipedia: Unicode
- ↑ 10.0 10.1 10.2 10.3 10.4 10.5 10.6 Wikipedia: String literal
- ↑ How to Think Like a Computer Scientistː Interactive Edition
- ↑ Wikipediaː Immutable object
- ↑ Wikipedia: Escape Sequences in C
- ↑ https://en.wikipedia.org/wiki/Run-length_encoding
Lesson 6 - Files
editThis lesson introduces files and file processing.
Objectives and Skills
editObjectives and skills for this lesson include:[1]
- Construct and analyze code segments that perform file input and output operations
- Open; close; read; write; append; check existence; delete; with statement
- Construct and analyze code segments that perform console input and output operations
- Read input from console; print formatted text; use of command line arguments
Readings
edit- Wikipedia: Computer file
- Wikipedia: File system
- Wikipedia: Directory (computing)
- Wikipedia: Directory structure
- Wikipedia: Text file
- Wikipedia: Binary file
Multimedia
edit- YouTube: File Systems
- Youtube: Reading and Writing to Files in Python
- Youtube: Reading and Writing to Files in Java
- YouTube: Writing Binary Files
Examples
editActivities
edit- Review Wikipedia: Password strength. Create a program that asks the user for an input password. If your programming language or library supports it, get the input without echoing the characters as they are entered. Determine the entropy of the input password based on length of password and the number of different character sets used in the password (e.g. Entropy/Strength Test). Use a separate function to determine password entropy. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
- Review Wikipedia: Dictionary attack. Enhance the program above by downloading a dictionary of English words as a text file: (i.e. GitHub) Use a separate function to check the password and see if it matches one of the dictionary words. Inform the user if their password is susceptible to a dictionary attack. Use exception handling for all file operations. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
- Enhance the program above by downloading a text file of common passwords: (i.e. GitHub) Use the function above to check the password and see if it matches a common password. Inform the user if their password susceptible to a common password attack. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language.
- Enhance the program above by saving all passwords entered by the user in a text file. Use the function above to check the password and see if it matches a previously entered password. Validate parameters and update program and function documentation, consistent with the documentation standards for your selected programming language. The final program will check password strength and validate passwords against an English dictionary, a common password list, and a recently-used password list.
Lesson Summary
edit- Python allows you to open files in four different modes: read ("r"), write ("w"), append ("a"), and create ("x").[2]
- A computer's file system is a division of information and data into files and directories (folders).[3]
- A file system consists of two or three layers. Sometimes the layers are explicitly separated, and sometimes the functions are combined.[3]
- The logical file system is the first layer and is responsible for interaction with the user application.[3]
- The second optional layer is the virtual file system which allows support for multiple concurrent instances of physical file systems.[3]
- The third layer is the physical file system. This layer is concerned with the physical operation of the storage device and handles tasks such as buffering and memory management.[3]
- Windows makes use of the FAT, NTFS, exFAT, Live File System and ReFS file systems while macOS uses a HFS Plus file system along with the term "Mac OS Extended".[3]
- File systems include utilities to initialize, alter parameters of and remove an instance of the file system.[3]
- Directory utilities may be used to create, rename and delete directory entries.[3]
- File utilities create, list, copy, move and delete files, and alter metadata. They may be able to truncate data, truncate or extend space allocation, append to, move, and modify files in-place.[3]
- Some of the most important features of file system utilities involve supervisory activities which may involve bypassing ownership or direct access to the underlying device.[3]
- A directory-based file system is one where directories coordinate the organization and retrieval of information.[4]
- Although rare, some embedded computers have no directories (everything is a file) or no ability to store directories inside other directories (thereby flattening the computer's storage).[4]
- If one is referring to a container of documents, the term folder is more appropriate. The term directory refers to the way a structured list of document files and folders is stored on the computer.[4]
- Current operating systems typically allow for long filenames, more than 250 characters per pathname element.[5]
- A fully qualified filename is a textual string that includes the path of the file, as well as its unique identifier and extension (e.g., C:\Users\cklei\Desktop\hello_world.py).[5]
- UNIX-like operating systems use the Filesystem Hierarchy Standard. All files and directories appear under the root directory "/", even if they are stored on different physical devices.[5]
- On Windows, files get arranged in a hierarchical forest of trees, where each tree root is a "drive letter" labeling the memory space, such as C in the path C:\Program Files.[3]
- Text files are employed for human-readable storage of information, notable for their simplicity.[6]
- On Windows, a newline is signaled by the carriage return and line feed characters in unison (CRLF); on UNIX-like systems, including macOS devices, the newline is simply communicated by the line feed character.[6]
- Binary files are usually imagined to be a series of bytes, which is a group of eight bits itself.[7]
- They often represent things other than characters (otherwise, you'd likely opt for a text file).[7]
- Binary files will sometimes be handled by mechanisms that can only deal with textual data; Base64 is an encoding scheme that makes such a translation possible.[7]
- When two computers (or systems) can run the same executable, they are said to be 'binary compatible'.
- Some software companies produce applications for Windows and the Macintosh that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file produced on a Macintosh. [7]
- This also makes it possible to run programs built for deprecated versions of Windows on newer systems.[7]
- A hex editor is specially designed to view binary files as chunks of hexadecimal (or decimal or binary) values. If a binary file is opened in a text editor, meanwhile, each group of eight bits is usually translated into a single character.[7]
- When the 'with' keyword is used together with the open() method in Python, the file doesn't need to be closed. It will close automatically after the execution of the 'with' statement.[8]
Key Terms
edit- absolute path
- A full path or location of a file, including the root directory, that points to the same location in a file system regardless of current working directory.[9]
- binary file
- A file that is non-textual—a raw sequence of bytes.[7]
- directory structure
- Files are stored in a hierarchical tree structure as the childless leaves of directories (or folders).[5]
- directory
- A structure that contains references to other computer files and possibly other subdirectories.[4]
- file system
- The way in which data is stored and retrieved on the machine.[3]
- file utilities
- File utilities allow users to create, list, copy, move and delete files, and alter metadata.[3]
- fully-qualified filename
- A string that uniquely identifies a file stored on the computer by including the path, name, and extension of the file.[5]
- metadata
- Information that is typically associated with each file within a file system. File systems might store the file creation time, the time it was last accessed, etc.[3]
- parent & children
- A parent directory houses "children" files or subdirectories. A child is a file or subdirectory housed in a parent directory.[4]
- path
- The location of a file among the hierarchy of directories (possibly indicating a storage device, as well).[4]
- relative path
- A path or location of a file that starts from the current working directory.
- root
- The highest-level directory in the file system hierarchy, found in UNIX-like operating systems.[10]
- text file
- A file that is structured as a sequence of lines of electronic text, it exists stored as data within a computer file system.[6]
See Also
edit- Computer Programming/Strings
- Wikipedia: Comparison of programming languages (string functions)
- Wikipedia: Exception handling syntax
- UFS Explorer: Understanding File Sytems
- Youtube: Windows File Structures and Paths
- Python Documentation: Reading and Writing Files
References
edit- ↑ Microsoft: Exam 98-381 Introduction to Programming Using Python
- ↑ https://www.w3schools.com/python/python_file_handling.asp
- ↑ 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 3.12 3.13 Wikipedia: File system
- ↑ 4.0 4.1 4.2 4.3 4.4 4.5 Wikipedia: Directory (computing)
- ↑ 5.0 5.1 5.2 5.3 5.4 Wikipedia: Directory structure
- ↑ 6.0 6.1 6.2 Wikipedia: Text file
- ↑ 7.0 7.1 7.2 7.3 7.4 7.5 7.6 Wikipedia: Binary file
- ↑ 8 PEP 343: The 'with' statement
- ↑ Wikipedia: Path (computing)
- ↑ Wikipedia: Root directory
Lesson 7 - Lists and Tuples
editThis lesson introduces lists and tuples.
Objectives and Skills
editObjectives and skills for this lesson include:
Readings
edit- Wikipedia: Data structure
- Wikipedia: Array data structure
- Wikipedia: List (computer science)
- Wikipedia: Record (computer science)
- Python for Everyone: Tuples
Multimedia
edit- YouTube: Python Tutorial for Beginners: Lists, Tuples, and Sets
- YouTube: Multi-Dimensional Lists
- YouTube: Introduction to Arrays
- YouTube: Python - Looping Through Two Dimensional Lists
- YouTube: Introduction to Linked List
- YouTube: Data Structures: Linked Lists
- YouTube: Difference Between List, Tuple, Set, and Dictionary in Python
- YouTube: Tuples in Python: Create, Access, Concat, and More
- YouTube: Python Tutorial: Sorting Lists, Tuples, and Objects
Examples
editActivities
edit- Download Northwind Customers. Write a program to read the file and create a customers list, where each element in the list is a customer list or tuple.
- Provide an interface for the program above that allows the user to:
- Display company name, contact name, and phone number for all customers sorted by company name.
- Display contact name, company name, and phone number for all customers sorted by contact name.
- Search for a given company name or part of a name and display matching records with fields labeled.
- Search for a given contact name or part of a name and display matching records with fields labeled.
- For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- A data structure is a means to organize and store data for later retrieval and modification. It's typically implemented as a contiguous, homogeneous container of elements, either dynamic or fixed in size.[1]
- Data structures include:[1]
- Arrays or associated arrays
- Lists or linked lists
- Records
- Objects
- An array holds elements that can be randomly accessed through the use of an index—without the need for sequential traversal.[2]
- Typically, this index is zero-based, since the index is used to calculate an offset from the base address.[2]
- A one-dimensional array is a type of linear array where access is limited to just one subscript which can either represent a row or column index.[2]
- For a multidimensional array, the element with indices i, j would have address B + c · i + d · j, where the coefficients c and d are the row and column address increments, respectively.[2]
- Arrays are used to implement mathematical vectors and matrices, as well as other kinds of rectangular tables. Many databases, small and large, consist of (or include) one-dimensional arrays whose elements are records.[2]
- Arrays can be used to implement other data structures, such as lists, heaps, hash tables, deques, queues, stacks, strings, and VLists.[2]
- Data structures provide a means to manage large amounts of data efficiently for uses such as large databases and internet indexing services. Usually, efficient data structures are key to designing efficient algorithms.[1]
- A list can often be constructed by writing the items in sequence, separated by commas, semicolons, and/or spaces, within a pair of delimiters such as parentheses, brackets, braces, or angle brackets. Lists are then able to expand and shrink. They are stored dynamically in memory.[3]
- A list is an abstract data type—a mathematical model for implementing a concrete representation—that holds a finite number of items or elements.[3]
- Often, this is realized through the use of a linked list data structure, where each node contains not just data but also a link to the next (and sometimes previous) node.[3]
- Lists can hold data, but they can also contain any number of sublists, which are capable of having their own sublists further down the chain.[3]
- A record is a data structure that collects together different variables, usually of different types.[4]
- A record may sometimes have a key (or several). This key is a field or set of fields in the record that serves as the record's identifier. A unique key is often called the primary key, or simply the record key.[4]
- If you were storing employee records, you could use the unique company ID field to look-up employees without knowing any other information (remember that even names may conflict).[4]
- Modern computer languages usually allow the programmer to define their own record type.[4]
- When you define a record type, you specify the data type of each field and the associated identifier through which it can be accessed.[4]
- Records may exist in any medium, but they are often written to persistent storage devices such as magnetic tapes or hard disks. Records are fundamental components of many data structures, especially linked data structures.[4]
- Record operations include but are not limited to:
- Declaration of a new record, including the relative position in the record, type, and name of each field.
- Declaration and construction of a variable as belonging to a given record type.[4]
- Selection of a record field by using an explicit name.[4]
- Assignment of a record to another.[4]
- Comparison of two records for equality.[4]
- You could even view the parameters of a function as record fields which receive values (the arguments).[4]
- In Python, tuples are much like a list; the important distinction is that tuples are immutable, whereas lists are not.[5]
- The values stored in Python tuples can be of any type; elements are indexed by non-negative integers.[5]
- Tuples and lists are ordered collections of objects in that the order in which you define the elements is an intrinsic property to the data structures and does not change unless explicitly altered. [6]
- Python provides various techniques for sorting data.
- A sorted() built-in function sorts lists and tuples in ascending order. It returns a new sorted list.[7]
- A list.sort() built-in method modifies a list or tuple by sorting it in ascending order. It returns None.[8]
- sort() takes 2 parameters — a key, and a boolean value. The key can be a function that specifies how to sort. The boolean is set to true if you want the values reversed, and false if you do not want it reversed, the default is false.[9]
- Both list.sort() and sorted() allow reverse sorting by accepting a reverse parameter with a boolean value.[10]
- CSV is a built-in Python function that is used to parse strings based on a selected character, usually ",". This is extremely helpful for reading "Northwind Customers." For more info on how to use CSV: Reading CSV Files in Python
Key Terms
edit- array
- A container of elements stored in contiguous space, typically of the same data type.[1]
- class
- A data structure that contains members, like a record, as well as various methods which operate on these members.[1]
- data structure
- A collection of related values, often organized in lists, dictionaries, tuples, etc.[11]
- field
- A variable which is one of multiple parts of a record.[4]
- hash table (hash map)
- is a data structure which implements an associative array abstract data type, a structure that can map keys to values.[12]
- heap
- Is a specialized tree-based data structure that satisfies the heap property: if P is a parent node of C, then the key (the value) of P is either greater than or equal to (in a max heap) or less than or equal to (in a min heap) the key of C.[13]
- linked list
- A collection of elements called nodes, where each node has a value and points to the next node in the list (and sometimes the previous).[1]
- list
- A list is a data structure that is a mutable, or changeable, ordered sequence of elements. Each element or value that is inside of a list is called an item. Lists are defined by having values between square brackets.
- member
- A single datum of a record; for example, the 'Name' field of a 'Person' record.[4]
- record
- A structure used to collect multiple variables, often of different types stored as fields.[4]
- struct
- Another word for a record.[4]
- tagged union
- A union that contains one additional field indicating the current type for enhanced type safety.[1]
- tuple
- A sequence of immutable objects. Tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.
- union
- A data structure where a number of primitive types may be stored in concert, similar to a struct or a record.[1]
See Also
edit- Data structures
- Python Programming/Lists
- YouTube: How To Implement Linked Lists With Test Driven Development In JavaScript
- JavaScript - Parse CSV data into an array
- Implementing Bubble Sort in JavaScript
References
edit- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Wikipedia: Data structure
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 Wikipedia: Array data structure
- ↑ 3.0 3.1 3.2 3.3 Wikipedia: List (abstract data type)
- ↑ 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 4.14 Wikipedia: Record (computer science)
- ↑ 5.0 5.1 Python: Tuples
- ↑ https://realpython.com/python-lists-tuples/
- ↑ Pythonː Sorting
- ↑ Pythonː Sorting
- ↑ https://www.w3schools.com/python/ref_list_sort.asp
- ↑ Pythonː Sorting
- ↑ "Chapter 10 | Python For Everyone | Trinket". books.trinket.io. Retrieved 2018-07-10.
- ↑ Wikipedia: Hash table
- ↑ Wikipedia: Heap (data structure)
Lesson 8 - Dictionaries and Sets
editThis lesson introduces dictionaries and sets.
Objectives and Skills
editObjectives and skills for this lesson include:
Readings
edit- Wikipedia: Data structure
- Wikipedia: Dictionary (data structure)
- Wikipedia: Hash table
- Wikipedia: Key-value database
- Wikipedia: Set (abstract data type)
- Python for Everyone: Dictionaries
Multimedia
edit- YouTube: Python Dictionary
- YouTube: Sort by Lambda
- YouTube: Python Tutorial - 24. Sets and Frozen Sets
- YouTube: Python: Combining Dictionaries with Lists
- YouTube: Python Programming Tutorial- 15 Using Dictionaries with For Loops
- YouTube: How to Sort a List of Dictionaries
- YouTube: Reading CSV File Into a Python Dictionary
Examples
editActivities
edit- Download GitHub: Northwind Customers. Write a program to read the file and create a customers list, where each element in the list is a dictionary of information for a given customer.
- Provide an interface for the program above that allows the user to:
- Display company name, contact name, and phone number for all customers sorted by company name.
- Display contact name, company name, and phone number for all customers sorted by contact name.
- Search for a given company name or part of a name and display matching records with fields labeled.
- Search for a given contact name or part of a name and display matching records with fields labeled.
- For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- A data structure is a particular way of organizing and storing data in a computer so that it can be accessed and modified efficiently.[1]
- An associative array, map, or dictionary is an abstract data type that collects key-value pairs, where each key only appears a single time in the container.[2]
- This is often implemented using a hash table—an array with a hash function that maps keys to indices, avoiding collisions when possible.[2]
- These data structures are unordered collections of unique key-value pairs that do not maintain the elements' position or order, unlike lists or tuples. [3]
- Dictionaries vary from lists since lists are a sequence of objects accessed in a certain order. Dictionaries can hold any object and use certain keys to access them. In a list, looping and comparing have to be used to check for a certain variable while dictionaries just need a specific integer or string.
- The operations that are usually defined for an associative array are:[2]
- Add or insert: add a new key, value pair to the collection, binding the new key to its new value. The arguments to this operation are the key and the value.[2]
- Reassign: replace the value in one of the key, value pairs that are already in the collection, binding an old key to a new value. As with an insertion, the arguments to this operation are the key and the value.[2]
- Remove or delete: remove a key, value pair from the collection, unbinding a given key from its value. The argument to this operation is the key.[2]
- Lookup: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some associative array implementations raise an exception.[2]
- In addition, associative arrays may also include other operations such as determining the number of bindings or constructing an iterator to loop over all the bindings.[2]
- Dictionaries are typically implemented in programming languages as either a hash table or a search tree. [4]
- Python dictionaries are collections that are indexed by keys, which can be any immutable type.
- Dictionaries may be nested and are accessed using the syntax dictionary[key][subkey]
- Dictionary items may be accessed with a for loop.
- Dictionaries are unordered. Dictionary items may be displayed in key order by sorting a list of keys using the sorted() function and the dictionary.keys()method.
- Serialization, which produces a text or binary representation of the original objects that can be written directly to a file, offers a solution to use associative arrays in permanent form.[2]
- Many programming languages provide hash table functionality, either as built-in associative arrays or as standard library modules.[5]
- When storing a value into a hash table, its key is manipulated to produce a valid array index (non-negative integer). As long as there is no element occupying the location, the value is then placed into this 'bucket'.[5]
- When retrieving the value using the associated key, this process is reversed. This, of course, becomes tricky when collisions have to be resolved.[5]
- The idea of hashing is to distribute the entries (key/value pairs) across an array of buckets. Given a key, the algorithm computes an index that suggests where the entry can be found.[5]
- The main advantage of hash tables over other table data structures is speed. This advantage is more apparent when the number of entries is large.[5]
- Hash tables are particularly efficient when the maximum number of entries can be predicted in advance, so that the bucket array can be allocated once with the optimum size and never resized.[5]
- Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. Therefore, almost all hash table implementations have some collision resolution strategy to handle such events, such as:[5]
- A critical statistic for a hash table is the load factor, defined as: Load Factor = n/k[5]
- n represents the number of entries occupied in the hash table and whereas k is the number of buckets. As the load factor grows larger, the hash table becomes slower.[5]
- If one considers every structure yielded by packaging and/or indexing, there are four basic data structures:[6]
- A set is a data structure that stores values without any particular order and without repeated values.[6]
- Unlike other containers, rather than retrieving a specific element, one typically tests a value for membership in the set.[6]
- Depending on the language, sets may be frozen, meaning they do not change after they are constructed.[6]
- Static sets allow only query operations on their elements — such as checking whether a given value is in the set or enumerating the values in some arbitrary order.[6]
- Dynamic or mutable sets allow the insertion and deletion of elements from the set.[6]
- Key-value databases can use consistency models ranging from eventual consistency to serializability. Some support ordering of keys. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks.[7]
- Because optional values are not represented by placeholders as in most Relational Databases, key-value databases often use far less memory to store the same database.[7]
- Key-value systems treat the data as a single opaque collection, which may have different fields for every record.This offers considerable flexibility and more closely follows modern concepts like object-oriented programming.[7]
- There are multiple ways to create a dictionary in Python.
- The dict() constructor builds dictionaries directly from sequences of key-value pairs.[8]
- Exampleː
dimensions = dict([('width', 12), ('length', 10), ('depth', 3)])
.
- Exampleː
- Dictionary comprehensions can be used to create dictionaries from arbitrary key and value expressions.[8]
- Exampleː
dimensions = {width: 12, lengthː 10, depthː 3
}.
- Exampleː
- When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments.[8]
- Exampleː
dimensions = dict(width=12, length=10, depth=3)
.
- Exampleː
- The dict() constructor builds dictionaries directly from sequences of key-value pairs.[8]
Key Terms
edit- associative array, dictionary, hash, map
- Alternative names for a data structure that contains key-value pairs, where each key only appears a single time in the container.[2]
- coalesced hashing
- A hybrid of chaining and open addressing, coalesced hashing links together chains of nodes within the table itself.[5]
- frozen set
- Also known as static sets, these sets do not change after they are constructed. Static sets allow only query operations on their elements.[6]
- hash collisions
- Occur when two entries are generated using the same index key. This has a high chance of occurring when hashing a random subset of a large set of possible keys.[5]
- hash table (hash map)
- is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.[5]
- key
- A datum used to access the value placed in the corresponding dictionary bucket. If the ID number of an employee were a unique field, entering this as the 'key' would pull up the rest of the employee's information.[5]
- multiset
- Similar to a plain set but one that allows for repeated values and duplicates.[6]
- merge algorithms
- are a family of algorithms that take multiple sorted lists as input and produce a single list as output, containing all the elements of the inputs lists in sorted order. These algorithms are used as subroutines in various sorting algorithms, most famously merge sort.[9]
- multimap
- generalizes an associative array by allowing multiple values to be associated with a single key.[2]
- serialization
- Serialization produces a text or binary representation of the original objects that can be written directly to a file and offers a solution to use associative arrays in permanent form.[2]
- set
- A data structure that stores values without any particular order and without repeated values.[6]
- value
- The data is retrieved using the key. It doesn't have to be a single object; it could be another container or structure itself.[5]
See Also
edit- Data structures
- Python Programming/Dictionaries
- Tutorials Point: Python Dictionaries
- Tutorials Point: Java Dictionaries
- SlideServe: Lists and dictionaries (data structure)
- YouTube: Implementing Dictionaries using the Map Class in Java
- Interactive Java Tutorial
- Kite: Read Rows of CSV File
- Coursera: Using Tuples and Data Dictionaries
References
edit- ↑ Wikipedia: Data structure
- ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 Wikipedia: Associative array
- ↑ https://python-reference.readthedocs.io/en/latest/docs/dict/
- ↑ https://en.wikiversity.org/wiki/Python_Programming/Dictionaries#cite_note-20
- ↑ 5.00 5.01 5.02 5.03 5.04 5.05 5.06 5.07 5.08 5.09 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 Wikipedia: Hash table
- ↑ 6.00 6.01 6.02 6.03 6.04 6.05 6.06 6.07 6.08 6.09 6.10 6.11 6.12 Wikipedia: Set (abstract data type)
- ↑ 7.0 7.1 7.2 Wikipedia: Key-value database
- ↑ 8.0 8.1 8.2 Python: Data structures
- ↑ Wikipedia: Merge algorithm
Lesson 9 - RegEx
editThis lesson introduces regular expressions.
Objectives and Skills
editObjectives and skills for this lesson include:
- Understand regular expression use and syntax
- Read directories and files
- Apply regular expression processing to large data files
Readings
edit- Wikipedia: Regular expression
- Wikipedia: Metacharacter
- Wikipedia: Kleene star
- Wikipedia: Greedy algorithm
- Python for Everyone: Regular expressions
Multimedia
edit- YouTube: Regular Expressions
- YouTube: Matching Patterns
- YouTube: Matching Patterns part 2
- YouTube: Regex Pattern in Java
- YouTube: Regular Expressions (RegEx) Tutorial Playlist
- YouTube: Learn Regular Expressions: Groups
Examples
editActivities
edit- Complete one or more of the following tutorials:
- Review Wikitech: Analytics/Archive/Data/Pagecounts-all-sites. Use Wikimedia Dumps to download two files of hourly log data for all Wikimedia projects. The filename of each log file is in the format
pageviews-yyyymmdd-hh0000
. The format for each log file is:
domain_code page_title count_views total_response_size
Wikiversity's domain_code is en.v. Sample data files are available at Sample Data 1 and Sample Data 2. Write a program that reads all pageview log files in the current directory and uses RegEx groups to parse the data. For en.v records only, create a dictionary using page_title as the key and the sum of count_views as the value. - After reading all files and summing count_views, display the top 100 pages and corresponding count_views sorted in descending order by count_views, and alphabetically in the case of a tie.
- The format for a page_title is
Title[/Subpage...]
. The title without subpages may be considered the overall learning project. Iterate over the dictionary and use RegEx to separate titles from subpages. Create a separate dictionary with a key for for each learning project and the sum of the page and its subpage count_views as the value. Display the top 100 learning projects and corresponding count_views sorted in descending order by count_views, and alphabetically in the case of a tie. - For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- Regular expressions, also known as regexes, comprise a language within themselves in order to express patterns that match substrings and text.[1]
- The phrase regular expressions is often used to mean the specific, standard textual syntax for representing patterns that matching text need to conform to. Each character in a regular expression is understood to be a metacharacter, with its special meaning, or a regular character, with its literal meaning.[1]
- Regular expressions are used in search engines, search and replace dialogs of word processors and text editors, in text processing utilities such as sed and AWK and in lexical analysis.[1]
- Other applications include data validation, data scraping (especially web scraping), data wrangling, simple parsing, the production of syntax highlighting systems, and many other tasks.Regular-Expressions.info
- While regexes would be useful on Internet search engines, processing them across the entire database could consume excessive computer resources depending on the complexity and design of the regex.[1]
- Metacharacters and literal characters can be used to identify textual material of a given pattern, or process a number of instances of it. Pattern-matches can vary from a precise equality to a very general similarity (controlled by the metacharacters).[1]
- Capturing groups (...) are useful to capture a sub-expression (or full expression) of a regular expression match as a numbered group which can then be back-referenced by that number.[1]
- The caret '^' and the dollar sign '$' are examples of regex metacharacters, both referred to as anchors. The caret matches from the beginning of a line or string, while the dollar sign represents the end of a line or string.[1]
- The language is separated into metacharacters that are imbued with a special meaning by the engine and literal characters that are to be taken as is.[2]
- If you want to literally use a symbol that is a metacharacter, you must escape it first, usually with a backslash ('\'). [2]
- Depending on the regex processor there are 14 metacharacters. If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash in order to drop their special meaning and be treated literally inside an expression: [2]
- the open/close square brackets, "[" and "]”
- the backslash "\"
- the caret "^"
- the dollar sign "$"
- the period or dot "."
- the vertical bar or pipe symbol "|"
- the question mark "?"
- the asterisk "*"
- the plus-sign "+"
- open/close curly braces, "{" and "}"
- open/close parenthesis, "(" and ")".
- In many regular expression engines, the . (dot) character matches any character, not just a dot.[2]
- In many programming languages, strings are delimited using quotes. Using an escape character is one method to avoid delimiter collision. Example : "He said : \"Hello\"".[2]
- In mathematical logic and computer science, the Kleene star (or Kleene operator), which is widely used for regular expressions, is a unary operation, either on sets of strings or on sets of symbols or characters.[3]
- The Kleene star (*) is a quantifier used for matching zero or more characters, whereas the Kleene plus (+) is used for matching one or more characters.[3]
- These operations are named after the mathematician Stephen Cole Kleene who formulated the basic conceptual theory behind regex.[1]
- A greedy algorithm is one that chooses the "local optimum" at every possible stage in the solving of a problem, hoping to find a holistic solution that is maximally optimized or at least passable.[4]
- In the context of regular expressions, a greedy match is one that matches as many characters as it can given the pattern. A lazy match is one that matches just the first occurrence of the pattern encountered.[4]
- A greedy algorithm could fail to produce the optimal solution, and may even produce the unique worst possible solution.[4]
- The matching pursuit is an example of greedy algorithm applied on signal approximation.[4]
- In Python, dictionaries cannot be iterated through since the values rely on their keys. Because of this, it is often useful to make a sorted list of tuples consisting of the dictionaries content. Thomas Cokelaer's Blog contains the syntax for creating a sorted list of tuples easily.
- Usually, regex patterns are expressed in Python code using raw string notation. For example,
r"\n"
.[5]- Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals.[5]
- The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'.[5]
- The difference between re.findall() and re.finditer():[6]
- re.findall(): return all non-overlapping matches of pattern in string, as a list of strings.
- re.finditer(): return an iterator yielding match objects over all non-overlapping matches for the RE pattern in string.
- The difference between Group(), Groups() and Groupdict():[7]
- A group() expression returns one or more subgroups of the match.
- A groups() expression returns a tuple containing all the subgroups of the match.
- A groupdict() expression returns a dictionary containing all the named subgroups of the match, keyed by the subgroup name.
Key Terms
edit- escape character
- If you want to literally use a symbol that is a metacharacter, you must escape it first, usually with a backslash ('\'). Using an escape character is one method to avoid delimiter collision.[2]
- greedy algorithm
- Is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum.[4]
- Kleene star
- A quantifier used for matching zero or more characters. The Kleene plus (+) is used for matching one or more characters.[3]
- metacharacter
- A character that fulfills a special purpose or function and is no longer to meant be taken literally.[1]
- monoid
- an algebraic structure with a single associative binary operation and an identity element [3]
- pattern
- A regular expression against which text is either matched or captured.[1]
- quantifier
- A modifier that follows another regex token, enumerating the number of times you expect it to appear.[1]
- regex processor
- A processor that translates a regular expression in the above syntax into an internal representation which can be executed and matched against a string representing the text being searched.[1]
- regular expression
- is a sequence of characters that define a search pattern. Usually, this pattern is used by string searching algorithms for "find" or "find and replace" operations on strings, or for input validation.[8]
- string-matching algorithms
- is an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.[9]
- wildcard
- The dot (.) metacharacter is a wildcard, a generic character that stands in for anything.[1]
See Also
edit- Regular expressions
- RegexOne: Learn Regular Expressions
- Regex 101: Online Regex Tester and Debugger
- TutorialsPoint: Python Regular Expression
- RegExLib: Regex Cheat Sheet (common expressions)
- RegExr: RegEx Interactive Website
- Guru99: Python Regular Expressions Complete Tutorial
- Python.org: Regular Expression HOWTO
- Python.org: Regular Expressions Documentation
- Regular Expressions
- YouTube: 2.6: Regular Expressions: test() and match() - Programming with Text
References
edit- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 Wikipedia: Regular expressions
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 Wikipedia: Metacharacter
- ↑ 3.0 3.1 3.2 3.3 Wikipedia: Kleene star
- ↑ 4.0 4.1 4.2 4.3 4.4 Wikipedia: Greedy algorithm
- ↑ 5.0 5.1 5.2 Pythonː Regular expression operations
- ↑ Python.org: Regular expression operations
- ↑ Python.org: Regular expression operations
- ↑ Wikipedia: Regular expression
- ↑ Wikipedia: String-searching algorithm
Lesson 10 - Internet Data
editThis lesson introduces Internet data, including XML and JSON.
Objectives and Skills
editObjectives and skills for this lesson include:
Readings
edit- Wikipedia: HTML
- Wikipedia: JSON
- Wikipedia: Markup language
- Wikipedia: Query string
- Wikipedia: Representational state transfer
- Wikipedia: Tree (data structure)
- Wikipedia: XML
Multimedia
edit- Youtube: JSON overview
- YouTube: Python 3 Programming Tutorial - Parsing Websites with re and urllib
- YouTube: Nested Dictionaries with For Loops
- YouTube: JSON in Python
- YouTube: Iterating Through JSON
- YouTube: Python Tutorial: Datetime Module - How to work with Dates, Times, Timedeltas, and Timezones
- YouTube: XML Video Tutorial
- YouTube: JSON in Python with json Library
Examples
editActivities
editXML Activities
edit- Review mw:Help:Export and mw:Manual:Parameters to Special:Export. Create a program that uses a link to Special:Export to export all pages in
Category:Applied_Programming
. - Use an XML library to parse the XML and display site name, namespace (<ns>) by name, page title, and page text for each page in the category. You will need to build and use a dictionary to save namespaces as key-value pairs with the ns value and corresponding name.
- Display a list of all page titles and last revision date sorted in descending order by timestamp. Display the date and time in local time rather than Zulu (UTC) time.
- For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
JSON Activities
edit- Review Wikitech:Analytics/AQS/Pageviews. Create a program that gets the top 1000 most visited articles for each month within the previous 12 months. Only one month may be requested at a time, so 12 different requests will be required. Use the
all-days
option to get data for all days within the month. - Use a JSON library to parse the JSON and each article and views count to a dictionary of key-value pairs, summing the corresponding views counts for pages visited during multiple months. Display the top 1000 articles and views in descending order by views, and alphabetically in the case of a tie.
- The format for an article title is
Title[/Subpage...]
. The title without subpages may be considered the overall learning project. Iterate over the dictionary and use RegEx to separate titles from subpages. Create a separate dictionary with a key for for each learning project and the sum of the page and its subpage views as the value. Display the top 100 learning projects and corresponding views sorted in descending order by views, and alphabetically in the case of a tie. - For each of the above, use separate functions for each type of processing. Reuse functions where possible, such as in sorting and searching. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- A query string communicates key-value pairs to a web server in order to pull up a desired resource through a GET HTTP request.[1]
- By convention, the query string follows a question mark (?) separator; keys and values are then adjoined with an equal sign (=); multiple pairs are chained together using ampersands (&).[1]
- Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document.[1]
- For example, given the URL https://en.wikiversity.org?title=Applied_Programming/Internet_Data#Lesson_Summary, the substring before the question mark delimiter is the location of the Internet resource (which should point to a script). The substring after the question mark is the query string. Here there is only a single key-value pair, that being a title key and its value which references this very lesson summary.[1]
- Behind the scenes, Wikiversity has a server program that extracts this information from our query string, handling the desired action; in this case, we are simply redirected to a page, but much can be accomplished through such a request.[1]
- Before forms were added to HTML, browsers rendered the <isindex> element as a single-line text-input control.[1]
- The text entered into this control was sent to the server as a query string addition to a GET request for the base URL.[1]
- A markup language is a way to give semantic meaning to pure text.[2]
- Examples include typesetting instructions such as those found in troff, TeX and LaTeX, or structural markers such as XML tags. Markup instructs the software that displays the text to carry out appropriate actions, but is omitted from the version of the text that users see.[2]
- Some markup languages, such as the widely used HTML, have pre-defined presentation semantics—meaning that their specification prescribes how to present the structured data. Others, such as XML, do not have them and are general purpose.[2]
- Perhaps the most well-known markup language is HTML, powering the underlying technology of the web. If a phrase is marked by opening and closing
<i>...</i>
tags (known holistically as the<i>
element), the phrase is targeted for italicization.[3]- By itself, HTML does nothing. The use of tags, however, gives meaning and context to words, allowing the browser's rendering engine to interpret and construct the document according to HTML's standard.[3]
- Perhaps the most well-known markup language is HTML, powering the underlying technology of the web. If a phrase is marked by opening and closing
- HTML elements are the building blocks of HTML pages. With HTML constructs, images and other objects, such as interactive forms, may be embedded into the rendered page. It provides a means to create structured documents by denoting structural semantics for text such as:[3]
- HTML elements are delineated by tags, written using angle brackets.[3]
- HTML can embed programs written in a scripting language such as JavaScript which affect the behavior and content of web pages.[3]
- In addition to HTML, which is strict and defines particular elements, there exists XML, which is arbitrarily standardized by a user-written schema. If you mark a string with
<employee>...</employee>
tags, a parser will understand this information to be referring to an employee (assuming the schema were defined this way).[4]- In modern word-processing systems, presentational markup is often saved in descriptive-markup-oriented systems such as XML, and then processed procedurally by implementations.[2]
- In order for search-engine spiders to be able to rate the significance of pieces of text they find in HTML documents, and also for those creating mashups and other hybrids as well as for more automated agents as they are developed, the semantic structures that exist in HTML need to be widely and uniformly applied to bring out the meaning of published text.[3]
- In computing, Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.[4]
- The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages.[4]
- XML allows the use of any of the Unicode-defined encodings, and any other encodings whose characters also appear in Unicode. XML also provides a mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used.[4]
- The design goals of XML include, "It shall be easy to write programs which process XML documents.[4]
- The XML specification defines an XML document as a well-formed text, meaning that it satisfies a list of syntax rules provided in the specification. Some key points in the fairly lengthy list include:[4]
- The document contains only properly encoded legal Unicode characters.[4]
- None of the special syntax characters such as < and & appear except when performing their markup-delineation roles.[4]
- The start-tag, end-tag, and empty-element tag that delimit elements are correctly nested, with none missing and none overlapping.[4]
- Tag names are case-sensitive; the start-tag and end-tag must match exactly.[4]
- Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character, and cannot begin with "-", ".", or a numeric digit.[4]
- A single root element contains all the other elements.[4]
- A tree is a widely used abstract data type (ADT)—or data structure implementing this ADT—that simulates a hierarchical tree structure, with a root value and subtrees of children with a parent node, represented as a set of linked nodes.[5]
- A node is a structure which may contain a value or condition, or represent a separate data structure (which could be a tree of its own).[5]
- A tree data structure can be defined recursively (locally) as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.[5]
- JSON is a language-independent data format.[6]
- JSON was originally devised for JavaScript, but many languages now include their own standardized JSON libraries (including Python).[6]
- JSON files use the extension
.json
; the associated media or MIME type isapplication/json
.[6] - To some, JSON is a preferable alternative to XML when dealing with AJAX applications.[6]
- JSON grew out of a need for stateful, real-time server-to-browser communication protocol without using browser plugins such as Flash or Java applets, the dominant methods used in the early 2000s.[6]
- JSON's basic data types are as follows, number, string, boolean, array, object, and null.[6]
- Numbers in JSON are agnostic with regard to their representation within programming languages. No differentiation is made between an integer and floating-point value: some implementations may treat 42, 42.0, and 4.2E+1 as the same number while others may not.[6]
- JSON does not allow for comments whereas XML does.[6]
- In JSON limited whitespace is allowed and ignored around or between syntactic elements (values and punctuation, but not within a string value).[6]
- Only four specific characters are considered whitespace for this purpose: space, horizontal tab, line feed, and carriage return.[6]
- JSON Schema:[6]
- specifies a JSON-based format to define the structure of JSON data for validation, documentation, and interaction control.[6]
- It provides a contract for the JSON data required by a given application, and how that data can be modified.[6]
- JSON Schema is based on the concepts from XML Schema (XSD), but is JSON-based.[6]
- JSON vs. XML[7][8]
- JSON - quicker and requires less memory, simpler to parse, smaller in size/lightweight, more human readable, doesn't support commenting, better for data transfer, multiple data types, less secure, lacks namespace support[7][8]
- XML - slower to parse, more verbose, handles metadata better in tag attributes, in use for longer time/more support in some language libraries, better for formatting, lacks intrinsic data type support, more secure, has namespace support[7][8]
- REST or representation state transfer is a style by which resources abide. If a resource satisfies every architectural restraint imposed, the resource is said to be RESTful.[9]
- Often, public services provide a REST API, as does Wikimedia at Wikimedia REST API.[9]
- REST allows clients and servers to exchange and make use of information on the Internet.[9]
- RESTful services typically permit you to access or manipulate web resources through a predefined set of stateless operations.[9]
- The constraints of the REST architectural style affect the following architectural properties:[9]
- Performance – component interactions can be the dominant factor in user-perceived performance and network efficiency.[9]
- Scalability to support large numbers of components and interactions among components.[9]
- Simplicity of a uniform Interface.[9]
- Modifiability of components to meet changing needs (even while the application is running).[9]
- Visibility of communication between components by service agents.[9]
- Portability of components by moving program code with the data.[9]
- Reliability is the resistance to failure at the system level in the presence of failures within components, connectors, or data.[9]
- This lesson requires you to find the current date to calculate the previous dates. In Python, you will need to use the proper datetime syntax: JournalDev: Python Datetime Syntax.[10]
- Python’s JSON library has multiple methods for working with Internet data:
Key Terms
edit- application programming interface (API)
- When used in the context of web development, an API is typically defined a set of specifications, such as Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format.[12]
- attribute
- An attribute is a markup construct consisting of a name–value pair that exists within a start-tag or empty-element tag. An example is <img src="madonna.jpg" alt="Madonna" />, where the names of the attributes are "src" and "alt", and their values are "madonna.jpg" and "Madonna" respectively.[4]
- document type declaration / DTD
- HTML documents are required to start with a Document Type Declaration (informally, a "doctype"). The DTD to which the DOCTYPE refers contains a machine-readable grammar specifying the permitted and prohibited content for a document.[3]
- element
- An element is a logical document component that either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start-tag and end-tag, if any, are the element's content, and may contain markup, including other elements, which are called child elements.[4]
- escape character
- Due to the nature of XML’s syntax, there are some characters that pose problems. For example, “ < ” is reserved, and “ < “ would have to be used in its place.[4]
- HTML
- Hypertext Markup Language, a regimented standard that describes the structure and content of a document, facilitating its rendering by a web browser.[3]
- JSON
- JavaScript Object Notation, a format that houses data so they can be exchanged fluidly.[6]
- markup language
- A means of tagging or marking up textual content that is to be interpreted a certain way.[2]
- MIME Type
- A media type (also known as a Multipurpose Internet Mail Extensions or MIME type) is a standard that indicates the nature and format of a document, file, or assortment of bytes.[13]
- node
- A structure which may contain a value or a condition or represent an entirely separate data structure.[5]
- path
- A sequence of nodes and edges connecting a node with a descendant.[5]
- query string
- A variadic list of key-value pairs appended to a URL and sent to a web server for processing.[1]
- REST
- Representational State Transfer, a design philosophy or architectural style that defines a set of limitations for accessing resources in order to improve reliability and scalability.[14]
- serializable
- is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later (possibly in a different computer environment).[15]
- tag
- A tag is a markup construct that begins with < and ends with >. Tags come in three flavors:[4]
- start-tag, such as
<section>
- end-tag, such as
</section>
- empty-element tag, such as
<line-break />
- start-tag, such as
- tree
- A hierarchical data structure comprised of nodes with a single element, known as the root, on the highest or topmost layer. Both HTML and XML documents are best represented as trees.[5]
- web browser
- receive HTML documents from a web server or from local storage and render the documents into multimedia web pages[3]
- XML
- Extensible Markup Language, a language that generalizes the marking up of documents, so users can define their own semantics.[4]
See Also
edit- Internet Fundamentals
- API:Categorymembers for XML data
- Python: Using XML.etree
- Python: Using JSON Library
- W3 Schools - XML Tutorial
- W3 Schools - JSON Introduction
- - What is JSON?
References
edit- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 Wikipedia: Query string
- ↑ 2.0 2.1 2.2 2.3 2.4 Wikipedia: Markup language
- ↑ 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 3.12 3.13 Wikipedia: HTML
- ↑ 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 4.14 4.15 4.16 Wikipedia: XML
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 Wikipedia: Tree (data structure)
- ↑ 6.00 6.01 6.02 6.03 6.04 6.05 6.06 6.07 6.08 6.09 6.10 6.11 6.12 6.13 6.14 Wikipedia: JSON
- ↑ 7.0 7.1 7.2 [2]
- ↑ 8.0 8.1 8.2 [3]
- ↑ 9.00 9.01 9.02 9.03 9.04 9.05 9.06 9.07 9.08 9.09 9.10 9.11 Wikipedia: Representational state transfer
- ↑ JournalDev
- ↑ 11.0 11.1 11.2 11.3 YouTubeːJSON in Python
- ↑ "Application programming interface". Wikipedia. 2018-07-11. https://en.wikipedia.org/w/index.php?title=Application_programming_interface&oldid=849739408.
- ↑ Wikipedia: Media type
- ↑ Wikipedia: Representational state transfer
- ↑ Wikipedia: Serialization
Lesson 11 - Databases
editThis lesson introduces database access through programming.
Objectives and Skills
editObjectives and skills for this lesson include:
- Create, read, and search for elements in databases
- Utilize Structured Query Language(SQL) for database management
- Format header and table columns for database entities
Readings
editMultimedia
edit- YouTube: Database Basics
- YouTube: Update & Delete
- YouTube: SQL Commands
- YouTube: Python SQLite Basics
- YouTube: Python SQLite Tutorial
- YouTube: DB Browser for SQLite Tutorial
- YouTube: Using the Cursor in SQLite
- YouTube: Using variables in SQLite.
- YouTube: Node JS SQlite tutorial - How to create a database, table, and insert data
Examples
editActivities
editTutorials
edit- Complete one or more of the following tutorials:
Create a Database
edit- Download and install SQLiteBrowser: DB Browser for SQLite. Run the program and create a new database named Northwind.
- Copy the SQL statements from Northwind SQL. Use Execute SQL in the DB Browser for SQLite to run the script and create the Northwind tables data.
Read a Database
edit- Create a program to list the tables in the Northwind database. The following query may be used to select table names:
SELECT name FROM sqlite_master WHERE type='table';
- Use a list to save table names. Provide an interface that allows the user to select one of the listed tables and then display all records in the table.
- Use the
cursor.description
attribute and a list or dictionary to save field names for the table. Determine the maximum length of each field and display the data in appropriately sized columns. Include column headings and row numbers. - For each of the above, use separate functions for each type of processing. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Modify a Database
edit- Create a program to list the tables in the Northwind database. Use a list to save table names. Provide an interface that allows the user to select one of the listed tables and then display all records in the table with column names and row numbers.
- Allow users to choose (I)nsert, (U)pdate, or (D)elete. If they choose Insert, ask for field values and insert the new record. If they choose Update, provide an interface that allows the user to select a row and field to update and then update the record. If they choose Delete, allow the user to select the record and then delete the record.
- For each of the above, use separate functions for each type of processing. Avoid using global variables by passing parameters and returning results. Include appropriate data validation and parameter validation. Add program and function documentation, consistent with the documentation standards for your selected programming language.
Lesson Summary
edit- A database on the highest level is simply an organized collection of data.[1]
- Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:[1]
- Data definition – Creation, modification and removal of definitions that define the organization of the data. [1]
- Update – Insertion, modification, and deletion of the actual data.[1]
- Retrieval – Providing information in a form directly usable or for further processing by other applications. The retrieved data may be made available in a form basically the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database.[1]
- Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.[1]
- A database management system provides three views of the database data:[1]
- The external level defines how each group of end-users sees the organization of data in the database. A single database can have any number of views at the external level.[1]
- The conceptual level unifies the various external views into a compatible global view. It provides the synthesis of all the external views. It is out of the scope of the various database end-users, and is rather of interest to database application developers and database administrators.[1]
- The internal level (or physical level) is the internal organization of data inside a DBMS. It is concerned with cost, performance, scalability and other operational matters. It deals with storage layout of the data, using storage structures such as indexes to enhance performance. Occasionally it stores data of individual views (materialized views), computed from generic data, if performance justification exists for such redundancy. It balances all the external views' performance requirements, possibly conflicting, in an attempt to optimize overall performance across all activities.[1]
- Database languages are special-purpose languages, which allows one or more of the following tasks, sometimes distinguished as sublanguages:[1]
- Data control language (DCL) – controls access to data;[1]
- Data definition language (DDL) – defines data types such as creating, altering, or dropping and the relationships among them;[1]
- Data manipulation language (DML) – performs tasks such as inserting, updating, or deleting data occurrences;[1]
- Data query language (DQL) – allows searching for information and computing derived information.[1]
- The DBMS provides various functions that allow entry, storage, and retrieval of large quantities of information and provides ways to manage how that information is organized.[1]
- In the relational model, data gets organized into tuples or records (the aggregate of which forms a relation or table) and is further subdivided into columns or attributes which describe individual records.[2]
- The relation can be thought of as some entity. The row is an instance of that entity, and its columns describe features attributed to that particular instance.[2]
- For example, let's consider a 'Person' entity. An instance of that entity is a realized person like 'Joe'. Joe has common features shared by everybody (skin and hair), but specific characterizations that may or may not differ in manifestation from other people (black skin, black hair), fleshing out his unique persona. Each of these idiosyncrasies or traits is a column value in and of itself.[2]
- The purpose of the relational model is to provide a declarative method for specifying data and queries.[2]
- The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by constraints, declared as part of the logical schema and enforced by the DBMS for all applications.[1]
- Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10.[3]
- Constraints provide one method of implementing business rules in the database and support subsequent data use within the application layer.[3]
- A relation consists of a heading and a body.[2]
- SQL:
- The most popular database model is the relational model which is often associated with SQL, a query language.[1]
- Most relational databases use the SQL data definition and query language; these systems implement what can be regarded as an engineering approximation to the relational model.[2]
- A table in an SQL database schema corresponds to a predicate variable; the contents of a table to a relation; key constraints, other constraints, and SQL queries correspond to predicates.[2]
- SQL is a domain-specific (limited and narrowly specialized in use) query language for interfacing with databases.[4]
- With SQL you can retrieve, update, insert, and delete information using standardized commands.[4]
- SQL offers two main advantages:[4]
- The SQL language is subdivided into several language elements, including:[4]
- Clauses, which are constituent components of statements and queries. (In some cases, these are optional.)[4]
- Expressions, which can produce either scalar values, or tables consisting of columns and rows of data[4]
- Predicates, which specify conditions that can be evaluated to SQL three-valued logic (3VL) (true/false/unknown) or Boolean truth values and are used to limit the effects of statements and queries, or to change program flow.[4]
- Queries, which retrieve the data based on specific criteria. This is an important element of SQL.[4]
- Statements, which may have a persistent effect on schemata and data, or may control transactions, program flow, connections, sessions, or diagnostics.[4]
- SQL's controversial "null" value is neither true nor false (predicates with terms that return a null value return null rather than true or false). Features such as outer-join depend on null values.[4]
- SQLite:
- SQLite stores the entire database (definitions, tables, indices, and the data itself) as a single cross-platform file on a host machine.[5]
- It implements this simple design by locking the entire database file during writing. SQLite read operations can be multitasked, though writes can only be performed sequentially.[5]
- The design goals of SQLite were to allow the program to be operated without installing a database management system or requiring a database administrator.[5]
- SQLite uses an unusual type system for an SQL-compatible DBMS; instead of assigning a type to a column as in most SQL database systems, types are assigned to individual values; in language terms it is dynamically typed. [5]
- Unlike client–server database management systems, the SQLite engine has no standalone processes with which the application program communicates. Instead, the SQLite library is linked in and thus becomes an integral part of the application program, making it a popular choice among widespread browsers, operating systems, and mobile phones. The library can also be called dynamically. [5]
- SQLite is included by default in Windows 10, Google's verison of Android, FreeBSD, and Samsung's Tizen.[5]
- SQLite has bindings to many programming languages such as Python, Java, JavaScript, C++, C#, Objective-C, Ruby, Swift as well as a variety of other languages.[5]
- Unlike some other RBDMSs, SQLite only allows one concurrent user for writes, so it is better for small to medium size projects.
- In SQLite, parameterized queries are used to improve performance and increase security by preventing injection attacks.[6]
- SQLite stores the entire database (definitions, tables, indices, and the data itself) as a single cross-platform file on a host machine.[5]
Key Terms
edit- ACID
- Atomicity, consistency, isolation, durability, four properties expected from typical database interactions.[1]
- atomicity
- A component property of ACID transactions; a database operation will never be executed partway or halfheartedly—either the whole operation gets performed or it's cancelled and nothing occurs.[8]
- attribute, column, property, field
- A particular feature which describes and is characteristic of a record.[2]
- body
- A set of n-tuples.[2]
- COMMIT
- A COMMIT statement in SQL ends a transaction within a relational database management system (RDBMS) and makes all changes visible to other users.[9]
- connection
- This read-only attribute provides the SQLite database Connection used by the Cursor object. A Cursor object created by calling con.cursor() will have a connection attribute that refers to con.
- consistency
- A component property of ACID transactions; data will always be held to standards established by the schema.[10]
- cursor
- A database cursor is an object that enables traversal over the rows of a result set. It allows you to process individual row returned by a query.
- database
- An organized collection of data.[2]
- Database Management System / DBMS
- There are multiple types of DBMS, (Relational, Object Oriented, etc), but each variant allows for reading, inserting, updating, deleting of a database along with general administrative/security tools.[1]
- DELETE
- Removes one or more records from a table.[11]
- durability
- A component property of ACID transactions; once a commit has been submitted and is reported, it's guaranteed to be reflected in the database—even in the case of an immediate system failure.[12]
- foreign key
- A set of one or more columns in a table which provide a link or reference to the primary key in another table maintaining a relationship between the tables.
- heading
- A set of attributes.[2]
- INSERT
- adds one or more records to a single table.[13]
- isolation
- A component property of ACID transactions; concurrent requests for data should only be permitted once in-progress transactions working on that data have been completed in their entirety.[14]
- PRAGMA
- In SQLite the PRAGMA command is a special command to be used to control various environmental variables and state flags within the SQLite environment. [15]
- primary key
- A specific choice of one or more columns (attributes) whose data uniquely identifies each row (tuple) in a table (relation).[16]
- query
- Retrieves/creates data based on specific criteria. Queries must follow SQL rules and syntax.[4]
- Relational model
- Is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd,[1][2] where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.[2]
- Retrieval
- Providing information in a form directly usable or for further processing by other applications.[1]
- RDBMS
- Relational database management system, a program or application that maintains a relational database.[1]
- record, row, tuple
- An instance of an entity in a database table.[2]
- relation, table
- A series of records gathered together under an entity type.[2]
- schema
- A set of integrity constraints that structures and establishes the rules for data in a database.[17]
- SELECT
- A common DQL command used to return a result set of records, either from a single or from multiple tables.[18]
- SQL
- Structured Query Language, a programmatic standard for querying data from a relational database.[4]
- SQLite
- A relational database management system (abbreviated RDBMS) embedded directly into a program.[5]
- transaction
- A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When the system processes a COMMIT statement, the transaction ends with successful completion.[19]
- UPDATE
- Changes specified data of one or more records in a table.[20]
- view
- Is the set of rows from a database of a stored query on the data, which the database users can query just as they would in a persistent database collection object.[5]
See Also
edit- Databases
- Internet Fundamentals/Databases
- Codecademy SQL Tutorial
- SQLite3 Python Documentation
- Sorting with Python
- Wiki: Relational Database
- SQL Fiddle
- SQLite Tutorial - Delete
- SQLite Cheat Sheet
- Pragma
References
edit- ↑ 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 Wikipedia: Database
- ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 2.13 2.14 2.15 2.16 Wikipedia: Relational model
- ↑ 3.0 3.1 "Relational database". Wikipedia. 2018-07-09. https://en.wikipedia.org/w/index.php?title=Relational_database&oldid=849448905.
- ↑ 4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07 4.08 4.09 4.10 4.11 4.12 4.13 Wikipedia: SQL
- ↑ 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 Wikipedia: SQLite
- ↑ 6.0 6.1 ZetCode: SQLite Python tutorial
- ↑ Python: sqlite3
- ↑ Wikipedia: Atomicity (database systems)
- ↑ "Commit (data management)". Wikipedia. 2018-05-04. https://en.wikipedia.org/w/index.php?title=Commit_(data_management)&oldid=839669111.
- ↑ Wikipedia: Consistency (database systems)
- ↑ Wikipedia: Delete (SQL)
- ↑ Wikipedia: Durability (database systems)
- ↑ Wikipedia: Insert (SQL)
- ↑ Wikipedia: Isolation (database systems)
- ↑ [4]
- ↑ Wikipedia: Primary key
- ↑ Wikipedia: Database schema
- ↑ Wikipedia: Select (SQL)
- ↑ "Database transaction". Wikipedia. 2018-07-08. https://en.wikipedia.org/w/index.php?title=Database_transaction&oldid=849319681.
- ↑ Wikipedia: Update (SQL)
Lesson 12 - Modules and Classes
editThis lesson introduces modules and classes.
Objectives and Skills
editObjectives and skills for this lesson include:
- Creating classes and modules
- Managing git branches
- building familiarity with Object-oriented Programming(OOP)
Readings
editMultimedia
edit- YouTube: Python Classes
- YouTube: Classes and Instances
- YouTube: Using Methods with Classes
- YouTube: Understanding OOP vs. Functional Programming
- YouTube: Classes in JavaScript
- YouTube: Python getter and setter Methods
- YouTube: Method Types in Python OOP: @classmethod, @staticmethod, and Instance Methods
- YouTube: Classes and Objects - Basics
- YouTube: Classes and Objects
Examples
editActivities
edit- Review Gordon.edu: ATM Example Class Diagram. Depending on time available, develop one or more class modules for an ATM based on the diagram provided. If applicable, work together with your classmates and have each person implement a separate module and class.
- For each class module, include appropriate data validation and parameter validation. Add module and method documentation, consistent with the documentation standards for your selected programming language.
- If you are working as a class, read about git to learn more about team-based programming contribution management and a commonly used version control system.
Lesson Summary
edit- Modular programming is a software design technique that emphasizes separating the functionality of a program into reusable, independent modules.[1]
- This concept is related to structured programming and object-oriented programming; all have the same end goal of deconstructing a comprehensive program into smaller pieces.[1]
- Modular programming refers to high-level decomposition of the code of an entire program into pieces, while structured programming to the low-level code use of structured control flow, and object-oriented programming to the data use of objects, a kind of data structure.[1]
- Languages that formally support the module concept include Ada, Algol, BlitzMax, C#, Clojure, COBOL, D, Dart, eC, Erlang, Elixir, F, F#, Fortran, Go, Haskell, IBM/360 Assembler, IBM i Control Language (CL), IBM RPG, Java, MATLAB, ML, Modula, Modula-2, Modula-3, Morpho, NEWP, Oberon, Oberon-2, Objective-C, OCaml, several derivatives of Pascal (Component Pascal, Object Pascal, Turbo Pascal, UCSD Pascal), Perl, PL/I, PureBasic, Python, Ruby, Rust, JavaScript, Visual Basic .NET and WebDNA.[1]
- Modular programming can be performed even where the programming language lacks explicit syntactic features to support named modules, like, for example, in C. This is done by using existing language features, together with, for example, coding conventions, programming idioms and the physical code structure.[1]
- With modular programming, concerns are separated such that modules perform logically discrete functions, interacting through well-defined interfaces.[1]
- When creating a modular system, instead of creating a monolithic application (where the smallest component is the whole), several smaller modules are written separately so that, when composed together, they construct the executable application program.[1]
- This makes modular designed systems, if built correctly, far more reusable than a traditional monolithic design, since all (or many) of these modules may then be reused in other projects. This also facilitates the "breaking down" of projects into several smaller projects.[1]
- Object-oriented programming is a language model based on the concept of objects. This approach to programming is an evolution of design practices that enabled the reuse of software.[2]
- The most popular object-oriented languages are class-based, meaning objects are instances of particular classes. The most popular OOP languages include Python, Java, JavaScript, C++, C#, Ruby, PHP, Object Pascal, Objective-C, and Swift.[2]
- Objects are accessed somewhat like variables with complex internal structure, and in many languages are effectively pointers, serving as actual references to a single instance of said object in memory within a heap or stack.[2]
- Languages called "pure" OO languages, because everything in them is treated consistently as an object, from primitives such as characters and punctuation, all the way up to whole classes, prototypes, blocks, modules, etc.[2]
- They were designed specifically to facilitate, even enforce, OO methods. Examples: Python, Ruby, Scala, Smalltalk, Eiffel, Emerald, JADE, Self.[2]
- Objects can contain other objects in their instance variables; this is known as object composition.[2]
- For example, an object in the Employee class might contain (either directly or through a pointer) an object in the Address class, in addition to its own instance variables like "first_name" and "position".[2]
- Object-oriented programming entails the use of classes, a template for object instantiation. An object is a user-created data type that carries its own attributes (fields) and behaviors (methods).[2]
- Languages that support classes almost always support inheritance. This allows classes to be arranged in a hierarchy that represents "is-a-type-of" relationships. For example, class Employee might inherit from class Person. All the data and methods available to the parent class also appear in the child class with the same names.[2]
- Both object-oriented programming and relational database management systems (RDBMSs) are extremely common in software today. Since relational databases don't store objects directly (though some RDBMSs have object-oriented features to approximate this), there is a general need to bridge the two worlds.[2]
- OOP can be used to associate real-world objects and processes with digital counterparts. However, not everyone agrees that OOP facilitates direct real-world mapping or that real-world mapping is even a worthy goal, that a program is not a model of the world but a model of some part of the world.[2]
- In languages that support open recursion, object methods can call other methods on the same object, including themselves, typically using a special variable or keyword called this or self.[2]
- In the design of a system, many classes are grouped together in a class diagram that helps to determine the static relations between them. With detailed modelling, the classes of the conceptual design are often split into a number of subclasses.[3]
- classes diagrams are represented with three compartments:[3]
The UML specifies two types of scope for members: instance and classifier, and the latter is represented by underlined names.[3]
- Classifier members are commonly recognized as “static” in many programming languages. The scope is the class itself.[3]
- To indicate a classifier scope for a member, its name must be underlined. Otherwise, instance scope is assumed by default.[3]
- Instance-level relationships consist of:[3]
- Class-level relationships[3]
Key Terms
edit- class
- A blueprint for constructing an object—a set of variables and methods that defines an entity.[2]
- class diagram
- A graphical representation of the structure of an object-oriented system that displays their attributes and relationships.[3]
- class methods
- belong to the class as a whole and have access only to class variables and inputs from the procedure call.[2]
- class variable
- A variable that belongs to an entire class; there is only one such variable shared between all objects.[2]
- encapsulation
- The act of hiding implementation details, either to protect internal data or for the purpose of abstraction.[2]
- instance methods
- belong to individual objects, and have access to instance variables for the specific object they are called on, inputs, and class variables.[2]
- instance variable
- A variable that is unique and belongs to each instance of a class.[2]
- library
- When a program invokes a library, it gains the behavior implemented inside that library without having to implement that behavior itself. Libraries encourage the sharing of code in a modular fashion, and ease the distribution of the code.[4]
- me, self, this
- A keyword that refers to the current object of focus.[2]
- member variable
- A variable that is either a class or instance variable.[2]
- method
- A function that is defined inside a class.[2]
- object
- A particular instance of a class.[2]
- object composition
- is that objects can contain other objects in their instance variables.[2]
- pointer
- A pointer references a location in memory. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page.[5]
- private
- An access modifier which limits external access.[6]
- property
- An intermediary between a variable and a method, providing the functionality of both.[2]
- public
- An access modifier which opens up external access.[6]
- Procedural programming
- is a programming paradigm, derived from structured programming[citation needed], based upon the concept of the procedure call. Procedures, also known as routines, subroutines, or functions, simply contain a series of computational steps to be carried out.[7]
- static
- A keyword used in some languages to designate a variable or method as shared between objects.[2]
- state diagram
- is a type of diagram used in computer science and related fields to describe the behavior of systems.[8]
- UML
- (Unified Modeling Language) is a picture of an object oriented system.[9]
- Variables
- Store information formatted in a small number of built-in data types like integers and alphanumeric characters.[2]
See Also
edit- Object Oriented Programming
- Stéphane Ducasse - Free Online Books: Designing Object Systems
- Classes in Python Docs
- Differences Between OOP and Functional Programming
- Differences Between OOP and Procedural Programming Languages
References
edit- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Wikipedia: Modular programming
- ↑ 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 Wikipedia: Object-oriented programming
- ↑ 3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07 3.08 3.09 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 Wikipedia: Class diagram
- ↑ "Library (computing)". Wikipedia. 2018-06-05. https://en.wikipedia.org/w/index.php?title=Library_(computing)&oldid=844509652.
- ↑ "Pointer (computer programming)". Wikipedia. 2018-07-07. https://en.wikipedia.org/w/index.php?title=Pointer_(computer_programming)&oldid=849208568.
- ↑ 6.0 6.1 Wikipedia: Access modifiers
- ↑ Wikipedia: Procedural programming
- ↑ Wikipedia: State diagram
- ↑ Wikipedia: Unified Modeling Language