Safe Python: Secure coding techniques

This article discusses an important matter every cool ~~hacker~~ programmer should care about: secure code. Perhaps, you think it’s boring and difficult? Not at all! Today I will share with you some of my experience and show how to write Python code you can be proud of.

Limit the scope of variables and functions

The scope of a variable is the context where this variable is defined and accessible. If a variable is accessible throughout the entire program, it’s called global. If a variable is only accessible within a function or method, it’s called local.

If you use a global variable:

secret = "my super secret data"
def print_secret():
    # Using global variable
    print(secret)
print_secret()

This approach can be dangerous since global variables are accessible throughout the entire program and can be easily changed. And what if it’s an important variable that must never be changed no matter what the circumstances? An attacker can take advantage of this and damage your program.

Therefore, it’s preferable to use local variables:

def print_secret():
        # Declaring a local variable
    secret = "my super secret data"
    print(secret)
print_secret()

Now the secret variable is only accessible inside the print_secret() function. This approach makes your code not only more secure, but also easier to read, debug, and maintain.

Split you code into modules

Instead of a huge piece of source code containing descriptions of all objects and functions, you can create several modules so that each of them performs its own task. This is a better approach since such modules can be easily used in other projects.

But does modularity make your code safer? The point is that the smaller the pieces are, the easier it is to look for errors in them, and the lower is the chance to accidentally break something when you make changes. Well-organized code is easy to modify, and if it’s split into isolated sections, then changes in one section won’t affect the others.

Split your code not only into packages that can be imported, but also into functions and objects.

Below is an example of bad code:

def do_something():
    # Doing multiple tasks here
    # ...
    # Doing something else here
    # ...
    # And something here
    # ...

This code contains one huge function that does plenty of different things. This is bad because if you find a vulnerability in one of these things, changes could affect other parts of this function. The larger it is, the more difficult it is to predict result of your edits.

Below is a good example:

def do_something_1():
    # Doing something here
    # ...
def do_something_2():
    # Doing something here
    # ...
def do_something_3():
    # Doing something here
    # ...

A large function has been split into several smaller ones, and each of them does its own job. This is much safer: if you find a vulnerability in one of these functions, you can fix it without affecting other functions. It also makes your code easier to read and maintain because each function performs only one task.

Another good way to isolate code and make it reusable involves Python classes and objects. Classes make it possible to group related functions and data together, thus, making your code more manageable and secure.

Below is a good example of code with classes:

class MyAwesomeClass:
    def __init__(self, some_data):
        self.some_data = some_data
    def do_something_1(self):
        # Doing something with some_data here
        # ...
    def do_something_2(self):
        # Doing something else with some_data here
        # ...

In this example, you create a class called MyAwesomeClass that contains two methods: do_something_1 and do_something_2. Each of these methods deals with data that you pass when you create a class object. This enables you to control how these data are used and processed. And security immediately goes up!

The main conclusion is as follows: the simpler, clearer, and easy-to-maintain your code is, the more secure it is.

Protect your code against code injections

What are these injections about? Imagine that some evil user feeds to your application not the requested data, but executable code – and, for some reason, your application takes and executes this code. Furthermore, this malicious input can be not Python code, but queries to an SQL database or OS commands… Sounds scary, isn’t it? Let’s find out why such ~~shit~~ things happen sometimes.

Bad example

Look at the piece of code below. What’s wrong in it?

def get_user(name):
    query = "SELECT * FROM users WHERE name = '" + name + "'"
    return execute_query(query)

You simply take a username and insert it into an SQL query. And what if the user inputs something like 'John'; DROP TABLE users;--? Congrats, you have just lost all your users! This example is considered a classic one.

The secure version of this code is as follows:

def get_user(name):
    query = "SELECT * FROM users WHERE name = ?"
    return execute_query(query, (name,))

Here you use a parameterized query (i.e. pass the username separately), and your database will 100% escape it. In other words, even if the user attempts to enter SQL code, it will be treated as a string, and nothing bad will happen.

And this is just the beginning. Remember: user data should be trusted as much as you trust a stranger who, as of a sudden, offers you sweet candy.

Use secure serialization and deserialization methods

What do these scary words – “serialization” and “deserialization” – mean? Do they cause derealization? Don’t be afraid! Serialization is a procedure that converts various structures (e.g. lists and dictionaries) into a string that can be easily stored on disk or transmitted over the network. Deserialization is the reverse procedure that converts a sequence of characters into a structure.

And this process involves a whole class of vulnerabilities. If you convert strings into structures carelessly, then an attacker can manipulate the data and seize control of your program.

Below is an example of dangerous code:

import pickle
# Never do this!
def unsafe_deserialization(serialized_data):
    return pickle.loads(serialized_data)

In this example, the pickle module is used to deserialize data. This is convenient, but pickle doesn’t ensure security. If an attacker tampers with serialized data, arbitrary code can be executed on your computer.

A good example:

import json
# Much better!
def safe_deserialization(serialized_data):
    return json.loads(serialized_data)

In this example, the json module is used for deserialization. It’s safer since it doesn’t allow arbitrary code to be executed. Always keep the risks in mind and choose safe methods!

info

Vulnerabilities originating from deserialization errors are discovered in commercial products on a regular basis. For instance, such a vulnerability was found in GoAnywhere MFT in 2023. It enabled remote attackers to execute code without authentication.

Use the principle of least privilege

This principle states: assign to a program only those privileges that it actually needs to perform its task.

This is extremely important from the security perspective: if somebody hacks your program, the attacker will gain the same privileges as the program has. If its privileges are limited, the attacker’s malicious capacity will be limited as well.

Imagine that you have a function that writes data to a file:

def write_to_file(file_path, data):
    with open(file_path, 'w') as f:
        f.write(data)

This function doesn’t need any privileges besides the ability to write data to a specific file. But if you run this function with admin privileges, an attacker who hacks this function will be able to do all and everything in the system.

Always assign to your functions and programs only the privileges they actually need and nothing more.

Avoid authentication- and authorization-related vulnerabilities

Secure user authentication is a broad topic with zillions of pitfalls. However, some of them can be easily avoided.

Secure password storage

Let’s start with what is absolutely unacceptable. Never (ever!) store passwords in clear text. For instance, like this:

users = {
    "alice": "password123",
    "bob": "qwerty321"
}

If these data leak somewhere (which is always possible), then all passwords of your users would be disclosed.

But how to do this in a right way? Use password hashing! Hashing is a procedure that generates a unique string of a fixed length from a password. Each hash is unique; so, even a slight change in the original password will completely change its hash.

In Python, you can use the hashlib module for hashing. The example below shows how it works:

import hashlib
password = "password123"
hashed_password = hashlib.sha256(password.encode()).hexdigest()
print(hashed_password)

Now, even if the database leaks, hackers will only see password hashes, not the original passwords.

Add salt to passwords

Simple hashing is also far from perfection. Hackers can use so-called rainbow tables to guess passwords. To complicate their task, use salt: a random string added to the password prior to hashing. This way, each password will have a unique hash, even if two users have the same password.

import hashlib
import os
password = "password123"
salt = os.urandom(16)  # Generating salt
salted_password = hashlib.pbkdf2_hmac('sha256', password.encode(), salt, 100000)
print(salted_password)

In the above example, the pbkdf2_hmac function from the hashlib module is used: it enables you to apply a salt to a password. Salt is generated using the os.urandom function, and then it’s used together with the password and number of iterations to create a hashed password.

In other words, even if two users have the same password, their hashes will be different because of different salts. This makes password guessing much more difficult.

However, now you have to store salt somewhere. Usually, salt and hash are stored together, for instance:

import hashlib
import os
password = "password123"
# Generating salt
salt = os.urandom(16)
salted_password = hashlib.pbkdf2_hmac('sha256', password.encode(), salt, 100000)
# Storing salt together with hash
stored_password = salt + salted_password

Now you are aware of basic tools that ensure secure password storage. Let’s proceed to input validation.

Validate input data thoroughly

Input validation is a cornerstone of secure programming. Yes, at first it seems to be extra work. But trust me, it will save you from many troubles in the long run.

When your program receives data from some external source, it’s imperative to ensure that the input matches the expected format and doesn’t contain malicious code. Remember: not all people are bona fide users. Some of them may try to hack your system. Therefore, be careful, especially if you use the old str.format() formatting method.

Below is an example of vulnerable code with str.format() (source: security.stackexchange.com):

from http.server import HTTPServer, BaseHTTPRequestHandler
secret = 'abc123'
class Handler(BaseHTTPRequestHandler):
    name = 'xakep'
    msg = 'welcome to {site.name}'
    def do_GET(self):
        res = ('<title>' + self.path + '</title>\n' + self.msg).format(site=self)
        self.send_response(200)
        self.send_header('content-type', 'text/html')
        self.end_headers()
        self.wfile.write(res.encode())
HTTPServer(('localhost', 8888), Handler).serve_forever()

This code starts a simple web server that processes a GET request, inserts the site name into the welcome message, and returns respective HTML. The site name is taken from the name attribute of the Handler handler.

The key vulnerability of this code is that it uses the self.path value (i.e. part of the URL submitted by the user) as part of the format string in the res line. This enables an attacker to manipulate the format string, which can result in unwanted behavior.

Exploitation looks as follows:

$ python3 example.py

$ curl ‘http://localhost:8888/test’

/test
welcome to xakep

But the attacker can access global variables as well:

$ curl -g 'http://localhost:8888/XXX{site.do_GET.__globals__[secret]}'
<title>/XXXabc123</title>
welcome to xakep

In this case, {site.do_GET.globals[secret]} is used to read the secret global variable whose value is abc123. When the server processes this request, it inserts the value abc123 into the page header.

This happens because self.path is controlled by the user, and you can use it to change the formatted string.

So, what can you do? Of course, use f-strings! Not only are they secure, but also work faster, and the code looks much better.

In the above example, the following safe formatting should be used:

res = f"<title>{self.path}\n{self.msg}"

Sometimes you have to filter user input for certain characters. For instance, you may need to get the username, and it should contain only letters:

def say_hello(name):
    if not isinstance(name, str) or not name.isalpha():
        raise ValueError("The name must be a string and contain only letters")
    print(f"Hello {name}!")
try:
    user_input = input("Enter your name: ")
    say_hello(user_input)
except ValueError as e:
    print(f"Error: {e}")

In this example, you check whether the entered name is a string and whether it contains only letters. If the entered data don’t meet these requirements, an exception is thrown. As a result, you avoid potentially dangerous situations associated with incorrect input.

Input validation is important, but input sanitization is just as important, too. If you develop web apps, this operation becomes absolutely mandatory since it allows to avoid a wide range of attacks, including SQL injections and cross-site scripting (XSS).

In Python, the escape function and the Bleach library are used for this purpose.

The html module from the standard Python library includes the escape function. It converts special characters (e.g. <, >, &, and quotes) into their HTML equivalents. This allows user input to be safely displayed on web pages without the risk of executing malicious code.

The example below shows how the escape function can be used :

from html import escape
user_input = "<script>malicious_code();</script>"
safe_input = escape(user_input)
print(safe_input)

Result:

<script>malicious_code();</script>

Bleach is a third-party library that offers a broader set of tools for HTML and text sanitizing and cleaning. Bleach can remove unwanted or potentially malicious tags and attributes from HTML.

Bleach application example:

import bleach
user_input = "<script>malicious_code();</script>"
safe_input = bleach.clean(user_input)
print(safe_input)  # Result: <script>malicious_code();</script>

By default, bleach.clean() removes all HTML tags. If you want to allow certain safe tags, you can pass them with the tags parameter:

safe_input = bleach.clean(user_input, tags=['b', 'i', 'u'])

In this example, only the <b>, <i>, and <u> tags are allowed; while all others will be removed.

Pay due attention to session management

Are you developing a web app? ~~Good luck to you then…~~ Seriously though, let’s discuss sessions. A session is a way to save data between user requests. When a user logs in, you create a session that continues until this user logs out (or the session times out).

Session management is a serious matter, and you can encounter multiple vulnerabilities there, including session hijacking and session cookie hijacking. Therefore, proper session management is critical.

Let’s briefly examine the basic principles.

Use secure cookies

The cookie mechanism is often used to store session identifiers. In such cases, you have to set the Secure and HttpOnly flags for your cookies. Secure means that cookies will only be transmitted over HTTPS; while HttpOnly prohibits access to cookies via JavaScript, thus, preventing cross-site scripting (XSS) attacks.

from flask import session, Flask
app = Flask(__name__)
app.config.update(
    SESSION_COOKIE_SECURE=True,
    SESSION_COOKIE_HTTPONLY=True,
    SESSION_COOKIE_SAMESITE='Lax',
)

Regenerate session ID

Every time a user logs in or out of the system, the session ID should be regenerated. This prevents session hijacking.

from flask import session
@app.route('/login', methods=['POST'])
def login():
    # ...
    # Validating credentials
    # ...
    # Regenerating session ID after a successful logon
    session.regenerate()
    return "Logon successful!"

Set session timeout

Infinite sessions are a bad thing. Always set a timeout for sessions.

from flask import Flask, session
from datetime import timedelta
app = Flask(__name__)
app.permanent_session_lifetime = timedelta(minutes=15)

Remember: sessions are a powerful tool, but they require due care.

Be cautious with eval() and exec()

Python has built-in functions called eval() and exec(); both of them execute Python code passed to them as a string, but with some differences.

The eval() function waits for a string containing a Python expression and returns the value of this expression. For instance, if you pass '1 + 2' to the eval() function, it will return 3.

An example:

x = 1
print(eval('x + 1'))  # Result: 2

The exec() function executes several Python code strings. Unlike eval(), it doesn’t return a value, but executes any statements contained in a string. For instance, you can use exec() to define new functions or classes.

An example:

exec('x = 1\ny = 2\nprint(x + y)') # Result: 3

In other words, the main difference between eval() and exec() is that eval() returns the value of an expression and can process only one expression; while exec() executes a block of code without returning a value.

But the catch is that these functions can execute code that wasn’t intended to be executed. And, of course, this opens the door to hackers. If an attacker gains access to eval(), or exec(), or parameters passed with them, this person can run any Python code with all the consequences that come with it.

Let’s have a look at good and bad examples of code with eval().

A bad example:

import os
def bad_eval(input_string):
    return eval(input_string)
# Imagine that the following string was received from the user
user_input = "os.system('rm -rf /')"
result = bad_eval(user_input)

In this example, eval() is used to execute the string entered by the user. As a result, a malicious user can enter a string that will delete all files on the disk.

A good example:

def good_eval(input_string):
    safe_list = ['+', '-', '*', '/', ' ', '4', '2']
    for i in input_string:
        if i not in safe_list:
            return "Error! Unsafe input."
    return eval(input_string)
# Even if the user attempts to enter dangerous code, nothing will happen
user_input = "4 / 2 * os.system('rm -rf /')"
result = good_eval(user_input)
print(result)
# Output: "Error! Unsafe input."

In this example, you limit what can be fed to eval() as input, thus, reducing risks. You create a list of safe characters and check the input: it must not contain anything other than these characters. If a character entered by the user is not on the list, the program returns an error message, and eval() isn’t executed.

However, even with such precautions, the use of eval() isn’t completely safe since you have to take into account all possible input options. This isn’t always feasible, especially when input becomes more complex.

The best way is to avoid eval() at all if possible. There are many other ways to handle user input that don’t incur such risks.

For instance, if you have to process mathematical expressions, you can use secure libraries (e.g. SymPy). They contain their own functions that parse and execute mathematical expressions.

The example below shows how can SymPy be safely used:

from sympy import sympify
def safe_eval(input_string):
    safe_list = ['+', '-', '*', '/', ' ', '4', '2']
    for i in input_string:
        if i not in safe_list:
            return "Error! Unsafe input."
    return sympify(input_string)
user_input = "4 / 2 * 2"
result = safe_eval(user_input)
print(result) # Will display 4.0

In this example, the sympify function from the SymPy library is used to execute a mathematical expression entered by the user. This is safer compared to eval() because SymPy doesn’t execute arbitrary Python code: it only processes mathematical expressions.

The use of eval() can be justified: for instance, when you dynamically execute Python code received as a string. But even in such cases, be extremely cautious and always validate the input to avoid possible vulnerabilities.

Let’s examine some examples where eval() can be reasonably called.

The most obvious situation is when you create your own Python interpreter or REPL (read-eval-print loop). You may need eval() to execute code entered by the user.

while True:
    user_input = input(">>> ")
    try:
        print(eval(user_input))
    except Exception as e:
        print("Error: ", e)

Sometimes eval() is used to dynamically import modules. For instance, you may need to load some modules listed as string values. However, in such cases, it’s better to use the importlib library for this purpose.

Remember: eval() is a powerful tool, but with great power comes great responsibility. Use it with caution and only in situations when other options are unavailable.

Use Python virtual environment

A virtual environment is an isolated zone where specific versions of Python and libraries are installed. This mechanism protects your project from changes in the system. Concurrently, virtual environments provide additional security.

How it works

Let’s say you are working on two projects: Project_A and Project_B. Project_A requires Django version 1.11; while Project_B requires Django version 2.2. If you install both Django versions globally, a version conflict would occur. A virtual environment solves this problem: it enables you to use two separate ‘copies’ of Python and libraries in each project.

To create a virtual environment for Project_A, open the terminal, go to the Project_A directory, and enter:

python3 -m venv env

This command creates a virtual environment called env. To activate this environment, use the following command:

source env/bin/activate

Simple, reliable, and more secure! The reasons are as follows:

Dependency isolation. Each virtual environment has its own set of dependencies that are isolated from the system-level Python. Even if one of the system Python packages has a vulnerability, it won’t affect your virtual environment;
Version control. Virtual environments enable you to control versions of libraries and packages used in your projects. You can use specific versions of packages whose security is 100% guaranteed;
Lower risks. If you accidentally install a malicious package, its capacity will be limited to its virtual environment and won’t be able to harm system-level Python or other projects;
Global space remains clean. If you install packages globally, it can create plenty of problems, especially if you work with different versions of Python. Virtual environments enable you to avoid this risk by keeping the global space clean and organized; and
Easy to reproduce and deploy. When you deploy an application on a server or send your code to another developer, virtual environments make it possible to easily reproduce the required conditions, including all dependencies.

Not only are virtual environments handy, but they also represent an important aspect of secure Python programming.

Conclusions

Now you know the secure coding basics. Adhere to them, and the chance to encounter vulnerabilities will decrease, thus, making your code more reliable. Always keep up with the latest threats and use best practices to minimize risks.

Remember: security isn’t something you can add at the end of development. You have to keep it in mind when you write every line of code. Yes, it takes more time and effort, but it will definitely pay off in the long run.

2022.06.01 — F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

Is the phrase "This script contains malicious content and has been blocked by your antivirus software" familiar to you? It's generated by Antimalware Scan Interface…

Full article →

2022.01.12 — Post-quantum VPN. Understanding quantum computers and installing OpenVPN to protect them against future threats

Quantum computers have been widely discussed since the 1980s. Even though very few people have dealt with them by now, such devices steadily…

Full article →

2022.02.16 — Timeline of everything. Collecting system events with Plaso

As you are likely aware, forensic analysis tools quickly become obsolete, while hackers continuously invent new techniques enabling them to cover tracks! As…

Full article →

2022.02.15 — EVE-NG: Building a cyberpolygon for hacking experiments

Virtualization tools are required in many situations: testing of security utilities, personnel training in attack scenarios or network infrastructure protection, etc. Some admins reinvent the wheel by…

Full article →

2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…

Full article →

2022.06.01 — Log4HELL! Everything you must know about Log4Shell

Up until recently, just a few people (aside from specialists) were aware of the Log4j logging utility. However, a vulnerability found in this library attracted to it…

Full article →

2023.02.21 — SIGMAlarity jump. How to use Sigma rules in Timesketch

Information security specialists use multiple tools to detect and track system events. In 2016, a new utility called Sigma appeared in their arsenal. Its numerous functions will…

Full article →

2022.01.13 — Step by Step. Automating multistep attacks in Burp Suite

When you attack a web app, you sometimes have to perform a certain sequence of actions multiple times (e.g. brute-force a password or the second authentication factor, repeatedly…

Full article →

2022.01.11 — Persistence cheatsheet. How to establish persistence on the target host and detect a compromise of your own system

Once you have got a shell on the target host, the first thing you have to do is make your presence in the system 'persistent'. In many real-life situations,…

Full article →

2022.06.01 — First contact. Attacks on chip-based cards

Virtually all modern bank cards are equipped with a special chip that stores data required to make payments. This article discusses fraud techniques used…

Full article →