Making Code Safer, Starting Here-Jade Guqin Logic

Keeping Secrets

Have you ever wondered why passwords are always the first to be stolen when websites are hacked? This is because many programmers don't know how to securely store passwords. Don't worry, I'll teach you a simple and effective method!

We know that plaintext passwords should not be stored directly in the database, because if the database is attacked, all passwords will be leaked. So how should passwords be stored? The answer is to use hash functions!

A hash function is a one-way encryption algorithm that can convert input of any length into output of a fixed length, and this process is irreversible. In other words, we cannot derive the original password from the hash value. This way, even if the database is stolen, hackers cannot obtain plaintext passwords.

In Python, we can use the bcrypt library to hash passwords. bcrypt not only uses a powerful hashing algorithm but also automatically adds salt to prevent rainbow table attacks. Let's see how to use it:

import bcrypt


password = b"my_secret_password"
hashed = bcrypt.hashpw(password, bcrypt.gensalt())
print(hashed)  # Output similar to: b'$2b$12$kGMcbBfd3KFkSjRfUQaWQu9dNOyUXoEONj5vMdtxqhyTJPVXXXXXX'


if bcrypt.checkpw(password, hashed):
    print("Password matches")
else:
    print("Password does not match")

See, using bcrypt is very simple! We first use the hashpw() function to hash the password, and then use the checkpw() function to compare hash values during verification. This way, no matter what happens to the database, your users' passwords are safe.

However, one thing to note is that the hash value generated by bcrypt is irreversible, so if users forget their passwords, you will not be able to retrieve the original passwords. In this case, you need to provide a password reset feature.

Dodging Bullets

We've talked a lot about passwords, let's talk about something else! As a Python developer, you may often need to get input from users, such as form data, query parameters, etc. But have you ever thought about the security risks hidden in user input, such as SQL injection, cross-site scripting attacks, etc.?

SQL injection attack is when an attacker inserts malicious SQL statements into input fields of a web application to obtain or modify data in the database. For example, entering in the login box:

' OR '1'='1

If you directly concatenate this input into an SQL statement and execute it, it's equivalent to running:

SELECT * FROM users WHERE username = '' OR '1'='1';

Since '1'='1' is always true, this statement will return all records in the users table, causing data leakage.

So how to prevent this? I'll teach you a trick - use parameterized queries! Different databases have different implementations, taking SQLAlchemy as an example:

from sqlalchemy import create_engine, text

engine = create_engine('sqlite:///mydatabase.db')
with engine.connect() as connection:
    username = "' OR '1'='1"
    result = connection.execute(text("SELECT * FROM users WHERE username = :username"), {"username": username})

SQLAlchemy will automatically escape the input for us, avoiding SQL injection. Of course, you can also choose other ORM libraries, or manually escape the input, but parameterized queries are undoubtedly the simplest and most effective way.

In addition to SQL injection, we also need to pay attention to other security risks when handling user input, such as cross-site scripting attacks (XSS), file upload vulnerabilities, etc. Let me teach you a trick to validate user input.

Data Inspector

Before receiving user input, we need to validate it to ensure the input is valid. For example, when registering a new user, you may need to verify if the email address format is correct. We can use regular expressions to implement this function:

import re

def validate_email(email):
    pattern = r'^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'
    return re.match(pattern, email) is not None

print(validate_email("[email protected]"))  # Output: True
print(validate_email("invalid@email"))  # Output: False

This function uses a rather complex regular expression to match valid email addresses. If you're not familiar with regex, that's okay, Python has many third-party libraries for validating user input, such as Cerberus.

from cerberus import Validator

schema = {
    'email': {'type': 'string', 'regex': '^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$'},
    'password': {'type': 'string', 'minlength': 8}
}

validator = Validator(schema)

document = {
    'email': '[email protected]',
    'password': 'mypassword'
}

if validator.validate(document):
    print("Input is valid")
else:
    print(validator.errors)

Cerberus allows you to define validation rules such as field types, length limits, regular expressions, and then perform batch validation on input data. This method is not only more concise but can also validate multiple fields simultaneously, which is very convenient.

Input validation is a small step, but it's crucial for ensuring the security of applications. In addition to preventing injection attacks, it can also avoid exceptions when the program handles illegal input. So whether using regex or third-party libraries, please be sure to validate user input before processing it.

Watch Out for Traps

When building Web APIs, security is also an issue that cannot be ignored. We need to not only protect the API from external attacks but also prevent data from being stolen during transmission.

First, we must use HTTPS to encrypt data transmission. HTTPS establishes secure connections through the SSL/TLS protocol, which can prevent man-in-the-middle attacks and ensure that data is not eavesdropped or tampered with during transmission. In Python, we can use the requests library to send HTTPS requests:

import requests

response = requests.get('https://api.example.com/data')
print(response.text)

Another thing to note is authentication and authorization. We must ensure that only authorized users can access sensitive data and functions in the API. Common authentication methods include password-based methods (HTTP Basic Auth) and token-based methods (OAuth2, JWT, etc.).

Taking Flask as an example, we can quickly implement HTTP basic authentication using the flask-httpauth extension:

from flask import Flask, request
from flask_httpauth import HTTPBasicAuth

app = Flask(__name__)
auth = HTTPBasicAuth()

users = {
    "admin": "secret"
}

@auth.verify_password
def verify_password(username, password):
    if username in users:
        return users.get(username) == password
    return False

@app.route('/api/data')
@auth.login_required
def get_data():
    return "This is some sensitive data"

if __name__ == '__main__':
    app.run()

In this example, we defined a simple username-password dictionary. The verify_password function will be called on each request to verify user identity. Only after passing authentication can one access the data returned by the /api/data route.

In addition, when handling API requests, we need to pay attention to some other security best practices, such as:

Limiting request rates to prevent brute force attacks
Using HTTPS and CSRF protection to prevent cross-site request forgery
Filtering and validating user input
Minimizing error information to avoid leaking sensitive information
Using the latest versions of libraries and frameworks to fix known vulnerabilities

Protecting your API will build a solid defense for your application.

Hiding Secrets

When writing Python programs, we inevitably need to use some sensitive information, such as database passwords, API keys, etc. If not protected, once this information is leaked, it will bring serious security risks to the application. So how to properly keep these "secrets"?

The least wise approach is to hardcode key information directly in the code, because once the code is leaked, the secrets will have nowhere to hide.

A relatively safe approach is to store the keys in a configuration file and then read that file in the code. However, this method also has a flaw: if you upload the configuration file to a public code repository (such as GitHub), the secrets will still be leaked.

So, the best practice I recommend is: use environment variables to store keys!

Environment variables are system-level variables that are stored in memory and completely separate from the code. We can set these variables when deploying the application without exposing them in the code.

In Python, you can use the os.getenv() function to read environment variables:

import os

db_password = os.getenv('DB_PASSWORD')

If the environment variable doesn't exist, getenv() will return None. You can also provide a default value as the second parameter.

For local development environments, we can create a .env file in the project root directory to store environment variables:

DB_PASSWORD=mysecretpassword

Then use the python-dotenv library to load this file when the program starts:

from dotenv import load_dotenv
load_dotenv()

import os
db_password = os.getenv('DB_PASSWORD')

For production environments, we need to set environment variables on the server. Different operating systems and cloud platforms have different setting methods, you need to consult the corresponding documentation.

In addition to keys, we can also store other sensitive data in environment variables, such as JWT secrets, message queue connection strings, etc. As long as you remember not to expose these variables in the code, your secrets will be safe and sound.

Summary

Through this blog post, we've learned some best practices for ensuring application security in Python programming:

Use libraries like bcrypt to securely store password hash values
Use parameterized queries or ORM to avoid SQL injection
Strictly validate user input to prevent attacks like XSS
Use HTTPS encryption for transmission in Web APIs, implement authentication and authorization
Use environment variables to store sensitive information like keys to avoid leakage

I hope these practices can help you write more secure and reliable Python programs. After all, as people often say: "There is no absolute security in programs, only sufficient security." Let's start now and add a "security lock" to our code!