Python from absolute zero. Learning to work with strings, files, and the Internet

One day, Crocodile Gena and Cheburashka were asked to write an essay on the topic ’How I spent my summer.’ The problem was that the friends drank beer all summer. Gena, who can’t lie, wrote it that way, so Cheburashka had to replace some words. And since Cheburashka was a Python coder, he did this using a string function. In this article, I’ll show you how to keep up with Cheburashka and learn how to work with strings, files, and make requests to websites in Python.

From the editors

Recently, we conducted a survey among our readers and found that many of them would like to learn Python, and start from scratch. As an experiment, we published the article Python from Absolute Zero. Learn to code without boring books», where they talked about the basics of Python: variables, conditions, loops and lists. The feedback was positive, and we decided to continue introducing readers to Python in our signature fun style.

This article, like the previous one, is available without a paid subscription, so feel free to share these links with your friends who dream of learning Python!

Let’s start with the strings. To solve the problem that the friends faced, Cheburashka used the replace() function, which replaces one substring in a string with another.

First, he declared a variable s and placed there the string that Gena sent him.

s = 'We drank beer all summer. So one day I open the door and there on the threshold is Cheburashka, all drunk, drunk, and a bottle is sticking out of his pocket.'

Next, Cheburashka determined a dictionary of words that needed to be replaced.

slova = {'drank':'read', 'beer':'books', 'drunk':'well-read', 'bottle':'encyclopedia'}

And now, using the for loop, Cheburashka went through the dictionary to replace each word (key) with the corresponding value from the dictionary (slova[key]):

for key in slova:
    s = s.replace(key, slova[key])
print(s)

info

Dictionaries are much like lists, but their values are written in pairs: a key and a value. You can find out the meaning from the key. You can think of keys in lists as indices (0, 1, 2…), and in dictionaries as strings.

The replace() function is useful for completely removing some words from a string. To do this, we will replace them with an empty string (if you open and close the quotation mark, you will get an empty string):

s = '''I don't like drinking beer.
It's tasteless and unhealthy!'''
s = s.replace('not','')
print(s)

info

To write multiple strings into a variable, you can wrap them in three single quotes and do string breaks directly in the code.

To get the number of characters in a string, use the len() function.

s = 'If you really cannot sit still, write the code any way you can!'
n = len(s)
print(n)

And, as I already said in the previous article, you can take slices from strings as from arrays if you specify the beginning and end of the substring in square brackets after the variable. The position starts from zero.

s = 'My name is Bond, James Bond'
a = s[11:15]
print('Last name: ' + a)

If you need to make a cut from the beginning of the string, you don’t need to write the first digit.

Let’s say you need to find strings in a list that start with https. We iterate over them using for, for each, we check whether the first five characters match the string https, and if so, we output the string:

mas = [ 'This is just a string', 'https://xakep.ru', 'Another string', 'https://habr.ru' ]
for x in mas:
    if x[:5] == 'https':
        print(x)

To count the number of occurrences of a substring in a string, you can use the .count() method:

s = 'Guess what, in short, I bam him with an exploit on the port, and he, in short, crashed right away!'
n = s.count('short')
print(n)

Sometimes, there may be extra spaces or string breaks at the beginning or end of a string. Let’s remove them with a special command .strip():

s = 'There is no such thing as too much beer!  \n'
s = s.strip()
print(s)

info

String breaks can be added using the \n (used in all OS) or \r\n (in Windows) characters. There are other special characters. For example, \t is a tab character.

To determine whether a substring exists in a string s, you can use the .find() method:

n = s.find('the string we are looking for')

If the desired substring is found, then its position in the string will be stored in the variable n , and if not found, n will be equal to -1.

Let’s try to determine if the string contains an email address from Xakep.ru, that is, we will look for the substring @xakep.ru.

But first, we need one more string method – .split(). It allows to split a string into parts by specifying a separator string as an argument. For example, s.split('\n') will split the text into paragraphs based on the string break character. If you leave the brackets empty, the default separator, a space, will be used.

s = 'This is a normal string, and it contains the email address vasya@xakep.ru'
words = s.split()
for w in words:
    n = w.find('@xakep.ru')
    if n != -1:
        print('Found email: ' + str(w) + ' at position ' + str(n))

The .join() method, on the contrary, allows to glue strings together. It takes a list and returns a string where each element of the list is connected to the other through the string you called this method on.

s = 'virus is being introduced'
list1 = ['one, ', 'two, ', 'three...']
print(s + s.join(list1))

Formatting strings

We have printed various things many times by connecting strings with simple addition. This is not always convenient, especially considering that if you come across numbers, you will have to convert them into strings using the str() function. There is a more beautiful and convenient way to substitute variable values into strings. More precisely, two slightly different methods.

Method 1 – using the .format() method

We can insert a pair of curly brackets into the string, and then call the string’s .format() method and pass it the desired values in the order they are substituted into the string.

name = 'Vasya Pupkin'
age = 20
address = 'Pushkin street, Kolotushkin house'
info = 'Name: {}. Age: {}. Address: {}'.format(name, age, address)
print(info)

You can pass information as a list separated by an asterisk:

data = ['Vasya Pupkin', 20, 'Pushkin street, Kolotushkin house']
info = 'Name: {}. Age: {}. Address: {}'.format(*data)
print(info)

Method 2 – via f-strings

Another option is to write the letter f before the string and then specify the variables directly in curly brackets.

name = 'Vasya Pupkin'
age = 20
address = 'Pushkin street, Kolotushkin house'
info = f'Name: {name.upper()}. Age: {age}. Address: {address}'
print(info)

The main advantage of this method is that you can insert a value into a string multiple times. In addition, you can change the values directly in the curly brackets: Python will first perform all the actions in them, and then substitute the resulting value into the string. So, the .upper() method in the example above makes all letters uppercase.

Files

The methods listed are enough to allow you to do whatever you want with strings. But where will these strings come from? Most often they are written in files, so now I will tell you how to manage them in Python.

To work with a file, you need to open it. The open() function is used for this, and it works like this:

f = open('file name with path and extension', 'file mode', encoding='Text encoding')

There are several modes of working with files, but you are mainly interested in:

r — open a file to read information from it;
w — open a file to write information to it (creates a new file);
a — open a file to append information to the end of the file (appends information to the end of an existing file);
a+ — additional writing and reading.

To avoid problems with paths in Windows, use double slashes \\ in them, and also put the letter u before the opening quote of the file path, indicating that the string is in Unicode encoding:

f = open(u'D:\\test.txt', 'r', encoding='UTF-8')

You can read string from a file using the .read() method:

f = open('test.txt', 'r', encoding='UTF-8')
s = f.read()
print(s)

Alternatively, you can sequentially read individual string from the file using a for loop:

f = open('test.txt', 'r', encoding='UTF-8')
for x in f:
    print(x)

Once you have finished working with the file, you need to close it.

f.close()

info

To work with binary files, add the letter b to the mode when opening a file:

f = open('myfile.bin', 'rb')
d = f.read()
print("d = ", d)

We will talk more about binary data in one of the following articles.

Let’s now try to create a new text file in the same directory as our script and write the values of some variables into it.

s1 = 'One, two, three, four, five\n'
s2 = 'I am going to break the server...\n'
f = open('poems.txt', 'w', encoding='UTF-8')
f.write(s1)
f.write(s2)
f.close()

Please note that at the end of each string there is a \n symbol — a transition to a new string.

Let’s say you want to add a third string to the end of this file. This is where the re-recording mode comes in handy!

s3 = 'Oh, they will get tired of fixing it!\n'
f = open('poems.txt', 'a', encoding='UTF-8')
f.write(s3)
f.close()

To open files, it is also very convenient to use the with open('file name with path and extension', 'file mode') as f construction, because thanks to the with word, the file will be closed automatically and you won’t have to think about it.

s = 'If you close this file, your disk will be formatted!\nJoke\n'
with open('test.txt', 'w', encoding='UTF-8') as f:
    f.write(s)

Working with the web

Let’s learn how to get information from web pages. First, you need to install several modules. We write in the command line:

pip install requests
pip install html2text

The requests module allows to make GET and POST requests to web pages. The html2text module is used to convert HTML code of web pages into plain text, that is, it cleans it from HTML tags.

We import our new modules at the beginning of the program and try to get some page from the Internet.

import requests
# Make a GET request
s = requests.get('http://xakep.ru')
# Print the server response code
print(s.status_code)
# Print the HTML code
print(s.text)

The program will print out a lot of HTML code that makes up the magazine’s main page. But what if you just want the site text, not a jumble of tags? Here html2text will help. It will extract text, headings and images from the code and return them without HTML tags.

import requests
import html2text
# Make a GET request
s = requests.get('http://xakep.ru')
# Server response code
print(s.status_code)
# An instance of the parser is created
d = html2text.HTML2Text()
# A parameter that affects how links are parsed
d.ignore_links = True
# Text without HTML tags
c=d.handle(s.text)
print(c)

In addition to GET requests, there are so-called POST requests, which are used to send large texts or files to the server. If you see a form on a website, especially with a file upload, then most likely, when you click the “Submit” button, a POST request will be made.

The requests library also allows to make POST requests. You may find this useful for simulating user actions — for example, if you need to automate work with a website. You can even use this as a home-made Burp alternative!

Let’s see how to send a regular POST request. Let’s assume that there is a guest.php script on the site.ru website, which accepts the user name name and message message from the form via a POST request, and then posts them to the guestbook.

import requests
# Variables to be sent via POST request
user = 'coolhacker'
message = 'You have beeh pwned!!!'
# We make a POST request and pass a dictionary of fields
r = requests.post("http://site.ru/guest.php", data={'user': user, 'message': message})
print(r.status_code)

Now let’s send a request with the payload.php file as an attachment and the same two form fields as in the previous request. The file will come to the server under the name misc.php.

import requests
user = 'kitty2007'
message = '(* ^ ω ^)'
# Open the file in binary mode
with open('payload.php', 'rb') as f:
    # POST request with file sending
    r = requests.post('http://site.ru/upload.php', files={'misc.php': f}, data={'user': user, 'message': message})

All that’s left is to learn how to download files. This is a lot like requesting pages, but it’s best to do it in streaming mode (stream=True). We will also need the shutil module, which has a convenient copyfileobj function. It allows to copy the contents of binary files — in our case, from the Internet to our disk.

import requests
import shutil
import os
# File to download
s = 'https://xakep.ru/robots.txt'
# Using the os.path.split(s) function, we extract the path to the file and its name from the string
dirname, filename = os.path.split(s)
# GET request in stream=True mode to download a file
r = requests.get(s, stream=True)
# If the server response is successful (200)
if r.status_code == 200:
    # Create a file and open it in binary mode for writing
    with open(filename, 'wb') as f:
    # Decode the data stream based on the content-encoding header
        r.raw.decode_content = True
        # Copying data stream from the Internet to a file using the shutil module
        shutil.copyfileobj(r.raw, f)

info

Server response codes help you understand how your request was performed. Code 200 means that the server successfully processed the request and gave us a response, code 404 — the page was not found, 500 — an internal server error, 503 — the server is unavailable, and so on. Full list of status codes can be found on Wikipedia.

Error handling

Before I look at a more real-world example, I need to show you one more language construct that is indispensable when working with files and the network. This is handling exceptional situations, that is, errors.

Often, when running a program, the computer encounters various problems. For example, file not found, network unavailable, disk space out. If the programmer has not taken care of this, the Python interpreter will simply exit with an error. But there is a way to anticipate problems right in the code and continue working — the try... except construct.

It looks like this:

try:
    # There are some commands here,
    # which may lead to an error
except:
    # Our actions if an error occurs

You can catch specific types of errors by specifying the type name after the except keyword. For example, KeyboardInterrupt is triggered if the user tries to terminate a program by pressing Ctrl-C. It is in our power to prohibit this from happening!

Heck, we can even allow division by zero if we catch the ZeroDivisionError error. This is what it will look like:

try:
    k = 1 / 0
except ZeroDivisionError:
    k = 'over 9000'
print(k)

www

Full list of exception types

Writing a port scanner

Now we’ll write our own port scanner! It will be simple, but quite functional. The socket module, which implements work with sockets, will help us with this.

info

A socket is an interface for exchanging data between processes. There are client and server sockets. The server socket listens on a specific port waiting for clients to connect, and the client socket connects to the server. Once a connection has been established, data exchange begins.

This is what the code will look like.

import socket
# List of ports to scan
ports = [20, 21, 22, 23, 25, 42, 43, 53, 67, 69, 80, 110, 115, 123, 137, 138, 139, 143, 161, 179, 443, 445, 514, 515, 993, 995, 1080, 1194, 1433, 1702, 1723, 3128, 3268, 3306, 3389, 5432, 5060, 5900, 5938, 8080, 10000, 20000]
host = input('Enter the site name without http/https or IP address: ')
print ("Please wait, port scanning in progress!")
# In a loop, we iterate through the ports from the list
for port in ports:
    # Create a socket
    s = socket.socket()
    # Set a timeout of one second
    s.settimeout(1)
    # Catch errors
    try:
    # Try to connect, pass host and port as a list
        s.connect((host, port))
    # If the connection caused an error
    except socket.error:
    # then we do nothing
        pass
    else:
        print(f"{host}: {port} active")
    # Close the connection
        s.close
print ("Scanning complete!")

As you can see, nothing complicated!

Homework

Make the port scanner get the list of IPs from one file and write the scan results to another.
In the previous article, you learned how to work with the clipboard. Write a program that continuously runs and periodically receives the clipboard contents. If it has changed, it adds it to the end of the monitoring.txt. Try to log only those intercepted strings that contain Latin letters and numbers, this way you are more likely to catch passwords.
Write a program that reads a file of this type:
Ivan Ivanov|ivanov@mail.ru|Password123 Dima Lapushok|superman1993@xakep.ru|1993superman Vasya Pupkin|pupok@yandex.ru|qwerty12345 Frodo Baggins|Frodo@mail.ru|MoRdOr100500 Kevin Mitnick|kevin@xakep.ru|dontcrackitplease User Userson|uswer@yandex.ru|aaaa321
The program should sort the strings by domains from the email, create a file for each domain, and place a list of email addresses in each file.
Write a program that goes through the sites on the list, downloads the robots.txt and sitemap.xml files and saves them to disk. If the file is not found, a message about this is displayed.

That’s all for today. In the next article, you’ll learn how to work with the OS file system, understand functions, discover the power of regular expressions, and write a simple SQL vulnerability scanner. Don’t miss it!

2023.07.07 — VERY bad flash drive. BadUSB attack in detail

BadUSB attacks are efficient and deadly. This article explains how to deliver such an attack, describes in detail the preparation of a malicious flash drive required for it,…

Full article →

2022.06.01 — Quarrel on the heap. Heap exploitation on a vulnerable SOAP server in Linux

This paper discusses a challenging CTF-like task. Your goal is to get remote code execution on a SOAP server. All exploitation primitives are involved with…

Full article →

2022.02.09 — First contact: An introduction to credit card security

I bet you have several cards issued by international payment systems (e.g. Visa or MasterCard) in your wallet. Do you know what algorithms are…

Full article →

2022.06.01 — F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

Is the phrase "This script contains malicious content and has been blocked by your antivirus software" familiar to you? It's generated by Antimalware Scan Interface…

Full article →

2023.02.21 — Pivoting District: GRE Pivoting over network equipment

Too bad, security admins often don't pay due attention to network equipment, which enables malefactors to hack such devices and gain control over them. What…

Full article →

2023.04.19 — Kung fu enumeration. Data collection in attacked systems

In penetration testing, there's a world of difference between reconnaissance (recon) and data collection (enum). Recon involves passive actions; while enum, active ones. During recon,…

Full article →

2022.02.15 — EVE-NG: Building a cyberpolygon for hacking experiments

Virtualization tools are required in many situations: testing of security utilities, personnel training in attack scenarios or network infrastructure protection, etc. Some admins reinvent the wheel by…

Full article →

2022.02.15 — First contact: How hackers steal money from bank cards

Network fraudsters and carders continuously invent new ways to steal money from cardholders and card accounts. This article discusses techniques used by criminals to bypass security…

Full article →

2023.02.21 — Herpaderping and Ghosting. Two new ways to hide processes from antiviruses

The primary objective of virus writers (as well as pentesters and Red Team members) is to hide their payloads from antiviruses and avoid their detection. Various…

Full article →

2022.02.09 — Dangerous developments: An overview of vulnerabilities in coding services

Development and workflow management tools represent an entire class of programs whose vulnerabilities and misconfigs can turn into a real trouble for a company using such software. For…

Full article →