I think most of Python programmers are quite familar with using the builtin method open to open a file, which I mean a real file on disk, a stored file, like a txt file. But actually, we can also use the same open function to open a file descriptor. Normally, this is a better way to read and write files than by directly using os.read and os.write, which are both low level interfaces. I'll share my knowledge about open a file descriptor in this blog post.
What is file descriptor?
Technically speaking, a file descriptor is just a non-negative number which
represents a I/O resource within a process, such as a disk file, a
pipe, a socket or a device. We often use fd as a short hand name.
Everything is a file in Linux!
The most famous file descriptors for each process are 1, 2 and 3, which represent stdin, stdout and stderr respectively. They are created automatically for each process by operation system.
We can check these file descriptors in Python:
>>> import sys
>>> sys.stdin.fileno() # fileno: get the number of file descriptor
0
>>> sys.stdout.fileno()
1
>>> sys.stderr.fileno()
2
When you open a disk-stored file, there is a hidden file descriptor you might not noticed before:
>>> with open('testfile.xml') as f:
... print(f.fileno())
...
3 # it could be any non-negative number bigger than 2
But, normally, we don't need to know it.
What's hidden in opening a file on disk?
When you open a file on disk, system call open is invoked to open the file as the way you specified and return a file descriptor to represent it. Then Python wrap this file descriptor by a so-called file-like object. What we get in our code is virtually a file-like object.
File-like object is good because it provides us more convenient ways to read and write file, and a buffer in user space for the I/O resource is also created for us. We could avoid using low level interfaces such as os.read or os.write, which are more difficult to handle.
Example: read sys.stdin after redirection
Here is an example demostrated how to read stdin in Python script. Actually the code in this example is quite useful when you want to make your program aware the input which is redirected by pipe. Almost all command line tools support this function.
First, we have a Python script file called openfd.py:
import sys
if not sys.stdin.isatty():
# open a file descriptor is just like open a disk file
with open(sys.stdin.fileno()) as f:
while line:=f.readline():
print('*', line, end='')
print()
The code above first check if the stdin is a TTY by sys.stdin.isatty() method. When there is no pipe redirection used on command line, isatty returns True, which means the user input will be coming from keyboard. Otherwise, isatty returns False, which means the stdin is redirected and the input data is coming from other sources rather than keyboard.
If pipe is involved, the code open file descriptor of stdin (returned by sys.stdin.fileno()) and read. Here is the point, we open a file descriptor, not a disk file, by the same builtin open. Then, readline is called upon the returned file-like object f. The convenient method readline is associated only with file-like object.
Finally, a little * is add to the head of each line for pretty display.
Then, we can start our test by shell:
$ echo -en 'abcde\n12345\npython\nCS4096.com' | python openfd.py
* abcde
* 12345
* python
* CS4096.com
Perfect! You can see that the code read 4 lines from stdin redirected to the output of echo command.
Example: autopass
This is a very tiny projects I found on Github.com. The lines of code are less than 200, but it offers a very convenient function on Linux platform, which is entering password for ssh, sudo and scp automatically.
You can find the code of opening a file descriptor of pipe, read and write. And there are also a few lines of code of using os.read and os.write to manipulate an unseekable file descriptor. I think it is a very good tiny project to study many different important things. Hope you can enjog it. :)
Project address: https://github.com/xinlin-z/autopass
Now you understand what is file descriptor, and how and why you should use it by the builtin open function provided by Python. There are absolutely so many details about this topic you need to explore, and they are impossble to be covered in the single blog post. The most important thing is that you have equipped with the very essential concept. So, good luck! :)

Comments
Post a Comment