News

Master Your Files: 5 Linux Commands for Effortlessly Handling Large Documents

September 2, 2024
5:07 am

Linux offers various utilities to handle extensive text files and unstable data streams.

The ‘less’ command is useful for scrolling, searching, and managing file contents, or you can directly pipe output to it.

To manage large files, the ‘split’ utility can be used to divide them into manageable pieces, while ‘head’ and ‘tail’ commands facilitate selective content viewing.

If you need to process very large text files for information, Linux comes equipped with a suite of built-in tools perfect for the job. These tools are also effective with live text streams.

Linux often operates quietly, where a lack of error messages generally means your operations are succeeding. It’s a system where silence typically indicates efficiency and success.

However, there are instances where Linux provides an overwhelming amount of data. Prior to the adoption of systemd and journalctl, the methods discussed here would primarily serve to manage extensive log files, yet these techniques are applicable to any type of file.

Furthermore, these methods can be used with data streams and the outputs from various commands.

The less command lets you scroll forward and backward through a file a line at a time using the Up Arrow and Down Arrow, or a full screen at a time using Page Up and Page Down, or you can move to the beginning or end of the file using the Home and End keys.

Line numbers can be displayed by including the -N (line numbers) option in the command line.

Despite potentially dealing with hundreds of thousands of lines in a file, less remains quite responsive, even on a moderately equipped virtual machine.

Less also incorporates a search utility. To use it, press the forward slash /, enter your search term, and hit Enter. If there’s a match, less will show the part of the file containing the match with the term highlighted.

You can navigate forward between matches by pressing n. To navigate in reverse, press N.

With the use of pipes, the output of a command can be directly fed into less.

Similar to interacting with a file, you can move forward, move backward, and perform string searches within less.

Although piping a command’s output to less is effective for immediate needs, remember that once you exit less, that output disappears. If you anticipate needing this output later, it’s advisable to save it permanently and then view that saved copy in less.

This is easily done by redirecting the output of the command into a file, and opening the file with less.

Note that warnings or errors are sent to the terminal window so that you can see them. If they were sent to the file, you might miss them. To read the file, we open it with less in the usual way.

The > operator redirects the output to the named file, and creates the file afresh each time. If you want to capture two (or more) different sets of information in the same file, you need to use the >> operator to append the data to the existing file.

Note the use of >> in the second command.

By capturing the output from a command, we’re actually capturing one of the three Linux streams. A command can accept input (stream 0, STDIN), generate output (stream 1, STDOUT), and can raise error messages (stream 2, STDERR).

We’ve been capturing stream 1, the standard output stream. If you want to capture error messages too, you’ll need to capture stream 2, the standard error stream, at the same time. We need to use this odd-looking construct 2>&1, which redirects stream 2 (STDERR) into stream 1 (STDOUT).

You can search the file for strings such as warning, error, or other phrases you know appear in the error messages.

If your files are so large that less slows down and becomes laggy, you can split the original file into more manageable chunks.

I’ve got a file named big-file.txt. This file contains 132.8 million lines of text, and its size exceeds 2GB.

Using less allows for efficient browsing, however, activities such as jumping directly to the beginning or end, or conducting reverse searches from the file’s end, are slower.

Although effective, the process is sluggish.

This is where the split command proves useful. As indicated by its name, it divides a large file into more manageable pieces, while keeping the original file intact.

You can divide a file into a specific number of smaller files, or by specifying a size for each split file, which then determines the number of resulting files. However, using these methods may result in split lines or even words across separate files.

Given that handling text files is the context, splitting text across lines or segments is often problematic. It is practical to divide files by the number of lines instead. We utilized the wc command to figure out the total number of lines previously.

For the splitting process, I employed the -l (lines) option to set the split at 500,000 lines per file. Additionally, I included the -d (digits) option for sequential file numbering and the -a (suffix length) option to ensure numbers are padded with zeros up to three digits. The prefix for the names of the split files is ‘chunk.’

In this scenario, we end up with 267 files (numbered from 000 to 266), generating more manageable file sizes for less powerful computers.

Each file contains 500,000 complete lines, apart from the last file. That file has whatever left over amount of lines there were.

The head and tail commands let you take a look at a selection of lines from the top or end of a file.

By default, you’re shown 10 lines. You can use the -n (lines) option to ask for more or fewer lines.

If you know the region within a file that interests you, you can select that region by piping the output from tail through head. The only quirk is, you need to specify the first line of the region you want to see counted backward from the end of the file, not forward from the start of the file.

The document contains a half-million lines. To view 20 lines from line 1240, instruct tail to begin at line 498761, which is calculated by subtracting 1239 from 500,000. This subtraction accounts for starting one line before the desired line 1240.

The results from tail are then processed by head, which isolates the first 20 lines of this segment for viewing.

When checked against the same segment in the file viewed through the less command, starting from line 1240, the displayed content is consistent.

An alternate method involves splitting the file starting from the desired line and utilizing head to capture and display the initial segment of the split file.

Another trick you can do with tail is monitor changing data. If you have a file that is being updated like a log file, the -f (follow changes) option tells tail to display the bottom of the file whenever it changes.

The grep command is very powerful. The simplest example uses grep to find lines in a file that contain the search string.

We can add line numbers with the -n (line numbers) option so that we can locate those lines in the file, if we want to. The -T (tab) option tabulates the output.

Searching through multiple files is just as easy.

The filename and the line number are provided for every match.

Grep is also capable of searching through live streams of data.

While using grep, you can employ more complex search criteria including regular expressions. It is advisable to refer to the manual pages of these commands, as they offer additional functionalities and options which may be beneficial depending on your specific needs.

Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.

Share this Post

0 0 votes

Article Rating

0 Comments

Oldest

Newest Most Voted

FRESH DEALS: KVM VPS PROMOS NOW AVAILABLE IN SELECT LOCATIONS!

DediRock is Waging War On High Prices Sign Up Now

Master Your Files: 5 Linux Commands for Effortlessly Handling Large Documents

Share this Post

Search

Categories

Tags

Address

We Accept