System crashes

Sunday, 18 February 2007, michuk

Every desktop operating system crashes. Nobody can avoid it while using software with such high level of complexity, even Microsoft. There may be hundreds of cases — wireless network stops working without an apparent reason, another time the hibernated computer doesn’t want to wake up. Drivers cause instability, as well, if badly written or applied to unsupported hardware. Installed programs overwrite settings of other applications, causing them to behave abnormally, etc. There are plenty of reasons for such incidents. I’m sure everyone of you at least once has encountered a strange message box with some completely out-of-scope information that “something did not succeed” or even some blank pop-up windows appearing during your system’s start-up process.

This happens in GNU/Linux, MS Windows, MacOS, Haiku, Syllabe and others. The operating system has nothing to do with it. What the operating systems can do however is to provide the means to cope with those problems in a standard way. I will be frank here. MS Windows is the only OS in which I simply cannot diagnose most of the errors. I’m sorry but I don’t have a slightest idea where I should look for when my driver crashes or some software works in an unstable manner. Really. And here is why.

Full automation

Windows systems have been designed with “normal” users in mind. It does its best to hide most of the low-level operations like mounting file systems, running daemons (programs running in background) or even installing hardware drivers (the famous “plug & play”). The Windows OS always attempts to do things automatically. Another Microsoft rule states that it is not the user who should make decisions in matters he or she is not competent enough. This concerns things like choosing a file system (NTFS is the a de facto standard), sound system, preferred desktop environment, default applications or settings, etc. And this is all very thoughtful (no irony here!). Full automation is what the desktop operating systems should enable. So, the Microsoft system does a pretty good job in making decisions for its users and taking actions without their knowledge and acceptance. This approach (in most cases) makes lives of many unaware users much easier — they have one extra problem solved for them for no cost. Unfortunately (yes, there’s always a ‘but’), there is one major issue with this approach. When the magic fails, detecting the reason of the failure and fixing it manually may be hell. Used to automation, we trust the OS blissfully. We don’t know what kind of operations our system performs and thus, we can hardly do something reasonable when things go wrong. Indeed, it is not easy to find a cause when we don’t even know where to look at!

Most of the GNU/Linux distributions work in a somewhat different manner. Most of the magic is still there, but this is the user who decides whether to use it or not. This way, most of the operations usually performed by wizards and background processes can be performed manually without a problem (provided that one possesses the knowledge required for such tasks) by editing configuration files or using some lower level console based helper apps. This way, the user who needs automation still gets it… but he’s not condemned to using it! Hope the difference is clear enough.

MS Access error
Pic.2 MS Access error. Here we have a slight chance — the app
returned an error code!

System logs

Trying to examine a cause of error in MS Windows is somewhat like searching for a needle in a hay stack. The main reason for this is a lack of detailed system logging, for registering all the incidents that occur while running the OS. On the contrary, in GNU/Linux, every time the OS detects some incompatible video driver, corrupted networking configuration, or references to some non-existent/protected files, an appropriate message is stored in a special log explaining what happened. Thus, it is much easier to analyze the cause or errors and prepare the plan of recovery. And even if the error message does not mean anything to you, there is always a fair chance that they can be understood by a specialist (and you can contact them through forums, IRC and such). Even the creators of first UNIX systems back in 70’s understood that appropriate logging is the key factor in the OS. Unfortunately, MS Windows still lacks this basic functionality and the seldom-helpful event logging system available in XP puts a smile on the faces of UNIX experts around the world.

Verbose and debug mode

Most Windows programs (even as simple as Notepad or Paint) have a closed source code which more or less eliminates the possibility to perform any serious code debugging and error analysis. Proprietary applications are just supposed to work. If they don’t, the manufacturer needs to correct those errors, which obviously can take months or even never happen. Open source programs, on the other hand, are written with community in mind. They usually allow for running an application in verbose or debug mode, this way enabling user to see all operations the program is performing at the moment.

Internet Explorer error
Pic.3 Internet Explorer critical error. When we encounter this
the only thing we can do is call Bill the Great :)

Console mode recovery

The console is a powerful feature of GNU/Linux and UNIX in general and I’m going to cover its role in detail in part III of this article. For now, I just want to mention that it’s irreplaceable in case of major system crashes.

It happens every now and then that the GUI stops responding to any mouse gestures or key strokes. Usually the problem is that some single program (process) took the whole processor activity making all other processes unresponsive for a while. If the problem is more severe, for instance the program enters an infinite loop or tries to use a corrupted driver, the only option may simply be to kill it (force close). In Windows you usually do it by pressing CTRL+ALT+DEL and manually closing the program from the Task Manager window. Unfortunately, if the Task Manager doesn’t respond, we can hardly do anything but wait.

Not anymore, if you’re using GNU/Linux. In Linux you can always drop the graphical mode and switch to console mode by pressing CTRL+ALT+F1. It often works even if the GUI seems unresponsive. When you are already logged into the console, you can easily check which program is causing the instability (top, ps, and lsof commands are useful here) and kill it manually (kill -9 process_id). Moreover, even if the console doesn’t want to show up or works extremely slowly (this may happen in case of very severe accidents) you can still connect to your computer remotely from another machine (of course you need another machine in your network to do so :P) and execute the same kill command over a remote SSH connection. This works just fine and in majority of cases, can rescue you from forcing the manual reboot and thus, risking your precious data. It always surprises me that after the nasty app is killed, the desktop comes back just as if nothing happened and I can get back to normal work immediately.

Nice thing, right? Status in MS Windows: not yet implemented.

Error logging and debugging give us – the users – a chance to detect almost any unwanted system behavior and fix it manually or ask someone to fix it for us (e.g. on the Usenet, IRC or online forums). It’s way easier to find errors when we know what has failed. In case of a Windows black-box, we can only guess what might have gone wrong and try to find the possible cause. You can choose the approach that sounds more sane to you.

Subscribe to RSS feed for this article!

2 Comments

fold this thread Noob_IN_linuxOS  Wednesday, 12 March 2008 o godz. 7:00 am #  Add karma Subtract karma  +0

could anyone guide ma how to use linux console and what commands would work if I do not have USER privileges
thank you
Linux fan

 
fold this thread marco  Thursday, 3 April 2008 o godz. 2:49 am #  Add karma Subtract karma  +0

If you need to know what program is for what task, the command “apropos” can help you to find the right manpage.

In a shell you write “apropos” followed by a keyword followed by pressing Enter.

Example:

marco@siduxbox: # apropos password
passwd (1) - change the password of a user

If you found an interesting manpage you can read them with the man command.

Example:

marco@siduxbox: # man passwd

 
Name (required)
E-mail (required - never shown publicly)
URI

Adjust field size: shrink | enlarge)

You can use simple HTML in your comments. Some examples are as follows:
  • A hyperlink: <a href="polishlinux.org">GNU/Linux for everyone!</a>,
  • Strong text: <strong>Strong text</strong>,
  • Italic text: <em>italic text</em>,
  • Strike: <strike>strike</strike>,
  • Code: <code>printf("hello world");</code>,
  • Block quote: <blockquote>Block quote</blockquote>