It wasn't quite the blue screen of death. The system wanted me to fix some errors on the disk partition manually. I was uncomfortable. Just 2 weeks earlier, the same thing had happened. The command 'fsck -y' had fixed all the errors! At the end, the only visible folder was 'lost+found'.
Even if I booted from another partition, I couldn't mount the partition and take a backup till I cleaned up the partition with fsck. I had already changed the SATA data cable on the previous occasion which seemed to have fixed the problem. At the suggestion of the hardware supplier, I changed the power cable as well this time. The system seemed stabler now. It was no longer grinding to a near halt with log reporting “ata4: hard resetting link”. However, it was too soon to rejoice.
I finally gave in and ran fsck. It cleaned up quite a few files/directories. It booted with errors and X wouldn't run. It was just the system partition with nothing more than the Fedora 9 installation. I had added a fair number of additional packages. It, probably, would have been faster to just reformat the partition and reinstall the OS. This time, I had taken the precaution of caching the downloaded rpm's on a different partition! So, the 24 hour download time would not be needed for the updates and the additional packages.
However, it seemed that this was an interesting problem. Can I recover a system which was so badly trashed? Based on the problems noticed, I used 'rpm -V' on some packages and found that some libraries were missing. Some packages were trying to access information beyond the partition. To make matters worse, I had been in the middle of an update (my wife would say – when am I not?).
The first step was to at least measure the scope of the problem. I took a list of all the packages installed -
I wasn't about to manually verify each one the 1500 or so packages! So, a small Python script would be useful:
It was a relief to know that only about 400 packages were in a damaged state! Even this was too large a number to manually handle. Surprisingly, there were some version issues. This turned out to be because there were multiple entries for some packages thanks to the failed update.
Fortunately, I had written a utility over a year ago to solve that problem. A combination of power failure and ups running out of battery in the middle of an upgrade had left my system in an inconsistent state.
The utility pieces were as follows. We create a dictionary of package names with attributes like version, release, arch which will help identify duplicates.
Especially on a x64 architecture, a package with the same name may occur for i386 architecture as well. Hence, that is not a duplicate. We need to check for each name and arch combination whether there are any duplicates.
The actual work of checking duplicates is done in chk_dups. We assume that there is only one package with the maximum version.
If I were writing this program today, I would have avoided filter function and used list comprehension instead, e.g.
newPkg = [ x for x in dup_pkgs if (x,x) == max_version]
While I could have deleted the rpm's in the program, I felt more comfortable getting a list of duplicate package names and then deleting them from the command line.
Now as root, I ran:
Having deleted some packages, I needed to get a fresh list of the installed packages and the packages which failed the verification.
The next step was to reinstall all the packages with problems. Since the RPM's were in various subdirectories of /var/cache/yum, I collected all of them in /opt/yum/RPMS/'. The script used was:
Some downloaded packages were lost. So, the final step was to use 'yum update' to update the missing packages.
On the first occasion when I had to reinstall from scratch, it had taken me well over 2 days to fully recover. Most of the time was spent downloading updates and packages not on the distribution dvd. Partly, it is hard to remember all the additional packages installed. The memory was often triggered by a high-interrupt from my wife – e.g. 'where's sylpheed?'
This time, I recovered the system in little over a day, over half the time was spent in figuring out the issues and developing the code. But now if the system winds up in the same state, I am sure I can recover in much less than half a day.
Actually, I will recover much faster because I now have a dual boot system. I bought another disk and have a fully configured installation on that disk as well.
I recently came across an excellent presentation on using generators - http://www.dabeaz.com/generators/. I realised that I had created temporary intermediary lists or dictionaries in order to keep the code easier to follow. How would the programming for fixing the issue of duplicate rpm's be different if I approached it from the perspective of generators?
I want to iterate over each package which is a duplicate and then take action on it. Let us just create a list of them. The code needed is:
The function duplicate_packages looks, feels and behaves like an iterator.
If we iterate over each package, we can determine which package is a duplicate. We will examine the header of each package. A package will be identified by the name, arch pair. The unique version is determined by the version, release pair.
The keyword yield has converted this function into a generator; so, we can iterate over duplicate packages. The method 'get_older' is straight forward.
The method package_headers is another generator.
The fascinating thing is that this code looks flat even though it is equivalent to nested code. It looks cleaner and is shorter.
Unfortunately, I have to wait for the system to have problems before I can test it properly. Or as some weird law of nature goes – now that I have backups, I may never get a chance!
Other Articles >