The following relates some things that happened back the beginning of march between 07-08 March. And also the effects thereof a month later when I thought everything had been fixed.
The frivolity of software is not due to the fact that it might lack something. No it’s due to the fact that software is becoming extremely complex.
Software has to work on a broad array of hardware architectures. On a broad array of operating systems working each on another architecture.
Linux is the best example of software that is robust
But what happens when software isn’t robust? When there are new features
added, when there are security issues which require a fix?
Then you issue an update!
And so our journey begins. An update can either break something or
enhance something. Having enough trust in Linux systems for the past 10
years I can say that an update/upgrade rarely breaks the system.
If something goes wrong it’s a minor point and you can usually fix it or
it won’t bother you too much.
I’ve trusted Debian, Ubuntu, Fedora and Linux Mint, each had a portion of my time.
The setup
In 2016 I decided to go once again with Debian as a desktop
environment.
Debian is great from the security standpoint, it ships older versions of
"well tested" programs.
Version 8 was a great enhancement, of course it wouldn’t come close to
the new enhancements in Fedora or Linux Mint but hey, it works!
Forensic Analysis Expert
I had a few moments when I bricked my system because of some settings
and when I was playing around with various things.
At one point decided to play a forensic analysis hero, Without out
knowing it I broke my whole partition table due to a fat finger
/sda instead of file:///sdd.[/sdd.]
It was at such a moment in life that you realize you’re just one week
away before the monthly disaster backup is due.
Command Line Fu to the rescue, i rebuilt it based on metadata from each
disk plus a little bit of luck.
Remember kids, forensic analysis should be done in virtual machines
Debian Upgrade
Then a few months later I dediced to do a upgrade to version 9.
Everything worked fine, except that my setup had 3 disks. 1 SSD which
had the / partition and another fast partition. Then 2 disks where in
RAID 0 that used MDADM, I had also manually setup encryption for my home
folder.
After the upgrade i got a rare message "You need to wait for 1 min and
30 seconds.." After the waiting time was done I could hit CTRL+D to go
in EMERGENCY MODE.
Oh crap! What happened?
Well, looking at the logs it seems that something related to
/etc/fstab couldn’t boot to the
/dev/md0 device.
But after I typed CTRL+D I recieved the login screen.
After looking into the logs and figuring out what went wrong I still to
this day can’t understand WTF!?
Anyway, I fixed the 90 seconds wait with by editing
/etc/fstab and adding x-systemd.device-timeout=2
The error still existed, but everything worked fine. Guess it’s a minor
bug.
Fast forward 1 year later 2018.
Don’t dig too deep for Fossils
I was working on an Elixir project. I updated Fossil from 1.37 to 2.5 by
downloading 1 self contained executable form the Fossil website.
Everything seemed to work fine, except the clone and push commands gave
a Segmentation Fault
I started digging in and found out that some library on my debian
crashed it.
I tried it on another debian 9 and the binary worked flawlessly. Great.
it’s my debian that has issues.
I decided it was time to check for updates. I installed them all.
Rebooted. Same error.
CTRL+D didn’t seem to react. I could however access another TTy. But
couldn’t start the Xorg server.
Looking in the logs I found some weird errors. I decided hey, let’s do a
dist upgrade.
Reboot.. This time whenever I hit CTRL+D or tried to go to a TTY it
began flickering my screen.
Great God NO! Anyway, I had multiple errors and tried to fix multiple things, but after each step I figured got a new error.
Reinstall Me Please!
Reinstalling Linux is a breeze. I’ve always had my / and
/home partitions kept separate and I encourage you to do
the same.
I usually reinstall Linux only of something goes wrong with my fiddling
or I decide to use another distro as my main one BEFORE i’ve tested it
out in a virtual machine.
I decided to give Linux Mint another try.
During the installation process I saw I couldn’t select my RAID disks.
I then began to look into gparted. It was still there.
Then I decided it was time to do it with mdadm.
But mdadm wasn’t installed. Another WTF! Why wouldn’t they ship it?
sudo apt-get install mdadm sudo mdadm --examine --scan ARRAY /dev/md/0 metadata=1.2 UUID=d8c71eda:0f21c7b4:2c5e0ced:71537788 name=zamolxes:0 sudo su -c "mdadm --examine --scan > /etc/mdadm/mdadm.conf"
The mdadm --examine --scan looks at all disks and scans for arrays.. If it finds something it will output it. So whenever you brick your system, your data is still there, don’t start formatting everything!
Great, it still exists, before I started the installation I decided to
backup everything.
I had some backup external HDD’s and considering the fact that i had
RAID 0 anything that could happen I still had my data there.
On the safe side, let me backup everything again.
I went through a little hassle in remounting everything with
ecryptfs.
Then I hit anothe rroadblock when trying to backup, the sheer size of
data and the lovely NTFS.
This was the last time I’ll ever use NTFS on external backup HDD’s for
"interoperability" with windows machines.
Ext4 all the way baby (or something else Linux can work with easily)
Without a proper recent backup (2 months of changes) I set out to reinstall it and I said If the installation bricks my encrypted mdadm raid then it’s a sign I should become a lumberjack, painter, priest or anything else except a Programmer/Linux enthousiast.
All our operators are currently busy bricking your installation, please wait untill we brick it again
I reinstalled Linux Mint, tried to login.
POOF Linux Mint was endlessly starting up. It’s a sign, I’ll have to let
my beard grow.
Then i thought the distro probably didn’t have my settings nor mdadm.
Couldn’t login to a TTy either since there was no home directory.
Eventually got a recovery shell, yep, no mdadm.
Restarted with the Linux Mint USB again.
Viewed the /etc/fstab and it had everything there.
Uncommented the /home directory form fstab.
Rebooted. Created /home/myuser as root, logged
in.
And then I installed mdadm and generated the mdadm config againsudo su -c "mdadm --examine --scan > /etc/mdadm/mdadm.conf"
It was time to find out if it will work or fail.
After the reboot, I logged in with my user and voila, ecryptfs worked
automatically.
I ignored the fact that cinnamon crashed and went to fallback mode
endlessly, probably due to my old cinnamon settings.
This was later fixed withdconf reset -f /org/cinnamon/
Yes, my data is still there! Thank you Super Robust Linux features!
Now for the fun part. I need to backup everything again. This means
copying things from 3 exernal hdd’s to my disk and vice versa.. Format
each one of them to Ext4.
All this for binary that had sefgaults!
I could have settled with using the old 1.37 version shipped with
debian. It worked, however why settle with an older version if my VPS
has the newer one?
I love linux for it’s modularity and for the surprises it brings.
I’m pretty sure something was rotten and it wasn’t the binary’s fault
but something else in the OS or some library.
I eventually had luck reinstalling fossil and this worked without any
segmentation fault.fossil clone https://core.tcl.tk/tcl/ tcl.fossil
If you don’t know what fossil is i encourage you to go to
https://fossil-scm.org/index.html/doc/trunk/www/index.wiki
Download fossil and play around with it.
Is everything working?
99% Of the things are working as expected.
One of the things that persits in FAILING is the suspend function. I had
the same problem back in 2009.
We’re now 2018 and this is so annoying to a point that I remember why i
had chosen debian in the first place. Because there it just worked out
of the box.
ON workstation it’s a MUST to have suspend. On my laptop Linux Mint
suspend seems to work.
Kernel update to the rescue.
1 month later, another blog update, OpenSSL stuff
One month later I decided to update something that didn’t work with the
Phoenix Blog.
I also decided to do it late at night before my birthday. Bad
decision.
So as I used distillery to roll out everything including the Erlang
Binary Distribution and all the packages I thought everything will work
out.
WRONG. It seems that The debian 9.3 version of SSL is 2 years NEWER than
the one shipped with Linux Mint 18.3 (based on Ubuntu Xenial)
I thought I could fix it by compiling the newer version of OPENSSL on my
system. Wrong again, the compilation against succeeded but the library
had some missing pointers.
So, the keepers of Linux Mint and Ubuntu Xenial never thought to update the OpenSSL version from 2015. I know it’s a LTS but I think OpenSSL is a package that shouldn’t lag 3 years.
I decided to do what every professional would do.
Use a staging/development/testing virtual machine for the compilation.
After setting up a vagrant box with the Debian version I wanted and
provisioning it with all the tools needed I finally was able to update
the blog.
Conclusion(s)
Always have a backup, even if you already have a backup, make another
one BEFORE you start updating your system/
Whenever you update your system accept the fact that it can brick
everything.
Don’t spend time figuring out what went wrong, a clean install is the
fastest way to solve a bricked system caused by upgrades/updates.
If you use VM’s for services/servers etc you won’t lose a thing.
If you dig for fossils you will end up reinstalling your system
anyway.
Never roll out an update at night or in the weekend.