Showing posts with label bug. Show all posts
Showing posts with label bug. Show all posts

Sunday, 14 November 2010

OpenSolaris (and OpenIndiana) Spends 50% of CPU Time in Kernel

A couple of days ago my client decided to prepare some new Java EE development environments and, when asked about which OS to choose, I suggested that he gave Solaris a try: since my client production servers run Solaris 10, he would benefit from a more homogeneous set of environments.

We installed a couple of test machines, one with Solaris 10 and another with OpenSolaris 2009.06, and we began installing the development environments and the required runtime components. The installation packages were SVR4: installation went straightforward on Solaris 10 while on OpenSolaris we had to resolve a couple of glitches. After a couple of day, test users were inclined towards OpenSolaris mostly because of its newer desktop environment: we started installing the remaining machines and started upgrading OpenSolaris to the latest dev release (b134).

Reduced Performance: CPU Time in Kernel When Idle 

The latest OpenSolaris dev release (b134) has got some known issues I wasn't concerned with since I already fought with in the past and can easily be resolved.

The surprise was discovering that all of the upgraded machines were affected by another problem: as soon as users rebooted into their b134 boot environment, the performance of the machine seemed to be pretty worse than when using the oldest (b111) boot environment.

prstat was showing no misbehaving process while vmstat indicated that the system was spending a constant 50% of the time in the kernel. With a quick search I easily pointed out this bug:


Repeating the steps outlined in the bug discussion confirmed me that we were hitting the same bug. We thus disabled cpupm in /etc/power.conf and the problem disappeared.

Upgrading to OpenIndiana

Although the bug is still listed as ACCEPTED, we decided to give OpenIndiana a try and upgrade a machine following the upgrade path from OpenSolaris b134. The upgrade went smooth and in no time we were rebooting into OpenIndiana b147.

The cpupm bug is still there, though. Nevertheless, it's a great opportunity for my client to test drive OpenIndiana and decide if it fits its needs. Nowadays, users will appreciate almost no differences between OpenSolaris and OpenIndiana (except for the branding.) As time goes by, we'll discover if and when Oracle will put back sources into OpenSolaris or if OpenIndiana is destined to diverge from its step-brother.



Sunday, 16 August 2009

Microsoft Word 2007 table of contents feature seems to be buggy

First time it happened, I couldn't believe. I thought it was I who screwed up that document: I wondered what I could have done to the document styles to get those spurious entries into the table of contents. Then it happened again. And again. It couldn't be me, simply.

I'm working at a client which is committed to produce its documents with Microsoft Office 2007. No way to change that: I had to purchase a license and install it onto my virtualized Windows. When I'm at work, I just use the client's computers. When I'm at home, I run Windows on a Solaris host with Sun xVM VirtualBox to get the job done.

A few days ago, just before sending to print the last revision of a document, I realized that the table of contents was screwed up! Instead of just listing heading up to level 3, it was showing spurious lines here and there. Some of them, were image captions, too. The first I tried to correct the problem was selecting the guilty lines, checking the paragraph options, which incidentally seemed ok, and reapply the original style. It worked, but everytime I opened the document, the table of contents was screwed up again. I tried to understand what happened, given that the spurious lines weren't that spurious (they were always the same), but found nothing. I google a couple of minutes just to confirm I wasn't alone.

I later discovered how I could reproduce the problem: it always happened when I closed the guilty document with the document map feauture on. Switching off the document map solved the problem and the next time I opened it it was just fine.

I recognize that's not a great solution but hey, it works.

Friday, 19 June 2009

NTP goes into maintenance mode: /sbin/sh: /lib/svc/method/xntp: not found

After live upgrading to Solaris Express Community Edition build 116, at the first reboot, I noticed that the NTP service had gone into maintenance mode:
# svcs -xv
svc:/network/ntp:default (Network Time Protocol (NTP) Version 4)
State: maintenance since June 20, 2009 12:44:33 AM CEST
Reason: Start method failed repeatedly, last exited with status 1.
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 1M ntpq
See: man -M /usr/share/man -s 1M ntpd
See: man -M /usr/share/man -s 4 ntp.conf
See: /var/svc/log/network-ntp:default.log
Impact: This service is not running.
and the log file clearly showed me the problem:
# tail /var/svc/log/network-ntp\:default.log
[ Jun 20 00:44:32 Executing start method ("/lib/svc/method/xntp"). ]
/sbin/sh: /lib/svc/method/xntp: not found
[ Jun 20 00:44:33 Method "start" exited with status 1. ]
[ Jun 20 00:44:33 Executing start method ("/lib/svc/method/xntp"). ]
/sbin/sh: /lib/svc/method/xntp: not found
[ Jun 20 00:44:33 Method "start" exited with status 1. ]
[ Jun 20 00:44:33 Executing start method ("/lib/svc/method/xntp"). ]
/sbin/sh: /lib/svc/method/xntp: not found
[ Jun 20 00:44:33 Method "start" exited with status 1. ]
[ Jun 20 00:44:43 Rereading configuration. ]

I checked and xntp isn't there. This is related to Solaris switching to NTP version 4: if you just live upgraded to build 116 don't worry. That's indeed a glitch during the first reboot and SMF should have updated the ntp service's manifest during the boot: you won't see that message anymore.

ludelete fails to delete the boot environment: mount point is already in use

While I was live upgrading from Solaris Express Community Edition build 115 to build 116 I decided to free some hard disk space and delete the oldest boot environment I had: snv_114.

That's easy:
# ludelete svn_114
But there was a small problem... ludelete complained about snv_114 unable to use its automatic mount point because the mount point was being used. Do you know what? It wasn't, before launching ludelete but, for reasons I don't understand, after mounting it it was reported as busy.

The Google Finger was going to be triggered immediately and look if there was some known bug out there but fortunately a forced unmount of the boot environmente solved the problem:
# luumount -f snv_114
Pretty strange, indeed.

OpenSolaris bug 6844307: bonobo-activation-server consumes one entire core

As you know, I substituted Thunderbird with Evolution in my Solaris desktop. I'm pretty happy with it but I hit a bug: bonobo-activation-server sometimes starts to spin and consumes one entire core. The user experience isn't completely compromised because I obviously have a multi processor machine but it's a pretty annoying bug. It's summer and in Spain it's very hot, you know. One CPU throttling at 100% means one fan going nuts and rotating all time long.

Googling was useless because the bug is reported in a closed database but there it is. I must thank the guys at the OpenSolaris Communities to help me find it out. A fix is reported to be delivered in build 117 and meanwhile you can safely kill the bonobo-activation-server: I experienced no problem whatsoever.