Wednesday, November 11, 2009

US EPA Big Polluters

Heard of the decision by the EPA to classify CO2 and other greenhouse gasses as pollutants, and regulate them?  Whether you have or not, here’s are three other things you should know.  First that first finding was supported in no small part by the 2,000 that gathered in Seattle outside the EPA hearing on that topic.  Second, the EPA is proposing to move forward to the next step, actual regulation, starting with the “big polluters”, that is, new or modified sources of 25,000 tons or more of CO2 or equivalent emissions. Third, just like with the endangerment finding, the EPA is holding two public hearings, and one of them is next week here in Chicago.

This is an important event for climate change politics, and you like the 2,000 that gathered in Seattle can play a pivotal role.  The Sierra Club is organizing supporters and we would love to see you there.  There are two ways you can help. 

First you can attend and testify, telling the EPA that you support the rulemaking.  You don’t need to be a climate scientist, the EPA wants to hear from the public.  In addition, Sierra Club will have a room at the hearing where you can stop by to learn more and get help preparing your testimony. 

If you don’t testify, attending is still awesome.  Pick up a sticker, listen to others, and show your support.

Last but not least, spread the word.

Sierra Club Flyer - http://tiny.cc/GzwS9

Sunday, July 19, 2009

Software Engineering, Control, Measurement and Trust

There is much clamor over this article, “Software Engineering, an idea whose time has come and gone?”, by Tom Demarco.  I love it.  At work, I know a manager who has the tag line “You can’t control what you can’t measure” posted on his office door.  It’s always given me the heebie-jeebies.  Tom hits one of the main points right on the head, that it implies “that control is an important aspect, maybe the most important, of any software project”,  But as Tom admits, “it isn’t.”

The other half is the how it plays together with the the "What isn't easily measurable, doesn't exist" rule.  Tom sees this too, and describes it like this:

“Now  apply  “You  can’t  control  what you can’t measure”  to  the  teenager. Most things  that  really matter—honor, dignity, discipline,  personality,  grace  under  pressure,  values,  ethics,  resourcefulness,  loyalty, humor, kindness—aren’t measurable. You must steer your child as best you can without much metric  feedback.”

I don’t fully agree with his prescription, to require the product team to be ready to ship in any given week of development.  Like the original guidance, there is a kernel of truth here, but an overreaching of application.  There are many good reasons to pursue a “ready to ship” strategy, but you need to be flexible.  You need to accept that there are some tasks that will break that goal.  You cannot replace important subsystems without causing at least a bump, and sometimes you need to do that. 

If you have the goal in place, but are flexible, then you can create a branch, do most of the work there, and once it reaches a certain level of maturity (no longer experimental), integrate.  Integration will cause some pain, on both sides, and you have to balance between disruption to the mainline team, and your ready to ship goal, and the probability that if you insist on zero mainline pain that the branch may never merge.

But anyhow, though it sounds like a prescription, Tom says it’s an example, and as an example it’s perfectly valid.

Like raising a teenager, when you have important things, that aren’t easy to measure, what you need is trust.  You need to be able to trust that your team understands your motivations, your goals.  They need to be able to trust that you will treat them fairly if they help achieve those goals.  A ready to ship strategy is just an example of “don’t do anything too crazy”.  It can help build trust because management can visibly see progress, and because hopefully when management sees that they say “thank you”.

Sunday, May 03, 2009

S+S Synchronization

Software and Services (S+S) solutions have a number of attractions, but one common attraction is the ability to work offline.  Some S+S solutions have clients that require persistent internet connections, or are read-only in offline mode.  However, most developers that choose S+S as their architectural blueprint do so, at least in part, with a desire to provide users strong offline capabilities.

The difficulty with offline editing is the possibility of conflicts and the need to provide conflict resolution.  Some applications ignore this difficulty because they are single user, and are willing to leave that user responsible for any data loss that results from accessing services directly, or from multiple clients without synchronizing work done offline on others.  Other applications are entirely read-only, eliminating the difficulty in a different way.

Many applications don’t fit inside those constraints, and when they do inevitably synchronization is brought out as a topic.  Discussions of synchronization often start out with a bit of wishful thinking.  That is, someone, or many participants believe there is a synchronization black box they can throw data at and all will be automatically resolved.  To my knowledge, that doesn’t exist.  In fact, I’ll go as far as to say, that I believe it cannot exist.  Unless you design your data to fit a strict structure that communicates your business rules, and those rules never require escalation to human judgment, such a system cannot correctly resolve all conflicts.

I’ve come to the conclusion that synchronization cannot be a black box.  Synchronization requires more than read/write to be exposed to the developer.  For S+S solutions, the synchronization architectures I prefer are those that expose more data to the client.  There are a number of ways to implement that, but I’ll explain one of the simplest.

First, begin with a centralized master, your service, that is the authoritative source for data.  This service needs to support two things for every synchronizable item.  First, it needs an identifier that is guaranteed to be unique and unchanging over the lifetime of the item.  Second it needs a versioning identifier.  It could be a sequentially incremented number or timestamp, but I prefer another unique id.

Next, in your service, implement (or re-use) a read-only caching system.  If this sounds pretty vanilla so far, that’s because it is.  These first two steps can be achieved through the use of HTTP, URI’s and ETags.

Where many implementations go wrong is the next step, where they attempt to convert their cache into a read-write store.  The most obvious problem with that choice is it breaks everything you’ve built so far, since a cache is one way.  The other problem is you’ve implemented a store that has two sources of data, and have no mechanism to rationalize which one wins.  You could create a mechanism, but there isn’t any perfect mechanism.

image

Instead of forcing the cache to transform into something more complex, leave it alone and create a separate store for modifications.  The client is then responsible for attempting to keep that modified store as empty as possible by submitting the modifications to your services.  Every submission is tagged with the versioning identifier of the item that was present in the cache when modification began.  If this sounds like the HTTP “If-Match”, then you have the idea.  Services should not accept modifications that are unaware of the content of the latest version they have accepted.

Now, if a submission is rejected, you’ve detected a conflict.  Your options are many at this point.  No option is perfect, and the choice is going to depend on many things, not the least of which are your user’s requirements.  But no option has been eliminated yet.  Without any additional implementation you have a first-in-wins strategy, which happens to be the safest bet without more complex insight into the data’s structure, or user intervention.

If you want a last-in-wins strategy, re-cache, update the versioning identifier and resubmit.  Since this would potentially destroy a previous set of updates, it would be bad practice to do so without some kind of user prompt or notification.. but you and you’re users are in control, not the API.

If merging is necessary, the essential complexity of the merge itself remains, but not much else.  You have a copy of the original version, you can retrieve a copy of the current version, and you have a copy of the modifications.  Two-way merge, Three-way merge, automated merge, manual merge.. whatever is necessary is possible and not any more difficult than absolutely necessary.

Wednesday, April 22, 2009

How to load test: Step 1 – Create a realistic load

Load testing isn’t the easiest job invented.  Depending on your business model, load testing can vary from important to absolutely critical.  So despite the pains, every project at least makes a token gesture toward load testing.  Unfortunately, either knowingly, or unknowingly, it’s often not much more than that, a token gesture.

The primary failure in the average load test is not creating a realistic load.  There are plenty of excuses for this.  There aren’t any servers comparable to the production servers.  It’s too hard to produce test data that simulates true data.  Or worse, you don’t even know what production load will look like.  Those aren’t minor obstacles, calling them excuses isn’t meant to trivialize them, it’s more a reflection of the true importance of load testing, and knowing that when you do it, you do it right.

The less and less realistic the load you generate, the more your test becomes performance analysis.  Performance analysis is great, but load testing and performance analysis are different animals.  You are doing yourself a disservice if you use a fishing rod to catch a great white, or a harpoon for goldfish.  If half your team is trying to do performance analysis and half is trying to load test then you will waste time you wouldn’t with a clear mission.

There are other things that distinguish performance analysis from load testing, but the number one is the type of load you generate.  A performance analysis load may sometimes resemble a true load, but it usually should not.  A performance analysis load should be structured to make it easy to pinpoint performance issues.  A true load makes this more difficult by being too complex or too chaotic.  So unless you’re tuning something that only performs badly in complex or chaotic scenarios simplify and isolate for performance analysis.

Load testing, by definition, needs a true load, or as close to it as you can approximate.  Load testing is a validation.  Load testing is developing reasons to be confident that under expected conditions, your system won’t fall over.  Load testing is about giving assurances that it’s not a bad idea to depend upon the reliability of your system.

So all that said, how do you create a realistic load?  If your software is like most, there is one, or probably many points where in the real world, a user takes some action.  Since in the real world you have lots of users, you’d ideally want to automate all of those steps.  Sometimes that’s not that difficult, and if it’s not, that’s the path to take.  There are tools that can simulate clicking “submit” on a web form.  Many of those same tools can simulate filling it with some data, or even using an AJAX control.  But all of this has limits.  If you’re within those limits, take the easy path.  If you’re not, you’re either going to have to take the next step, or settle for a sub-par load test.

Moving work from the server to the client is sometimes an effective strategy to improve scalability, among other possible benefits, but it’s definitely going to complicate your ability to automate your steps with cookie cutter solutions.  There are two paths you can take in that situation.  One choice is to start from scratch and use your knowledge of your software to generate data from a template, substituting in values coming from preceding steps, and maybe some randomness.  The second is to reuse code from your application and wire those pieces together.

Which choice you make is going to depend on your code.  Using your application code has many things going for it, but you have to surmount several common challenges.  First, if you have thousands of users (or millions), you’ll need your application code to simulate more than one.  No matter how minimal your application, it’s very unlikely you can run hundreds or thousands of copies of it on a single box at a time. 

Depending on how well you meet the first challenge, you may also need to make sure your application code can run in parallel on the same hardware.  Usually this means multiple processes.  Why wouldn’t you need multiple processes?  If you can make one instance of your application code simulate a large number of users, and thus consume the overall load generating capacity of the hardware, then the second challenge can be skipped.

Either way, with a large number of users, you may find that a single machine isn’t going to have the ability to generate sufficient load.  If one, two or three machines are necessary you may feel happy with manually starting instances.  At the least do yourself the favor of using a tool like PsExec to let you do that from a batch file or something.

Summary

Whatever path you’ve taken, you are now generating a load.  The challenges aren’t over though.  Now need to validate you’ve met your load and that it hasn’t caused anything to topple over.  That means monitoring and analyzing stats and logs.   And if you do find a problem, you’ll need to switch hats back into performance analysis to hone in on the exact cause and find a solution.  Since those aren’t topics to be taken lightly, I’m reserve them for a future date.

Saturday, April 04, 2009

How higher gas taxes benefit you

The United States ought to raise the gas tax, and we ought to do it soon.  There are so many good reasons its going to be hard to explain them all.  To start with, we have 1 or 2 years until, crude oil prices start to climb again.  We ought to be prepared for that when it happens and the best way to do that now is to pretend as if those prices are here today, when we can collect the money and send it to the Federal Treasury, rather than pay the price later to a Saudi sheik. 

The second reason is both current and future prices.  The more effectively we control demand, by making choices that won’t cost us in the future, the better we protect ourselves from high prices in the future too.  In other words, we can delay the rebound in crude oil prices by an extra year or two, maybe even more, by dampening demand through a tax.  Every Prius substituted for a SUV today lowers the pre-tax price of gasoline by some fraction in 2010, 2011, 2012, etc.  It’s also non-linear.  If demand stays below today’s production, prices will stay at today’s prices, or lower.  If demand exceeds production, then all hell breaks lose again.

Those two reasons are symbiotic.  I’ll illustrate with some hypotheticals.  In the status quo scenario, no additional tax, gas might stay around $2.00/gallon for the next 2 years.  But in the meantime, the economy may (hopefully) recover, and about a year from now with a good economy and low gas prices, I wouldn’t be surprised to see SUVs flying off the shelves all over again.  Keep that up for a year, and demand could grow back from 19.5 mbd to 22mbd.  If that happens, then certainly crude will skyrocket again, and we’ll see $4.00+ gas in later 2011.

On the other hand, if an additional 50-75 cent gas tax was added, I think we would hear a different message, and demand might even decline a little further, say 18.5 mbd.  Here is an example of what the implications might be in terms of the amount of dollars sent tot he middle east, and the taxes collected.  In the short term there would be some consumer costs, but many of these costs might be offset by tax reductions in other areas.

image

In the long term, consumer costs actually end up lower because a) they were prepared to use less gas, and b) demand was lower resulting in lower gas prices.

Most importantly, a 10% difference in demand can translate into a 50% difference in the amount of money sent to the middle east, of which we know a certain percent falls into the hands of terrorists.  With any luck, the people of Saudi Arabia, Iran, or at least those with the checkbooks, will see terrorism as the first “discretionary” spending item and strangle those funds much more than 50%.

Friday, April 03, 2009

New Version of deSleeper, and manual.

I’ve posted a new version of deSleeper, v2.0, to codeplex.  This version adds some functions to help network administrators setup a couple hundred machines to work with deSleeper with fairly minimal effort.  And to help the non-admin user (and probably the admin too…), I’ve finally put together a deSleeper manual.  You’ll always be able to find it on codeplex, but here’s a little RSS copy too.

If you're setting up deSleeper in a network, you may want to read the deSleeper Setup & Architecture Guide as well. This guide will cover the features of deSleeper from simplest to most complex.

Wake Up!

If you're using deSleeper, the first screen you'll see is the Wake Up Page. If you're not setting up servers this might be the only tab you ever use.

WakeUpPage

To start off you need to supply information about the computer you're trying to wake up (the "target"), and how to get your request from your PC to the target.

Initially you have no targets configured, so click on the New Target button. You can type whatever you like in the Description field, it's for you alone. If you're using deSleeper, it's probably because you want to use the proxy functions. What's a proxy? It's a service that helps get a message from one place to another. If you setup the proxy, then I hope you know the host name, and if you didn't hopefully your friendly network guy can fill in the blanks. Either way, type the name into the proxy field.

You've got two choices for how to identify your target, MAC Address, and host name. If you've never heard of a MAC Address, don't fret, you only need one, and host will do fine. For more advanced users, MAC Address is a little more surefire (though a lot harder to memorize!).

Once that is sorted out, click on Wake Up Now, and you'll either see a nice little success message, or an ugly yellow error. Let's hope for the first.

Network Card Configuration

The second most likely place for a casual user to wander is the Network Card Configuration Page. There's not a lot here, but it consolidates three important items you'd have to hunt all over your PC for otherwise.

ConfigurationPage

The first is you can find out your MAC Address here. Of course, it's the MAC Address of the PC you've just run deSleeper on, not the one you're trying to wake up, but there isn't anything preventing you from installing deSleeper to the PC you want to wake up. Actually everything on this page is best down on the PC you’re trying to wake up.

The second item here is the ability to enable you’re network card to listen for the “magic packets”. Such a nice name. Magic packets are the magic that takes a computer sipping 1 watt and turns it back on as if you walked over and pushed the power button.

Once deSleeper has given you a reliable way to wake up your PC remotely, you’ll want to configure the PC to use its built-in power saving features. You can configure these in more detail through your computer’s power options control panel, but for convenience the main setting, the sleep timeout can be updated here.

Service Installation

If no network admin has setup a proxy for you, and “Wake up a machine on your local network” isn’t working, it’s not hard to setup your own. All you need is a PC which will remain on. Most offices, unfortunately, have hundreds of these, so take advantage of one. You install a service through the Service Installation Page

ServicePage

The easiest thing to do is to install deSleeper on the machine you want to use as a proxy, come to this page and click Install. There is no need to change any of the default settings if you don’t understand them.

It will however be helpful to type the names of the machines you want to wake up into the Precache Hosts field, before clicking Install. This option makes your first wake up easier and reliable. It’s optional, but highly recommended.

The other option is to do a remote install. This function is really for more advanced users as it requires access rights the average network user won’t have, and some of the error messages that come back if you’re missing one of those rights are, somewhat of necessity, not all that user friendly.

One common gotcha of remote installs is that .NET 3.5 SP1 needs to be installed before you hit the Install button. To try and prevent confusion, by default, deSleeper checks to see if .NET (and the right version) is installed. But to do so requires a service, the Remote Registry Service, be enabled, which many users disable for security reasons. To skirt this issue, click suppress check for .NET. If one of the installs fails you may have to manually check if .NET is installed.

There is yet one more function available from this page. The Prepare Hosts button will take each PC in the Precache Hosts field and attempt to remotely enable the Wake-On-Lan setting on that PCs network card. Like remove service installs, this requires administrator, or close to it, access rights. For the techies, I’ll mention that this feature, and the remote install feature, is made possible by RCtrlX, a utility from Leon Sodhi. Thank you Leon!

Summary

So that’s about it, if you get an error message when doing anything, you’ve got a couple options. The first is to head over to the deSleeper discussion list. The client writes log entries to a file deSleeperClient.log, which is in the same folder as the executable, C:\Program Files\deSleeper (at least for now.. by all standards it should be in AppData but for now it’s in the much easier location).

The service writes most errors to the Application or System Event Logs, which you can get to through Event Viewer. As with all networking related tools, it helps to know a little about what your network is composed of, but in the interest of not overcomplicating this little guide, I’ll leave those as topics for another day.

Thursday, March 26, 2009

deSleeper Version 1.11

I’ve released a new version of deSleeper.  This update provides an alternative way to “precache” hostnames in a kind of offline ARP table.  Also fixes some errors that occurred when additional types of network cards (such as those VMWare installs) were present.