Monday, October 29, 2012

How to work around Amazon EC2 outages « James Cohen

How to work around Amazon EC2 outages « James Cohen

I am working on a AWS design for high availability and found this insight from the school of hard knocks very helpful.


A few of these options are good in principle, but are not necessarily informed by the reality of operational experience with the more-common failure modes of AWS at a medium to larger scale (~50-100+ instances).
The author recommends using EBS volumes to provide for backups and snapshots. However, Amazon’s EBS system is one of the more failure-prone components of the AWS infrastructure, and lies at the heart of this morning’s outage [1]. Any steps you can take to reduce your dependence upon a service that is both critical to operation and failure-prone will limit the surface of your vulnerability to such outages. While the snapshotting ability of EBS is nice, waking up to a buzzing pager to find that half of the EBS volumes in your cluster have dropped out, hosing each of the striped RAID arrays you’ve set up to achieve reasonable IO throughput, is not. Instead, consider using the ephemeral drives of your EC2 instances, switching to a non-snapshot-based backup strategy, and replicating data to other instances and AZ’s to improve resilience.
The author also recommends Elastic Load Balancers to distribute load across services in multiple availability zones. Load balancing across availability zones is excellent advice in principle, but still succumbs to the problem above in the instance of EBS unavailability: ELB instances are also backed by Amazon’s EBS infrastructure. ELB’s can be excellent day-to-day and provide some great monitoring and introspection. However, having a quick chef script to spin up an Nginx or HAProxy balancer and flipping DNS could save your bacon in the event of an outage that also affected ELBs, like today.
With each service provider incident, you learn more about your availability, dependencies, and assumptions, along with what must improve. Proportional investment following each incident should reduce the impact of subsequent provider issues. Naming and shaming providers in angry Twitter posts will not solve your problem, and it most certainly won’t solve your users’ problem. Owning your availability by taking concrete steps following each outage to analyze what went down and why, mitigating your exposure to these factors, and measuring your progress during the next incident will. It is exciting to see these investments pay off.
Some of these:
– *Painfully* thorough monitoring of every subsystem of every component of your infrastructure. When you get paged, it’s good to know *exactly* what’s having issues rather than checking each manually in blind suspicion.
– Threshold-based alerting.
– Keeping failover for all systems as automated, quick, and transparent as is reasonably possible.
– Spreading your systems across multiple availability zones and regions, with the ideal goal of being able to lose an entire AZ/region without a complete production outage.
– Team operational reviews and incident analysis that expose the root cause of an issue, but also spider out across your system’s dependencies to preemptively identify other components which are vulnerable to the same sort of problem.
[1] See the response from AWS in the first reply here: https://forums.aws.amazon.com/thread.jspa?messageID=239106&tstart=0

Thursday, October 25, 2012

Forrester Research : Research : File Storage Costs Less In The Cloud Than In-House

Forrester Research : Research : File Storage Costs Less In The Cloud Than In-House


FOR INFRASTRUCTURE & OPERATIONS PROFESSIONALS

FILE STORAGE COSTS LESS IN THE CLOUD THAN IN-HOUSE

We Calculate A 74% Cost Reduction, But It's Hard To Compare Apples To Apples

BIM, SaaS and mobile driving Newforma developments

Great article by Paul Wilkinson

BIM, SaaS and mobile driving Newforma developments

Once scathing of the attractions of Software-as-a-Service, Newforma is now actively embracing the cloud, SaaS, BIM and mobile, and eyeing potential social media ideas for future product development.


Friday, October 5, 2012

BIM-in-The-Cloud Has New Competition With Newforma | ENR: Engineering News Record | McGraw-Hill Construction

BIM-in-The-Cloud Has New Competition With Newforma | ENR: Engineering News Record | McGraw-Hill Construction  $$

Model-driven project delivery on jobsites could be taking a big leap forward with a license deal between project information management (PIM) provider Newforma and M-SIX's 3D software platform, called VEO.
...
Accessing BIM models on projects by construction teams "can be a challenge for mere mortals to master," Batcheler says. Using the VEO platform, project teams can access the geometry of commonly-used 3D models through the cloud "without the risk of an untrained person damaging or corrupting the model," he adds.