Model-driven Fault-tolerance Provisioning for Component-based Distributed Real-time Embedded Systems

|
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
 5 views
of 60

Please download to get full document.

View again

Description
Ph.D. Proposal, April 2010 Sumant Tambe sutambe@dre.vanderbilt.edu www.dre.vanderbilt.edu/~sutambe. Model-driven Fault-tolerance Provisioning for Component-based Distributed Real-time Embedded Systems. Department of Electrical Engineering & Computer Science
Share
Transcript
Ph.D. Proposal, April 2010SumantTambesutambe@dre.vanderbilt.eduwww.dre.vanderbilt.edu/~sutambeModel-driven Fault-tolerance Provisioning for Component-based Distributed Real-time Embedded SystemsDepartment of Electrical Engineering & Computer ScienceVanderbilt University, Nashville, TN, USAPresentation Road MapTechnology Context OverviewRelated Research & Unresolved ChallengesResearch Progress  Model-driven Fault-tolerance Provisioning for Component-based DRE SystemsProposed ResearchResearch Contributions & Dissertation TimelineConcluding RemarksContext: Distributed Real-time Embedded (DRE) Systems
  • Heterogeneous soft real-time applications
  • Stringent simultaneous QoS demands
  • High availability, Predictability (CPU & network) etc
  • Efficient resource utilization
  • Operation in dynamic & resource-constrained environments
  • Process/processor failures
  • Changing system loads
  • Examples
  • Total shipboard computing environment
  • NASA’s Magnetospheric Multi-scale mission
  • Warehouse Inventory Tracking Systems
  • Component-based development
  • Separation of Concerns
  • Composability
  • Reuse of commodity-off-the-shelf (COTS) components
  • (Images courtesy Google)Operational Strings & End-to-end QoS
  • Operational String model of component-based DRE systems
  • A multi-tier processing model focused on the end-to-end QoS requirements
  • Functionality is a chain of tasks scheduled on a pool of computing nodes
  • Resources, QoS, & deployment are managed end-to-end
  • End-to-end QoS requirements
  • Critical Path: The chain of tasks that is time-critical from source to destination
  • Need predictable scheduling of computing resources across components
  • Need network bandwidth reservations to ensure timely packet delivery
  • Failures may compromise end-to-end QoS
  • Must support highly available operational strings!Availability Requirements of DRE Operational Strings(1/4)
  • The fault-model consists of fail-stop failures
  • Cause delays & requires software/hardware redundancy
  • Recovery must be quick to meet the deadline (soft real-time)
  • What are reliability alternatives?
  • Roll-back recovery
  • Transactional
  • Roll-forward recovery: replication schemes
  • Active replication (multiple concurrent executions)
  • Passive replication (primary-backup approach)
  • ResourcesNon-determinismRecovery timeAvailability Requirements of DRE Operational Strings(2/4)
  • What is failover granularity for passive replication?
  • Single component failover
  • Larger than a single component
  • Scenario 1: Must tolerate catastrophic faults (DoD-centric)
  • Pool Failure
  • Network failure
  • Whole operational string must failoverPool 1ReplicaClientsNNNNNNPool 2NNNNNNNNNAvailability Requirements of DRE Operational Strings(3/4)
  • Scenario 2: Must tolerate Bohrbugs
  • A Bohrbug repeats itself predictably when the same state reoccurs
  • Preventing Bohrbugs: Reliability through diversity
  • Diversity via non-isomorphic replication
  • Non-isomorphicwork-flowand implementationof ReplicaDifferent End-to-end QoS(thread pools, deadlines, priorities)Whole operational string must failoverAvailability Requirements of DRE Operational Strings(4/4)
  • Scenario 3: Must tolerate non-determinism
  • Sources of non-determinism in DRE systems
  • Local information (sensors, clocks), thread-scheduling, timers, timeouts, & more
  • Enforcing determinism is not always possible
  • Scenario 3a: Must tolerate side-effects of replication + non-determinism
  • Problem: Orphan request & orphan state
  • Solution based on single component failover require costly roll-backs
  • Fault-tolerance provisioning should be transparent
  • Separation of availability concerns from the business logic
  • Improves reusability, productivity, & perceived availability of the system
  • Potential orphan state Non-determinismReplicationHigh availability of operational strings need transparent group failover with passive replication!Integrating Availability in Component-based Development LifecycleDevelopment Lifecycle
  • Answers needed in all phases of development
  • How to specify FT & end-to-end QoS requirements?
  • SpecificationComposition
  • How to compose & deploy application components & their replica?
  • How to configure the underlying middleware to provision QoS?
  • DeploymentConfiguration
  • How to deal with the side effects of replication & non-determinism at run-time?
  • Run-timePresentation Road MapTechnology Context OverviewRelated Research & Unresolved ChallengesResearch Progress  Model-driven Fault-tolerance Provisioning for Component-based DRE SystemsProposed ResearchResearch Contributions & Dissertation TimelineConcluding RemarksRelated Research: QoS & FT Modeling Recovery block modeling and QoS for SOALightweight&HeavyweightUML extensionsMoC = service logic graphs, state machine,Java extension11Unresolved Challenges: QoS & FT ModelingDevelopment Lifecycle
  • Crosscutting availability requirements
  • Tangled with primary structural dimension
  • Tangled with secondary dimensions (deployment, QoS)
  • Composing replicated & non-replicated functionality
  • Example: Replicas must be modeled, composed, & deployed
  • Imposes modeling overhead
  • Supporting non-isomorphic replication
  • Reliability through diversity (structural & QoS)
  • Supporting graceful degradation through diversity
  • SpecificationCompositionDeploymentImposes modeling overheadConfigurationComposing connectionsRun-timeComposingreplicasUnresolved Challenges: QoS & FT ModelingDevelopment Lifecycle
  • Variable granularity of failover
  • Whole operational string, sub-string, or a component group
  • Variable QoS association granularity
  • Network-level QoS specification (connection level)
  • Differentiated service based on traffic class & flow
  • Example: High priority, high reliability, low latency
  • Bidirectional bandwidth requirements
  • SpecificationPort-levelConnection-levelCompositionComponent-levelDeploymentConfigurationRun-timeRelated Research: Transparent FT ProvisioningM2M transformation & code generationPerformance improvement for FT using AOPPerformance improvement for FT using AOP14Unresolved Challenges: Transparent FT ProvisioningDevelopment Lifecycle
  • Not all the necessary steps are supported coherently
  • Automatic component instrumentation for fault-handling code
  • Deciding placement of components & their replicas
  • Deploying primaries, replicas, & monitoring infrastructure
  • Platform-specific metadata synthesis (XML)
  • Missing domain-specific recovery semantics (run-time middleware)
  • Group failover is DRE-specific & often neglected
  • Costly to modify the middleware
  • Application-level solutions lose transparency & reusability
  • Missing transparent network QoS provisioning (D&C middleware)
  • Configuration of network resources (edge routers)
  • Configuration of containers for correct packet marking
  • SpecificationCompositionDeploymentConfigurationHow to add domain-specific recovery semanticsin COTS middleware retroactively?How to automate it to improve productivity & reduce cost?Run-timePresentation Road MapTechnology Context OverviewRelated Research & Unresolved ChallengesResearch Progress  Model-driven Fault-tolerance Provisioning for Component-based DRE SystemsProposed ResearchResearch Contributions & Dissertation TimelineConcluding RemarksOverview of the Solution ApproachResolves challenges in
  • Component QoS Modeling Language (CQML)
  • Aspect-oriented Modeling for Modularizing QoS Concerns
  • SpecificationComposition
  • Generative Aspects for Fault-Tolerance (GRAFT)
  • Multi-stage model-driven development process
  • Weaves dependability concerns in system artifacts
  • Provides model-to-model, model-to-text, model-to-code transformations
  • DeploymentConfiguration
  • End-to-end Reliability of Non-deterministic Stateful Components
  • Proposed research
  • Run-timeResolution 1: QoS SpecificationResolves challenges in
  • Component QoS Modeling Language (CQML)
  • Aspect-oriented Modeling for Modularizing QoS Concerns
  • SpecificationCompositionDeploymentConfigurationRun-timeResolution 1: QoS Specification
  • Component QoS Modeling Language (CQML)
  • A modeling framework for declarative QoS specification
  • Reusable for multiple composition modeling languages
  • Failover unit for Fault-tolerance
  • Capture the granularity of failover
  • Specify # of replicas
  • Network-level QoS
  • Annotate component connections
  • Specify priority of communication traffic
  • Bidirectional bandwidth requirements
  • Security QoS
  • Real-time CORBA configuration
  • Event channel configuration
  • Separation of Concerns in CQML
  • Resolving tangling of functional composition & QoS concerns
  • Separate Structural view from the QoS view
  • GRAFT transformations use aspect-oriented model weaving to coalesce both the views of the model
  • Resolving Granularity of QoS Associations in CQML
  • Commonality/Variability analysis of composition modeling languages
  • e.g., PICML for CCM, J2EEML for J2EE, ESML for Boeing Bold-Stroke
  • Feature model of composition modeling languages
  • Dictates QoS association granularityComposition Modeling Language
  • Enhance composition language to model QoS
  • GME meta-model composition
  • Composing CQML (1/3)PICML or J2EEML or ESMLComposition Modeling LanguageGoal: Create reusable & loosely coupled associationsConcreteQoS ElementsCQMLComposing CQML (2/3)PICML or J2EEML or ESMLComposition Modeling LanguageDependency Inversion PrincipleCQML Join-point ModelCQMLConcreteQoS ElementsComposing CQML (3/3)PICML or J2EEML or ESMLComposition Modeling LanguageCQML Join-point ModelAbstract QoS ElementsCQMLGrouping of QoS elements using is-a relationshipConcreteQoS ElementsComposing CQML (3/3)PICML or J2EEML or ESMLComposition Modeling LanguageCQML Join-point ModelAbstract QoS ElementsCQMLConcreteQoS ElementsEvaluating Composability of CQML
  • Three composition modeling languages
  • PICML
  • J2EEML
  • ESML
  • Available feature-set determines the extent of applicability of the join-point model
  • Three composite languages with varying QoS modeling capabilities
  • PICML’
  • J2EEML’
  • ESML’
  • Relevant PublicationsCQML: Aspect-oriented Modeling for Modularizing & Weaving QoS Concerns in Component-based Systems, ECBS 2009Towards A QoS Modeling & Modularization Framework for Component-based Systems, AQuSerM2008Model-driven Engineering for Development-time QoS Validation of Component-based Software Systems, RTWS 2006First-authorResolution 2: Post-specification Phases of the Development LifecycleResolves challenges inSpecificationComposition
  • Generative Aspects for Fault-Tolerance (GRAFT)
  • Multi-stage model-driven development process
  • Weaving Dependability Concerns in System Artifacts
  • Provides model-to-model, model-to-text, model-to-code transformations
  • DeploymentConfigurationRun-timeGenerative Aspects for Fault Tolerance (GRAFT)
  • Multi-stage model-driven generative process
  • Incremental model-refinement using transformations
  • Model-to-model
  • Model-to-text
  • Model-to-code
  • Weaves dependability concerns in system artifacts
  • Stage 1: Isomorphic M2M TransformationQoS ViewM2M TransformationStructural ViewStep1: Model structural composition of operational stringStep2: Annotate components with failover unit(s) marking them “fault-tolerant” in the QoS viewStep3: Use aspect-oriented M2M transformation developed using Embedded Constraint Language (ECL) of C-SAWStep4: Component replicas & interconnections are generated automaticallyStep 5: FOU annotations are removed but other QoS annotations are cloned (uses Dependency Inversion Principle of CQML)Step 6: Isomorphic clone can be modified manually (reliability through diversity)Stage 2: Determine Component PlacementRPRootRiskGroupSRGSRG
  • Strategic placement of components
  • Improves availability of the system
  • Several constraint satisfaction algorithms exist
  • Placement comparison heuristic
  • Hop-count between replicas
  • Formulation based on the co-failure probabilities captured using Shared Risk Group (SRG)
  • E.g., shared power supply, A/C, fire zone
  • Reduces simultaneous failure probability
  • GRAFT transformations weave the decisions back into the model
  • Stage 3: Synthesizing Fault Monitoring InfrastructureTransformation AlgorithmFailover unitQoS ViewM2M TransformationStructural ViewFault DetectorCollocated Heartbeat ComponentsStage 4: Synthesizing Code for Group Failover (1/2)
  • Code generation for fault handling
  • Reliable fault detection
  • Transparent fault masking
  • Fast client failover
  • Location of failure determines handling behavior
  • HeadTailFOU
  • FOU shutdown is achieved using seamless integration with D&C middleware APIs
  • e.g., Domain Application Manager (DAM) of CCM
  • Shutdown method calls are generated in fault-handling code
  • Stage 4: Synthesizing Code for Group Failover (2/2)
  • Two behaviors based on component position
  • FOU participant’s behavior
  • Detects the failure
  • Shuts down the FOU including itself
  • FOU client’s behavior
  • Detects the failure
  • Does an automatic failover to a replica FOU
  • Optionally shuts down the FOU to save resources
  • Generated code: AspectC++
  • AspectC++ compiler weaves in the generated code in the respective component stubs
  • Stage 5: Synthesizing Platform-specific MetadataGRAFT synthesizes the necessary artifacts for transparent FT provisioning for DRE operational strings
  • Component Technologies use XML metadata to configure middleware
  • Existing model interpreters can be reused without any modifications
  • CQML’s FT modeling is opaque to existing model interpreters
  • GRAFT model transformations are transparent to the model interpreters
  • Evaluating Modeling Efforts Reduction Using GRAFT
  • Case-study - Warehouse Inventory Tracking System
  • GRAFT’s isomorphic M2M transformation eliminates human modeling efforts of replicas
  • Components
  • Connections
  • QoS requirements
  • Evaluating Programming Efforts Reduction Using GRAFT
  • GRAFT’s code generator reduces human programming efforts
  • Code for fault-detection, fault-masking, & failover
  • # of try blocks
  • # of catch blocks
  • Total # of lines
  • Evaluating Client Perceived Failover Latency Using GRAFT
  • Client perceived failover latency
  • Sensitive to the location of failure
  • Sensitive to the implementation of DAM
  • Head component failure
  • Constant failover latency
  • Tail component failover
  • Linear increase in failover latency
  • Head component failureTail component failureRelevant PublicationsFault-tolerance for Component-based Systems – An Automated Middleware Specialization Approach, ISORC 2009MDDPro: Model-Driven Dependability Provisioning in Enterprise Distributed Real-Time & Embedded Systems, ISAS 2007Supporting Component-based Failover Units in Middleware for Distributed Real-time & Embedded Systems, JSA Elsevier (under submission)First-authorOtherPresentation Road MapTechnology Context OverviewRelated Research & Unresolved ChallengesResearch Progress  Model-driven Fault-tolerance Provisioning for Component-based DRE SystemsProposed ResearchResearch Contributions & Dissertation TimelineConcluding RemarksProposed Research: End-to-end Reliability of Non-deterministic Stateful ComponentsDevelopment LifecycleSpecificationCompositionDeployment
  • End-to-end Reliability of Non-deterministic Stateful Components
  • Proposed research
  • ConfigurationRun-timeExecution Semantics & High Availability
  • Execution semantics in distributed systems
  • May-be – No more than once, not all subcomponents may execute
  • At-most-once – No more than once, all-or-none of the subcomponents will be executed (e.g., Transactions)
  • Transaction abort decisions are not transparent
  • At-least-once – All or some subcomponents may execute more than once
  • Applicable to idempotent requests only
  • Exactly-once – All subcomponents execute once & once only
  • Enhances perceived availability of the system
  • Exactly-once semantics should hold even upon failures
  • Equivalent to single fault-free execution
  • Roll-forward recovery (replication) may violate exactly-once semantics
  • Side-effects of replication must be rectified
  • State UpdateState UpdateState UpdatePartial executionshould seem like no-op upon recoveryExactly-once Semantics, Failures, & Determinism
  • Deterministic component A
  • Caching of request/reply at component B is sufficient
  • Caching of request/reply rectifies the problem
  • Non-deterministic component A
  • Two possibilities upon failover
  • No invocation
  • Different invocation
  • Caching of request/reply does not help
  • Non-deterministic code must re-execute
  • Orphan request & orphan stateRelated Research: End-to-end ReliabilityDatabase in the last tier Deterministic schedulingProgram analysis to compensate nondeterminism44Unresolved Challenges: End-to-end Reliability of
  • Non-deterministic Stateful Components
  • Integration of replication & transactions
  • Applicable to multi-tier transactional web-based systems only
  • Overhead of transactions (fault-free situation)
  • Join operations in the critical path
  • 2 phase commit (2PC) protocol at the end of invocation
  • JoinJoinJoinState UpdateState UpdateState UpdateUnresolved Challenges: End-to-end Reliability of
  • Non-deterministic Stateful Components
  • Integration of replication & transactions
  • Applicable to multi-tier transactional web-based systems only
  • Overhead of transactions (fault-free situation)
  • Join operations in the critical path
  • 2 phase commit (2PC) protocol at the end of invocation
  • Overhead of transactions (faulty situation)
  • Must rollback to avoid orphan state
  • Re-execute & 2PC again upon recovery
  • Complex tangling of QoS: Schedulability & Reliability
  • Schedulability of rollbacks & join must be ensured
  • Transactional semantics are not transparent
  • Developers must implement: prepare, commit, rollback (2PC phases)
  • State UpdateState UpdateState UpdatePotential orphan stategrowingOrphan state bounded in B, C, DUnresolved Challenges: End-to-end Reliability of
  • Non-deterministic Stateful Components
  • Integration of replication & transactions
  • Applicable to multi-tier transactional web-based systems only
  • Overhead of transactions (fault-free situation)
  • Join operations in the critical path
  • 2 phase commit (2PC) protocol at the end of invocation
  • Overhead of transactions (faulty situation)
  • Must rollback to avoid orphan state
  • Re-execute & 2PC again upon recovery
  • Complex tangling of QoS: Schedulability & Reliability
  • Schedulability of rollbacks & join must be ensured
  • Transactional semantics are not transparent
  • Developers must implement all: commit, rollback, 2PC phases
  • Enforcing determinis
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks