将 Claude Code Skill 部署到 AWS AgentCore Managed Harness Agent¶

方案概述¶

本方案将原本在 Claude Code / Cursor 中通过 Skill 执行的 AWS 最佳实践评估能力，打包部署到 Amazon Bedrock AgentCore Managed Harness Agent 上，实现云端托管、无需本地环境、按需调用的 AI Agent 服务。

核心思路¶

评估 Skill 文件 (SKILL.md + references/)     AWS Knowledge MCP Server
        ↓ 烘焙进容器                              ↓ remote_mcp 直连
Ubuntu arm64 容器镜像                      https://knowledge-mcp.global.api.aws
  (AWS CLI + kubectl + 6 Skills)                   ↓
        ↓ 推送 ECR                          Harness Tool 配置
        ↓                                         ↓
AgentCore Managed Harness ←─── skills[] + tools[remote_mcp] + CWL Observability
        ↓
invoke_harness API → Agent 加载 Skill + 调用 MCP 搜文档 + shell 跑命令 → 报告

1. 架构对比¶

维度	Claude Code 本地	AgentCore Managed Harness
运行环境	本地终端	AWS 托管 microVM (隔离沙箱)
模型调用	Claude API	Amazon Bedrock (多模型可选)
Skill 加载	`.agents/skills/` 目录	容器内预装 + `skills` 参数指向路径
工具执行	本地 shell	microVM 内 shell (内置)
MCP 工具	MCP Server 本地连接	`remote_mcp` 原生直连
状态管理	无	有状态 Session (可持久化)
扩展性	单用户	API 调用，支持多用户并发

2. Skill 与工具清单¶

容器内预装 6 个评估 Skill，MCP 工具通过 remote_mcp 原生接入。

容器内评估 Skill¶

Skill	用途
`aws-best-practice-research`	AWS 服务最佳实践调研与评估
`eks-workload-best-practice-assessment`	EKS 工作负载级别深度评估
`aws-service-chaos-research`	混沌工程调研
`aws-fis-experiment-prepare`	AWS FIS 实验准备
`aws-fis-experiment-execute`	AWS FIS 实验执行
`app-service-log-analysis`	应用/服务日志分析

Remote MCP 工具 (通过 Harness `tools` 配置直连)¶

{
  "type": "remote_mcp",
  "name": "aws-knowledge-mcp",
  "config": {
    "remoteMcp": {
      "url": "https://knowledge-mcp.global.api.aws"
    }
  }
}

限制: remote_mcp 仅支持 URL + 可选 headers，不支持 SigV4 签名。要调用需要 SigV4 的 AgentCore Runtime MCP Server，必须通过 AgentCore Gateway 包装（见下方 Gateway 工具）。

Harness 自动发现并注入以下 MCP 工具供 Agent 使用：

MCP Tool	用途
`aws___search_documentation`	搜索 AWS 文档（按主题分类）
`aws___read_documentation`	读取特定 AWS 文档页面内容
`aws___recommend`	获取相关文档推荐
`aws___get_regional_availability`	查询服务/API 的区域可用性
`aws___list_regions`	列出所有 AWS 区域
`aws___retrieve_agent_sop`	获取标准操作流程 (SOP)

AgentCore Gateway 工具 (通过 Gateway 调用 SigV4 MCP Server)¶

当目标 MCP Server 部署在 AgentCore Runtime 上（需 SigV4 认证）时，Harness 无法直接用 remote_mcp 连接。解决方案：

创建 AgentCore Gateway，inbound auth 设为 AWS_IAM
添加 MCP server target，outbound auth 设为 GATEWAY_IAM_ROLE + service=bedrock-agentcore
Harness 使用 agentcore_gateway 工具类型挂载

{
  "type": "agentcore_gateway",
  "name": "billing-gateway",
  "config": {
    "agentCoreGateway": {
      "gatewayArn": "arn:aws:bedrock-agentcore:us-east-1:123456789012:gateway/my-gateway-id"
    }
  }
}

IAM 权限要求：

Gateway role: 需要 bedrock-agentcore:InvokeAgentRuntime 权限（调目标 MCP runtime）
Harness role: 需要 bedrock-agentcore:InvokeGateway 权限（调 Gateway 本身）

Gateway 工具名自动加前缀 <target-name>___，如 billing-mcp-target___cost-explorer。

3. 容器镜像构建¶

3.1 目录结构¶

harness-container/
├── Dockerfile
└── skills/
    ├── aws-best-practice-research/
    │   ├── SKILL.md
    │   ├── references/
    │   │   ├── assessment-workflow.md
    │   │   ├── output-template.md
    │   │   └── search-queries.md
    │   ├── README.md
    │   └── README_CN.md
    ├── eks-workload-best-practice-assessment/
    │   ├── SKILL.md
    │   ├── references/
    │   └── ...
    ├── aws-service-chaos-research/
    ├── aws-fis-experiment-prepare/
    ├── aws-fis-experiment-execute/
    └── app-service-log-analysis/

3.2 Dockerfile¶

FROM ubuntu:24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PATH="/usr/local/bin:${PATH}"

# Core tools
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl unzip jq git python3 python3-pip ca-certificates groff less \
    && rm -rf /var/lib/apt/lists/*

# AWS CLI v2 (arm64)
RUN curl -sL "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o /tmp/awscliv2.zip \
    && cd /tmp && unzip -qo awscliv2.zip && ./aws/install \
    && rm -rf /tmp/aws /tmp/awscliv2.zip

# kubectl
RUN curl -sLO "https://dl.k8s.io/release/$(curl -sL https://dl.k8s.io/release/stable.txt)/bin/linux/arm64/kubectl" \
    && install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl \
    && rm kubectl

# eksctl
RUN curl -sL "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_Linux_arm64.tar.gz" \
    | tar xz -C /usr/local/bin

# Assessment skills baked in (MCP tools via remote_mcp, no wrapper skills needed)
COPY skills/ /opt/.agents/skills/

# Verification
RUN aws --version && kubectl version --client=true 2>/dev/null && echo "Build OK"

关键约束: AgentCore 要求容器必须是 linux/arm64 架构。

3.3 构建和推送¶

# 准备 skill 文件
mkdir -p harness-container/skills

# 复制评估类 skill (从 GitHub 仓库)
git clone --depth 1 https://github.com/<YOUR_ORG>/skills.git /tmp/skills-repo
for d in /tmp/skills-repo/*/; do
  [ -f "$d/SKILL.md" ] && cp -r "$d" harness-container/skills/
done

# 构建 arm64 镜像
docker build --platform linux/arm64 -t harness-skills-agent:latest harness-container/

# 创建 ECR 仓库
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
REGION="us-east-1"
aws ecr create-repository --repository-name harness-skills-agent --region $REGION

# 推送到 ECR
ECR_URI="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/harness-skills-agent:latest"
aws ecr get-login-password --region $REGION \
  | docker login --username AWS --password-stdin "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com"
docker tag harness-skills-agent:latest $ECR_URI
docker push $ECR_URI

# 备选：使用 crane 推送 (绕过 Docker daemon 网络问题)
# docker save harness-skills-agent:latest -o /tmp/image.tar
# crane auth login "${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com" \
#   -u AWS -p $(aws ecr get-login-password --region $REGION)
# crane push /tmp/image.tar $ECR_URI

4. IAM 配置¶

4.1 创建 Execution Role¶

cat > trust-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
    "Action": "sts:AssumeRole"
  }]
}
EOF

aws iam create-role \
  --role-name AgentCoreHarnessExecutionRole \
  --assume-role-policy-document file://trust-policy.json

4.2 附加权限策略¶

# ECR 拉取镜像
aws iam attach-role-policy --role-name AgentCoreHarnessExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

# CloudWatch Logs (Observability)
aws iam attach-role-policy --role-name AgentCoreHarnessExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess

# X-Ray (分布式追踪)
aws iam attach-role-policy --role-name AgentCoreHarnessExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess

# 自定义: Bedrock 模型调用 + 目标服务只读权限
cat > harness-policy.json << 'EOF'
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockModelInvocation",
      "Effect": "Allow",
      "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
      "Resource": "*"
    },
    {
      "Sid": "AWSReadAccess",
      "Effect": "Allow",
      "Action": [
        "eks:Describe*", "eks:List*",
        "ec2:Describe*",
        "autoscaling:Describe*",
        "iam:GetRole", "iam:GetPolicy", "iam:ListAttachedRolePolicies",
        "elasticloadbalancing:Describe*",
        "logs:DescribeLogGroups", "logs:GetLogEvents",
        "cloudwatch:GetMetricData", "cloudwatch:DescribeAlarms",
        "sts:GetCallerIdentity", "tag:GetResources"
      ],
      "Resource": "*"
    }
  ]
}
EOF

aws iam put-role-policy --role-name AgentCoreHarnessExecutionRole \
  --policy-name HarnessExecutionPolicy \
  --policy-document file://harness-policy.json

5. 创建 Harness Agent¶

5.1 使用 boto3 创建¶

import boto3

client = boto3.client('bedrock-agentcore-control', region_name='us-east-1')

ACCOUNT_ID = "<YOUR_ACCOUNT_ID>"
ROLE_ARN = f"arn:aws:iam::{ACCOUNT_ID}:role/AgentCoreHarnessExecutionRole"
CONTAINER_URI = f"{ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/harness-skills-agent:latest"

SKILL_PATHS = [
    "/opt/.agents/skills/aws-best-practice-research",
    "/opt/.agents/skills/eks-workload-best-practice-assessment",
    "/opt/.agents/skills/aws-service-chaos-research",
    "/opt/.agents/skills/aws-fis-experiment-prepare",
    "/opt/.agents/skills/aws-fis-experiment-execute",
    "/opt/.agents/skills/app-service-log-analysis",
]

resp = client.create_harness(
    harnessName="skills_agent",
    executionRoleArn=ROLE_ARN,

    environmentArtifact={
        "containerConfiguration": {"containerUri": CONTAINER_URI}
    },

    model={
        "bedrockModelConfig": {
            "modelId": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
            "maxTokens": 16384,
        }
    },

    systemPrompt=[{
        "text": """You are an expert AWS Solutions Architect with access to
multiple assessment skills and AWS documentation tools.

## Assessment Skills (pre-installed at /opt/.agents/skills/)

- aws-best-practice-research: AWS service best practice assessment
- eks-workload-best-practice-assessment: EKS workload deep assessment
- aws-service-chaos-research: Chaos engineering research
- aws-fis-experiment-prepare / execute: FIS experiment lifecycle
- app-service-log-analysis: Application/service log analysis

## AWS Documentation Tools (via MCP)
You have aws-knowledge-mcp-server tools: search_documentation,
read_documentation, recommend, get_regional_availability, list_regions,
retrieve_agent_sop.

When asked to perform an assessment:

1. Load the appropriate SKILL.md and follow its workflow
2. Use MCP tools to search/read AWS documentation for best practices
3. Use shell to run AWS CLI, kubectl commands for live checks
4. Produce a comprehensive report

All CLI tools (aws, kubectl, eksctl, jq, git) are pre-installed.
Always use --region flag. Output in Chinese unless specified otherwise."""
    }],

    # remote_mcp 原生直连 AWS Knowledge MCP Server
    tools=[{
        "type": "remote_mcp",
        "name": "aws-knowledge-mcp",
        "config": {
            "remoteMcp": {
                "url": "https://knowledge-mcp.global.api.aws"
            }
        }
    }],

    skills=[{"path": p} for p in SKILL_PATHS],

    environment={
        "agentCoreRuntimeEnvironment": {
            "lifecycleConfiguration": {
                "idleRuntimeSessionTimeout": 1800,
                "maxLifetime": 7200,
            },
            "networkConfiguration": {"networkMode": "PUBLIC"}
        }
    },

    maxIterations=50,
    timeoutSeconds=3600,

    tags={
        "project": "best-practice-assessment",
        "owner": "<YOUR_ALIAS>",
    },
)

5.2 Observability (CloudWatch 日志)¶

Harness 自动生成 traces、logs、metrics 到 CloudWatch，无需额外配置：

Traces: 每次 invoke 自动记录模型调用、工具执行、shell 命令的时序和 payload
Logs: 流式输出到 CloudWatch Logs
Metrics: session 数量、延迟、token 用量、错误率 (OpenTelemetry 格式)

前提条件:

Execution Role 需要 CloudWatchLogsFullAccess + AWSXRayDaemonWriteAccess 权限
账户需要一次性启用 CloudWatch Transaction Search

查看日志:

# AgentCore CLI
agentcore logs --harness skills_agent
agentcore traces list --harness skills_agent

# CloudWatch Console → AgentCore Observability Dashboard

5.3 模型选择注意事项¶

错误场景	原因	正确做法
`The provided model identifier is invalid`	直接使用 foundation model ID	改用 inference profile ID
`Invocation with on-demand throughput isn't supported`	使用了 foundation model ID	改用 `us.xxx` 或 `global.xxx` 前缀的 profile
`marked by provider as Legacy`	模型已过期	选择 ACTIVE 状态的模型

# 查看可用的 inference profiles
aws bedrock list-inference-profiles --region us-east-1 \
  --query "inferenceProfileSummaries[?contains(inferenceProfileId,'sonnet') && status=='ACTIVE'].{id:inferenceProfileId,name:inferenceProfileName}" \
  --output table

6. 调用 Agent¶

6.1 直接调用 (Python)¶

import boto3, uuid
from botocore.config import Config

config = Config(read_timeout=900)  # 评估耗时长，需要大超时
client = boto3.client('bedrock-agentcore', region_name='us-east-1', config=config)

HARNESS_ARN = "arn:aws:bedrock-agentcore:us-east-1:<ACCOUNT_ID>:harness/<HARNESS_ID>"
SESSION_ID = str(uuid.uuid4()) + "-assessment"

resp = client.invoke_harness(
    harnessArn=HARNESS_ARN,
    runtimeSessionId=SESSION_ID,
    skills=[{"path": "/opt/.agents/skills/aws-best-practice-research"}],
    messages=[{
        "role": "user",
        "content": [{"text": "请对 us-west-2 的 my-cluster EKS 集群执行最佳实践评估"}]
    }],
)

for event in resp["stream"]:
    if "contentBlockDelta" in event:
        delta = event["contentBlockDelta"].get("delta", {})
        if "text" in delta:
            print(delta["text"], end="", flush=True)

6.2 读取 Agent 生成的文件¶

resp = client.invoke_agent_runtime_command(
    agentRuntimeArn=HARNESS_ARN,
    runtimeSessionId=SESSION_ID,
    body={"command": "cat /tmp/eks-report.md"},
)
for event in resp["stream"]:
    chunk = event.get("chunk", {})
    if "contentDelta" in chunk:
        if "stdout" in chunk["contentDelta"]:
            print(chunk["contentDelta"]["stdout"], end="")

6.3 多轮对话 (同一 Session)¶

# 第一轮: 评估
client.invoke_harness(
    harnessArn=HARNESS_ARN,
    runtimeSessionId=SESSION_ID,  # 保持同一 session
    messages=[{"role": "user", "content": [{"text": "评估 my-cluster"}]}],
)

# 第二轮: 追问 (agent 保留了上下文)
client.invoke_harness(
    harnessArn=HARNESS_ARN,
    runtimeSessionId=SESSION_ID,  # 同一 session
    messages=[{"role": "user", "content": [{"text": "帮我生成修复脚本"}]}],
)

7. 踩坑记录¶

7.1 区域支持¶

AgentCore Managed Harness 目前在 Preview 阶段，仅支持：

us-east-1 (N. Virginia)
us-west-2 (Oregon)
eu-central-1 (Frankfurt)
ap-southeast-2 (Sydney)

东京 (ap-northeast-1) 不支持 Harness，但 Agent 可以跨区域操作 AWS 资源 (通过 --region 参数)。

7.2 Base 环境极简¶

AgentCore 默认 base 环境是 Amazon Linux 2023 minimal：

没有 aws CLI
没有 pip / python3-pip
没有 curl (只有 curl-minimal，且有包冲突)
没有 find, which, unzip

解决方案: 构建自定义容器镜像，预装所有依赖。

7.3 ECR 权限¶

Harness Execution Role 必须有 ECR 拉取权限才能使用自定义容器：

aws iam attach-role-policy --role-name AgentCoreHarnessExecutionRole \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly

遗漏这一步会导致: ECR access denied ... ecr:GetAuthorizationToken

7.4 流式读取超时¶

Agent 执行评估时，中间有长时间的工具调用 (shell 命令)，客户端流可能超时。

from botocore.config import Config
config = Config(read_timeout=900)  # 15 分钟
client = boto3.client('bedrock-agentcore', config=config)

7.5 容器 ENTRYPOINT¶

AgentCore 会覆盖容器的 ENTRYPOINT 和 CMD。不要依赖容器启动命令，把所有初始化放到 Dockerfile 的 RUN 阶段，或通过 InvokeAgentRuntimeCommand 在 session 启动后执行。

7.6 remote_mcp 不支持 SigV4¶

Harness 的 remote_mcp 工具类型只接受 URL + 可选 headers，不支持 SigV4 签名。这意味着：

公开 MCP Server（如 https://knowledge-mcp.global.api.aws）可以直连
部署在 AgentCore Runtime 上的 MCP Server（如 billing MCP）必须通过 AgentCore Gateway 包装后使用 agentcore_gateway 工具类型接入

架构对比：

目标 MCP Server	认证方式	Harness 工具类型
公开 HTTP(S)	无 / API Key	`remote_mcp` (URL + headers)
AgentCore Runtime MCP	SigV4	`agentcore_gateway` (需创建 Gateway + MCP target)
需要 OAuth 的 MCP	OAuth 2.0	`agentcore_gateway` (Gateway + OAuth credential provider)

7.7 不依赖 shell 的纯 MCP 工具 Harness¶

如果 Harness 只通过 MCP 工具（不用 shell 跑 aws cli / kubectl），容器镜像可以极度精简：

FROM ubuntu:24.04
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates \
    && rm -rf /var/lib/apt/lists/*

实际案例：billing_query_agent 用最小 ubuntu arm64 镜像 (~30MB)，所有费用查询通过 Gateway → billing MCP 工具完成，不需要 AWS CLI。

8. Skill 加载方式对比¶

方式	适用场景	优缺点
容器镜像预装 (本方案)	生产环境	秒级启动，版本可控；更新需重新构建镜像
Session 启动时从 S3 拉取	开发测试	灵活，无需重建镜像；每次 session 有额外延迟
`npx` 安装公共 Skill	使用社区 Skill	方便；仅限已发布的公共 Skill

# 容器镜像方式 (推荐) — 创建 harness 时注册 skill 路径
skills=[{"path": "/opt/.agents/skills/aws-best-practice-research"}]

# S3 拉取方式 — Session 启动后通过 exec 命令拉取
agentcore invoke --exec --harness my-agent --session-id "$SID" \
  "aws s3 cp s3://my-bucket/skills/ .agents/skills/ --recursive"

# npx 方式 — 安装社区 Skill
agentcore invoke --exec --harness my-agent --session-id "$SID" \
  "npx @anthropic-ai/agent-skills add xlsx github"

9. 后续改进方向¶

CI/CD 自动化 — 用 GitHub Actions 自动构建镜像、推送 ECR、更新 Harness (参考)
多 Harness 拆分 — 按职能创建专用 Harness (安全评估、成本优化、混沌工程)
Memory 持久化 — 启用 AgentCore Memory，跨 session 记住评估历史
自定义 JWT 认证 — 为 Harness 配置 inbound auth，对外暴露安全 API
AgentCore Gateway — ✅ 已实现：将 billing MCP Server 注册到 Gateway，Harness 通过 agentcore_gateway 工具调用（见 7.6 节）
更多 MCP Server — 挂载更多 remote MCP (如 GitHub MCP, Jira MCP) 扩展 Agent 能力
Inline Functions — 添加 human-in-the-loop 审批工具，关键操作前需人工确认