MCP For Software Engineers | Part 1 : Building Your First Server

New to MCP? Heres how to build your first MCP server and a Host Application that integrates and uses the server.

Jul 02, 2025

∙ Paid

When I first tried out the Model Context Protocol (MCP) from Anthropic (March 2025), the developer experience was rough. Integrating and bundling MCP with existing applications was challenging, and there were a few security pitfalls that developers had to navigate. I wrote about that experience here.

If you are visually inclined - here is a video walkthrough of this post.

Model Context Protocol (MCP) is a standard for how AI applications connect to tools and data sources. Simply put: your AI app needs to call tools, monitor requests, handle prompts, and get user approval. MCP standardizes all of this.

However, like all good standards or protocols, MCP has evolved and gotten better (see recent changelog including fixes to sdks, improved support for remote servers, improved auth). And I can now see it solving several critical problems faced by teams building AI applications.

Integration: Without MCP, Team X spends weeks integrating Team Z's new tool. With MCP, Team X announces day-one support for any MCP tool (including Team Z’s new capabiltiies!).
Distribution: Without MCP, Team X writes a Cursor extension, a Windsurf plugin, a VSCode extension, a Claude Desktop add-on, and more. With MCP, Team X writes one MCP server that works everywhere.
Discovery: Without MCP, teams ask "Does anyone have a tool that does X?" With MCP, there's a central registry where teams publish and find tools.
Security: Without MCP, each team implements (or skips) their security/auth for the tools or resources that LLMs use. With MCP, its possible to implement centralized auth and managed registry of MCP servers.
Runtime Flexibility: Without MCP, you're stuck with hard-coded tool configurations. With MCP, tools can be dynamically discovered based on context. Also, it can be helpful to have aspects of the the application logic (e.g., tool execution on remote MCP servers) managed by an MCP server as opposed to running within the host applicaiton

In this tutorial (part 1 of a series on MCP for Software Engineers), we will cover the following:

Building an MCP server that exposes tools (fetch news from techcrunch)
Creating a client to connect to the server
Building a host application that uses an LLM to translate user requests to tool calls on the MCP server.
Choosing between stdio and Streamable HTTP transports for MCP
Bonus : how to use the MCP Server we create in VSCode (or any other tool)

Note: This series is not for those seeking to learn about the "latest productivity hacks with MCP in some existing host application (Cursor, Windsurf etc)."

Screenshot of the server we will build, shown in the AutoGen Studio MCP Playground

Key MCP Concepts in Brief

MCP has an excellent and well maintained documentation site. In brief, here are key concepts to get started.

Server: MCP servers expose capabilities through a standardized interface. A single server can provide multiple tools (functions to call), resources (data to read), prompts (templates for LLM interactions), and sampling (request LLM completions from the client).
Client: MCP clients maintain 1:1 connections with servers and handle the protocol communication. Hosts embed clients to talk to servers.
Host MCP hosts are user-facing applications like Claude Desktop, Cursor, or VSCode. They use clients to connect to servers and decide which tools to call based on user needs.
Transport MCP uses JSON-RPC 2.0 with two transport options - stdio and streamable HTTP. Stdio runs the server as a subprocess using standard input/output - ideal for local integrations where the server runs as a subprocess of the client (e.g., IDE extensions, local development tools). Streamable HTTP uses network requests - better for web applications, distributed systems, multiple clients connecting to one server, and easier debugging/monitoring.

For this tutorial, we'll use the Streamable HTTP as it provides a better learning experience with clearer separation of concerns (you can run the server on a remote machine) and easier debugging.

Building Your First MCP Server

Now that we've covered the core concepts, let's build your first MCP server. We'll use the Python MCP SDK, which is mature and widely adopted, but the same concepts apply to other languages.

Protocol vs SDK ?
MCP mostly defines a protocol or standard - essentially a set of rules that says clients and servers MUST/SHOULD/SHALL/SHALL/NOT do X and Y in order to communicate. Now SDKs are an implementation of these rules.
While you can build your own compliant servers/cleints, in general, it is recommended that you use SDKs for more standardized behaviors where possible.

Our goal: build a tool that can answer news-related queries like "What is the latest AI news on TechCrunch?"

1. Set Up Your Project

Create a new Python project and install the MCP SDK.

Using uv (recommended):

uv init mcp-news-demo
cd mcp-news-demo
uv add "mcp[cli]"

Using pip:

pip install "mcp[cli]"

2. Create the MCP Server

Let's create a simple MCP server with a tool that fetches TechCrunch news.

# server.py
import os
from mcp.server.fastmcp import FastMCP
import requests

mcp = FastMCP(
    "TechCrunch News Server", 
    host=os.environ.get("MCP_SERVER_HOST", "localhost"), 
    port=int(os.environ.get("MCP_SERVER_PORT", 8011))
)

@mcp.tool(title="Fetch from TechCrunch")
def fetch_from_techcrunch(category: str = "latest") -> str:
    """Fetch the latest news from TechCrunch for a given category."""
    allowed = {"ai", "startup", "security", "venture", "latest"}
    cat = category.lower()
    
    if cat not in allowed:
        cat = "latest"
    
    url = f"https://techcrunch.com/tag/{cat}/" if cat != "latest" else "https://techcrunch.com/"
    
    try:
        response = requests.get(url)
        if response.ok:
            try:
                from bs4 import BeautifulSoup
                soup = BeautifulSoup(response.text, "html.parser")
                text = soup.get_text(separator=' ', strip=True)
                return text[:1000] + ("..." if len(text) > 1000 else "")
            except ImportError:
                return response.text[:1000] + ("..." if len(response.text) > 1000 else "")
        return "Failed to fetch news."
    except Exception as e:
        return f"Error fetching news: {str(e)}"

if __name__ == "__main__":
    mcp.run(transport="streamable-http")

3. Run Your Server

Start your server:

python server.py

You should see:

INFO: Started server process [65618]
INFO: Waiting for application startup.
[07/01/25 12:40:34] INFO StreamableHTTP session manager started
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8011 (Press CTRL+C to quit)

Your server is now running at http://localhost:8011/mcp and ready to accept requests.

Note: This example uses the high-level FastMCP API with Streamable HTTP transport. For advanced use cases, check the MCP Python SDK documentation for the low-level API.

Building Your First MCP Client

MCP clients connect to servers and handle communication. They operate within "sessions" - logical groupings of requests and responses. Let's build a client that connects to our TechCrunch server. Note that we will use the same streamablehttp transport as the server:

# client.py
import asyncio
import os
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

async def run_client():
    # Connect to the HTTP MCP server
    host = os.environ.get("MCP_SERVER_HOST", "localhost")
    port = os.environ.get("MCP_SERVER_PORT", "8011")
    server_url = f"http://{host}:{port}/mcp"
    
    async with streamablehttp_client(server_url) as (read, write, _):
        async with ClientSession(read, write) as session:
            # Initialize the connection
            await session.initialize()
            
            # List available tools
            tools_response = await session.list_tools()
            print("Available tools:")
            for tool in tools_response.tools:
                print(f"- {tool.name}: {tool.description}")
            
            # Call a tool
            result = await session.call_tool(
                "fetch_from_techcrunch",
                arguments={"category": "ai"}
            )
            print(f"\nTool result: {result.content}")

if __name__ == "__main__":
    asyncio.run(run_client())

You should now see a list of tools and a result of a call to the tool using `session.call_tool`.

Available tools:
- fetch_from_techcrunch: Fetch the latest news from TechCrunch for a given category.
Tool result: [TextContent(type='text', text='AI | TechCrunch AI | TechCrunch TechCrunch Desktop Logo TechCrunch Mobile Logo Latest Startups Venture Apple Security AI Apps Events Podcasts Newsletters Search Submit Site Search Toggle Mega Menu Toggle Topics Latest AI Amazon  ...

Server and Client Capabilities

Note: `await session.initialize()` returns important details about the server including its capabilities (e.g., if it provides tools, resources, prompts, logging or any experimental/custom features.). This can be utilized by the client or host application to define dynamic behaviors.

Similarly, clients can “advertise support” for client capabilities such as sampling, elicitation and roots during initialization. In the python SDK, this is done by providing `callbacks` for each capability.

async with ClientSession(read, write, sampling_callback=sampling_callback, elicitation_callback=elicitation_callback,list_roots_callback=list_roots_callback, logging_callback=logging_callback) as session:

Building the Host Application

The host application is where the magic happens. It bridges MCP with the outside world (users or business applications), turning user queries or business tasks into tool calls and responses.

Most MCP tutorials skip over how complex host applications really can be. They're not just pass-through layers - they're intelligent orchestrators that often must:

Accept user requests
Discover available tools from MCP servers
Use an LLM to choose the right tools
Execute tool calls with proper parameters
Handle errors gracefully
Present results in a user-friendly way

Let's build a simple host that uses OpenAI to intelligently orchestrate our MCP tools.

Setup

First, install dependencies:

uv add openai
# or: pip install openai

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

The Host Application

# app.py
import asyncio
import json
import os
import sys
from openai import AsyncOpenAI
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client

def convert_mcp_tools_to_openai_format(tools):
    """Convert MCP tool definitions to OpenAI function calling format."""
    openai_tools = []
    for tool in tools:
        openai_tools.append({
            "type": "function",
            "function": {
                "name": tool.name,
                "description": tool.description,
                "parameters": tool.inputSchema if hasattr(tool, 'inputSchema') else {}
            }
        })
    return openai_tools

async def handle_user_request(session, openai_client, tools, user_input: str):
    """Process user request using LLM and MCP tools."""
    openai_tools = convert_mcp_tools_to_openai_format(tools)
    
    # Ask LLM to decide which tools to use
    response = await openai_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {
                "role": "system", 
                "content": "You are a helpful assistant that can fetch news from TechCrunch. Use tools when needed."
            },
            {"role": "user", "content": user_input}
        ],
        tools=openai_tools,
        tool_choice="auto"
    )
    
    message = response.choices[0].message
    
    # If LLM wants to use a tool, execute it
    if message.tool_calls:
        tool_call = message.tool_calls[0]
        function_name = tool_call.function.name
        function_args = json.loads(tool_call.function.arguments)
        
        print(f"🔧 Calling tool: {function_name} with args: {function_args}")
        
        result = await session.call_tool(function_name, arguments=function_args)
        
        # Format the response
        content = str(result.content)[:500]
        return f"Here's what I found:\n\n{content}...\n\nFor more details, visit TechCrunch directly."
    else:
        return message.content

async def main():
    if not os.getenv("OPENAI_API_KEY"):
        print("Error: Set OPENAI_API_KEY environment variable")
        return
    
    # Get user input
    user_input = " ".join(sys.argv[1:]).strip() or "What is the latest news on AI?"
    
    # Initialize OpenAI client
    openai_client = AsyncOpenAI()
    
    # Connect to MCP server
    server_url = f"http://localhost:8011/mcp"
    
    async with streamablehttp_client(server_url) as (read, write, _):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            # Get available tools
            tools = (await session.list_tools()).tools
            
            print(f"Task: {user_input}\n")
            
            # Process the request
            response = await handle_user_request(
                session, openai_client, tools, user_input
            )
            
            print(response)

if __name__ == "__main__":
    # Make sure server is running first!
    asyncio.run(main())

The host application above combines three key components: an MCP client, an LLM (OpenAI), and user interface logic. When you run python app.py "What's the latest AI news?", here's what happens: First, the host connects to the MCP server using its embedded client and calls list_tools() to discover available tools. It then converts these MCP tool definitions into OpenAI's function-calling format using convert_mcp_tools_to_openai_format(). Next, it sends your question along with the tool definitions to OpenAI's API. OpenAI analyzes your question and returns which tool to call - in this case, fetch_from_techcrunch with category='ai'. The host then executes this tool call through session.call_tool(), gets the raw results from TechCrunch, and formats them into a readable response. This architecture lets you ask natural language questions without knowing which tools exist or how to call them - the LLM figures that out based on what's available.

A similar host application format can be extended to build complex agentic or multi-agent applications.

Putting It All Together

You now have a complete MCP application:

Server (server.py) - Exposes the TechCrunch fetching tool
Client (client.py) - Shows how to connect and call tools directly
Host (app.py) - Uses AI to intelligently orchestrate tool usage

To run the full system:

# Terminal 1: Start the server
python server.py

# Terminal 2: Run the host application
python app.py "What's the latest AI news?"

Oh .. and the server above can be used in any of your favorite MCP host applications. For example, you can add it to your VSCode agent by adding an mcp server config to your user settings. The same configuration will work for Claude Desktop, Cursor, Windsurf …

"demo": {
        "type": "http",
        "url": "http://localhost:8011/mcp/",
        "headers": { "VERSION": "1.2" }
      }

Keep reading with a 7-day free trial

Subscribe to Designing with AI to keep reading this post and get 7 days of free access to the full post archives.