Portage / Routes / Squarespace → Astro
// Route specification
Squarespace → Astro
A crossing for a Squarespace blog: posts, pages, images, and categories mapped to Astro content collections via the WordPress XML export — with a crawl fallback for the pages the export leaves on the dock.
01Overview
This route carries a Squarespace blog into an Astro project as content collections backed by Markdown or MDX. The primary source is Squarespace's WordPress-format XML (WXR) export — a single, parseable file — supplemented by an optional crawl of the live site for content the export omits.
Squarespace's export is deliberately narrow: it covers the primary blog and basic pages, and little else. Portage treats the WXR as the spine of the crossing and the crawl as connective tissue, then reconciles both into one manifest before writing Astro-native files.
Published blog posts, basic pages, inline images, categories and tags, authors, publish dates, and the blog permalink structure.
02Export & crawl
A · WordPress XML export
From Settings → Import & Export → Export → WordPress, Squarespace produces a .xml (WXR) file. Portage parses each <item>: title, wp:post_name (slug), content:encoded (body), excerpt:encoded, pubDate, category (tags & categories), dc:creator (author), and wp:status.
$ npx portage extract --from squarespace \ --export ./Squarespace-Wordpress-Export.xml --to ./astro-project → 96 posts · 5 pages · 22 categories · 3 authors · 204 images referenced
B · Crawl fallback
Because the export omits index, gallery, portfolio, album, event, and commerce collections, Portage can crawl the live site from its sitemap.xml to recover pages the WXR leaves out, parsing rendered block HTML and SEO tags.
$ npx portage extract --from squarespace --export ./export.xml \ --crawl https://www.example.com --to ./astro-project
Squarespace exports only the primary blog page. If the site runs multiple blogs, export each separately, or lean on the crawl to recover the rest.
03Content mapping
Each WXR <item> becomes one Markdown/MDX file. Posts and pages split into separate collections by wp:post_type.
| WXR field | Type | Astro frontmatter | Notes | |
|---|---|---|---|---|
| title | string | → | title | Required. |
| wp:post_name | string | → | (filename) | Slug & entry id. |
| excerpt:encoded | html | → | description | Falls back to a generated excerpt. |
| content:encoded | html | → | (body) | Converted to Markdown/MDX — see §04. |
| pubDate · wp:post_date | date | → | pubDate | Coerced by zod. |
| category[domain=post_tag] | string[] | → | tags | |
| category[domain=category] | string[] | → | categories | Disambiguated by domain attribute. |
| dc:creator | string | → | authors | |
| wp:status | enum | → | draft | draft → draft: true. |
| wp:post_type | enum | → | (collection) | post → blog; page → pages. |
| (first inline image) | url | → | heroImage | Derived — WXR has no feature-image field; --hero controls it. |
| (crawled <title> · meta) | string | → | seo | Only when --crawl is on — SEO meta is not in the WXR. |
04Content transforms
Squarespace stores body content as content:encoded HTML — rendered blocks wrapped in sqs-block containers with inline styles. Portage strips the wrapper markup and inline style attributes, keeping the semantic content, then converts to Markdown. Common blocks map as follows.
| Squarespace block | Output (Markdown) | Output (MDX) | |
|---|---|---|---|
| text / markdown | → | Paragraphs & lists | Same |
| image + caption | → |  + caption | <Figure> |
| gallery / summary | → | Image sequence | <Gallery> |
| code | → | Fenced block | Fenced block |
| quote | → | Blockquote | Blockquote |
| button | → | Markdown link | <Button> |
| embed (YouTube, …) | → | Link fallback | <Embed> |
05Assets
Squarespace serves images from images.squarespace-cdn.com with a size query (?format=750w, ?format=original). Portage resolves every reference, fetches the original, and rewrites the path.
- Strip the format query — the original is requested with
?format=originaland Astro regenerates responsive sizes. - Download & dedupe — assets are content-hashed; duplicates collapse to one file.
- Rewrite references — feature and in-body images point at local paths.
- External images stay remote unless
--images localize-external.
--images assets (default) writes to src/assets/blog/ for full Astro optimization. --images public writes stable URLs to public/images/.
06Routes & redirects
Squarespace namespaces blog posts under a collection prefix (often /blog/). Portage preserves the structure by default and emits a redirect map for anything that changes.
| Squarespace route | Astro route | Handled by | |
|---|---|---|---|
| /blog/{slug} | → | /blog/{slug}/ | Blog collection (--route-base) |
| /{page-slug} | → | /{page-slug}/ | Pages collection |
| /tag/{slug} · /category/{slug} | → | /tag/ · /category/ | Generated taxonomy pages |
| ?format=rss | → | /rss.xml | @astrojs/rss + redirect |
- Slug prefix — preserved by default; flatten with
--route-base /. - Existing 301s — Squarespace's built-in URL mappings can be exported and merged into the redirect map.
- Trailing slashes & sitemap —
trailingSlash: 'always'and@astrojs/sitemapconfigured to the new base URL.
07Out of scope
The WXR export is narrow by design, and much of Squarespace has no static equivalent. Portage migrates content and reports — explicitly — what it could not carry.
- Commerce — products, checkout, inventory, and orders.
- Member areas & paywalls — gated content and the auth around it.
- Index, portfolio, gallery, album, event & audio collections — absent from the WXR; partially recoverable via crawl.
- Site styles — design, custom CSS, fonts, and layout (rebuilt in Astro).
- Forms, scheduling (Acuity), donations, and other dynamic features.
- Secondary blogs — only the primary blog exports.
Treat the export as the spine and the crawl as connective tissue. Expect to hand-place a few pages the export and crawl can't fully reconstruct — Portage lists every one of them in the manifest.
08Edge cases
- Pages missing from the export — recovered from the crawl when
--crawlis set; otherwise listed as gaps. - No feature-image field — hero is derived from the first inline image; disable with
--hero none. - SEO meta absent from WXR — recovered via crawl, else left to Astro defaults.
- Drafts — Squarespace doesn't export unpublished posts; only published items arrive.
- Tags vs. categories — both arrive as
<category>; split by thedomainattribute. - Inline styles &
sqs-wrappers — stripped to leave clean semantic markup. - CDN size variants — collapsed to the downloaded original.
09Output
A predictable, buildable Astro project, with a manifest ledger at the root.
astro-project/ ├── src/ │ ├── content/ │ │ ├── blog/ ← 96 posts (WXR) │ │ └── pages/ ← 5 pages (WXR + crawl) │ ├── assets/blog/ ← 171 localized images │ ├── components/portage/ ← MDX block stubs (if --content mdx) │ └── content.config.ts ├── public/_redirects ← 18 generated redirects ├── portage.manifest.json ← extract / transform / load ledger └── astro.config.mjs ← trailingSlash · sitemap · redirects
Content collection schema
import { defineCollection, z } from 'astro:content'; import { glob } from 'astro/loaders'; const blog = defineCollection({ loader: glob({ pattern: '**/*.{md,mdx}', base: './src/content/blog' }), schema: ({ image }) => z.object({ title: z.string(), description: z.string(), pubDate: z.coerce.date(), heroImage: image().optional(), tags: z.array(z.string()).default([]), categories: z.array(z.string()).default([]), authors: z.array(z.string()).default([]), draft: z.boolean().default(false), seo: z.object({ title: z.string().optional(), description: z.string().optional() }).optional(), }), }); export const collections = { blog };
Sample migrated post
--- title: "Harbor Notes, Week One" description: "First impressions after leaving the builder." pubDate: 2025-09-30 heroImage: ../../assets/blog/harbor-notes.jpg tags: ["field-notes"] categories: ["Journal"] authors: ["Dana Reyes"] --- The chart table is clearer than the dashboard ever was…
10CLI
Three stages, run in order. Add --crawl at extract time to fill the export's gaps.
$ npx portage extract --from squarespace --export ./export.xml --crawl $URL --to ./astro-project $ npx portage transform --schema content-collections --content mdx $ npx portage load --images assets --redirects netlify
| Flag | Values | Default | Purpose |
|---|---|---|---|
| --export | path | — | WordPress XML (WXR) file. Required. |
| --crawl | url | — | Recover pages & SEO the WXR omits. |
| --route-base | string | /blog | Preserve or flatten the blog prefix. |
| --hero | first-image · none | first-image | How heroImage is derived. |
| --content | markdown · mdx | markdown | Body format & block handling. |
| --images | assets · public · localize-external | assets | Image placement. |
| --dry-run | flag | off | Plan & diff only. |
11Verification
Every crossing is auditable. The dry-run reconciles the export against the crawl so you can see exactly what each source contributed.
- Dry-run first —
--dry-runprints the plan and a diff; nothing is written. - Source attribution — the manifest records whether each entry came from the WXR, the crawl, or both.
- Counted on, counted off — gaps (export-only collections, missing SEO) are listed explicitly, never hidden.
$ npx portage load --dry-run → 96 posts (wxr) → 96 files ✓ reconciled → 5 pages (wxr+crawl) → 5 files ✓ → 204 images → 171 unique ✓ deduped → 18 redirects mapped ✓ → 2 gallery collections ⚠ not in export — crawl or hand-place