Skip to main content

Local file quickstart

If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io. After you are signed in, watch the following video: TODO: Add a how-to video here. The following procedure outlines the steps in the preceding video.
  1. After you are signed in, the Start page appears.
  2. In the Welcome area, do one of the following:
    • Click one of the sample files, such as realestate.pdf, to have Unstructured parse and transform that sample file.
    • Click Browse files, and then browse to and select one of your own files, to have Unstructured parse and transform it.
  3. After Unstructured has finished parsing and transforming the file (a process known as partitioning), you will see the file’s contents in the center and Unstructured’s results on the right.
  4. The view on the right shows a formatted view of Unstructured’s results, which is designed for human readability. To see the underlying JSON view of the results, which is designed for RAG and agentic AI, click JSON at the top of the view on the right side of the screen. Learn about what’s in the JSON view.
You can also do the following:
  • To download the results as a local JSON file, click the download icon to the left of the Formatted and JSON buttons.
  • To have Unstructured partition a different file, click Add new file on the left, and then browse to and select the target file.
  • To view the results for a file that was previously partitioned during this session, click the file’s name in the Recent files list on the left.
  • To return to the Start page, click the X (close) button at the top left of the page, next to Transform.
  • To have Unstructured do more than just partitioning, such as chunking, enriching, and embedding, click Edit in Workflow Editor at the top right of the page, or skip over to the walkthrough.
  • To get an associated code snippet that you can use to have Unstructured parse and transform a file programmatically instead of by using the Unstructured user interface, click the down arrow next to Copy curl command at the top right of the page, and then do one of the following:
    • Click Show options to see the associatedcurl, Unstructured Python SDK, and Unstructured JavaScript/TypeScript SDK code snippets. Then do one of the following:
      • Click the Copy icon in the top right corner to copy the active code snippet to your system’s clipboard.
      • Click My API keys to get your Unstructured API key, which is necessary when calling Unstructured programmatically.
      • Click API Documentation to learn how to set up, customize, and run the code.
    • Click Copy curl command to copy the curl code snippet to your system’s clipboard without viewing the code snippet first.
    • Click Copy Python SDK code to copy the Unstructured Python SDK code snippet to your system’s clipboard without viewing the code snippet first.
    • Click Copy JavaScript code to copy the Unstructured JavaScript/TypeScript SDK code snippet to your system’s clipboard without viewing the code snippet first.
  Learn how to add chunking, enrichments, and embeddings after partitioning.   Learn more about the Unstructured user interface.

Remote quickstart

The following quickstart shows you how to use the Unstructured UI to process remote files (or data). The requirements are as follows.
  • A compatible source (input) location that contains your data for Unstructured to process. See the list of supported source types. If your source (input) location is not in this list, or if you do not yet have any source locations for Unstructured to process, stop here and skip over to the Dropbox source connector quickstart instead. This quickstart guides you through the process of creating a free Dropbox account, uploading your files to Dropbox, and creating a source connector to connect Unstructured to those files.
  • For document-based source locations, compatible files in that location. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
  • A compatible destination (output) location for Unstructured to put the processed data. See the list of supported destination types. If your destination (output) location is not in this list, or if you do not yet have any destination locations for Unstructured to send its processed data, stop here and skip over to the Pinecone destination connector quickstart instead. This quickstart guides you through the process of creating a free Pinecone account and creating a destination connector to connect Unstructured to a Pinecone dense serverless index within your Pinecone account.
1

Sign up and sign in

  1. If you do not already have an Unstructured account, sign up for free. After you sign up, you are automatically signed in to your new Unstructured Starter account, at https://platform.unstructured.io.
    To sign up for a Team or Enterprise account instead, contact Unstructured Sales, or learn more.
  2. If you have an Unstructured Starter or Team account and are not already signed in, sign in to your account at https://platform.unstructured.io.
    For an Enterprise account, see your Unstructured account administrator for instructions, or email Unstructured Support at support@unstructured.io.
2

Set the source (input) location

Sources in the sidebar
  1. From your Unstructured dashboard, in the sidebar, click Connectors.
  2. Click Sources.
  3. Click New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the source location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
3

Set the destination (output) location

Destinations in the sidebar
  1. In the sidebar, click Connectors.
  2. Click Destinations.
  3. Click New or Create Connector.
  4. For Name, enter some unique name for this connector.
  5. In the Provider area, click the destination location type that matches yours.
  6. Click Continue.
  7. Fill in the fields with the appropriate settings. Learn more.
  8. If a Continue button appears, click it, and fill in any additional settings fields.
  9. Click Save and Test.
4

Define the workflow

Workflows in the sidebar
  1. In the sidebar, click Workflows.
  2. Click New Workflow.
  3. Next to Build it for Me, click Create Workflow.
    If a radio button appears instead of Build it for Me, select it, and then click Continue.
  4. For Workflow Name, enter some unique name for this workflow.
  5. In the Sources dropdown list, select your source location from Step 3.
  6. In the Destinations dropdown list, select your destination location from Step 4.
    You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations.
  7. Click Continue.
  8. The Reprocess All box applies only to blob storage connectors such as the Amazon S3, Azure Blob Storage, and Google Cloud Storage connectors:
    • Checking this box reprocesses all documents in the source location on every workflow run.
    • Unchecking this box causes only new documents that are added to the source location, or existing documents that are updated in the source location, since the last workflow run to be processed on future runs. Previously processed documents are not processed again. However:
      • Even if this box is unchecked, a renamed file is always treated as a new file, regardless of whether the file’s original contents have changed.
      • Even if this box is unchecked, a file that is removed but is added back later with the same file name is processed on future runs only if the file’s contents have changed since the file was originally processed.
  9. Click Continue.
  10. If you want this workflow to run on a schedule, in the Repeat Run dropdown list, select one of the scheduling options, and fill in the scheduling settings. Otherwise, select Don’t repeat.
  11. Click Complete.
5

Process the documents

Workflows in the sidebar
  1. If you did not choose to run this workflow on a schedule in Step 5, you can run the workflow now: on the sidebar, click Workflows.
  2. Next to your workflow from Step 5, click Run.
6

Monitor the processing job

Select a jobCompleted job
  1. In the sidebar, click Jobs.
  2. In the list of jobs, wait for the job’s Status to change to Finished.
  3. Click the row for the job.
  4. After Overview displays Finished, go to the next Step.
7

View the processed data

Go to your destination location to view the processed data.
Learn more about Unstructured source connectors, destination connectors, workflows, jobs, and managing your account.
I