🤖Mastering Chatbot Training: Using Website Content and Documents for Enhanced Responses

📚 Learn how to train your chatbot with your website texts or PDF documents. This tutorial guides you through setting up training sources, automating updates, and managing Q&A to enhance your chatbot’s capabilities.

💡 PRO TIP: Click here to hire an expert to guide you in the step by step. Book a call now to learn all the tips and tricks for training your AI, or let us handle it all for you instead.

Table of contents

Welcome to our comprehensive guide on training your chatbot with your website’s content and PDF documents! This tutorial will walk you through the process of setting up your chatbot to learn from various sources, including URLs, text files, and PDFs.

With this feature, your chatbot can undergo training using your website texts or PDF documents. Once the training is successfully completed, the chatbot will be able to answer questions related to your contents.

  • To initiate the chatbot training, enter the chatbot area. It is possible to enter URLs for websites, text or PDF files, or XML sitemaps.
  • Once the sources are set, click the Train chatbot button and await completion of the training process.
  • You can add and manage personalized question and answers from Chatbot > Training > Q&A and from the chatbot training window.

  • To automatically train the chatbot with your website content at regular intervals, create a cron job that runs the URL https://omnichat.planifyx.com/script/include/api.php?open-ai-training=true&cloud=API-TOKEN, or use the command */59 * * * * wget https://omnichat.planifyx.com/script/include/api.php?open-ai-training=true to run it via a command. Replace API-TOKEN with your own API Token which can be found at https://omnichat.planifyx.com/account/?tab=installation. The cron job can be executed at most once every 7 days. The automatic training only works with websites. We strongly recommand providing an XML sitemap instead of the website URL for performance reasons.

Information #

  • It is only possible to upload files in PDF and TXT formats.
  • You can provide the website URL and all child URLs will be included and crawled, but with large websites, it is more efficient and less prone to errors and infinite link loops to utilize an XML sitemap instead of relying on the website URL. You can create it with a service like https://www.xml-sitemaps.com.
  • If you want to train your chatbot using specific pages from your website instead of all of them, you can make use of an XML sitemap. Create one using a tool like https://www.xml-sitemaps.com, and then remove the pages you do not wish to include by editing the file in a text editor. To use the XML sitemap, you need to upload it either onto your server or an external online location. Afterward, add the URL of the sitemap in Chatbot > Training > Website.

  • You can upload large files and your XML sitemap with a service like https://tmpfiles.org.
  • If you are training OpenAI with a multi-language website, you can limit the chatbot to retrieve answers only from the pages in the user’s language. To activate this feature, go to Settings > Artificial Intelligence > OpenAI > Multilingual Training Sources. For OmniChat to comprehend the language of your web pages, the <html> must contain the attribute lang.

  • As soon as the training is completed, the uploaded files are removed.
  • To add new training sources, simply train the chatbot again. The previous training sources will not be lost, and only the new sources will be added.
  • The OmniChat articles are used as training sources automatically.
  • The OmniChat conversations are used as training sources automatically. The training is done via cron job every 24 hours. Only user and agent messages are used, chatbot messages are ignored.
  • There are character limits for training the chatbot. You can view the character limits here.
  • The embedding model is essential for training your chatbot and handling all user messages. We currently use the text-embedding-3-small model. It is necessary for these scenarios and cannot be disabled or changed. You can find pricing information at https://openai.com/pricing. Check out the pricing for the text-embedding-3-small model in the Embedding models section.
  • The responses generated by OpenAI have the feature to include in the reply a link to the corresponding website page where the answer was sourced.
  • Go to Chatbot > Training > Information, and click the Delete all training data button to remove all previous training data for the chatbot.

  • The embeddings are stored as JSON files in the OmniChat uploads folder and are secured using the password-by-filename approach.
    If you have additional doubts, remember that you can always contact us.