Replies: 1 comment
-
I need help with the documents to be used and then query the vector index for answers. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
The embeddings are being created in vector index(databricks) and when queried I get the below error.
79 vector_store.add_documents(documents=documents, ids=[str(i) for i in range(1, len(documents) + 1)])
83 #vector_store.add_documents(documents=documents, ids=["2"])
---> 84 results = vector_store.similarity_search(
85 query="Capital of India?", k=1, filter={"source": "https://en.wikipedia.org/wiki/India" target="_blank" rel="noopener noreferrer">https://en.wikipedia.org/wiki/India"}
86 )
87 for doc in results:
88 print(f"* {doc.page_content} [{doc.metadata}]")
Sample:
import requests
from bs4 import BeautifulSoup
from langchain_core.documents import Document
Function to fetch webpage content
def fetch_filtered_content(url):
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Extract only paragraphs
paragraphs = [p.get_text() for p in soup.find_all('p')]
return "\n".join(paragraphs)
Define URLs
urls = [
"https://en.wikipedia.org/wiki/India",
"https://en.wikipedia.org/wiki/Japan",
"https://en.wikipedia.org/wiki/Australia",
]
Fetch content and create documents
documents = []
for idx, url in enumerate(urls, start=1):
page_content = fetch_webpage_content(url)
document = Document(page_content=page_content, metadata={"source": url})
documents.append(document)
Add documents to vector store (example)
vector_store.add_documents(documents=documents, ids=[str(i) for i in range(1, len(documents) + 1)])
System Info
Package Version
aiohappyeyeballs 2.4.4
aiohttp 3.11.10
aiosignal 1.3.1
alembic 1.14.0
annotated-types 0.7.0
anyio 4.7.0
asttokens 2.0.5
astunparse 1.6.3
attrs 24.2.0
azure-core 1.30.2
azure-storage-blob 12.19.1
azure-storage-file-datalake 12.14.0
backcall 0.2.0
beautifulsoup4 4.12.3
black 23.3.0
blinker 1.9.0
boto3 1.34.39
botocore 1.34.39
bs4 0.0.2
cachetools 5.3.3
certifi 2023.7.22
cffi 1.15.1
chardet 4.0.0
charset-normalizer 2.0.4
click 8.1.7
cloudpickle 2.2.1
comm 0.1.2
contourpy 1.0.5
cryptography 41.0.3
cycler 0.11.0
Cython 0.29.32
databricks-ai-bridge 0.0.3
databricks-langchain 0.0.3
databricks-sdk 0.38.0
databricks-vectorsearch 0.43
dataclasses-json 0.6.7
dbus-python 1.2.18
debugpy 1.6.7
decorator 5.1.1
Deprecated 1.2.15
deprecation 2.1.0
distlib 0.3.8
distro 1.7.0
distro-info 1.1+ubuntu0.2
docker 7.1.0
entrypoints 0.4
executing 0.8.3
facets-overview 1.1.1
filelock 3.13.4
Flask 3.1.0
fonttools 4.25.0
frozenlist 1.5.0
gitdb 4.0.11
GitPython 3.1.43
google-api-core 2.18.0
google-auth 2.31.0
google-cloud-core 2.4.1
google-cloud-storage 2.17.0
google-crc32c 1.5.0
google-resumable-media 2.7.1
googleapis-common-protos 1.63.2
graphene 3.4.3
graphql-core 3.2.5
graphql-relay 3.2.0
greenlet 3.1.1
grpcio 1.60.0
grpcio-status 1.60.0
gunicorn 23.0.0
h11 0.14.0
httpcore 1.0.7
httplib2 0.20.2
httpx 0.28.1
httpx-sse 0.4.0
idna 3.4
importlib-metadata 6.0.0
ipyflow-core 0.0.198
ipykernel 6.25.1
ipython 8.15.0
ipython-genutils 0.2.0
ipywidgets 7.7.2
isodate 0.6.1
itsdangerous 2.2.0
jedi 0.18.1
jeepney 0.7.1
Jinja2 3.1.4
jiter 0.8.2
jmespath 0.10.0
joblib 1.2.0
jsonpatch 1.33
jsonpointer 3.0.0
jupyter_client 7.4.9
jupyter_core 5.3.0
keyring 23.5.0
kiwisolver 1.4.4
langchain 0.3.10
langchain-community 0.3.10
langchain-core 0.3.23
langchain-databricks 0.1.1
langchain-openai 0.2.12
langchain-text-splitters 0.3.2
langsmith 0.1.147
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
Mako 1.3.8
Markdown 3.7
MarkupSafe 3.0.2
marshmallow 3.23.1
matplotlib 3.7.2
matplotlib-inline 0.1.6
mlflow 2.18.0
mlflow-skinny 2.18.0
more-itertools 8.10.0
multidict 6.1.0
mypy-extensions 0.4.3
nest-asyncio 1.5.6
numpy 1.26.4
oauthlib 3.2.0
openai 1.57.1
opentelemetry-api 1.28.2
opentelemetry-sdk 1.28.2
opentelemetry-semantic-conventions 0.49b2
orjson 3.10.12
packaging 23.2
pandas 1.5.3
parso 0.8.3
pathspec 0.10.3
patsy 0.5.3
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
pip 23.2.1
platformdirs 3.10.0
plotly 5.9.0
prompt-toolkit 3.0.36
propcache 0.2.1
proto-plus 1.24.0
protobuf 4.24.1
psutil 5.9.0
psycopg2 2.9.3
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 14.0.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pyccolo 0.0.52
pycparser 2.21
pydantic 2.10.3
pydantic_core 2.27.1
pydantic-settings 2.6.1
Pygments 2.15.1
PyGObject 3.42.1
PyJWT 2.3.0
pyodbc 4.0.38
pyparsing 3.0.9
python-apt 2.4.0+ubuntu3
python-dateutil 2.8.2
python-dotenv 1.0.1
python-lsp-jsonrpc 1.1.1
pytz 2022.7
PyYAML 6.0
pyzmq 23.2.0
regex 2024.11.6
requests 2.31.0
requests-toolbelt 1.0.0
rsa 4.9
s3transfer 0.10.2
scikit-learn 1.3.0
scipy 1.11.1
seaborn 0.12.2
SecretStorage 3.3.1
setuptools 68.0.0
six 1.16.0
smmap 5.0.1
sniffio 1.3.1
soupsieve 2.6
SQLAlchemy 2.0.36
sqlparse 0.5.0
ssh-import-id 5.11
stack-data 0.2.0
statsmodels 0.14.0
tabulate 0.9.0
tenacity 8.2.2
threadpoolctl 2.2.0
tiktoken 0.8.0
tokenize-rt 4.2.1
tornado 6.3.2
tqdm 4.67.1
traitlets 5.7.1
typing_extensions 4.12.2
typing-inspect 0.9.0
tzdata 2022.1
ujson 5.4.0
unattended-upgrades 0.1
urllib3 1.26.16
virtualenv 20.24.2
wadllib 1.3.6
wcwidth 0.2.5
Werkzeug 3.1.3
wheel 0.38.4
wrapt 1.17.0
yarl 1.18.3
zipp 3.11.0
Beta Was this translation helpful? Give feedback.
All reactions