


Python Program: Find the starting and ending index of all words in a string
Sometimes, we need the starting index of a word and the last index of that word. Sentences consist of words separated by spaces. In this Python article, two different ways of finding the beginning and end indices of all words in a sentence or a given string are given using two different examples. In the first example, follow the process of simply iterating over all characters of the string while looking for spaces that mark the beginning of a word. In Example 2, the Natural Language Toolkit is used to find the start and end indices of all words in a string.
Example 1 - Find the start and end index of all words in a string by iterating over it.
algorithm
Step 1 - First get a string and name it as given Str.
Step 2 - Create a function called StartandEndIndex that will take this given Str and iterate over it, check for whitespace and return a list of tuples with the start and end indices of all words .
Step 3 - Create a word list using the split method.
Step 4 - Use the values from the two lists above and create a dictionary.
Step 5 - Run the program and check the results.
Python file contains this content
#function for given word indices def StartandEndIndex(givenStr): indexList = [] startNum = 0 lengthOfSentence=len(givenStr) #iterate though the given string for indexitem in range(0,lengthOfSentence): #check if there is a separate word if givenStr[indexitem] == " ": indexList.append((startNum, indexitem - 1)) indexitem += 1 startNum = indexitem if startNum != len(givenStr): indexList.append((startNum, len(givenStr) - 1)) return indexList givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you' #call the function StartandEndIndex(givenStr) #and get the list having starting and ending indices of all words indexListt = StartandEndIndex(givenStr) # make a list of words separately listofwords= givenStr.split() print("\nThe given String or Sentence is ") print(givenStr) print("\nThe list of words is ") print(listofwords) #make a dictionary using words and their indices resDict = {listofwords[indx]: indexListt[indx] for indx in range(len(listofwords))} print("\nWords and their indices : " + str(resDict))
View results - Example 1
To see the results, run the Python file in a cmd window.
The given String or Sentence is Keep your face always toward the sunshine and shadows will fall behind you The list of words is ['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you'] Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Figure 1: Displaying results in the command window.
Example 2: Use nltk (Natural Language Toolkit) to find the start and end indices of all words in a string.
algorithm
Step 1 - First install nltk using the pip command. Now import align_tokens from it.
Step 2 - Take the given Str as test string and split it into words using split function and call it listofwords.
Step 3 - Now use align_tokens and listofwords as tokens along with the given Str.
Step 4 - It will return the word index list but with spaces. Subtract one from the last word index value to get a word index list without spaces.
Step 5 - Use the values from the two lists above and create a dictionary.
Step 6 - Run the program and check the results.
Python file contains this content
#Use pip install nltk to install this library #import align tokens from nltk.tokenize.util import align_tokens #specify a string for testing givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you' #make a list of words listofwords= givenStr.split() print("\nThe given String or Sentence is ") print(givenStr) print("\nThe list of words is ") print(listofwords) #this will include blank spaces with words while giving indices indices_includingspace= align_tokens(listofwords, givenStr) indices_withoutspace=[] #reduce the last index number of the word indices for item in indices_includingspace: #convert tuple to list lst = list(item) lst[1]=lst[1] - 1 #convert list to tuple again tup = tuple(lst) indices_withoutspace.append(tup) print(indices_withoutspace) #make the dictionary of all words in a string with their indices resDict = {listofwords[indx]: indices_withoutspace[indx] for indx in range(len(listofwords))} print("\nWords and their indices : " + str(resDict))
View results - Example 2
Open the cmd window and run the python file to view the results.
The given String or Sentence is Keep your face always toward the sunshine and shadows will fall behind you The list of words is ['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you'] [(0, 3), (5, 8), (10, 13), (15, 20), (22, 27), (29, 31), (33, 40), (42, 44), (46, 52), (54, 57), (59, 62), (64, 69), (71, 73)] Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Figure 2: Displaying words and their indexes.
In this Python article, using two different examples, methods are given to find the starting index and ending index of all words in a string. In Example 1, this is accomplished by iterating over all characters of the string. Here, spaces are chosen to mark the beginning of new words. In Example 2, the nltk library or Natural Language Toolkit is used. First, it is installed using pip. Then import the required module named align_tokens. Using this module and specifying the tags in the word list, the index of all words can be found.
The above is the detailed content of Python Program: Find the starting and ending index of all words in a string. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

This tutorial demonstrates how to use Python to process the statistical concept of Zipf's law and demonstrates the efficiency of Python's reading and sorting large text files when processing the law. You may be wondering what the term Zipf distribution means. To understand this term, we first need to define Zipf's law. Don't worry, I'll try to simplify the instructions. Zipf's Law Zipf's law simply means: in a large natural language corpus, the most frequently occurring words appear about twice as frequently as the second frequent words, three times as the third frequent words, four times as the fourth frequent words, and so on. Let's look at an example. If you look at the Brown corpus in American English, you will notice that the most frequent word is "th

Python provides a variety of ways to download files from the Internet, which can be downloaded over HTTP using the urllib package or the requests library. This tutorial will explain how to use these libraries to download files from URLs from Python. requests library requests is one of the most popular libraries in Python. It allows sending HTTP/1.1 requests without manually adding query strings to URLs or form encoding of POST data. The requests library can perform many functions, including: Add form data Add multi-part file Access Python response data Make a request head

This article explains how to use Beautiful Soup, a Python library, to parse HTML. It details common methods like find(), find_all(), select(), and get_text() for data extraction, handling of diverse HTML structures and errors, and alternatives (Sel

Dealing with noisy images is a common problem, especially with mobile phone or low-resolution camera photos. This tutorial explores image filtering techniques in Python using OpenCV to tackle this issue. Image Filtering: A Powerful Tool Image filter

PDF files are popular for their cross-platform compatibility, with content and layout consistent across operating systems, reading devices and software. However, unlike Python processing plain text files, PDF files are binary files with more complex structures and contain elements such as fonts, colors, and images. Fortunately, it is not difficult to process PDF files with Python's external modules. This article will use the PyPDF2 module to demonstrate how to open a PDF file, print a page, and extract text. For the creation and editing of PDF files, please refer to another tutorial from me. Preparation The core lies in using external module PyPDF2. First, install it using pip: pip is P

This tutorial demonstrates how to leverage Redis caching to boost the performance of Python applications, specifically within a Django framework. We'll cover Redis installation, Django configuration, and performance comparisons to highlight the bene

Natural language processing (NLP) is the automatic or semi-automatic processing of human language. NLP is closely related to linguistics and has links to research in cognitive science, psychology, physiology, and mathematics. In the computer science

This article compares TensorFlow and PyTorch for deep learning. It details the steps involved: data preparation, model building, training, evaluation, and deployment. Key differences between the frameworks, particularly regarding computational grap
