Home > Backend Development > Python Tutorial > Python Program: Find the starting and ending index of all words in a string

Python Program: Find the starting and ending index of all words in a string

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
Release: 2023-08-28 09:17:06
forward
1391 people have browsed it

Python Program: Find the starting and ending index of all words in a string

Sometimes, we need the starting index of a word and the last index of that word. Sentences consist of words separated by spaces. In this Python article, two different ways of finding the beginning and end indices of all words in a sentence or a given string are given using two different examples. In the first example, follow the process of simply iterating over all characters of the string while looking for spaces that mark the beginning of a word. In Example 2, the Natural Language Toolkit is used to find the start and end indices of all words in a string.

Example 1 - Find the start and end index of all words in a string by iterating over it.

algorithm

Step 1 - First get a string and name it as given Str.

Step 2 - Create a function called StartandEndIndex that will take this given Str and iterate over it, check for whitespace and return a list of tuples with the start and end indices of all words .

Step 3 - Create a word list using the split method.

Step 4 - Use the values ​​from the two lists above and create a dictionary.

Step 5 - Run the program and check the results.

Python file contains this content

#function for given word indices
def StartandEndIndex(givenStr):
   indexList = []
   startNum = 0
   lengthOfSentence=len(givenStr)
   #iterate though the given string
   for indexitem in range(0,lengthOfSentence):
      #check if there is a separate word
      if givenStr[indexitem] == " ":
         indexList.append((startNum, indexitem - 1))
         indexitem += 1
         startNum = indexitem
             
   if startNum != len(givenStr):
      indexList.append((startNum, len(givenStr) - 1))
   return indexList
 

givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you'
#call the function StartandEndIndex(givenStr) 
#and get the list having starting and ending indices of all words
indexListt = StartandEndIndex(givenStr)

# make a list of words separately
listofwords= givenStr.split()
print("\nThe given String or Sentence is ")
print(givenStr)
print("\nThe list of words is ")
print(listofwords)

#make a dictionary using words and their indices
resDict = {listofwords[indx]: indexListt[indx] for indx in range(len(listofwords))}
print("\nWords and their indices : " + str(resDict))
Copy after login

View results - Example 1

To see the results, run the Python file in a cmd window.

The given String or Sentence is
Keep your face always toward the sunshine and shadows will fall behind you

The list of words is
['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you']

Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Copy after login

Figure 1: Displaying results in the command window.

Example 2: Use nltk (Natural Language Toolkit) to find the start and end indices of all words in a string.

algorithm

Step 1 - First install nltk using the pip command. Now import align_tokens from it.

Step 2 - Take the given Str as test string and split it into words using split function and call it listofwords.

Step 3 - Now use align_tokens and listofwords as tokens along with the given Str.

Step 4 - It will return the word index list but with spaces. Subtract one from the last word index value to get a word index list without spaces.

Step 5 - Use the values ​​from the two lists above and create a dictionary.

Step 6 - Run the program and check the results.

Python file contains this content

#Use pip install nltk to install this library

#import align tokens
from nltk.tokenize.util import align_tokens

#specify a string for testing
givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you'

#make a list of words
listofwords= givenStr.split()

print("\nThe given String or Sentence is ")
print(givenStr)
print("\nThe list of words is ")
print(listofwords)

#this will include blank spaces with words while giving indices
indices_includingspace= align_tokens(listofwords, givenStr)
indices_withoutspace=[]

#reduce the last index number of the word indices
for item in indices_includingspace:
   #convert tuple to list
   lst = list(item)
   lst[1]=lst[1] - 1
   #convert list to tuple again
   tup = tuple(lst)
   indices_withoutspace.append(tup)
print(indices_withoutspace)

#make the dictionary of all words in a string with their indices
resDict = {listofwords[indx]: indices_withoutspace[indx] for indx in range(len(listofwords))}
print("\nWords and their indices : " + str(resDict))
Copy after login

View results - Example 2

Open the cmd window and run the python file to view the results.

The given String or Sentence is
Keep your face always toward the sunshine and shadows will fall behind you

The list of words is
['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you']
[(0, 3), (5, 8), (10, 13), (15, 20), (22, 27), (29, 31), (33, 40), (42, 44), (46, 52), (54, 57), (59, 62), (64, 69), (71, 73)]

Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}
Copy after login

Figure 2: Displaying words and their indexes.

In this Python article, using two different examples, methods are given to find the starting index and ending index of all words in a string. In Example 1, this is accomplished by iterating over all characters of the string. Here, spaces are chosen to mark the beginning of new words. In Example 2, the nltk library or Natural Language Toolkit is used. First, it is installed using pip. Then import the required module named align_tokens. Using this module and specifying the tags in the word list, the index of all words can be found.

The above is the detailed content of Python Program: Find the starting and ending index of all words in a string. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:tutorialspoint.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template